**RESPONSE GENERATION - CHITCHAT**

In [None]:
!pip install transformers

In [None]:
!pip install -U nltk

In [None]:
!pip install rank_bm25

In [None]:
import pandas as pd
from rank_bm25 import BM25Okapi
from transformers import AutoTokenizer, AutoModelForCausalLM, GPT2Tokenizer, GPT2LMHeadModel, pipeline
import nltk
nltk.download('punkt')

In [None]:
dataset = pd.read_pickle('chitchat.pkl')

text_corpus = dataset['message'].tolist()
tokenized_corpus = [nltk.word_tokenize(text.lower()) for text in text_corpus]
bm25 = BM25Okapi(tokenized_corpus)

tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-medium")
model = AutoModelForCausalLM.from_pretrained("microsoft/DialoGPT-medium")

In [None]:
import torch
tokenizer.pad_token = tokenizer.eos_token

def generate_response(prompt):
    tokenized_query = nltk.word_tokenize(prompt.lower())
    top_messages = bm25.get_top_n(tokenized_query, text_corpus, n=5)

    responses = []
    for message in top_messages:
        input_ids = tokenizer.encode(message, return_tensors='pt')
        output_ids = torch.cat([input_ids, torch.tensor([[tokenizer.eos_token_id]]).to(torch.int64)], dim=1)

        output = model.generate(output_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id)
        response = tokenizer.decode(output[:, output_ids.shape[-1]:][0], skip_special_tokens=True)

        responses.append(response)

    scores = bm25.get_scores(tokenizer.tokenize(' '.join(responses)))
    ranked_responses = [response for _, response in sorted(zip(scores, responses), reverse=True)]
    return ranked_responses[0]

In [None]:
input = "How are you"
response = generate_response(input)
print("Input Text : ", input)
print("Bot Response : ", response)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Input Text :  How are you
Bot Response :  I'm good, how are you?


In [None]:
input = "How is the weather today ?"
response = generate_response(input)
print("Input Text : ", input)
print("Bot Response : ", response)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Input Text :  How is the weather today ?
Bot Response :  It's going well.


In [None]:
input = "I like music"
response = generate_response(input)
print("Input Text : ", input)
print("Bot Response : ", response)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Input Text :  I like music
Bot Response :  I'm not sure, I've never really listened to music.


In [None]:
input = "The food was very good"
response = generate_response(input)
print("Input Text : ", input)
print("Bot Response : ", response)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Input Text :  The food was very good
Bot Response :  Thanks.


In [None]:
input = "I want to buy a new macbook"
response = generate_response(input)
print("Input Text : ", input)
print("Bot Response : ", response)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Input Text :  I want to buy a new macbook
Bot Response :  I'm sure you'll be able to afford it.


In [None]:
input = "My grandpa passed away"
response = generate_response(input)
print("Input Text : ", input)
print("Bot Response : ", response)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Input Text :  My grandpa passed away
Bot Response :  I'm sorry to hear that.


In [None]:
input = "I need to study a lot"
response = generate_response(input)
print("Input Text : ", input)
print("Bot Response : ", response)

A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


Input Text :  I need to study a lot
Bot Response :  I'm sure you'll get there.


**RESPONSE GENERATION - EMPATHIC**

In [None]:
!pip install sentence_transformers

In [None]:
import pandas as pd
import numpy as np
from sentence_transformers import SentenceTransformer
from rank_bm25 import BM25Okapi
import os
import joblib

df = pd.read_pickle("empathicdatafull.pkl")
df = df[["prompt", "message", "context"]]

models = {}
bm25_models = {}
embeddings = {}

In [None]:
def train_context_model(context):
    context_df = df[df["context"] == context]

    prompt_list = context_df["prompt"].tolist()
    bm25 = BM25Okapi(prompt_list)

    utterance_list = context_df["message"].tolist()
    model = SentenceTransformer('bert-base-nli-mean-tokens')
    embeddings = model.encode(utterance_list, show_progress_bar=True)

    if not os.path.exists("models"):
        os.makedirs("models")
    bm25_file = f"models/{context}_model.joblib"
    embeddings_file = f"models/{context}_embeddings.npy"

    model.save(f"models/{context}_model")
    np.save(embeddings_file, embeddings)
    joblib.dump(bm25, bm25_file)

In [None]:
for context in ['proud','sad','sentimental','surprised','terrified','trusting']:
    print(f"Training model for context: {context}")
    train_context_model(context)

In [None]:
def load_context_model(context):
    model_path = f"models/{context}_model"
    embeddings_path = f"models/{context}_embeddings.npy"
    bm25_path = f"models/{context}_model.joblib"
    
    model = SentenceTransformer(model_path)
    embeddings = np.load(embeddings_path)
    bm25 = joblib.load(bm25_path)
    
    return model, bm25, embeddings

In [None]:
import numpy as np
for context in ['proud','sad','sentimental','surprised','terrified','trusting']:
    print(f"Loading model for context: {context}")
    model, bm25, embedding = load_context_model(context)
    models[context] = model
    bm25_models[context] = bm25
    embeddings[context] = embedding

In [None]:
def get_best_response(query, context):
    prompt_list = df[df["context"] == context]["prompt"].tolist()
    scores = bm25_models[context].get_scores(query)
    top_indices = np.argsort(scores)[::-1][:10]
    top_prompts = [prompt_list[i] for i in top_indices]

    utterance_list = df[df["context"] == context]["message"].tolist()
    prompt_embeddings = embeddings[context][top_indices]
    query_embedding = model.encode([query])[0]
    cos_scores = np.dot(prompt_embeddings, query_embedding) / (np.linalg.norm(prompt_embeddings, axis=1) * np.linalg.norm(query_embedding))
    best_index = np.argmax(cos_scores)
    best_response = utterance_list[top_indices[best_index]]

    return best_response

In [None]:
query = "I am scared of the dark"
context = "afraid"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I am scared of the dark
Response : hahaha thats funny why are you still afraid of the dark?


In [None]:
query = "It was my mistake"
context = "guilty"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : It was my mistake
Response : Oh that is so scary! What happened after?


In [None]:
query = "He will surely win tomorrow"
context = "confident"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : He will surely win tomorrow
Response : I certainly hope so, that would be awesome!


In [None]:
query = "I am really mad at him"
context = "angry"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I am really mad at him
Response: Oh jeez, did you ask him why he did that?


In [None]:
query = "I am really looking forward to the match"
context = "excited"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I am really looking forward to the match
Response: I just love going, when it gets closer to when we are leaving I look forward to it.


In [None]:
query = "My mother loves me so much"
context = "caring"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : My mother loves me so much
Response: My mom is so great! She is always there for me and her grandchildren.


In [None]:
query = "I miss my family"
context = "lonely"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I miss my family
Response : I'm longing for my family's love.


In [None]:
query = "I am scared of exams"
context = "terrified"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I am scared of exams
Response : That sounds terrifying! Are you okay now?


In [None]:
query = "I really miss those days"
context = "nostalgic"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I really miss those days
Response : Enjoy it, your 30s go by really fast. I'm clinging desperately to the last few years of mine myself.


In [None]:
query = "I am proud of your efforts"
context = "proud"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I am proud of your efforts
Response : That's great that they accomplished that and even better that you recognize their achievements.


In [None]:
query = "I trust in you."
context = "trusting"
response = get_best_response(query, context)
print(f"Input : {query}")
print(f"Response : {response}")

Input : I trust in you.
Response : Yes, I really felt like he had my best interests at heart.


**RESPONSE GENERATION - REDDIT**

In [None]:
!pip install rank_bm25
!pip install transformers

In [None]:
import pandas as pd
from rank_bm25 import BM25Okapi
from transformers import GPT2Tokenizer, GPT2LMHeadModel, pipeline
import nltk
nltk.download('punkt')

tokenizer = GPT2Tokenizer.from_pretrained('gpt2-medium')
model = GPT2LMHeadModel.from_pretrained('gpt2-medium')

In [None]:
import re

def generate_response(prompt, subreddit):
    dataset = pd.read_pickle("irdata_50k.pkl")

    dataset = dataset[dataset["subreddit"] == subreddit]

    text_corpus = dataset['message'].tolist()
    tokenized_corpus = [nltk.word_tokenize(text.lower()) for text in text_corpus]
    bm25 = BM25Okapi(tokenized_corpus)
    
    tokenized_query = nltk.word_tokenize(prompt.lower())
    top_messages = bm25.get_top_n(tokenized_query, text_corpus, n=5)

    responses = []
    for message in top_messages:
        input_ids = tokenizer.encode(message, return_tensors='pt')
        output_ids = model.generate(input_ids, max_length=10, num_return_sequences=1, temperature=1.0)
        response = tokenizer.decode(output_ids[0], skip_special_tokens=True)
        
        if re.search(r'[.!]', response):
            response = re.split(r'[.!]', response)[0] + '.'
            
        responses.append(message)
    scores = bm25.get_scores(tokenizer.tokenize(' '.join(responses)))
    ranked_responses = [response for _, response in sorted(zip(scores, responses), reverse=True)]

    return ranked_responses[0]

In [1]:
prompt = 'Are apple iphones secure?'
subreddit = "technology"
response = generate_response(prompt, subreddit)
print("Prompt: ", prompt)
print("Response: ",response)

Prompt: Are apple iphones secure?
Response: Where has Apple ever claimed your iCloud content is secure *from Apple*? I’m quite sure they haven’t.


In [2]:
prompt = 'What do you think about professors'
subreddit = "education"
response = generate_response(prompt, subreddit)
print("Prompt: ", prompt)
print("Response: ",response)

Prompt: What do you think about professors
Response: Haha you think college professors make that much more than high school teachers.....  That might be true if they could get a full time position maybe.


In [3]:
prompt = 'Talk about medical bills at the hospital'
subreddit = "healthcare"
response = generate_response(prompt, subreddit)
print("Prompt: ", prompt)
print("Response: ",response)

Prompt: Talk about medical bills at the hospital
Response: It's the same that whether it's the high deductible or the high price of medical bills, I still can't afford it


In [4]:
prompt = 'Donald trump makes a move in elections'
subreddit = "politics"
response = generate_response(prompt, subreddit)
print("Prompt: ", prompt)
print("Response: ",response)

Prompt: Donald trump makes a move in elections
Response: Its Saturday, almost 5pm EST. Why is Donald Trump still in power?


In [5]:
prompt = 'Talk about planet Earth'
subreddit = "environment"
response = generate_response(prompt, subreddit)
print("Prompt: ", prompt)
print("Response: ",response)

Prompt: Talk about planet Earth
Response: After global warming does its worst, the most habitable planet in the solar system will still be Earth.
