# Custom Chatbot Project

I want to build a Shakespearean character role play bot that talks in Katherine the Shrew's tone and voice. This bot can assume the role of thser inis character and provide replies to the user. This is the dataset that I want to use which contains all the lines scraped from the play "The taming of the Shrew" 
https://www.kaggle.com/datasets/guslovesmath/shakespeare-plays-dataset

I want to accomplish this task using In-context learning with one-shot and few shot prompts. 

## Data Wrangling


In [1]:
!pip install openai

Collecting openai
  Downloading openai-1.31.0-py3-none-any.whl.metadata (21 kB)
Collecting distro<2,>=1.7.0 (from openai)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting pydantic<3,>=1.9.0 (from openai)
  Downloading pydantic-2.7.3-py3-none-any.whl.metadata (108 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m109.0/109.0 kB[0m [31m16.9 MB/s[0m eta [36m0:00:00[0m
Collecting annotated-types>=0.4.0 (from pydantic<3,>=1.9.0->openai)
  Downloading annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting pydantic-core==2.18.4 (from pydantic<3,>=1.9.0->openai)
  Downloading pydantic_core-2.18.4-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.5 kB)
Downloading openai-1.31.0-py3-none-any.whl (324 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m324.1/324.1 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[?25hDownloading distro-1.9.0-py3-none-any.whl (20 kB)
Downloading pydantic-2.7.3-py3-n

In [2]:
import pandas as pd
import numpy as np
import openai
from typing import List, Union, Dict
from scipy.spatial.distance import cosine

In [36]:
data = pd.read_csv('shakespeare_plays.csv')

In [37]:
data.head()

Unnamed: 0.1,Unnamed: 0,play_name,genre,character,act,scene,sentence,text,sex
0,0,All's Well That Ends Well,Comedy,Countess,1,1,1,"In delivering my son from me, I bury a second ...",female
1,1,All's Well That Ends Well,Comedy,Bertram,1,1,2,"And I in going, madam, weep o'er my father's d...",male
2,2,All's Well That Ends Well,Comedy,Bertram,1,1,3,"anew: but I must attend his majesty's command, to",male
3,3,All's Well That Ends Well,Comedy,Bertram,1,1,4,"whom I am now in ward, evermore in subjection.",male
4,4,All's Well That Ends Well,Comedy,Lafeu,1,1,5,"You shall find of the king a husband, madam; you,",male


In [5]:
data.shape

(108093, 9)

In [5]:
data['genre'].value_counts()

genre
Comedy     45901
Tragedy    31473
History    30719
Name: count, dtype: int64

In [6]:
data['sex'].value_counts()


sex
male      89038
female    19055
Name: count, dtype: int64

In [7]:
data['play_name'].value_counts()

play_name
Hamlet                         4023
Coriolanus                     3761
Cymbeline                      3755
Richard III                    3702
Antony and Cleopatra           3565
Othello                        3558
King Lear                      3499
Troilus and Cressida           3456
Winter's Tale                  3362
Henry IV, part 2               3251
Henry VIII                     3236
Henry V                        3230
Henry VI, part 2               3122
Romeo and Juliet               3079
Henry IV, part 1               3038
Henry VI, part 3               2931
All's Well That Ends Well      2925
Love's Labours Lost            2862
Measure for Measure            2833
Richard II                     2800
Henry VI, part 1               2761
As You Like It                 2676
The Merchant of Venice         2665
King John                      2648
Taming of the Shrew            2637
The\nMerry Wives of Windsor    2615
Julius Caesar                  2599
Much Ado About Not

# Sample from dataset
We will use a subset of the play `Taming of the Shrew` related data only to demonstrate our use case.

In [38]:
data['play_name'] = data['play_name'].str.strip().str.lower()

In [39]:
data.head()

Unnamed: 0.1,Unnamed: 0,play_name,genre,character,act,scene,sentence,text,sex
0,0,all's well that ends well,Comedy,Countess,1,1,1,"In delivering my son from me, I bury a second ...",female
1,1,all's well that ends well,Comedy,Bertram,1,1,2,"And I in going, madam, weep o'er my father's d...",male
2,2,all's well that ends well,Comedy,Bertram,1,1,3,"anew: but I must attend his majesty's command, to",male
3,3,all's well that ends well,Comedy,Bertram,1,1,4,"whom I am now in ward, evermore in subjection.",male
4,4,all's well that ends well,Comedy,Lafeu,1,1,5,"You shall find of the king a husband, madam; you,",male


In [40]:
data['play_name'].value_counts()

play_name
hamlet                         4023
coriolanus                     3761
cymbeline                      3755
richard iii                    3702
antony and cleopatra           3565
othello                        3558
king lear                      3499
troilus and cressida           3456
winter's tale                  3362
henry iv, part 2               3251
henry viii                     3236
henry v                        3230
henry vi, part 2               3122
romeo and juliet               3079
henry iv, part 1               3038
henry vi, part 3               2931
all's well that ends well      2925
love's labours lost            2862
measure for measure            2833
richard ii                     2800
henry vi, part 1               2761
as you like it                 2676
the merchant of venice         2665
king john                      2648
taming of the shrew            2637
the\nmerry wives of windsor    2615
julius caesar                  2599
much ado about not

In [41]:
play = ['Hamlet']
data = data[data['play_name']=='taming of the shrew']

In [42]:
data.shape

(2637, 9)

In [43]:
data.head()

Unnamed: 0.1,Unnamed: 0,play_name,genre,character,act,scene,sentence,text,sex
29451,29451,taming of the shrew,Comedy,Sly,0,1,1,"I'll pheeze you, in faith.",male
29452,29452,taming of the shrew,Comedy,Hostess,0,1,2,"A pair of stocks, you rogue!",female
29453,29453,taming of the shrew,Comedy,Sly,0,1,3,Ye are a baggage: the Slys are no rogues; look in,male
29454,29454,taming of the shrew,Comedy,Sly,0,1,4,the chronicles; we came in with Richard Conque...,male
29455,29455,taming of the shrew,Comedy,Sly,0,1,5,Therefore paucas pallabris; let the world slid...,male


In [44]:
data['character'].value_counts()

character
Petruchio          587
Tranio             293
Katharina          216
Hortensio          206
Lucentio           190
Baptista           175
Grumio             171
Gremio             170
Lord               138
Biondello          102
Bianca              71
Sly                 63
Pedant              50
Vincentio           47
Curtis              25
Tailor              17
Page                16
First Servant       16
Third Servant       12
Second Servant      12
Widow               11
First Huntsman       9
Messenger            8
Servant              5
Nathaniel            4
A Player             4
Hostess              4
Second Huntsman      3
Katarina             3
Peter                2
Players              1
Hortensia            1
Philip               1
Joseph               1
Nicholas             1
Haberdasher          1
All                  1
Name: count, dtype: int64

In [45]:
data = data[data['character']=='Katharina']

In [46]:
data.head()

Unnamed: 0.1,Unnamed: 0,play_name,genre,character,act,scene,sentence,text,sex
29789,29789,taming of the shrew,Comedy,Katharina,1,1,57,"I pray you, sir, is it your will",female
29790,29790,taming of the shrew,Comedy,Katharina,1,1,58,To make a stale of me amongst these mates?,female
29793,29793,taming of the shrew,Comedy,Katharina,1,1,61,"I'faith, sir, you shall never need to fear:",female
29794,29794,taming of the shrew,Comedy,Katharina,1,1,62,I wis it is not half way to her heart;,female
29795,29795,taming of the shrew,Comedy,Katharina,1,1,63,"But if it were, doubt not her care should be",female


# Let's try to assess the model's performance without any modifications at first

In [10]:
OPENAI_API_KEY = "API_KEY"
EMBEDDING_MODEL = 'text-embedding-3-small'
COMPLETION_MODEL = 'gpt-3.5-turbo'
BATCH_SIZE = 2

openai_client = openai.OpenAI(api_key=OPENAI_API_KEY)

In [47]:
# Example OpenAI Python library request
MODEL = "gpt-3.5-turbo"
response = openai_client.chat.completions.create(
  model=MODEL,
  messages=[
      {
        "role": "system",
        "content": "Please create a shakespearen tragedy style quote for not being able to sleep. Do it in the tone of the character Katharine the Shrew from the play Taming of the shrew"
      }],
  temperature=1,
  max_tokens=256,
  top_p=1,
  frequency_penalty=0,
  presence_penalty=0
)


In [48]:
response.choices[0].message.content.split("\n")

['"Thou cruel night, that doth torment mine eyes with restless vigils! O sleep, thou art a fickle lover, forsaking me in mine hour of need. For what offense have I committed, that thou deny me rest? This cursed insomnia doth plague me, turning my mind to madness and mine body to weariness. Would that I could find sweet slumber\'s embrace, but alas, it eludes me like a fleeting dream. O sleep, thou art a cruel mistress, taunting me with thy absence and leaving me to suffer in the darkness of mine own mind."']

## This response is in generic Shakespearean style but not particulary in Katharine's voice or tone. I want to capture her sassiness and sharp-wit.

## Custom Query Completion

In [49]:
from typing import List, Union, Dict
from scipy.spatial.distance import cosine

In [54]:
def build_simple_prompt(question):
    return [
        {
            'role': 'user',
            'content': question
        }
    ]

In [64]:
def get_context(character_profile):
    context = character_profile.tolist()[:100]
    return context

In [65]:
print(get_context(data['text']))

['I pray you, sir, is it your will', 'To make a stale of me amongst these mates?', "I'faith, sir, you shall never need to fear:", 'I wis it is not half way to her heart;', 'But if it were, doubt not her care should be', "To comb your noddle with a three-legg'd stool", 'And paint your face and use you like a fool.', 'A pretty peat! it is best', 'Put finger in the eye, an she knew why.', 'Why, and I trust I may go too, may I not? What,', 'shall I be appointed hours; as though, belike, I', 'knew not what to take and what to leave, ha?', 'Of all thy suitors, here I charge thee, tell', 'Whom thou lovest best: see thou dissemble not.', "Minion, thou liest. Is't not Hortensio?", 'O then, belike, you fancy riches more:', 'You will have Gremio to keep you fair.', 'If that be jest, then all the rest was so.', "Her silence flouts me, and I'll be revenged.", 'What, will you not suffer me? Nay, now I see', 'She is your treasure, she must have a husband;', 'I must dance bare-foot on her wedding day'

In [67]:
def handle_question(prompt, client, model_name=COMPLETION_MODEL):
    response = client.chat.completions.create(
        model=model_name,
        messages=prompt,
        max_tokens=256
    )
    return response.choices[0].message.content

In [80]:
def build_custom_prompt(question, database_df):
    print("\n \n Found context :: {} ".format(get_context(database_df)))
    return [
        {
            'role': 'system',
            'content': """
                Provide an answer to the user's question based on the tone, speaking style of Katahrine's quotes in context. Answer in her sharp witted tone. The context is enclosed in a list of strings. If you get the context use it, else just follow
                the instruction without context.
            Context: 
                {}
            """.format('\n\n'.join(get_context(database_df)))
        },
        {
            'role': 'user',
            'content': question
        }
    ]

## Custom Performance Demonstration

### Question 1

In [81]:
question_1 = 'Please create a star wars quote'

print('Answer without Context: \n', handle_question(build_simple_prompt(question_1), openai_client))

print('\nAnswer with Context: \n', handle_question(build_custom_prompt(question_1, data['text']), openai_client))

Answer without Context: 
 "May the Force be with you, always."

 
 Found context :: ['I pray you, sir, is it your will', 'To make a stale of me amongst these mates?', "I'faith, sir, you shall never need to fear:", 'I wis it is not half way to her heart;', 'But if it were, doubt not her care should be', "To comb your noddle with a three-legg'd stool", 'And paint your face and use you like a fool.', 'A pretty peat! it is best', 'Put finger in the eye, an she knew why.', 'Why, and I trust I may go too, may I not? What,', 'shall I be appointed hours; as though, belike, I', 'knew not what to take and what to leave, ha?', 'Of all thy suitors, here I charge thee, tell', 'Whom thou lovest best: see thou dissemble not.', "Minion, thou liest. Is't not Hortensio?", 'O then, belike, you fancy riches more:', 'You will have Gremio to keep you fair.', 'If that be jest, then all the rest was so.', "Her silence flouts me, and I'll be revenged.", 'What, will you not suffer me? Nay, now I see', 'She is y

### Question 2

In [82]:
question_1 = 'DO you think peacocks are good looking ?'

print('Answer without Context: \n', handle_question(build_simple_prompt(question_1), openai_client))

print('\nAnswer with Context: \n', handle_question(build_custom_prompt(question_1, data['text']), openai_client))

Answer without Context: 
 Many people find peacocks to be very beautiful and striking with their vibrant colors and elaborate plumage. They are often considered to be one of the most visually stunning birds in the animal kingdom.

 
 Found context :: ['I pray you, sir, is it your will', 'To make a stale of me amongst these mates?', "I'faith, sir, you shall never need to fear:", 'I wis it is not half way to her heart;', 'But if it were, doubt not her care should be', "To comb your noddle with a three-legg'd stool", 'And paint your face and use you like a fool.', 'A pretty peat! it is best', 'Put finger in the eye, an she knew why.', 'Why, and I trust I may go too, may I not? What,', 'shall I be appointed hours; as though, belike, I', 'knew not what to take and what to leave, ha?', 'Of all thy suitors, here I charge thee, tell', 'Whom thou lovest best: see thou dissemble not.', "Minion, thou liest. Is't not Hortensio?", 'O then, belike, you fancy riches more:', 'You will have Gremio to k