# Custom Chatbot Project

### **The reason I chose this dataset (apart from the fact that I love movies and plays) is that it's usually cumbersome to go through the entire movie info and get specific things I want to be answered. I see this as something that can be expanded on something like the IMDB dataset - making a chatbot that could help get quick information about movies and help people make decisions faster.**

In [1]:
# Imports
import ast
import openai
import tiktoken
import pandas as pd
from openai.embeddings_utils import get_embedding, distances_from_embeddings

# Constants
batch_size = 100
COMPLETION_MODEL_NAME = "gpt-3.5-turbo-instruct"
EMBEDDING_MODEL_NAME = "text-embedding-ada-002"
openai.api_key = "YOUR API KEY"

## Data Wrangling

In [6]:
# read in data
df = pd.read_csv("data/character_descriptions.csv")
# create a column with more information
df['text'] = df.apply(lambda row: f"{row['Name']} is a character in a {row['Medium']} set in {row['Setting']}. {row['Description']}", axis=1)
df = df[['text']]

In [7]:
print(df.shape)
# Setting display options for DataFrame
pd.set_option('display.max_colwidth', None)  
pd.set_option('display.max_rows', None)
df

(55, 1)


Unnamed: 0,text
0,"Emily is a character in a Play set in England. A young woman in her early 20s, Emily is an aspiring actress and Alice's daughter. She has a bubbly personality and a quick wit, but struggles with self-doubt and insecurity. She's also in a relationship with George."
1,"Jack is a character in a Play set in England. A middle-aged man in his 40s, Jack is a successful businessman and Sarah's boss. He has a no-nonsense attitude, but is fiercely loyal to his friends and family. He's married to Alice."
2,"Alice is a character in a Play set in England. A woman in her late 30s, Alice is a warm and nurturing mother of two, including Emily. She's kind-hearted and empathetic, but can be overly protective of her children and prone to worrying. She's married to Jack."
3,"Tom is a character in a Play set in England. A man in his 50s, Tom is a retired soldier and John's son. He has a no-nonsense approach to life, but is haunted by his experiences in combat and struggles with PTSD. He's also in a relationship with Rachel."
4,"Sarah is a character in a Play set in England. A woman in her mid-20s, Sarah is a free-spirited artist and Jack's employee. She's creative, unconventional, and passionate about her work. However, she can also be flighty and impulsive at times."
5,"George is a character in a Play set in England. A man in his early 30s, George is a charming and charismatic businessman who is in a relationship with Emily. He's ambitious, confident, and always looking for the next big opportunity. However, he's also prone to bending the rules to get what he wants."
6,"Rachel is a character in a Play set in England. A woman in her late 20s, Rachel is a shy and introverted librarian who is in a relationship with Tom. She's intelligent, thoughtful, and has a deep love of books. However, she struggles with social anxiety and often feels like an outsider."
7,"John is a character in a Play set in England. A man in his 60s, John is a retired professor and Tom's father. He has a dry wit and a love of intellectual debate, but can also be stubborn and set in his ways."
8,"Maria is a character in a Movie set in Texas. A middle-aged Latina woman in her 40s, Maria is a hard-working single mother who owns a small family-run diner in a small Texas town. She's fiercely protective of her teenage daughter, Sofia, and is always trying to balance work and family."
9,"Caleb is a character in a Movie set in Texas. A young African American man in his early 20s, Caleb is a talented musician who dreams of making it big in the music industry. He's charismatic, confident, and has a way with words. However, he's also struggling with addiction and is constantly at odds with his strict religious family."


In [8]:
## generate embeddings
embeddings = []

for i in range(0, len(df), batch_size):
    # Send text data to OpenAI model to get embeddings
    response = openai.Embedding.create(
        input=df.iloc[i:i+batch_size]["text"].tolist(),
        engine=EMBEDDING_MODEL_NAME
    )
    
    # Add embeddings to list
    embeddings.extend([data["embedding"] for data in response["data"]])

# Add embeddings list to dataframe
df["embeddings"] = embeddings
df

Unnamed: 0,text,embeddings
0,"Emily is a character in a Play set in England. A young woman in her early 20s, Emily is an aspiring actress and Alice's daughter. She has a bubbly personality and a quick wit, but struggles with self-doubt and insecurity. She's also in a relationship with George.","[-0.021480176597833633, -0.010991205461323261, -0.01598135381937027, -0.02571697346866131, -0.04280582070350647, 0.028022101148962975, -0.00543121388182044, 0.019831817597150803, -0.003737782593816519, -0.012916437350213528, -0.00761078204959631, -0.009896592237055302, 0.011770312674343586, -0.0010334437247365713, -0.0007939970819279552, 0.01902051642537117, 0.024995816871523857, -0.01146768406033516, 0.008673201315104961, -0.024313293397426605, -0.012156646698713303, -0.001264439313672483, 0.00925914105027914, -0.0038021716754883528, -0.02706914395093918, -0.003589687868952751, 0.02424890547990799, -0.017642591148614883, 0.0031035507563501596, -0.03724261000752449, -0.011242322623729706, -0.01139041781425476, -0.018067559227347374, -0.03147335350513458, -0.02209831215441227, -0.010759404860436916, -0.018698571249842644, -0.025188984349370003, 0.03268386796116829, 0.013972417451441288, 0.01149343978613615, 0.019973473623394966, -0.017771368846297264, -0.0054891640320420265, -0.012942193076014519, 0.005173657555133104, -0.03546547144651413, -0.0012620247434824705, -0.016045743599534035, 0.027094898745417595, 0.013141798786818981, 0.024274660274386406, -0.014899618923664093, -0.005614722613245249, -0.01905914954841137, 0.009098168462514877, -0.0027703375089913607, -0.0025417564902454615, 0.007681610062718391, 0.0021393252536654472, 0.005746719893068075, 0.002657656790688634, -0.022536156699061394, -0.03355955705046654, -0.0048645902425050735, 0.0018865983001887798, -0.02732669934630394, 0.005708086770027876, 0.015479120425879955, 0.015607898123562336, 0.02897505834698677, 0.012942193076014519, -5.0555441703181714e-05, 0.01372773852199316, 0.02272932417690754, 0.0007992287282831967, -0.01683128997683525, 0.01851828210055828, -0.015968477353453636, 0.00356393214315176, 0.005524578038603067, -0.030546151101589203, 0.0040468499064445496, 0.005086732562631369, 0.01572379842400551, 0.03482158109545708, -0.01366334967315197, 0.03268386796116829, -0.0068703084252774715, 0.018376626074314117, 0.002652827650308609, 0.03188544139266014, 0.02446782775223255, 0.014848107472062111, 0.010791599750518799, -0.0076043433509767056, 5.1083632570225745e-05, 0.04375877603888512, 0.001241098390892148, -0.0284599456936121, ...]"
1,"Jack is a character in a Play set in England. A middle-aged man in his 40s, Jack is a successful businessman and Sarah's boss. He has a no-nonsense attitude, but is fiercely loyal to his friends and family. He's married to Alice.","[0.0016696855891495943, -0.023069651797413826, -0.004641270264983177, -0.033406730741262436, -0.03975999727845192, 0.01821356825530529, -0.0029439192730933428, 0.004585939459502697, 0.0102394362911582, -0.013669940643012524, 0.0002733988221734762, -0.0019251833437010646, 0.009503863751888275, -0.0015728569123893976, 0.016703365370631218, 0.006148218642920256, 0.03356295824050903, -0.018890555948019028, 0.02562137506902218, -0.03632298484444618, -0.018031302839517593, 0.007700733374804258, -3.3717151382006705e-05, -0.02097359485924244, -0.027053464204072952, -0.0054614669643342495, 0.005786941386759281, -0.007140916772186756, 0.020296607166528702, -0.024410609155893326, -0.0030008775647729635, 0.010864348150789738, -0.006691761314868927, -0.014841649681329727, -0.024684006348252296, 0.009458296932280064, -0.015518637374043465, -0.01465938426554203, 0.007316673174500465, 0.002603798173367977, 0.004182350821793079, 0.019971132278442383, -0.004149803426116705, -0.014841649681329727, -0.0013360739685595036, 0.013012481853365898, -0.01831771992146969, -0.030334249138832092, -0.008221493102610111, 0.011339541524648666, -0.004159567877650261, 0.03197464346885681, -0.010786234401166439, -0.009106784127652645, 0.005526561755686998, -0.007466391194611788, -0.00841026846319437, 0.011430674232542515, 0.005783686880022287, -0.018942631781101227, 0.013142671436071396, -0.011020575650036335, -0.03874451667070389, -0.01908584125339985, 0.00031957554165273905, 0.009315088391304016, -0.009393202140927315, -0.005559109151363373, 0.006379305850714445, 0.0243975892663002, 0.0204267967492342, 0.03413579612970352, -0.011944924481213093, 0.02636345662176609, 0.011795205995440483, -0.01744544878602028, -0.02668893150985241, 0.02198907546699047, -0.0298916045576334, 0.022145304828882217, 0.004826791118830442, -0.03642713651061058, 0.007108369376510382, -0.001070811995305121, 0.02168963849544525, 0.023551354184746742, -0.016364872455596924, 0.0327037051320076, -0.01788809336721897, 0.019736791029572487, -0.0026054256595671177, 0.02638949453830719, 0.005702318158000708, 0.0061514731496572495, -0.012394079938530922, 0.01028500311076641, -0.006369541399180889, 0.025920812040567398, -0.0009202799410559237, -0.028485553339123726, ...]"
2,"Alice is a character in a Play set in England. A woman in her late 30s, Alice is a warm and nurturing mother of two, including Emily. She's kind-hearted and empathetic, but can be overly protective of her children and prone to worrying. She's married to Jack.","[0.005102536641061306, -0.0099968072026968, -0.017702678218483925, -0.035327255725860596, -0.041106659919023514, 0.030380921438336372, -0.003931036219000816, 0.02151656523346901, 0.004236927721649408, -0.015841294080018997, -0.005346599500626326, -0.0013415311696007848, 0.00492355739697814, -0.008818797767162323, 0.008226539008319378, 0.015659060329198837, 0.03129208832979202, -0.0020680243615061045, 0.019160546362400055, -0.02458850108087063, -0.013192400336265564, 0.01554191019386053, 0.009931723587214947, -0.0062024458311498165, -0.028116019442677498, -0.012652209028601646, 0.014292309992015362, -0.01646609418094158, 0.016856594011187553, -0.040924426168203354, -0.003543789964169264, 0.0003056885034311563, -0.010270156897604465, -0.04144509509205818, -0.02671021781861782, -0.003475452307611704, -0.004559090826660395, -0.01279539242386818, 0.021165113896131516, 0.006781687960028648, 0.010179040022194386, 0.009710440412163734, -0.009411056526005268, -0.006036483217030764, 0.008441314101219177, 0.028792886063456535, -0.021776897832751274, -0.006553896237164736, -0.018848145380616188, 0.013068742118775845, 0.0011405862169340253, 0.025278383865952492, -0.004025407135486603, -0.0037422941531986, -0.019498979672789574, -0.004842203110456467, 0.004617665894329548, 0.016179727390408516, 0.014695826917886734, -0.0007024937076494098, -0.0018320970702916384, -0.0029596665408462286, -0.025213301181793213, -0.019004346802830696, -0.003345285542309284, 0.006602708715945482, -0.009664881974458694, 0.011493724770843983, 0.008877372369170189, 0.020891765132546425, 0.03957069292664528, 0.018301445990800858, -0.00907262321561575, 0.011448166333138943, 0.017064861953258514, -0.004738070070743561, -0.03665495663881302, 0.029886286705732346, -0.016856594011187553, 0.02303951606154442, 0.0008460839162580669, -0.010569540783762932, 0.005711066536605358, 0.005200162064284086, 0.01808016188442707, 0.03129208832979202, -0.028246186673641205, 0.026020335033535957, -0.0025838101282715797, 0.019017362967133522, 0.003076816676184535, 0.01326399203389883, 0.030042488127946854, 0.014930127188563347, 0.00010982820094795898, -0.0044321781024336815, -0.0004568039730656892, 0.04430876299738884, -0.012853967025876045, -0.030745387077331543, ...]"
3,"Tom is a character in a Play set in England. A man in his 50s, Tom is a retired soldier and John's son. He has a no-nonsense approach to life, but is haunted by his experiences in combat and struggles with PTSD. He's also in a relationship with Rachel.","[0.011180771514773369, -0.01593923568725586, 0.0001870408159447834, -0.02493823505938053, -0.028589628636837006, 0.04700196906924248, 0.004609559662640095, -0.003160979598760605, -0.02097608521580696, -0.001298057148233056, 0.0030412087216973305, 0.004632219206541777, 0.003343872958794236, 0.005046561360359192, 0.0008578183478675783, 0.003334161825478077, 0.022996004670858383, 0.003952438477426767, 0.019176285713911057, -0.04477487877011299, -0.014981068670749664, -0.0007627299637533724, 0.007186251692473888, -0.015498996712267399, -0.00671364227309823, -0.013776886276900768, 0.01940935291349888, -0.015511944890022278, 0.02315138466656208, -0.026958154514431953, -0.00694670993834734, -0.02636253833770752, -0.017428278923034668, -0.03674699366092682, -0.008060255087912083, 0.01011254545301199, -0.006771909072995186, -0.012676289305090904, 0.023164331912994385, -0.00716682942584157, -0.002421313663944602, 0.005318473558872938, 0.0014089260948821902, -0.017557760700583458, 0.0031415573321282864, -0.011420313268899918, -0.0019001485779881477, -0.01869720220565796, -0.01259860023856163, 0.007684757467359304, 0.03084261529147625, 0.03586651757359505, -0.005240784492343664, -0.02168823778629303, 0.004609559662640095, -0.004023653455078602, -0.01649600826203823, 0.02334560640156269, -0.006661849562078714, -0.024990027770400047, -0.0012405995512381196, 0.0003536477452144027, -0.01873604767024517, -0.0060144392773509026, -0.014307762496173382, -0.005509459413588047, -0.027424290776252747, -0.014294814318418503, -0.002095990115776658, 0.03496014326810837, 0.030298789963126183, 0.03374301269650459, 0.026517916470766068, 0.008040833286941051, 0.022037837654352188, -0.005749001167714596, -0.022840626537799835, -0.0035995999351143837, 0.002599351340904832, -0.004023653455078602, 0.021804770454764366, -0.042936235666275024, -0.017000988125801086, 0.014566726051270962, 0.03110157884657383, -0.0004333601100370288, -0.007374000735580921, 0.04265137389302254, -0.014566726051270962, 0.010759955272078514, -0.016573697328567505, 0.026181261986494064, 0.017091626301407814, 0.032681260257959366, 0.014760949648916721, 0.016767920926213264, -0.02091134525835514, 0.027165325358510017, -0.0006069469382055104, -0.04213344678282738, ...]"
4,"Sarah is a character in a Play set in England. A woman in her mid-20s, Sarah is a free-spirited artist and Jack's employee. She's creative, unconventional, and passionate about her work. However, she can also be flighty and impulsive at times.","[-0.014750869944691658, -0.025154927745461464, -0.014519383199512959, -0.02024225704371929, -0.03467161953449249, 0.008565021678805351, -0.018711870536208153, 0.010166140273213387, -0.018750451505184174, -0.012673916295170784, 0.012339546345174313, 0.005021983291953802, 0.0016638132510706782, -0.0026331653352826834, 0.008442847989499569, -0.0034658757504075766, 0.030659176409244537, -0.014982356689870358, 0.015239564701914787, -0.0392756387591362, -0.017824504524469376, -0.004883734043687582, -0.003986721858382225, -0.007735526189208031, -0.026286642998456955, 0.010269023478031158, 0.01245529018342495, -0.012789660133421421, 0.00873863697052002, -0.027469798922538757, -0.009413807652890682, -0.0027569467201828957, -0.002682999474927783, -0.028987323865294456, -0.02970750629901886, -0.003777740290388465, -0.015226704999804497, -0.022068433463573456, -0.0005646516219712794, 0.011548632755875587, -0.010654835030436516, 0.032099541276693344, -0.011104948818683624, -0.0016421113396063447, -0.004398254211992025, 0.004279295448213816, -0.015188123099505901, -0.017310088500380516, -0.013072588481009007, 0.02911592833697796, 0.019200565293431282, 0.02289149910211563, -0.02654385007917881, 0.004407899454236031, -0.004867658484727144, 0.006185848731547594, -0.015188123099505901, 0.017104322090744972, 0.02443474531173706, -0.01959923841059208, 0.005079855211079121, 0.007549050264060497, -0.033179812133312225, -0.01857040636241436, -0.0024016783572733402, 0.006931751500815153, -0.018789032474160194, 0.01140716765075922, 0.0031636564526706934, 0.020885277539491653, 0.015599655918776989, 0.029347416013479233, -0.007015344221144915, 0.027752727270126343, 0.017361529171466827, -0.01403068844228983, -0.01514954213052988, 0.03184233233332634, -0.02383030764758587, 0.01750299334526062, 0.01517526339739561, -0.025129206478595734, 0.018943358212709427, -0.003334056818857789, 0.005137726664543152, 0.013336227275431156, -0.007941292598843575, 0.032562512904405594, -0.014609405770897865, 0.016152651980519295, 0.00029739656019955873, 0.031199311837553978, 0.011272134259343147, 0.0054077948443591595, -0.011522911489009857, 0.0011887825094163418, 0.004684397950768471, 0.0317651703953743, -0.012050188146531582, -0.024563349783420563, ...]"
5,"George is a character in a Play set in England. A man in his early 30s, George is a charming and charismatic businessman who is in a relationship with Emily. He's ambitious, confident, and always looking for the next big opportunity. However, he's also prone to bending the rules to get what he wants.","[-0.020272687077522278, -0.011866139248013496, -0.006424488965421915, -0.04161994159221649, -0.0356442853808403, 0.036273300647735596, -0.0034170025028288364, 0.027650529518723488, -0.013523860834538937, -0.002876441227272153, -0.0062311976216733456, 0.02356192097067833, -0.001606941339559853, 0.01828080043196678, 0.01994507387280464, 0.026051778346300125, 0.035775329917669296, -0.019198115915060043, 0.026549749076366425, -0.018713248893618584, -0.004809357225894928, -0.006761930417269468, 0.01509640272706747, -0.027545692399144173, -0.019591251388192177, -0.004717625677585602, 0.025501389056444168, -0.015135716646909714, 0.01859530806541443, -0.02116379328072071, -0.005906860809773207, -0.023365352302789688, 0.0010819416493177414, -0.02352260611951351, -0.029327906668186188, -0.01567300222814083, -0.013209352269768715, -0.017297960817813873, 0.014729476533830166, 0.021963169798254967, 0.007902023382484913, 0.015489538200199604, -0.025186879560351372, -0.027991246432065964, -0.009939775802195072, 0.0047798724845051765, -0.006358966697007418, -0.04041432961821556, -0.006522772833704948, 0.019892655313014984, 0.005739778280258179, 0.039497010409832, -0.014847416430711746, -0.00244071613997221, -0.003161464584991336, 0.010254284366965294, -0.002072151517495513, 0.00654570572078228, -0.004311386030167341, -0.02783399261534214, 0.017560051754117012, -0.007718559820204973, -0.02002369984984398, -0.028489219024777412, -0.011361615732312202, 0.005608732812106609, -0.03449108824133873, -0.005536658223718405, 0.008177218027412891, 0.02234319970011711, 0.03325926139950752, 0.027493275701999664, -0.005336814094334841, 0.020167849957942963, 0.017717305570840836, -0.006165674887597561, -0.02786020189523697, 0.0188180860131979, -0.02880372665822506, 0.004308109637349844, 0.004979716148227453, -0.045079536736011505, 0.005749606527388096, 0.008183770813047886, 0.0014578774571418762, 0.03331167995929718, -0.006742273457348347, 0.02977346070110798, -0.01568610593676567, 0.009913566522300243, -0.002114741364493966, 0.029930714517831802, 0.019381579011678696, 0.015161924995481968, -0.005546486470848322, -0.007888918742537498, -0.000792413717135787, 0.024885475635528564, 0.02569795586168766, -0.01854288950562477, ...]"
6,"Rachel is a character in a Play set in England. A woman in her late 20s, Rachel is a shy and introverted librarian who is in a relationship with Tom. She's intelligent, thoughtful, and has a deep love of books. However, she struggles with social anxiety and often feels like an outsider.","[-0.005774088203907013, -0.012860901653766632, 0.004610392265021801, -0.029932670295238495, -0.027345269918441772, 0.02691403590142727, 0.00522236293181777, -0.008390026167035103, -0.017921552062034607, -0.021409468725323677, -0.0015124876517802477, -0.00043480057502165437, 0.005472858902066946, 0.007330967579036951, 0.0014252897817641497, 0.0094807930290699, 0.02483396977186203, -0.004429654683917761, 0.017274701967835426, -0.045025840401649475, -0.012924318201839924, -0.007882692851126194, 0.01603173464536667, -0.018682552501559258, -0.02479591965675354, -0.01402776874601841, 0.027345269918441772, -0.011979409493505955, 0.018796702846884727, -0.03013560362160206, -0.0030122920870780945, -0.01574001833796501, -0.0021086044143885374, -0.04474680498242378, -0.02635597065091133, -0.015194634906947613, -0.006747534032911062, -0.0071090091951191425, 0.016780052334070206, 0.011408659629523754, -0.003725729649886489, 0.014459001831710339, -0.0185810849070549, 0.023223185911774635, 0.004226721357554197, 0.009709092788398266, -0.014141918160021305, -0.023375386372208595, -0.011269142851233482, 0.01789618469774723, 0.02965363673865795, 0.015904901549220085, -0.012689676135778427, -0.005181142129004002, -0.012004776857793331, -0.0016646876465529203, -0.005070162937045097, 0.010343259200453758, -0.004036471247673035, -0.025544237345457077, -0.00409354642033577, 0.008326609618961811, -0.024440785869956017, -0.013139935210347176, -0.001361873117275536, 0.0032437629997730255, -0.02418711967766285, 0.011833551339805126, -0.0013349209912121296, 0.0227158535271883, 0.004613562952727079, 0.01766788586974144, 0.025734486058354378, 0.015689285472035408, 0.009487134404480457, -0.01599368453025818, -0.015042435377836227, 0.017300069332122803, 0.0015077313873916864, -0.007952450774610043, 0.006341667380183935, -0.033915236592292786, 0.0020039668306708336, -0.0018168877577409148, 0.009829584509134293, 0.01566391810774803, -0.007762201130390167, 0.041347671300172806, -0.02031870186328888, 0.011053526774048805, -0.01092035137116909, 0.03302740305662155, 0.0180103350430727, 0.029602903872728348, 0.01982405222952366, 0.0023004398681223392, -0.0064336215145885944, 0.04223550483584404, -0.004474046174436808, -0.031378570944070816, ...]"
7,"John is a character in a Play set in England. A man in his 60s, John is a retired professor and Tom's father. He has a dry wit and a love of intellectual debate, but can also be stubborn and set in his ways.","[0.014504357241094112, -0.016475198790431023, -0.02327653020620346, -0.029163289815187454, -0.02663855254650116, 0.03980325534939766, -0.007954210042953491, 0.026612790301442146, -0.033980902284383774, -0.0012969870585948229, -0.008012176491320133, 0.011560977436602116, 0.010311489924788475, 0.008907427079975605, 0.009016918018460274, 0.014710457995533943, 0.027669057250022888, 0.0029192273505032063, 0.019785694777965546, -0.0255951676517725, -0.019695525988936424, -0.007703024428337812, -0.009538611397147179, -0.008907427079975605, -0.005915742367506027, 0.0009242341620847583, 0.015509099699556828, -0.016088759526610374, 0.018394513055682182, -0.016346385702490807, -0.012153517454862595, -0.01002810150384903, -0.004250832833349705, -0.04598628357052803, -0.012984362430870533, 0.0025859232991933823, -0.0219883993268013, -0.005223372019827366, 0.020172134041786194, 0.005915742367506027, 0.004663034807890654, 0.015934182330965996, -0.005812691990286112, -0.04178697615861893, -0.0043860869482159615, -0.004669475369155407, -0.0025521100033074617, -0.010105389170348644, -0.008308446034789085, 0.0036872755736112595, 0.007155568804591894, 0.04302358254790306, 0.00023226616031024605, -0.003212277079001069, 0.0005072016501799226, -0.01328063290566206, -0.027875158935785294, 0.008843020536005497, -0.009119968861341476, -0.024848051369190216, -0.004669475369155407, -0.001877451199106872, -0.012378941290080547, -0.013357920572161674, -0.0105562349781394, -0.005938285030424595, -0.013731478713452816, -0.01919315569102764, -0.0038998171221464872, 0.02967854216694832, 0.021730773150920868, 0.018252819776535034, 0.008405055850744247, 0.026264995336532593, 0.03135311231017113, 0.005815912503749132, -0.025878556072711945, 0.009274544194340706, -0.007967091165482998, 0.0057128616608679295, 0.00744539825245738, -0.022735515609383583, 0.0028193972539156675, 0.025401948019862175, 0.019283324480056763, 0.004044732078909874, -0.0018903325544670224, 0.03065752238035202, 0.008701326325535774, 0.011419283226132393, -0.02039111778140068, 0.04013816639780998, 0.024255510419607162, 0.035114455968141556, 0.005699980538338423, 0.0027115161065012217, -0.012385381385684013, 0.017879260703921318, -0.0001004842997645028, -0.025002626702189445, ...]"
8,"Maria is a character in a Movie set in Texas. A middle-aged Latina woman in her 40s, Maria is a hard-working single mother who owns a small family-run diner in a small Texas town. She's fiercely protective of her teenage daughter, Sofia, and is always trying to balance work and family.","[-0.01540297269821167, -0.02033880539238453, -0.015601464547216892, -0.036919496953487396, -0.02662438154220581, 0.003266846062615514, -0.010619317181408405, 0.010645783506333828, -0.015522067435085773, -0.03837510198354721, 0.010156169533729553, 0.0030749705620110035, 0.0060573117807507515, 0.022905966266989708, 0.01374887302517891, 0.006315350998193026, 0.03321431577205658, -0.010447291657328606, 0.0316263772547245, -0.04986117035150528, -0.025698086246848106, 0.01950513944029808, -0.0011644859332591295, -0.0019617618527263403, -0.01818186044692993, 0.013027685694396496, 0.009163710288703442, -0.02572455257177353, 0.015389739535748959, -0.029773786664009094, 0.004558697808533907, -0.00044081747182644904, 0.0037647299468517303, -0.005071468651294708, -0.0221781637519598, 0.0005987839540466666, 0.013920899480581284, -0.025790715590119362, 0.0049821469001472, 0.008535152301192284, -0.006801656447350979, -0.0003661762457340956, -0.016818881034851074, -0.026002440601587296, 0.005997763946652412, 0.011763954535126686, -6.337268132483587e-05, -0.01603814773261547, -0.026240631937980652, -0.0009031381923705339, 0.014926591888070107, 0.03803105279803276, -0.03355836495757103, -0.014225253835320473, -0.012319731526076794, -0.0035993200726807117, -0.015588231384754181, 0.008515303023159504, 0.017679013311862946, -0.010685481131076813, 0.011651475913822651, 0.007840430364012718, -0.017348192632198334, -0.0018740944797173142, 0.0016631968319416046, 0.021410660818219185, -0.01325264386832714, 0.02119893580675125, 0.0020626618061214685, 0.021516524255275726, 0.023369114845991135, 0.025512827560305595, 0.0034008282236754894, 0.010745028965175152, 0.007244954816997051, -0.026187699288129807, -0.006781807169318199, 0.014953057281672955, 0.010950136929750443, 0.015614697709679604, 0.00672225933521986, -0.0014398933853954077, -0.0006107761873863637, 0.015019221231341362, 0.019266948103904724, 0.015204480849206448, -0.01852591149508953, 0.015455903485417366, -0.025698086246848106, 0.002881441032513976, -0.004591779783368111, 0.016051379963755608, 0.017758410423994064, 0.027894729748368263, -0.016144009307026863, -0.0009279497317038476, -0.0065568494610488415, 0.035755012184381485, -0.011168478056788445, -0.0032238394487649202, ...]"
9,"Caleb is a character in a Movie set in Texas. A young African American man in his early 20s, Caleb is a talented musician who dreams of making it big in the music industry. He's charismatic, confident, and has a way with words. However, he's also struggling with addiction and is constantly at odds with his strict religious family.","[0.005176080856472254, -0.0296921469271183, 0.00675056641921401, -0.04228803142905235, -0.026239270344376564, 0.028967948630452156, -0.010565541684627533, 0.0013562560779973865, -0.01895848661661148, 0.0007686528842896223, 0.0030374315101653337, -0.0019285023445263505, 0.010947039350867271, 0.008716248907148838, 0.024984855204820633, 0.00011851075396407396, 0.04761606454849243, -0.024027878418564796, -0.0014823442324995995, -0.02782992087304592, -0.015971167013049126, -0.009964197874069214, -0.012072133831679821, -0.007287249434739351, -0.009343456476926804, 0.014535702764987946, 0.012615283019840717, -0.025592664256691933, 0.03274412825703621, -0.030778445303440094, 0.010048257187008858, 0.004600601736456156, -0.006744100246578455, -0.020238766446709633, -0.033830426633358, -0.023329544812440872, -0.01289332378655672, -0.04637458175420761, 0.006983344443142414, 0.011503120884299278, 0.014755548909306526, 0.0026462350506335497, -0.013902029022574425, -0.012886857613921165, -0.0035078374203294516, 0.009117144159972668, -0.00916240643709898, -0.009317591786384583, -0.016540180891752243, 0.015376290306448936, 0.020730188116431236, 0.021583708003163338, -0.011580713093280792, -0.0013829285744577646, 0.005557578522711992, 0.009440447203814983, -0.032407891005277634, 0.010546143166720867, 0.006731168366968632, -0.007481231354176998, -0.0006231665029190481, 0.026666030287742615, 0.0066115460358560085, -0.01894555613398552, -0.002688264474272728, 0.010979369282722473, 0.010048257187008858, 0.02762300707399845, 0.007604086305946112, 0.02471328154206276, 0.016216877847909927, 0.020083582028746605, 0.01978614367544651, -0.0016173232579603791, 0.00309077650308609, 0.010998767800629139, -0.042753588408231735, 0.0037600137293338776, -0.025812510401010513, 0.0395464226603508, 0.00885850191116333, -0.016566045582294464, -0.03683067858219147, 0.026097018271684647, 0.004157676361501217, -0.0010103218955919147, -0.018997283652424812, 0.009058949537575245, -0.02206219546496868, -0.005273071583360434, -0.003824674291536212, 0.017794596031308174, 0.012188523076474667, 0.01626860722899437, -0.00700274296104908, -0.000692676636390388, 0.010339229367673397, 0.00977668259292841, -0.009815478697419167, -0.021144015714526176, ...]"


In [9]:
pd.reset_option('display.max_colwidth')
pd.reset_option('display.max_rows')
df.to_csv("embeddings.csv")

## Custom Query Completion

In [2]:
df = pd.read_csv("embeddings.csv", index_col=0)
df['embeddings'] = df['embeddings'].apply(ast.literal_eval)

In [3]:
def get_rows_sorted_by_relevance(question, df):
    """
    Function that takes in a question string and a dataframe containing
    rows of text and associated embeddings, and returns that dataframe
    sorted from least to most relevant for that question
    """
    
    # Get embeddings for the question text
    question_embeddings = get_embedding(question, engine=EMBEDDING_MODEL_NAME)
    
    # Make a copy of the dataframe and add a "distances" column containing
    # the cosine distances between each row's embeddings and the
    # embeddings of the question
    df_copy = df.copy()
    df_copy["distances"] = distances_from_embeddings(
        question_embeddings,
        df_copy["embeddings"].values,
        distance_metric="cosine"
    )
    
    # Sort the copied dataframe by the distances and return it
    # (shorter distance = more relevant so we sort in ascending order)
    df_copy.sort_values("distances", ascending=True, inplace=True)
    return df_copy


In [4]:
def create_prompt(question, df, max_token_count):
    """
    Given a question and a dataframe containing rows of text and their
    embeddings, return a text prompt to send to a Completion model
    """
    # Create a tokenizer that is designed to align with our embeddings
    tokenizer = tiktoken.get_encoding("cl100k_base")
    
    # Count the number of tokens in the prompt template and question
    prompt_template = """
Answer the question based on the context below, and if the question
can't be answered based on the context, say "I don't know". 
The data contains character descriptions from theater, television, and film productions. 
Each row contains the name, description, medium, and setting for each character.

Context: 

{}

---

Question: {}
Answer:"""
    
    current_token_count = len(tokenizer.encode(prompt_template)) + \
                            len(tokenizer.encode(question))
    
    context = []
    for text in get_rows_sorted_by_relevance(question, df)["text"].values:
        
        # Increase the counter based on the number of tokens in this row
        text_token_count = len(tokenizer.encode(text))
        current_token_count += text_token_count
        
        # Add the row of text to the list if we haven't exceeded the max
        if current_token_count <= max_token_count:
            context.append(text)
        else:
            break

    return prompt_template.format("\n\n###\n\n".join(context), question)


def custom_question_answer(
    question, df, max_prompt_tokens=1800, max_answer_tokens=150
):
    """
    Given a question, a dataframe containing rows of text, and a maximum
    number of desired tokens in the prompt and response, return the
    answer to the question according to an OpenAI Completion model
    
    If the model produces an error, return an empty string
    """
    custom_prompt = create_prompt(question, df, max_prompt_tokens)
    try:
        response = openai.Completion.create(
            model=COMPLETION_MODEL_NAME,
            prompt=custom_prompt,
            max_tokens=max_answer_tokens
        )
        return response["choices"][0]["text"].strip()
    except Exception as e:
        print(e)
        return ""

In [5]:
def simple_question_answer(question):
    simple_prompt = f"""
Question: "{question}"
Answer:
"""
    return openai.Completion.create(
        model=COMPLETION_MODEL_NAME,
        prompt=simple_prompt,
        max_tokens=150
    )["choices"][0]["text"].strip()

## Custom Performance Demonstration

### Question 1

In [7]:
question = "Who all are characters from England?"
simple_answer = simple_question_answer(question)
custom_answer = custom_question_answer(question, df)
print(f"Question: {question}\n\nSimple Answer:\n{simple_answer}\n\nCustom Answer:\n{custom_answer}")

Question: Who all are characters from England?

Simple Answer:
1. Sherlock Holmes 
2. James Bond 
3. Harry Potter 
4. Elizabeth Bennett 
5. Mr. Darcy 
6. Mary Poppins 
7. Mr. Bean 
8. Alice in Wonderland 
9. Oliver Twist 
10. Robin Hood 
11. Peter Pan 
12. Jane Eyre 
13. Dracula 
14. Ebenezer Scrooge 
15. Willy Wonka.

Custom Answer:
Emily, George, John, Jack, Alice, Tom, Sarah, Rachel


### Question 2

In [8]:
question = "What are the different locations that the characters are in?"
simple_answer = simple_question_answer(question)
custom_answer = custom_question_answer(question, df)
print(f"Question: {question}\n\nSimple Answer:\n{simple_answer}\n\nCustom Answer:\n{custom_answer}")

Question: What are the different locations that the characters are in?

Simple Answer:
There is not enough context to determine the specific characters and locations being referred to. However, some common locations where characters may be found include:

1. Home or residence: This could include a character's house, apartment, or any other place they live in.
2. Work or school: Many characters may spend a significant amount of time at their workplace or school.
3. Public places: Characters may be seen in various public places such as parks, cafes, malls, etc.
4. Travel destinations: Characters may go on trips or vacations to different locations such as beaches, mountains, or other countries.
5. The setting of the story: Depending on the plot of the story, characters may be in specific locations such as a medieval castle,

Custom Answer:
The characters are in England, Texas, USA, Australia, and Ancient Greece.


### Question 3

In [9]:
question = "Can you give all distinct medium and settings from the data?"
simple_answer = simple_question_answer(question)
custom_answer = custom_question_answer(question, df)
print(f"Question: {question}\n\nSimple Answer:\n{simple_answer}\n\nCustom Answer:\n{custom_answer}")

Question: Can you give all distinct medium and settings from the data?

Simple Answer:
1. Watercolor on paper 
2. Oil on canvas 
3. Charcoal on paper 
4. Acrylic on canvas 
5. Ink on paper 
6. Mixed media on canvas 
7. Pen and ink on paper 
8. Pencil on paper 
9. Digital print on paper 
10. Screen print on fabric 
11. Etching on paper 
12. Woodcut on paper 
13. Collage on paper 
14. Photography on canvas 
15. Ceramic sculpture 
16. Bronze sculpture 
17. Stone sculpture 
18. Glass blowing 
19. Encaustic painting on wood 
20. Graffiti on concrete.

Custom Answer:
The distinct mediums are Musical, Play, Movie, Reality Show, Limited Series, and Opera. The distinct settings are USA, England, Italy, and Australia.


# Result

### ***From these, we can see that using the custom prompt with the embeddings are much better at answering questions about the dataset, although some of them are not completely correct.***