# Custom Chatbot Project

TODO: In this cell, write an explanation of which dataset you have chosen and why it is appropriate for this task

I have selected https://en.wikipedia.org/wiki/2022 for my dataset as it contains lot of interesting events information which has happened over the course of 2022. It has varity of data and covering a wide range of topics, makes it an ideal resource for training a chatbot.

## Data Wrangling

TODO: In the cells below, load your chosen dataset into a `pandas` dataframe with a column named `"text"`. This column should contain all of your text data, separated into at least 20 rows.

In [52]:
import requests
import pandas as pd

# This function is to fetch Wikipedia data extract for"2022" Wikipedia page
def fetch_wikipedia_extract(title):
    endpoint = "https://en.wikipedia.org/w/api.php"
    params = {
        "action": "query",
        "prop": "extracts",
        "format": "json",
        "titles": title,
        "explaintext": True
    }

    response = requests.get(endpoint, params=params)
    
    if response.status_code == 200:
        data = response.json()
        page = next(iter(data['query']['pages'].values()))
        return page.get('extract', '')
    else:
        print(f"Failed to fetch data. Status code: {response.status_code}")
        return ""

content = fetch_wikipedia_extract("2022")

paragraphs = content.split("\n")

while len(paragraphs) < 20:
    paragraphs.extend(paragraphs)

paragraphs = paragraphs[:20]

df = pd.DataFrame(paragraphs, columns=["text"])

print(df)

                                                 text
0   2022 (MMXXII) was a common year starting on Sa...
1   The year saw the removal of nearly all COVID-1...
2   2022 was also dominated by wars and armed conf...
3                                                    
4                                                    
5                                        == Events ==
6                                                    
7                                                    
8                                     === January ===
9    January 1 – The Regional Comprehensive Econom...
10  January 2 – Abdalla Hamdok resigns as Prime Mi...
11  January 4 – The five permanent members of the ...
12  January 5 – A nationwide state of emergency is...
13  January 6 – The CSTO deploys a "peacekeeping" ...
14  January 7 – COVID-19 pandemic: The number of C...
15  January 9 – February 6 – The 2021 Africa Cup o...
16  January 10 – The first successful heart transp...
17  January 15 – A large eru

In [53]:
#This step is remove blanks
df = df[df["text"].str.len()>0]
df

Unnamed: 0,text
0,2022 (MMXXII) was a common year starting on Sa...
1,The year saw the removal of nearly all COVID-1...
2,2022 was also dominated by wars and armed conf...
5,== Events ==
8,=== January ===
9,January 1 – The Regional Comprehensive Econom...
10,January 2 – Abdalla Hamdok resigns as Prime Mi...
11,January 4 – The five permanent members of the ...
12,January 5 – A nationwide state of emergency is...
13,"January 6 – The CSTO deploys a ""peacekeeping"" ..."


In [54]:
#This step is to remove text starting from "=="
df = df[~df["text"].str.startswith("==")]

In [55]:
df.tail(20)

Unnamed: 0,text
0,2022 (MMXXII) was a common year starting on Sa...
1,The year saw the removal of nearly all COVID-1...
2,2022 was also dominated by wars and armed conf...
9,January 1 – The Regional Comprehensive Econom...
10,January 2 – Abdalla Hamdok resigns as Prime Mi...
11,January 4 – The five permanent members of the ...
12,January 5 – A nationwide state of emergency is...
13,"January 6 – The CSTO deploys a ""peacekeeping"" ..."
14,January 7 – COVID-19 pandemic: The number of C...
15,January 9 – February 6 – The 2021 Africa Cup o...


In [56]:
# This step is to adjust the text for example- If the row's text is not a date then we'll add the prefix "-"
from dateutil.parser import parse

prefix = ""
for (i, row) in df.iterrows():

    if " – " not in row["text"]:
        try:
            parse(row["text"])
            prefix = row["text"]
        except:
            row["text"] = prefix + " – " + row["text"]
df = df[df["text"].str.contains(" – ")].reset_index(drop=True)

In [57]:
df

Unnamed: 0,text
0,– 2022 (MMXXII) was a common year starting on...
1,– The year saw the removal of nearly all COVI...
2,– 2022 was also dominated by wars and armed c...
3,January 1 – The Regional Comprehensive Econom...
4,January 2 – Abdalla Hamdok resigns as Prime Mi...
5,January 4 – The five permanent members of the ...
6,January 5 – A nationwide state of emergency is...
7,"January 6 – The CSTO deploys a ""peacekeeping"" ..."
8,January 7 – COVID-19 pandemic: The number of C...
9,January 9 – February 6 – The 2021 Africa Cup o...


## Custom Query Completion

TODO: In the cells below, compose a custom query using your chosen dataset and retrieve results from an OpenAI `Completion` model. You may copy and paste any useful code from the course materials.

In [58]:
import openai

# Set your OpenAI API key
api_key = 'your_api_Key'
openai.api_key = api_key

In [59]:
# Define a function to query the OpenAI Completion model
def query_openai_model(prompt):
    response = openai.Completion.create(
        model="gpt-3.5-turbo-instruct",
        prompt=prompt,
        max_tokens=150
    )
    return response.choices[0].text.strip()

In [60]:
# Define a custom query function using the dataset
def custom_query(question, df):
    context = " ".join(df['text'].tolist())
    prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
    return query_openai_model(prompt)

# Demonstrate performance with example questions
questions = [
    "What were some major events in 2022?",
    "What were the significant political changes in 2022?"
]

for question in questions:
    basic_answer = query_openai_model(question)
    custom_answer = custom_query(question, df)
    
    print(f"Basic Completion Model Answer for '{question}':\n{basic_answer}\n")
    print(f"Custom Query Answer for '{question}':\n{custom_answer}\n")

Basic Completion Model Answer for 'What were some major events in 2022?':
1. Winter Olympics in Beijing, China
2. FIFA World Cup in Qatar
3. Commonwealth Games in Birmingham, UK
4. G20 Summit in Rome, Italy
5. United Nations Climate Change Conference (COP26) in Glasgow, Scotland
6. Presidential elections in France
7. European Union parliamentary elections
8. SpaceX's first commercial crewed flight to the International Space Station
9. Global protests and strikes for climate action
10. Continued global COVID-19 pandemic response and vaccine distribution efforts.

Custom Query Answer for 'What were some major events in 2022?':

Basic Completion Model Answer for 'What were the significant political changes in 2022?':
1. Presidential Elections: One of the most significant political changes in 2022 was the presidential elections in various countries around the world. Several countries, including Brazil, France, and Russia, held their presidential elections, bringing in new leaders and chang

## Custom Performance Demonstration

TODO: In the cells below, demonstrate the performance of your custom query using at least 2 questions. For each question, show the answer from a basic `Completion` model query as well as the answer from your custom query.

### Question 1

In [61]:
# Define a custom query function using the dataset
def custom_query(question, df):
    context = " ".join(df['text'].tolist())
    prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
    return query_openai_model(prompt)

# Demonstrate performance with example questions
questions = [
    "What was the impact of COVID-19 in 2022?"
]

for question in questions:
    basic_answer = query_openai_model(question)
    custom_answer = custom_query(question, df)
    
    print(f"Basic Completion Model Answer for '{question}':\n{basic_answer}\n")
    print(f"Custom Query Answer for '{question}':\n{custom_answer}\n")

Basic Completion Model Answer for 'What was the impact of COVID-19 in 2022?':
It is difficult to predict the exact impact of COVID-19 in 2022 as the situation is constantly evolving. However, some potential impacts could include:

1. Economic Impact: The pandemic has already caused significant disruptions to global economies, and this is likely to continue in 2022. The prolonged lockdowns, travel restrictions, and supply chain disruptions have led to a decrease in consumer spending, business closures, and job losses. It may take some time for economies to recover from this impact, and some industries may see permanent changes in consumer behavior and demand.

2. Health Impact: While vaccines have been developed and distributed, it is possible that new variants of the virus could continue to emerge and cause outbreaks in different parts of the world. This

Custom Query Answer for 'What was the impact of COVID-19 in 2022?':
In 2022, the COVID-19 pandemic continued to have a significant i

In [62]:
# Define a custom query function using the dataset
def custom_query(question, df):
    context = " ".join(df['text'].tolist())
    prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
    return query_openai_model(prompt)

# Demonstrate performance with example questions
questions = [
    "Who are the five permanent members of the UN Security Council?"
]

for question in questions:
    basic_answer = query_openai_model(question)
    custom_answer = custom_query(question, df)
    
    print(f"Basic Completion Model Answer for '{question}':\n{basic_answer}\n")
    print(f"Custom Query Answer for '{question}':\n{custom_answer}\n")

Basic Completion Model Answer for 'Who are the five permanent members of the UN Security Council?':
The five permanent members of the UN Security Council are the United States, Russia, China, France, and the United Kingdom.

Custom Query Answer for 'Who are the five permanent members of the UN Security Council?':
The five permanent members of the UN Security Council are China, France, Russia, the United Kingdom, and the United States.



### Question 2

In [63]:
# Define a custom query function using the dataset
def custom_query(question, df):
    context = " ".join(df['text'].tolist())
    prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
    return query_openai_model(prompt)

# Demonstrate performance with example questions
questions = [
    "Abdalla Hamdok has resigned from which position in 2022?"
]

for question in questions:
    basic_answer = query_openai_model(question)
    custom_answer = custom_query(question, df)
    
    print(f"Basic Completion Model Answer for '{question}':\n{basic_answer}\n")
    print(f"Custom Query Answer for '{question}':\n{custom_answer}\n")

Basic Completion Model Answer for 'Abdalla Hamdok has resigned from which position in 2022?':
Abdalla Hamdok resigned from his position as Prime Minister of Sudan in March 2022.

Custom Query Answer for 'Abdalla Hamdok has resigned from which position in 2022?':
Prime Minister of Sudan.



In [64]:
# Define a custom query function using the dataset
def custom_query(question, df):
    context = " ".join(df['text'].tolist())
    prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
    return query_openai_model(prompt)

# Demonstrate performance with example questions
questions = [
    "Who are the five permanent members of the UN Security Council?"
]

for question in questions:
    basic_answer = query_openai_model(question)
    custom_answer = custom_query(question, df)
    
    print(f"Basic Completion Model Answer for '{question}':\n{basic_answer}\n")
    print(f"Custom Query Answer for '{question}':\n{custom_answer}\n")

Basic Completion Model Answer for 'Who are the five permanent members of the UN Security Council?':
The five permanent members of the UN Security Council are the United States, Russia, China, France, and the United Kingdom.

Custom Query Answer for 'Who are the five permanent members of the UN Security Council?':
The five permanent members of the UN Security Council are China, France, Russia, the United Kingdom, and the United States.



In [66]:
# Define a custom query function using the dataset
def custom_query(question, df):
    context = " ".join(df['text'].tolist())
    prompt = f"Context: {context}\n\nQuestion: {question}\nAnswer:"
    return query_openai_model(prompt)

# Demonstrate performance with example questions
questions = [
    "where was the first successful heart transplant from a pig to a human patient occured?"
]

for question in questions:
    basic_answer = query_openai_model(question)
    custom_answer = custom_query(question, df)
    
    print(f"Basic Completion Model Answer for '{question}':\n{basic_answer}\n")
    print(f"Custom Query Answer for '{question}':\n{custom_answer}\n")

Basic Completion Model Answer for 'where was the first successful heart transplant from a pig to a human patient occured?':
The first successful heart transplant from a pig to a human patient occurred at the National Heart and Lung Institute in Bethesda, Maryland, United States in 1984. The patient, a 35-year-old man with end-stage heart failure, received a baboon heart but died 21 days after the surgery due to a combination of immune rejection and infection. Since then, several other attempts have been made to transplant pig hearts into humans, but none have been successful.

Custom Query Answer for 'where was the first successful heart transplant from a pig to a human patient occured?':
The first successful heart transplant from a pig to a human patient occurred in Baltimore, Maryland, United States.

