<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

This Jupyter Notebook demonstrates the backend logic of a Data Analysis Chatbot. Built with OpenAI's GPT model, this interactive tool allows users to input their API key, the paths to their data, and their questions. The model processes this information to provide insightful answers to the user's queries about their dataset. The AI-powered assistant is designed to understand and analyze data uploaded in CSV format, delivering its analysis in a natural, conversational manner.

In [None]:
os.getcwd()

'/workspaces/ai-assisted-coding-final/nbs'

In [1]:
#| echo: false
#| output: asis
show_doc(GPTQuestionsHandler)

---

### GPTQuestionsHandler

>      GPTQuestionsHandler (api_key)

This class interfaces with OpenAI's GPT model. 
It sends user queries about the dataset to the GPT model and retrieves responses.

In [2]:
#| echo: false
#| output: asis
show_doc(CSVFileManager)

---

### CSVFileManager

>      CSVFileManager ()

This class is responsible for handling CSV file operations. 
It can read one or more CSV files, either from file paths or file-like objects, and concatenate them into a single pandas DataFrame.

In [None]:
def main():
    #file_paths_input = input("Enter CSV file paths, separated by a comma: ") Can use this if you want the notebook to be interactive
    file_paths_input = '../data/009-1.csv, ../data/009-2.csv'
    file_paths = [path.strip() for path in file_paths_input.split(',')]

    data_manager = CSVFileManager()
    data_frame = data_manager.load_data(file_paths)

    api_key = getpass.getpass("Enter OpenAI API Key: ")
    ai_handler = GPTQuestionsHandler(api_key)

    # Initialize the conversation history
    history = []
    try:
        while True:
            user_question = input("Please enter your question about the data: ")
            # Update the history with the user's question
            history.append({"role": "user", "content": user_question})

            answer = ai_handler.ask_gpt(history, data_frame.to_string(index=False))

            print("Question: ", user_question,"\n")
            print("Answer:", answer, "\n")
            
            # Ask the user if they want to continue or exit
            continue_chat = input("Do you have another question? (yes/no): ")
            if continue_chat.lower() == 'no':
                print("Exiting the chat.")
                break
            
    except Exception as e:
        print("An error occurred. Please try again.",e)


if __name__ == "__main__" and os.environ.get('NBDEV_PREPARE') is None:
    main()

Question:  How many rows are there?  

Answer: There are 46 rows in the dataset. 

Question:  Give me the line of code in python to calculate this 

Answer: Certainly! You can use the pandas library in Python to read the dataset and calculate the number of rows. Here's the line of code to do that:

```python
import pandas as pd

# Load the dataset
data = pd.read_csv('your_file.csv')  # Replace 'your_file.csv' with the actual file name

# Calculate the number of rows
num_rows = data.shape[0]
print(num_rows)
```

Replace `'your_file.csv'` with the actual file name and use this code to calculate the number of rows in the dataset. 

Question:  Give me an example of an OTR utterance 

Answer: Certainly! An example of an OTR (other) utterance from the given dataset is:
"Can someone give me a sentence with an exclamation mark?"

This is an example of an utterance where the speaker is seeking a specific type of response from the audience. 

Exiting the chat.


The above output from the model illustrates that it has the capabilities to remember the conversation and answer follow up questions.

### Commentary on code functionality

#### Analysis of GPT Model Responses with examples

In [None]:
def main():
    file_paths_input = '../data/009-1.csv, ../data/009-2.csv'
    file_paths = [path.strip() for path in file_paths_input.split(',')]

    data_manager = CSVFileManager()
    data_frame = data_manager.load_data(file_paths)

    try:
        api_key = getpass.getpass("Enter OpenAI API Key: ")
        ai_handler = GPTQuestionsHandler(api_key)
    except Exception as e:
        print(f"An error occurred: {e}")
        return

    # Initialize the conversation history
    history = []
    try:
        while True:
            user_question = "Can you provide examples of sentences labeled as 'PRS'"
            # Update the history with the user's question
            history.append({"role": "user", "content": user_question})

            answer = ai_handler.ask_gpt(history, data_frame.to_string(index=False))

            print("Question: ", user_question,"\n")
            print("Answer:", answer, "\n")
            
            # Ask the user if they want to continue or exit
            continue_chat = "no"
            if continue_chat.lower() == 'no':
                print("Exiting the chat.")
                break
            
    except Exception as e:
        print("An error occurred. Please try again.",e)

if __name__ == "__main__" and os.environ.get('NBDEV_PREPARE') is None:
    main()

Question:  Can you provide examples of sentences labeled as 'PRS' 

Answer: Sure! Here are the examples of sentences labeled as 'PRS' from the given data:

1. Great example, 'run' is a verb because it is an action.
2. Our sentence is now: 'The fast cat runs'.
3. Great job, that sentence definitely needed an exclamation mark!
4. Great, now our sentence is: 'She reads a book'.
5. Great, now our sentence is: 'She runs quickly'.
6. Well done, everyone! You have learned a lot today.
7. Wonderful! That sentence is showing strong emotion, so it is an exclamatory sentence.
8. Great, now our sentence is: 'The book on the table'.
9. Well done, everyone! You have learned a lot today.
10. I am looking forward to reading your paragraphs.
11. Have a great day, and remember to keep practicing your English!

These are the examples of sentences labeled as 'PRS' from the given data. 

Exiting the chat.


In [None]:
# def main():
#     file_paths_input = '../data/009-1.csv, ../data/009-2.csv'
#     file_paths = [path.strip() for path in file_paths_input.split(',')]

#     data_manager = CSVFileManager()
#     data_frame = data_manager.load_data(file_paths)

#     try:
#         api_key = getpass.getpass("Enter OpenAI API Key: ")
#         ai_handler = GPTQuestionsHandler(api_key)
#     except Exception as e:
#         print(f"An error occurred: {e}")
#         return

#     # Initialize the conversation history
#     history = []
#     try:
#         while True:
#             user_question = "What are the unique labels in the label column?"
#             # Update the history with the user's question
#             history.append({"role": "user", "content": user_question})

#             answer = ai_handler.ask_gpt(history, data_frame.to_string(index=False))

#             print("Question: ", user_question,"\n")
#             print("Answer:", answer, "\n")
            
#             # Ask the user if they want to continue or exit
#             continue_chat = "no"
#             if continue_chat.lower() == 'no':
#                 print("Exiting the chat.")
#                 break
            
#     except Exception as e:
#         print("An error occurred. Please try again.",e)

# if __name__ == "__main__":
#     main()

Question:  What are the unique labels in the label column? 

Answer: The unique labels in the "Label" column are:
- PRS
- OTR
- NaN 

Exiting the chat.


1. Response to "How many unique labels are there in the Label column?"
The model correctly identifies three unique labels: 'PRS', 'OTR', and 'NaN'.
'NaN' is mentioned as a label, which indicates the model understands 'NaN' (commonly used to represent missing data in pandas DataFrames) as a category. This shows an understanding of data handling conventions.
The response indicates the model's ability to interpret and categorize data based on provided information.

2. Response to "Can you provide examples of sentences labeled as 'PRS'"
The model provided a list of sentences which are assumed to be labeled as 'PRS'.
The model's response demonstrates its ability to extract and present data based on specified criteria

From the above two examples, we can see that overall, the model shows a strong understanding of the context and structure of the data. It can differentiate between labels and provide relevant examples. The provided examples seem relevant and accurately categorized based on the 'PRS' label. This indicates the model's effectiveness in filtering and presenting data as per user queries.

While the model can provide information based on the data it's been given in the prompt, and can provide the python code to perform analyses, it can't run codes and provide visualizations.


The model's responses to queries about the dataset showcase its capabilities in data analysis, context understanding, and relevant information extraction. The model effectively interprets and categorizes data, providing coherent and contextually appropriate responses. 

Moreover, it has the capabilities to remember the conversation history, to answer follow up questions.

These capabilities make it a valuable tool for gaining insights from structured data, such as CSV files, although it's important to remember its limitations.