<a href="https://colab.research.google.com/github/mindfulcoder49/NorthShoreAI/blob/main/North_Shore_Hackathon_Jan_20.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Building a Full Stack App in Google Colab with OpenAI, Gradio, and Dataset

This notebook was initially created for the first meeting of the North Shore AI, Machine Learning, and Coding Meetup. It is designed to be interesting for anyone from beginners to experts. There are some very brief explanations of coding syntax for anyone with no experience with python or jupyter notebook.

If you are not familiar with Jupyter notebooks, this is a notebook that includes text and code blocks. The code blocks are meant to be run in order. There is a little "Play" button to the left of each code block that allows you to "run" that block. As you read through the notebooks, run the code blocks in order.

## Installing Dependencies

The reason we can build a whole app in just a few lines of code in python is because other people have spent a long time creating packages of python code for us to use. We need to install those packages in this Google Colab environment. We do that in python using pip. We use a ! before pip to tell Google Colab this is a shell command, not python code.

In [None]:
!pip install -q sqlalchemy>=2.0 dataset gradio openai

## Adding your OpenAI API Key

Querying the OpenAI API is easy by design, and the difficult part is getting an API key. An API key is basically a password that your send with your API request to OpenAI that is connected to your OpenAI account. It is how OpenAI makes sure only authorized users are using its API, it tracks your usage, and OpenAI uses that information to charge you accordingly for API usage. I have put $10 into my OpenAI account and have managed to spend about 25 cents of that developing this notebook. This key will be active for the duration of the workshop:

```
key goes here
```

You will need to add this key to the notebook:

1) Click on the Key icon on the left
2) Click "+ Add new secret"
3) Enter the name OPENAI_API_KEY
4) Paste in the API key above for the Value
5) Click the round toggle on the left to activate Notebook Access

later we will use this code to get the key and pass it to OpenAI:

```
from google.colab import userdata
userdata.get('OPENAI_API_KEY')
```

## Querying an OpenAI API

Querying the OpenAI API is very simple. First let's assign our message to a variable. In Python we put text in between single or double quotes, and to save that text to a variable, we write it with an equals sign like this:

In [None]:
my_message = "Hello ChatGPT, what are your plans for humanity?"

print(my_message)

You need to tell the api which OpenAI model you want to query. You can find the different models here: https://platform.openai.com/docs/models

GPT-3 models are significantly less expensive than GPT-4 models, so we are going to stick to using GPT-3 models here.

In [None]:
my_model = "gpt-3.5-turbo-1106"

We will also need to tell the GPT API how many token we want back in our response. This is because the API charges you per token, and this lets you control that cost directly. Let's set it to 60 now, but you can change that if you want.

When picking variable names, you're save with letters and underscores. When saving a number to a variable, you just write the number, like this:

In [None]:
how_many_tokens_you_want = 60

Now the code for calling the API is fairly standard. Take a moment to read through the code and notice this is just three function calls. We call the OpenAI function with our API key to create a client object, then we call the client.chat.completions.create function with our message, model name, and token limit, to query the API and get a response and then we call the print function to show the response content.

In [None]:
from openai import OpenAI
from google.colab import userdata

client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))
response = client.chat.completions.create(
  model = my_model,
  messages = [{"role":"user","content":my_message}],
  max_tokens = how_many_tokens_you_want
)

print( response.choices[0].message.content )


## Enter Gradio, to give us a Front End

We have some code now that let's us interact with GPT-3.5, but we need a nice interface if this is going to be a real webapp and not just a piece of python code.

Gradio makes that easy. But we need to first make a python function.

We have been using python function this whole time like when we call print(my_message). Now we put our call to OpenAI inside a function:

In [None]:
def chat_with_gpt(user_input):

  client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))
  response = client.chat.completions.create(
    model="gpt-3.5-turbo-1106",  # Replace with the correct GPT-4 model name
    messages=[{"role":"user","content":user_input}],
    max_tokens=60
  )

  return response.choices[0].message.content

Then we can use this function to chat with GPT in a more simple way:

In [None]:
print(chat_with_gpt("Hello GPT, how are you?"))

In [None]:
print(chat_with_gpt("What is cassava?"))

Once you have a function, creating a front end interface with Gradio is easy!

In [None]:
import gradio as gr
simplechat = gr.Interface(
    fn=chat_with_gpt,
    inputs=gr.Textbox(lines=2, label="Your Message"),
    outputs=gr.Textbox(label="GPT Response"),
    title="Chat with GPT",
    description="This is a simple chat app using Gradio and OpenAI's GPT"
)

In [None]:
gr.TabbedInterface(
    [simplechat], ["SimpleChat"]
).launch()

Now we have a nice interface and an easy way to chat with GPT. But you might have noticed that there is no conversation history, and our AI chatbot doesn't remember anything we say. Every message is interpreted as the beginning of a chat conversation, and that's not what we want.

## Building a chat conversation

Building a chat conversation is simple, but requires two more python data types:
an array, which is a list, and uses square brackets and commas:

```
["one", "two", "three", "four"]
```

and a dictionary, which is a collection of key and value pairs, and uses curly brackets, colons, and commas:

```
{"one":"apple", "two":"orange", "three":"banana"}
```

When we send our messages to the OpenAI API, each message is a dictionary with the keys "role" and "content", and we will use the values "user" or "assistant" for the role, and the content will be the message:

e.g.
```
{"role":"user", "content":"What is your name?"}
```

We will then put all the messages in our conversation in an array:

e.g.
```
[{"role":"user", "content":"What is your name?"},
 {"role":"assistant", "content":"My name is GPT-3"},
 {"role":"user", "content":"Nice to meet you"}]
```
To create a conversation, we must send the entire conversation to OpenAI every single time we call the API to get the next response from GPT-3.5

In [None]:
from openai import OpenAI

first_user_message = {"role":"user", "content":"Hello, what is your name?"}
current_messages = [first_user_message]

client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))
response = client.chat.completions.create(
  model = "gpt-3.5-turbo-1106",  # Replace with the correct GPT-4 model name
  messages = current_messages,
  max_tokens = 60
)

first_AI_Response = response.choices[0].message

print(first_AI_Response)


The difference in the code above is that we saved the AI response in the first_AI_Response variable so we can add it to our message array in the next call to the API:

In [None]:
second_user_message = {"role":"user","content":"That's a funny name, where did you get it?"}

current_messages = [first_user_message, first_AI_Response, second_user_message]

response = client.chat.completions.create(
  model="gpt-3.5-turbo-1106",  # Replace with the correct GPT-4 model name
  messages=current_messages,
  max_tokens=60
)

print(response.choices[0].message.content)

We need a way to save the response from the previous message and add it to the next message inside of the function we are using in our Gradio Interface.

## Enter Dataset, to give us a database

The traditional solution to this classic problem is to create a database. This completes our stack. When people talk about full stack development, they are talking about building the front end, the functions in the middle, and the database where information is stored. Our frontend is Gradio, and our functions in the middle are using the OpenAI API and now will also use Dataset to manage chat history.

We will use four dataset functions.

First we use dataset.connect to create a database file:

```
db = dataset.connect('sqlite:///chat_history.db)
```

Then we create a table:

```
table = db['history']
```

In our function we use table.all() to get all rows of the table. Each row will contain a message and the AI's response, and we will use a loop to restructure that data before sending it to OpenAI:

```
table.all()
```

And then when we receive each response from the AI, we save the latest message and response pair as a new table row using table.insert():

```
table.inset(dict(user_input=user_input, assistant_response=gpt_response))
```

If you have ever dealt with setting up a database, defining your tables, and building functions to get data, you can see how this is easier.

## Building a whole app in one go

The code below integrates dataset and some more complex logic to create a chat interface that includes a chat history and gives the user an experience of chatting with an AI who can remember the whole conversation.

If you'd like, take some time to read through the code line by line and make sure you know what each line is doing. If you're not sure about it, try asking the GPT in the interface you made!

In [None]:
import gradio as gr
from openai import OpenAI
import dataset
from google.colab import userdata

# Instantiate the OpenAI client
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))  # Replace with your API key

# Connect to a SQLite database (it will be created if it doesn't exist)
db = dataset.connect('sqlite:///chat_history.db')
table = db['history']

# Function to handle the chat and interact with the database
def chat_with_gpt3_5(user_input):
    # Retrieve and format the chat history from the database
    chat_history = []
    for row in table.all():
        chat_history.append({'role': 'user', 'content': row['user_input']})
        if row['assistant_response']:
            chat_history.append({'role': 'assistant', 'content': row['assistant_response']})

    # Add the current user input to the chat history
    chat_history.append({'role': 'user', 'content': user_input})

    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=chat_history,
            max_tokens=150,
            temperature=0.7
        )
        gpt_response = response.choices[0].message.content
    except Exception as e:
        gpt_response = f"Error: {str(e)}"

    # Store new user input and assistant response in the database
    table.insert(dict(user_input=user_input, assistant_response=gpt_response))

    # Update and format the chat history for display
    chat_history.append({'role': 'assistant', 'content': gpt_response})
    formatted_chat_history = "\n".join([f"{msg['role'].title()}: {msg['content']}" for msg in chat_history])

    return formatted_chat_history

iface = gr.Interface(
    fn=chat_with_gpt3_5,
    inputs=gr.Textbox(lines=2, label="Your Message"),
    outputs=gr.Textbox(label="Chat History"),
    title="Chat with GPT-3.5",
    description="This is a simple chat app using Gradio and OpenAI's GPT-3.5, with chat history stored in SQLite via Dataset."
)

gr.TabbedInterface(
    [iface], ["HistoryChat"]
).launch()

Now you have a second Gradio app running in the same notebook, courtesy of their amazing TabbedInterface function. Try making some changes to the API call like increasing the max_tokens to get longer answers or changing the temperature value to see what it does. Use the app to send code snippets and questions to GPT for guidance!

## Building your own app

Below is basically a copy of the HistoryChat app, with the gradio interface and function renamed to prevent naming conflicts. Try making some changes to explore how the app works, and maybe to build a new app of your own.

Try using the HistoryChat above to get help from GPT, and feel free to increase the max_tokens parameter to get longer answers to your questions!

One thing to notice: Since you're using the same database in this cloned app, you'll see the same chat history as in the one above! You can change the name of the database file to make a new chat history.

In [None]:
import gradio as gr
from openai import OpenAI
import dataset
from google.colab import userdata

# Instantiate the OpenAI client
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))  # Replace with your API key

# Connect to a SQLite database (it will be created if it doesn't exist)
db = dataset.connect('sqlite:///chat_history.db')
table = db['history']

# Function to handle the chat and interact with the database
def my_gradio_function(user_input):
    # Retrieve and format the chat history from the database
    chat_history = []
    for row in table.all():
        chat_history.append({'role': 'user', 'content': row['user_input']})
        if row['assistant_response']:
            chat_history.append({'role': 'assistant', 'content': row['assistant_response']})

    # Add the current user input to the chat history
    chat_history.append({'role': 'user', 'content': user_input})

    try:
        response = client.chat.completions.create(
            model="gpt-3.5-turbo-1106",
            messages=chat_history,
            max_tokens=150,
            temperature=0.7
        )
        gpt_response = response.choices[0].message.content
    except Exception as e:
        gpt_response = f"Error: {str(e)}"

    # Store new user input and assistant response in the database
    table.insert(dict(user_input=user_input, assistant_response=gpt_response))

    # Update and format the chat history for display
    chat_history.append({'role': 'assistant', 'content': gpt_response})
    formatted_chat_history = "\n".join([f"{msg['role'].title()}: {msg['content']}" for msg in chat_history])

    return formatted_chat_history

your_interface = gr.Interface(
    fn=my_gradio_function,
    inputs=gr.Textbox(lines=2, label="Your Message"),
    outputs=gr.Textbox(label="Chat History"),
    title="Chat with GPT-3.5",
    description="This is a simple chat app using Gradio and OpenAI's GPT-3.5, with chat history stored in SQLite via Dataset."
)

gr.TabbedInterface(
    [your_interface], ["NewGPTApp"]
).launch()

# Making an app to generate reports about Housing Violations

One common use of AI is to tranform or contextualize data. The data on Housing Violations is available to the public on data.boston.gov:

https://data.boston.gov/dataset/rentsmart

## RENTSMART
> RentSmart Boston compiles data from BOS:311 and the City's Inspectional Services Division to give prospective tenants a more complete picture of the homes and apartments they are considering renting, assisting them in understanding any previous issues with the property, including: housing violations, building violations, enforcement violations, housing complaints, sanitation requests, and/or civic maintenance requests.



Let's make a gradio app that generates a report on building activities happening around a certain location. The parameters will be:
1. model_name: the name of the model we are querying
2. tokens: the number of tokens we are requesting
3. latitude: latitude of the central location for the report
3. longitude: longitude of the same
5. radius: the distance for the radius of the circle within which to find permits
6. language: the language of the report
7. start-date: the earliest date for permits
8. end-date: the latest date for permits



In [None]:
!pip install haversine

In [None]:
# all necessary imports
import pandas as pd
from pandas import to_datetime
import pytz
from haversine import haversine
import gradio as gr
from openai import OpenAI
import dataset
from google.colab import userdata
import requests
import os

csv_link = "https://data.boston.gov/dataset/f506e000-b08c-4500-97c7-9f36e7ac125a/resource/dc615ff7-2ff3-416a-922b-f0f334f085d0/download/tmp2b5dwl3h.csv"

# Instantiate the OpenAI client
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))  # Replace with your API key


def load_dataframe():
    csv_file = "data.csv"
    pkl_file = "data.pkl"

    # Check if both files exist
    if os.path.exists(csv_file) and os.path.exists(pkl_file):
        # Compare modification times
        csv_mod_time = os.path.getmtime(csv_file)
        pkl_mod_time = os.path.getmtime(pkl_file)

        if pkl_mod_time > csv_mod_time:
            # Load from pickle if it's newer than csv
            df = pd.read_pickle(pkl_file)
            return df
    # Check if the csv_file exists, if not, download it
    if not os.path.exists(csv_file):
        download_data(csv_link)


    # If pickle is not newer, read csv and save as pickle
    df = pd.read_csv(csv_file, dtype={'zipcode': str}, parse_dates=['date'])
    df.to_pickle(pkl_file)
    return df

def download_data(url):
    # Download data
    response = requests.get(url)
    file_path = 'data.csv'
    with open(file_path, 'wb') as file:
        file.write(response.content)


def filter_dataset_by_location(df, lat, lon, radius):
    filtered_df = {}
    def is_within_radius(lat1, lon1, lat2, lon2, radius):
        return haversine((lat1, lon1), (lat2, lon2)) <= radius
            # Assuming columns 'lat' and 'long' exist and are correctly formatted
    if 'latitude' in df.columns and 'longitude' in df.columns:
        # Filter the DataFrame
        mask = df.apply(lambda row: is_within_radius(lat, lon, row['latitude'], row['longitude'], radius), axis=1)
        filtered_df = df[mask]
    else:
        print(f"Latitude/Longitude columns not found in dataset")

    return filtered_df


def filter_datasets_by_date(df, start_date, end_date):
    if 'date' in df.columns:
        mask = (df['date'] >= start_date) & (df['date'] <= end_date)
    else:
        print(f"No suitable date column found in {dataset_id}, returning original DataFrame.")
        return df

    filtered_df = df.loc[mask]
    return filtered_df

#function for Gradio to take all inputs from the user, load the df from the database, and then send it to OpenAI:
def my_gradio_function(model_name, tokens, latitude, longitude, radius, start_date, end_date, language, prompt):
    # Load the building permit dataframe
    df = load_dataframe()

    # Filter the permits by location
    filtered_df_location = filter_dataset_by_location(df, latitude, longitude, radius)

    # Convert the start and end dates to datetime objects:
    start_dt = to_datetime(start_date).tz_localize(None)  # Example of making it timezone-naive
    end_dt = to_datetime(end_date).tz_localize(None)

    # Filter the permits by date
    filtered_df_date = filter_datasets_by_date(filtered_df_location, start_dt, end_dt)

    # Save the raw tabular data to a variable to be returned as an output in Gradio
    raw_data_output = filtered_df_date.to_csv(index=False)

    # Paste together the prompt, a request to give output in language, and the permits tabular data with newlines between
    full_prompt = f"{prompt}\nLanguage: {language}\nData:\n{raw_data_output}"

    # Format as a messages object
    full_prompt_messages = [{"role": "user", "content": full_prompt}]

    try:
        response = client.chat.completions.create(
            model=model_name,
            messages=full_prompt_messages,
            max_tokens=tokens,
            temperature=0.7
        )
        gpt_response = response.choices[0].message.content
    except Exception as e:
        gpt_response = f"Error: {str(e)}"

    # Return the response and the raw data to be displayed in separate Gradio outputs
    return gpt_response, filtered_df_date








In [None]:
# load dataframe downloads the data, reads it into a pandas dataframe, and then saves that as a pkl for quick access
df = load_dataframe()

Let's test our functions before we launch it in Gradio

In [None]:
my_gpt_response, my_raw_data_output = my_gradio_function(
    model_name="gpt-3.5-turbo",
    tokens=100,
    latitude=42.335817,
    longitude=-71.0797682,
    radius=0.25,
    start_date="2024-01-01",
    end_date="2024-01-19",
    language="Spanish",
    prompt=f"These records represent all violations found in a .25 km radius around a central point. summarize this history for a prospective renter of a unit at the center of this circle."
)

print(my_gpt_response)

In [None]:
my_raw_data_output.head(20)

In [None]:
# Define default values for each input
default_model = "gpt-3.5-turbo"
default_tokens = 3000
default_latitude = 42.335817
default_longitude = -71.0797682
default_radius = 0.25
default_start_date = "2024-01-01"
default_end_date = "2024-01-19"
default_language = "Spanish"
default_prompt = "These records represent violations in a circle around a center point in Boston. Summarize them into a report for a prospective renter"

permit_interface = gr.Interface(
    fn=my_gradio_function,
    inputs=[
        gr.Dropdown(choices=["gpt-3.5-turbo-1106","gpt-3.5-turbo"], label="Model Name", value=default_model),
        gr.Number(label="# of Requested Tokens", value=default_tokens),
        gr.Number(label="Latitude", value=default_latitude),
        gr.Number(label="Longitude", value=default_longitude),
        gr.Number(label="Radius (in km)", value=default_radius),
        gr.Textbox(label="Start Date", value=default_start_date),
        gr.Textbox(label="End Date", value=default_end_date),
        gr.Dropdown(choices=["English", "Spanish", "French", "Arabic", "Chinese", "Vietnamese", "Japanese", "Korean"], label="Language", value=default_language),
        gr.Textbox(label="Prompt", value=default_prompt)
    ],
    outputs=[
        gr.Textbox(label="GPT-3 Response"),
        gr.Dataframe(label="Filtered Data")
    ]
)

# Launch the Gradio interface
gr.TabbedInterface(
    [permit_interface], ["TheBostonApp"]
).launch(debug=True)