#  Introduction to Large Language Models with GPT & LangChain

[ChatGPT](https://chat.openai.com/) is wildly popular, with over a billion visits per month. Although this web interface is great for many non-technical use cases, for programming and automation tasks, it is better to access GPT (the AI that powers ChatGPT) via the OpenAI API.

As well as GPT, I'll also make use of [LangChain](https://python.langchain.com/docs/get_started/introduction), a programming framework for working with Generative AI.

I'll cover:

- Getting set up with an OpenAI developer account and integration with Workspace.
- Calling the chat functionality in the OpenAI API, with and without langchain.
- Simple prompt engineering.
- Holding a conversation with GPT.
- Ideas for incorporating GPT into a data analysis or data science workflow.

I'll be using GPT to explore [a dataset](https://catalog.data.gov/dataset/electric-vehicle-population-data) about electric cars in Washington state, USA. 

## Before I begin

Create a developer account with OpenAI. This is **not free**.

## Task 0: Setup

We need to install the `langchain` package. This is currently being developed quickly, sometimes with breaking changes, so we fix the version.

The `langchain` depends on a recent version of `typing_extensions`, so we need to update that package, again fixing the version.

### Instructions

Run the following code to install `openai`, `langchain`, and `typing_extensions`.

In [2]:
# Update the typing_extensions package
# !pip install typing_extensions==4.8.0

In [3]:
# this is the one used in this file so langchain doesn't brings an error

# pip install langchain


# pip install openai==1.3.5

In order to chat with GPT, we need first need to load the `openai` and `os` packages to set the API key from the environment variables you just created.

### Instructions

- Import the `os` package.
- Import the `openai` package.
- Set `openai.api_key` to the `OPENAI_API_KEY` environment variable.

In [4]:
# Import the os package and the openai package
import os

In [20]:
import pprint

In [5]:
try:
    import openai
    print("OpenAI imported")
    
except:
    print("It didn't work")

OpenAI imported


In [6]:
# Set openai.api_key to the OPENAI_API_KEY environment variable
try:
    OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')
    print("Success!")
except Exception as e:
    print(e)

Success!


We need to import the `langchain` package. It has many submodules, so to save typing later, we'll also import some specific functions from those submodules.

### Instructions

- Import the `langchain` package as `lc`.
- From the `langchain.chat_models` module, import `ChatOpenAI`.
- From the `langchain.schema` module, import `AIMessage`, `HumanMessage`, `SystemMessage`.

In [7]:
# Import the langchain package as lc
# From the langchain.chat_models module, import ChatOpenAI
# From the langchain.schema module, import AIMessage, HumanMessage, SystemMessage

try:
    import langchain as lc
    from langchain.chat_models import ChatOpenAI
    from langchain.prompts.chat import (
        ChatPromptTemplate,
        HumanMessagePromptTemplate,
        SystemMessagePromptTemplate,
)
    from langchain.schema import AIMessage, HumanMessage, SystemMessage
    print("Success importing all langchain requirements")
except Exception as e:
    print(f"Error: {e}")

Success importing all langchain requirements


You'll also need to do some light data manipulation with the `pandas` package and data visualization with `plotly.express`.  Finally, the `IPython.display` pacakges contains functions to prettily display Markdown content.

### Instructions

Import the following packages.

- Import `pandas` using the alias `pd`.
- Import `plotly.express` using the alias `px`.
- From the `IPython.display` package, import `display` and `Markdown`.
- etc.

In [18]:
# Import pandas using the alias pd
# Import plotly.express using the alias px
# From the IPython.display package, import display and Markdown

import pandas as pd
import plotly.express as px
from IPython.display import display, Markdown
import pprint

## Task 1: Import the Electric Cars Data

The electric cars data is contained in a CSV file named `electric_cars.csv`.

Each row in the dataset represents the count of the number of cars registered within a city, for a particular model.

The dataset contains the following columns.

- `city` (character): The city in which the registered owner resides.
- `county` (character): The county in which the registered owner resides.
- `model_year` (integer): The [model year](https://en.wikipedia.org/wiki/Model_year#United_States_and_Canada) of the car.
- `make` (character): The manufacturer of the car.
- `model` (character): The model of the car.
- `electric_vehicle_type` (character): Either "Plug-in Hybrid Electric Vehicle (PHEV)" or "Battery Electric Vehicle (BEV)".
- `n_cars` (integer): The count of the number of vehicles registered.

Our first step is to import and print the data.

### Instructions

Import the electric cars data to a pandas dataframe.

- Read the data from `electric_cars.csv`. Assign to `electric_cars`.
- Display a description of the numeric columns of `electric_cars`.
- Display a description of the object columns of `electric_cars`.
- Print the whole dataset. 

In [9]:
# Read the data from electric_cars.csv. Assign to electric_cars.
electric_cars = pd.read_csv("electric_cars.csv")

# Display a description of the numeric columns
print("Description of non-numeric columns:")
display(electric_cars.describe(include='O'))

# Display a description of the numeric columns
print("\n\nDescription of numeric columns:")
display(electric_cars.describe())

Description of non-numeric columns:


Unnamed: 0,city,county,make,model,electric_vehicle_type
count,26813,26813,26813,26813,26813
unique,683,183,37,127,2
top,Bothell,King,TESLA,LEAF,Battery Electric Vehicle (BEV)
freq,479,7066,5071,1889,15885




Description of numeric columns:


Unnamed: 0,model_year,n_cars
count,26813.0,26813.0
mean,2019.375527,5.612166
std,3.286257,26.997325
min,1997.0,1.0
25%,2017.0,1.0
50%,2020.0,2.0
75%,2022.0,4.0
max,2024.0,1514.0


In Python, particularly when working with the pandas library, the `"O"` in the `describe` method is used to indicate that you want to include only the columns with object data types in the description. The object data type in pandas typically represents text or mixed types (text and numbers). Without it, it would be onl the numeric values, which is the default in you don't include `include='O'`. Here's a breakdown of what this means:

#### **Pandas DataFrame**

When you're working with data in pandas, it's usually stored in a DataFrame. A DataFrame is like a table with rows and columns, where each column can have a different data type.

#### **Data Types**
Each column in a DataFrame has a specific data type. Common data types include `int` for integers, `float` for floating-point numbers, `bool` for boolean values, and `object` for text or mixed types.

#### **The `describe()` Method**

Now, let's talk about the `describe()` method. This method is used to get a summary of the data in your DataFrame. By default, `describe()` will only show you summaries for numerical columns (like mean, median, standard deviation, etc.).

But what if you want a summary of your non-numerical data, like those `object` type columns?

#### **Using `describe(include='O')`**

Here's where `include='O'` comes in. When you write `your_dataframe.describe(include='O')`, you're telling pandas: "Hey, please include a summary for columns with the `object` data type too."

---  

What kind of summary does it provide for `object` columns?

- **count**: How many non-null entries are there.
- **unique**: The number of unique entries.
- **top**: The most common entry.
- **freq**: How often the most common entry appears.


So, in your code `display(electric_cars.describe(include="O"))`, you're asking pandas to display descriptive statistics for all the columns in the `electric_cars` DataFrame that are of object type. This can be particularly useful for understanding the distribution and common values in text-based columns.

In [11]:
# Print the whole dataset
display("The electric cars dataset:")
display(electric_cars)

'The electric cars dataset:'

Unnamed: 0,city,county,model_year,make,model,electric_vehicle_type,n_cars
0,Seattle,King,2023,TESLA,MODEL Y,Battery Electric Vehicle (BEV),1514
1,Seattle,King,2018,TESLA,MODEL 3,Battery Electric Vehicle (BEV),1153
2,Seattle,King,2021,TESLA,MODEL Y,Battery Electric Vehicle (BEV),1147
3,Seattle,King,2022,TESLA,MODEL Y,Battery Electric Vehicle (BEV),1122
4,Bellevue,King,2023,TESLA,MODEL Y,Battery Electric Vehicle (BEV),931
...,...,...,...,...,...,...,...
26808,Lakewood,Pierce,2022,BMW,IX,Battery Electric Vehicle (BEV),1
26809,Lakewood,Pierce,2022,BMW,X5,Plug-in Hybrid Electric Vehicle (PHEV),1
26810,Lakewood,Pierce,2022,FORD,TRANSIT,Battery Electric Vehicle (BEV),1
26811,Lakewood,Pierce,2022,HYUNDAI,KONA ELECTRIC,Battery Electric Vehicle (BEV),1


## Task 2: Asking GPT a Question

Let's start by sending a message to GPT and getting a response. For now, we won't worry about including any details about the dataset&mdash;it's the equivalent of asking "is this microphone turned on?".

We'll also skip using langchain for now so you can see more clearly how the `openai` packages works.

### Types of Message

There are three types of message, documented in the [Introduction](https://platform.openai.com/docs/guides/chat/introduction) to the Chat documentation. We'll use two of them here.

- `system` messages describe the behavior of the AI assistant. If you don't know what you want, try "You are a helpful assistant".
- `user` messages describe what you want the AI assistant to say. We'll cover examples of this today.

`assistant` is the third one.

### Instructions

Send a question to GPT and get a response.

- Define the system message as follows and assign to `system_msg_test`.

```
"""You are a helpful assistant who understands data science.
 You write in a clear language that a ten year old can understand.
 You keep your answers brief.""". 
```
    
- Define the user message as follows and assign to `user_msg_test`.

```
"You are an expert in data analytics for the automotive industry."
```

- Create a message list from the system and user messages. Assign to `msgs_test`.
- Send the messages to GPT. Assign to `rsps_test`.

<details>
    <summary><b>Code hints ></b></summary>
<p>
        
The `openai.ChatCompletion.create()` function expects the messages in the form of a list of dictionaries, each with a `role` and `content` element.
        
```
messages = [
    {"role": "system", "content": system_msg},
    {"role": "user", "content": user_msg}
]
```

</p>
</details>

In [12]:
from openai import OpenAI

# Define the system message. Assign to system_msg_test.
system_prompt = """You are an expert in data analytics for the automotive industry.
 You write in a clear language that a ten year old can understand.
 You keep your answers brief."""

# Define the user message. Assign to user_msg_test.
user_prompt = "Tell me some uses of GenAI for data analysis in the car industry"

# Create a message list from the system and user messages. Assign to msgs_test.
message = [
    {"role" : "system", "content" : system_prompt},
    {"role" : "user", "content" : user_prompt}
]

# Send the messages to GPT. Assign to rsps_test.
client = OpenAI()
response = client.chat.completions.create(
    model="gpt-3.5-turbo-0125",
    messages = message
)

Now you need to explore the response. The result is a highly nested object. As well as the text response that we want, there's a lot of metadata. You'll print the whole thing so you can see the structure, and extract just the text content.

### Instructions

Print the whole response and just the text content.

- Print the whole response.
- Print just the response's content.

<details>
    <summary><b>Code hints ></b></summary>
<p>
        
Buried within the response variable is the text we asked GPT to generate. Luckily, it's always in the same place.
        
```
response["choices"][0]["message"]["content"]
```

</p>
</details>

In [25]:
from pprint import pprint  # Import the pprint function from the pprint module

In [26]:
# Print the whole response
pprint(response)

ChatCompletion(id='chatcmpl-8tdnOuwyzKJ3jnkvkIOODxWZjMg3o', choices=[Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='GenAI can be used in the car industry to analyze data from vehicles to improve safety features, predict maintenance needs, and optimize performance. It can also be used to personalize the driving experience for drivers and enhance overall customer satisfaction.', role='assistant', function_call=None, tool_calls=None), logprobs=None)], created=1708271706, model='gpt-3.5-turbo-0125', object='chat.completion', system_fingerprint='fp_6dd124df95', usage=CompletionUsage(completion_tokens=45, prompt_tokens=57, total_tokens=102))


In [27]:
# Print just the response's content
pprint(response.choices[0].message.content)

('GenAI can be used in the car industry to analyze data from vehicles to '
 'improve safety features, predict maintenance needs, and optimize '
 'performance. It can also be used to personalize the driving experience for '
 'drivers and enhance overall customer satisfaction.')


### Basic methods
#### Choice

In [31]:
print("\n=========================\nChoices list (completions):\n", response.choices[0])

print("\n=========================\nFinish reason:", response.choices[0].finish_reason)

print("\n=========================\nIndex:", response.choices[0].index)

print("\n=========================\nMessage:", response.choices[0].message)

print("\n=========================\nMessage content:\n", response.choices[0].message.content)

print("\n=========================\nMessage role:", response.choices[0].message.role)

print("\n=========================\nCreated:", response.created)

print("\n=========================\nId:", response.id)

print("\n=========================\nModel:", response.model)

print("\n=========================\nObject:", response.object)


Choices list (completions):
 Choice(finish_reason='stop', index=0, message=ChatCompletionMessage(content='GenAI can be used in the car industry to analyze data from vehicles to improve safety features, predict maintenance needs, and optimize performance. It can also be used to personalize the driving experience for drivers and enhance overall customer satisfaction.', role='assistant', function_call=None, tool_calls=None), logprobs=None)

Finish reason: stop

Index: 0

Message: ChatCompletionMessage(content='GenAI can be used in the car industry to analyze data from vehicles to improve safety features, predict maintenance needs, and optimize performance. It can also be used to personalize the driving experience for drivers and enhance overall customer satisfaction.', role='assistant', function_call=None, tool_calls=None)

Message content:
 GenAI can be used in the car industry to analyze data from vehicles to improve safety features, predict maintenance needs, and optimize performance.

#### Usage

In [32]:
print("\nUsage:", response.usage)

print(" - Prompt tokens:", response.usage.prompt_tokens)

print(" - Completion tokens:", response.usage.completion_tokens)

print(" - Total tokens:", response.usage.total_tokens)


Usage: CompletionUsage(completion_tokens=45, prompt_tokens=57, total_tokens=102)
 - Prompt tokens: 57
 - Completion tokens: 45
 - Total tokens: 102


### Know all methods and attributes

In [39]:
all_methods_and_attributes = dir(response)
# print(all_methods_and_attributes)

## Task 3: Asking a Question About the Dataset

Now we know that GPT is working, we can start asking questions about data analysis. Because we have details of our dataset, we can pass these in to our prompt to improve the quality of the mesages we get back.

Another change that we're going to make is to use the `langchain` package, which provides a convenience layer on top of the `openai` package.

### Why should we use LangChain?

The code in the previous task used complicated nested objects in two places (the list of dictionaries for the message, and the dictionary of lists and dictionaries for the response). This sort of object is common in web application development, but not in data analysis, where rectangular data (pandas DataFrames and SQL tables) is the more common.

One of the advantages of LangChain is that it simplifies the code for some tasks, letting you avoid messing about with too many square brackets and curly braces as you navigate these deep objects.

Secondly, if you want to swap GPT for a different model at a later date (as you might in a corporate setting), it can be easier to do so if you use the `langchain` package rather than the `openai` package directly.

### LangChain message types

The LangChain message types are names slightly differently from the OpenAI message types.

- `SystemMessage` is the equivalent of OpenAI's `system` message.
- `HumanMessage` is the equivalent of OpenAI's `user` message.
- Also `AIMessage `

### Instructions

Create a prompt that includes dataset details.

- _Read the description of the dataset that is provided._
- Create a task for the AI. Assign to `suggest_questions`.
    - Use the text `"Suggest some data analysis questions that could be answered with this dataset."`.
- Concatenate the dataset description and the request. Assign to `msgs_suggest_questions`.
    - The first message is a system message with the content `"You are a data analysis expert."`.
    - The second message is a human message with `dataset_description` and `suggest_questions` concatenated with two line breaks in between.

In [51]:
system_prompt = "You're a data analysis expert"

# A description of the dataset
dataset_description = """
You have a dataset about electric cars registered in Washington state, USA in 2020. It is available as a pandas DataFrame named `electric_cars`.

Each row in the dataset represents the count of the number of cars registered within a city, for a particular model.

The dataset contains the following columns.

- `city` (character): The city in which the registered owner resides.
- `county` (character): The county in which the registered owner resides.
- `model_year` (integer): The [model year](https://en.wikipedia.org/wiki/Model_year#United_States_and_Canada) of the car.
- `make` (character): The manufacturer of the car.
- `model` (character): The model of the car.
- `electric_vehicle_type` (character): Either "Plug-in Hybrid Electric Vehicle (PHEV)" or "Battery Electric Vehicle (BEV)".
- `n_cars` (integer): The count of the number of vehicles registered.
"""

# Create a task for the AI. Assign to suggest_questions.
suggest_questions = "Suggest some data analysis questions that could be answered with this dataset."

# Concatenate the dataset description and the request. Assign to msgs_suggest_questions.
# this works: msgs_suggest_question = [dataset_description, suggest_questions]

#testing:
message_langchain = [
    SystemMessage(content=system_prompt),
    HumanMessage(content=f"{dataset_description}\n\n{suggest_questions}")
]

### Instructions

- Create a `ChatOpenAI` object. Assign to `chat`.
- Pass your message to GPT. Assign to `rsps_suggest_questions`.
- Print the response object and the contents of the response.
- Print the type of the response

In [43]:
# Create a message list from the system and user messages. Assign to msgs_test.
chat = ChatOpenAI(temperature=0.8,
                  n = 2,
                  verbose=True)

#Pass the message to GPT. Assign that to a variable
response = chat(message_langchain)

In [46]:
pprint(response.content)

('Here are some data analysis questions that could be explored using the '
 '`electric_cars` dataset:\n'
 '\n'
 '1. What is the distribution of electric vehicle types (PHEV vs BEV) in '
 'different counties of Washington state?\n'
 '2. Which manufacturer (make) has the highest number of electric cars '
 'registered in 2020 in Washington state?\n'
 '3. How does the number of electric cars registered vary by city or county?\n'
 '4. What are the most popular electric car models registered in Washington '
 'state in 2020?\n'
 '5. Is there a correlation between the model year of electric cars and the '
 'number of cars registered?\n'
 '6. Are there any cities or counties in which a particular electric vehicle '
 'type (PHEV or BEV) is more popular?\n'
 '7. Can we identify any trends in the registration of electric cars over the '
 'course of 2020?\n'
 '8. What is the total number of electric cars registered in Washington state '
 'in 2020?\n'
 '9. Are there any outliers in terms of the numb

In [49]:
# Print just the response's content type
print("\nResponse's content type:", type(response.content))


Response's content type: <class 'str'>


## Task 4: Hold a conversation with GPT

Notice that the response from GPT was a dictionary-like object. The most useful part of this is the `.content` element, which contains the text response to your prompt.

While a single prompt and response can be useful, often you want to have a longer conversation with GPT. In this case, you can pass previous messages so that GPT can "remember" what was said before.

### AI messages

The response from GPT had type `AIMessage`. By distinguishing `AIMessage`s from the `HumanMessage`s, you can tell who said what in a conversation with the AI.

### Displaying Markdown content

When GPT generates code as an output, it if often formatted as a Markdown code block inside triple backticks. You can display Markdown output more beautifully in Workspace by swapping `print()` for the `display()` and `Markdown()` functions.

```py
display(Markdown(your_markdown_text))
```

### Instructions

Append another prompt to the conversation and chat with GPT again.

- Append the response and a new message to the previous messages. Assign to `msgs_python_top_models`.
- Pass your message to GPT. Assign to `rsps_python_top_models`.
- Display the response's Markdown content.

In [55]:
#first, create a new human message

next_message = "What is the total number of electric cars registered in Washington state in 2020?)"


In [61]:
# Append the response and a new message to the previous messages. 
# Assign to msgs_python_top_models.

# msgs_python_top_models = msgs_suggest_questions + [ #first system instructions - don't change it, reuse variable
#     rsps_suggest_questions, #previous response in conversation
#     next_human_message #new user message in the conversation
# ]

msgs_python_top_models = [
    SystemMessage(content=system_prompt),
    HumanMessage(content=f"{dataset_description}\n\n{suggest_questions}"),
    response,
    HumanMessage(content=next_message)
]

# Pass your message to GPT. Assign to rsps_python_top_models.
rsps_python_top_models = chat(msgs_python_top_models)

In [62]:
#see full response
print(rsps_python_top_models)

content='To calculate the total number of electric cars registered in Washington state in 2020, you can simply sum up the `n_cars` column in the `electric_cars` DataFrame. Here\'s how you can do it in Python using pandas:\n\n```python\ntotal_registered_cars = electric_cars[\'n_cars\'].sum()\nprint("Total number of electric cars registered in Washington state in 2020:", total_registered_cars)\n```\n\nThis code snippet will sum up the \'n_cars\' column in the `electric_cars` DataFrame and print out the total number of electric cars registered in Washington state in 2020.'


In [63]:
# Display the response's Markdown content
display(Markdown(rsps_python_top_models.content))

To calculate the total number of electric cars registered in Washington state in 2020, you can simply sum up the `n_cars` column in the `electric_cars` DataFrame. Here's how you can do it in Python using pandas:

```python
total_registered_cars = electric_cars['n_cars'].sum()
print("Total number of electric cars registered in Washington state in 2020:", total_registered_cars)
```

This code snippet will sum up the 'n_cars' column in the `electric_cars` DataFrame and print out the total number of electric cars registered in Washington state in 2020.

## Task 5: Execute the Code Provided by GPT

You just asked GPT to write some code for you. Next you need to see if it worked, and fix it if it didn't. 

This is a standard workflow for interacting with generative AI: the AI acts as a junior data analyst who writes the code, then you act as the boss who reviews the work.

### Instructions

Review the work of your AI assistant.

- Copy and paste the code generated by GPT into the next code cell and run it.
- _Look at the result. Do you think it is correct?_*
- If the code threw an error or gave a wrong answer, use the Workspace AI Assistant (also powered by GPT!) to fix and explain the code.

*During testing of this code-along project, GPT sometimes wrote incorrect code when using `.sum()`. Double-check those function calls.

In [64]:
total_registered_cars = electric_cars['n_cars'].sum()
print("Total number of electric cars registered in Washington state in 2020:", total_registered_cars)

Total number of electric cars registered in Washington state in 2020: 150479


## Task 6: Continue the Conversation to Create a Plot

Doing more analysis with GPT assistance is simply a case of continuing the conversation by appending new `HumanMessage` prompts to the message list, and calling the `chat()` function.

The output from GPT is random, but when doing data analysis this isn't always desirable since you'd like your results to be reproducible. With large language models, the amount of randomness can be controled with a parameter known as "temperature".

- `temperature` controls the randomness of the response. It ranges from `0` to `2` with zero meaning minimal randomness (to make it easier to reproduce results) and two meaning maximum randomness (often gives weird responses). If you use the OpenAI API directly, the default is `1`, but using LangChain reduces the default value to `0.7`. 

### Instructions

- Create a new OpenAI chat object with temperature set to zero. Assign to `chat0`.

In [65]:
# Create a new OpenAI chat object with temperature set to 0.5. Assign to chat0.
chat02 = ChatOpenAI(temperature=0.5,
                    max_tokens=100)

### Instructions

- Work through the previous conversation flow again, appending the previous response and a new request for Python code to draw a bar plot of the total count of electric cars by model year, with bars colored by electric vehicle type.
    - The solution asks for Plotly Express code, but you can pick any Python data viz package you prefer.

In [66]:
#first, create a second human message

next_message02 = "Write some python code to draw a bar plot of the total count of electric cars by model year, with bars colored by electric vehicle type. Use the Plotly Express package."
next_human_message02 = HumanMessage(content=next_message02)

In [67]:
# Ask GPT for code for a bar plot, as detailed in the instructions
msgs_python_plot = msgs_python_top_models + [
    rsps_python_top_models,
    next_human_message02
]

rsps_python_plot = chat02(msgs_python_plot)

In [68]:
display(Markdown(rsps_python_plot.content))

To draw a bar plot of the total count of electric cars by model year, with bars colored by electric vehicle type using Plotly Express in Python, you can use the following code:

```python
import plotly.express as px

# Group the data by model year and electric vehicle type and sum the counts
grouped_data = electric_cars.groupby(['model_year', 'electric_vehicle_type']).sum().reset_index()

# Create the bar plot using Plotly Express
fig = px.bar

In [69]:
import plotly.express as px

# Group the data by model year and electric vehicle type and sum the counts
grouped_data = electric_cars.groupby(['model_year', 'electric_vehicle_type']).sum().reset_index()

# Create the bar plot using Plotly Express
fig = px.bar

### Instructions

To see how much variation there is with temperature set to zero, ask GPT for the same thing again.

- Call GPT again with the same message list and display the response.
- _Look at the response content. How close is it to the previous response content?_

In [70]:
# Call GPT again with the same message list and display the response
rsps_python_plot02 = chat02(msgs_python_plot)

In [71]:
display(Markdown(rsps_python_plot02.content))

To create a bar plot of the total count of electric cars by model year, with bars colored by electric vehicle type using Plotly Express, you can follow the code snippet below:

```python
import plotly.express as px

# Group the data by model year and electric vehicle type and sum the counts
grouped_data = electric_cars.groupby(['model_year', 'electric_vehicle_type'])['n_cars'].sum().reset_index()

# Plot the bar graph using Plotly Express


## Task 7: Execute the Code Provided by GPT to See Your Plot

Setting temperature to zero removed all randomness so you got the same output twice. That makes your workflow more reproducible.

The final task is to see the plot. As before, remember that GPT is only your assistant and you are the boss. Check the code and its output to make sure that you really have what you want.

### Instructions

Run the code and check that the plot is correct.

- Run the bar plot code generated by GPT.
- _Check that the output is suitable. If not, try changing your prompt in the previous task to improve the output (this is prompt engineering, which you'll see more of in the next code-along project in the series)._

In [73]:
# Paste the code generated firstly by GPT and run it

# Grouping the data by model year and electric vehicle type and calculating the total count
grouped_data = electric_cars.groupby(['model_year', 'electric_vehicle_type'])['n_cars'].sum().reset_index()

# Drawing the bar plot
fig = px.bar(grouped_data, x='model_year', y='n_cars', color='electric_vehicle_type',
             labels={'model_year': 'Model Year', 'n_cars': 'Count of Electric Cars'},
             title='Total Count of Electric Cars by Model Year',
             barmode='group')

fig.show()

In [74]:
# Paste the code generated secondly by GPT and run it

# Filter the data for electric cars
electric_cars_data = electric_cars[electric_cars['electric_vehicle_type'].isin(['Plug-in Hybrid Electric Vehicle (PHEV)', 'Battery Electric Vehicle (BEV)'])]

# Group the data by model year and electric vehicle type, and calculate the total count
grouped_data = electric_cars_data.groupby(['model_year', 'electric_vehicle_type']).sum().reset_index()

# Draw the bar plot
fig = px.bar(grouped_data, x='model_year', y='n_cars', color='electric_vehicle_type', barmode='group')

# Set the title and axis labels
fig.update_layout(title='Total Count of Electric Cars by Model Year',
                  xaxis_title='Model Year',
                  yaxis_title='Count')

# Show the plot
fig.show()

## Summary

You've now seen how to access GPT through the OpenAI API both directly and using LangChain.

You saw how GPT can be used to come up with ideas for analyses to perform and to write code for you.

You also saw how to have an extended conversation and how to control the reproducibility of the responses.