## Integrating OpenAI API into our Generative AI Workflow

Leveraging the OpenAI API opens up a world of possibilities for programming and automation workflows, far beyond simple chat interactions. This guide will walk you through the process of integrating the OpenAI API into your projects, enhancing your generative AI portfolio. We'll cover:

- **Setting Up Your OpenAI Developer Account**: Learn how to create and configure your OpenAI developer account, and integrate it with your development environment.
- **Calling the Chat Functionality**: Understand how to make API calls to utilize the chat functionality of OpenAI's models.
- **Extracting Response Text**: Discover methods to parse and extract meaningful responses from the API output.
- **Maintaining a Conversation**: Implement techniques to hold longer, context-aware conversations with the AI.
- **Combining OpenAI API with Other APIs**: Explore how to integrate the OpenAI API with other APIs to build more complex and powerful applications.

By the end of this guide, we will have a solid foundation for incorporating OpenAI's capabilities into your own projects, making your generative AI portfolio more robust and versatile.

## Step 0: Setup

To use GPT, we need to import the `os` and `openai` packages, and some functions from `IPython.display` to render Markdown. A later task will also use Yahoo! Finance data via the `yfinance` package.

We also need to put the environment variable we just created in a place that the `openai` package can see it.

- Import the `os` package.
- Import the `openai` package.
- Import the `yfinance` package with the alias `yf`.
- From the `IPython.display` package, import `display` and `Markdown`.
- Set `openai.api_key` to the `OPENAI` environment variable.

In [62]:
# Import the os package
import os

# Import the openai package
from openai import OpenAI

# Import yfinance as yf
import yfinance as yf

# From the IPython.display package, import display and Markdown
from IPython.display import display, Markdown

# Set openai.api_key to the OPENAI environment variable
openai_api_key = os.environ["OPENAI_API_KEY"]

# we set OpenaAI client 
client = OpenAI()

## Step 1: Get GPT to create a dataset

It's time to chat! Having a conversation with GPT involves a single function call of this form.

```python
response = openai.ChatCompletion.create(
    model="MODEL_NAME",
    messages=[
        {"role": "system", "content": 'SPECIFY HOW THE AI ASSISTANT SHOULD BEHAVE'},
        {"role": "user", "content": 'SPECIFY WANT YOU WANT THE AI ASSISTANT TO SAY'}
    ]
)
```

There are a few things to unpack here.

The model names are listed in the [Model Overview](https://platform.openai.com/docs/models/overview) page of the developer documentation. Today we'll be using `gpt-3.5-turbo`, which is the latest model used by ChatGPT that has broad public API access. 

If you have access to GPT-4, you can use that instead by setting `model="gpt-4"`, though note that the price is 15 times higher.

There are three types of message, documented in the [Introduction](https://platform.openai.com/docs/guides/chat/introduction) to the Chat documentation:

- `system` messages describe the behavior of the AI assistant. If you don't know what you want, try "You are a helpful assistant".
- `user` messages describe what you want the AI assistant to say. We'll cover examples of this today.
- `assistant` messages describe previous responses in the conversation. We'll cover how to have an interactive conversation in later tasks. 

The first message should be a system message. Additional messages should alternate between user and assistant.

### Define the system message, `system_msg` as

> 'You are a helpful assistant who understands data science.'

### Define the user message, `user_msg` as: 

> 'Create a small dataset of data about people. The format of the dataset should be a data frame with 5 rows and 3 columns. The columns should be called "name", "height_cm", and "eye_color". The "name" column should contain randomly chosen first names. The "height_cm" column should contain randomly chosen heights, given in centimeters. The "eye_color" column should contain randomly chosen eye colors, taken from a choice of "brown", "blue", and "green". Provide Python code to generate the dataset, then provide the output in the format of a markdown table.'

- Ask GPT to create a dataset using the `gpt-3.5-turbo` model. Assign to `response`.


In [63]:
import openai

# Define the system message
system_msg = 'You are a helpful assistant who understands data science'

# Define the user message
user_msg = 'Create a small dataset of data about people. The format of the dataset should be a data frame with 5 rows and 3 columns. The columns should be called "name", "height_cm", and "eye_color". The "name" column should contain randomly chosen first names. The "height_cm" column should contain randomly chosen heights, given in centimeters. The "eye_color" column should contain randomly chosen eye colors, taken from a choice of "brown", "blue", and "green". Provide Python code to generate the dataset, then provide the output in the format of a markdown table.'

# Create a dataset using GPT
response =  client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": system_msg},
        {"role": "user", "content": user_msg}
    ]
)

print("Full Response Answer is:\n")
print(response)

Full Response Answer is:

ChatCompletion(id='chatcmpl-Bh0CtNBBGV7KKwmkNGI8DJVfva7iV', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='Here is the Python code to create the dataset and then display it in a markdown table:\n\n```python\nimport pandas as pd\nimport random\n\n# Create the dataset\ndata = {\n    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],\n    "height_cm": [random.randint(150, 190) for _ in range(5)],\n    "eye_color": random.choices(["brown", "blue", "green"], k=5)\n}\n\ndf = pd.DataFrame(data)\n\n# Display the dataset in a markdown table\nprint(df.to_markdown(index=False))\n```\n\nOutput (in the format of a markdown table):\n\n| name    |   height_cm | eye_color   |\n|:--------|------------:|:------------|\n| Alice   |         188 | brown       |\n| Bob     |         174 | blue        |\n| Charlie |         177 | green       |\n| David   |         151 | brown       |\n| Eve     |         184 | green       |\n``

## Step 2: Check the response is OK

API calls are "risky" because problems can occur outside of your notebook, like internet connectivity issues, or a problem with the server sending you data, or because you ran out of API credit. You should check that the response you get is OK.

GPT models return a status code with one of four values, documented in the [Response format](https://platform.openai.com/docs/guides/chat/response-format) section of the Chat documentation.

- `stop`: API returned complete model output
- `length`: Incomplete model output due to max_tokens parameter or token limit
- `content_filter`: Omitted content due to a flag from our content filters
- `null`: API response still in progress or incomplete

The GPT API sends data to Python in JSON format, so the response variable contains deeply nested lists and dictionaries. It's a bit of a pain to work with!

For a response variable named `response`, the status code is stored in `response["choices"][0]["finish_reason"]`.

If you prefer to work with dataframes rather than nested lists and dictionaries, you can flatten the output to a single row dataframe with the following code.

```python
import pandas as pd
pd.json_normalize(response, "choices", ['id', 'object', 'created', 'model', 'usage'])
```

### Check the status code of the `response` variable.

In [64]:
# Check the status code of the response variable
print("Reason of the API Stop:\n")
print(response.choices[0].finish_reason)

if response.choices[0].finish_reason == "stop":
    print("OpenAI works perect, we get the expected answer. Well donde Santiago")

Reason of the API Stop:

stop
OpenAI works perect, we get the expected answer. Well donde Santiago


## Step 3: Extract the AI assistant's message

Buried within the response variable is the text we asked GPT to generate. Luckily, it's always in the same place.

`response["choices"][0]["message"]["content"]`

The response content can be printed as usual with `print(content)`, but it's Markdown content, which Jupyter notebooks can render, via `display(Markdown(content))`.

- Print the content generated by GPT.

- Render the Markdown content generated by GPT.

- Read the code that was generated. Does it look correct?

- Read the dataset that was generated. Does it match the specifications?

In [65]:
# Print the content generated by GPT.
print(response.choices[0].message.content)

Here is the Python code to create the dataset and then display it in a markdown table:

```python
import pandas as pd
import random

# Create the dataset
data = {
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "height_cm": [random.randint(150, 190) for _ in range(5)],
    "eye_color": random.choices(["brown", "blue", "green"], k=5)
}

df = pd.DataFrame(data)

# Display the dataset in a markdown table
print(df.to_markdown(index=False))
```

Output (in the format of a markdown table):

| name    |   height_cm | eye_color   |
|:--------|------------:|:------------|
| Alice   |         188 | brown       |
| Bob     |         174 | blue        |
| Charlie |         177 | green       |
| David   |         151 | brown       |
| Eve     |         184 | green       |
```


In [66]:
# Render the Markdown content generated by GPT
markdown_style_response = response.choices[0].message.content

display(Markdown(markdown_style_response))

Here is the Python code to create the dataset and then display it in a markdown table:

```python
import pandas as pd
import random

# Create the dataset
data = {
    "name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "height_cm": [random.randint(150, 190) for _ in range(5)],
    "eye_color": random.choices(["brown", "blue", "green"], k=5)
}

df = pd.DataFrame(data)

# Display the dataset in a markdown table
print(df.to_markdown(index=False))
```

Output (in the format of a markdown table):

| name    |   height_cm | eye_color   |
|:--------|------------:|:------------|
| Alice   |         188 | brown       |
| Bob     |         174 | blue        |
| Charlie |         177 | green       |
| David   |         151 | brown       |
| Eve     |         184 | green       |
```

## Lets use a helper function

You need to write a lot of repetitive boilerplate code to do these three simple things. Having a wrapper function to abstract away the boring bits is useful. That way we can focus on data science use cases.

Hopefully OpenAI will improve the interface to their Python package so this sort of thing is built-in. In the meantime, feel free to use this in your own code.

The function takes 2 arguments.

- `system`: A string containing the system message.
- `user_assistant`: An array of strings that alternate user message then assistant message.

The return value is the generated content.

### Run the next cell so you have access to the function.

In [67]:
def chat(system, user_assistant):
    assert isinstance(system, str), "`system` should be a string"
    assert isinstance(user_assistant, list), "`user_assistant` should be a list"
    system_msg = [{"role": "system", "content": system}]
    user_assistant_msgs = [
        {"role": "assistant", "content": user_assistant[i]} if i % 2 else {"role": "user", "content": user_assistant[i]} 
        for i in range(len(user_assistant))
    ]
    msgs = system_msg + user_assistant_msgs
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=msgs
    )
    status_code = response.choices[0].finish_reason
    assert status_code == "stop", f"The status code was {status_code}."
    return response.choices[0].message.content

# call the function
chat("You are a machine learning expert who writes tersely.", ["Explain what a support vector machine model is."])

'Support Vector Machine (SVM) is a supervised machine learning algorithm that is used for classification and regression tasks. It works by finding the hyperplane that best separates the classes in the feature space.'

Here is a check to make sure the function works.

In [68]:
response_fn_test = chat(
    "You are a machine learning expert who writes tersely.", 
    ["Explain what a support vector machine model is."]
)
display(Markdown(response_fn_test))

Support vector machine is a supervised machine learning algorithm used for classification and regression tasks. It works by finding the hyperplane that best separates different classes in the feature space.

### Pro Tip

In the system message for that check to make sure the function works correctly, I told the AI that it "writes tersely". This reduces the amount of output it generates, saving you some credits. You won't always want a terse output, but it's useful if you are just testing things.

**When you don't care too much about the style of the output, include a command to "write tersely" in the system message.**

## Step 4: Perform a calculation by reusing messages

The "zero-shot" case where the AI gives you the perfect response the first time is pretty rare. As with humans, you often need to have a longer conversation. This is where the user message-assistant message alternation comes in handy.

- Assign the content from the response in Task 1 to `assistant_msg`.

- Define a new user message, `user_msg2` as follows.

> 'Using the dataset you just created, write code to calculate the mean of the `height_cm` column. Also include the result of the calculation.'

- Create a list of user and assistant messages from `user_msg`, `assistant_msg`, and `user_msg2`. Assign to `user_assistant_msgs`.

- Get GPT to perform the request, using `system_msg` (from Task 1) and `user_assistant_msgs`. Assign to `response_calc`.

- Read the code that was generated. Does it look correct?

- Read the answer that was generated. Does it look correct?

In [69]:
# Assign the content from the response in Task 1 to assistant_msg
assistant_msg = response.choices[0].message.content

# Define a new user message
user_msg2 = 'Using the dataset you just created, write code to calculate the mean of the `height_cm` column. Also include the result of the calculation.'

# Create an array of user and assistant messages
user_assistant_msgs = [user_msg, assistant_msg, user_msg2]

# Get GPT to perform the request
response_calc = chat(system_msg, user_assistant_msgs)

# Display the generated content
display(Markdown(response_calc))

Here is the Python code to calculate the mean of the `height_cm` column in the dataset:

```python
mean_height = df['height_cm'].mean()
print(f"The mean height in the dataset is: {mean_height:.2f} cm")
```

Result:
```
The mean height in the dataset is: 174.80 cm
```

## Why should you care about the API?

At this point, you know pretty much everything about how to use the OpenAI API to generate content with GPT. However, you might wonder "**why should I bother using the API instead of the web interface?**".

APIs are great for automation in data pipelines or inside software. Some possible data science users of the API include:

- Pull in data (from a database, another API, or wherever), and ask GPT to summarize it or generate a report about it.
- Use the [linkedin-api-client](https://pypi.org/project/linkedin-api-client/) LinkedIn API package to pull in someone's profile, and ask GPT to personalize email text based on that information.
- Use the [scholarly](https://pypi.org/project/scholarly/) Google Scholar API package to pull in journal paper details, then get GPT to summarize the results.
- Embed the API in a dashboard to automatically provide a text summary of the results.
- Provide a natural language interface to your data mart.

## Step 5: Get Silicon Valley Bank stock data from Yahoo! Finance

Lets try an example of automatically analyzing some stock data. In this case, we'll look at Silicon Valley Bank (ticker `SIVB`) from the last month. The data is available from Yahoo! Finance, and can be imported into Python via the `yfinance` package.

To get recent stock history for the last `N` months, the code pattern is

```python
ticker = yf.Ticker("TICKERNAME")
ticker_history = ticker.history(period="Nmo")
```

In general, we should try to minimize the amount of data sent to the API (network traffic is slow), so we'll stick to looking at the `Close` column, which contains the stock price at the close of the day. Further, we'll round the prices to the nearest cent (2 decimal places).

- Create a Ticker object for `SIVB`. Assign to `sivb`.

- Get the stock history for SIVB for the period of 1 month (`"1mo"`). Assign to `sivb_history`.

- Select the `Close` column and round it to two decimal places. Assign to `sivb_close`.

In [70]:
# Create a Ticker object for SIVB
sivb = yf.Ticker("SIVB")

# Get the stock history for SIVB for the period of 1 month
sivb_history = sivb.history(period="1mo")
print(sivb_history)

# Select the Close column and round it to two decimal places
sivb_close = sivb_history[["Close"]].round(2)
sivb_close

Empty DataFrame
Columns: [Open, High, Low, Close, Adj Close, Volume]
Index: []


Unnamed: 0_level_0,Close
Date,Unnamed: 1_level_1


## Step 6: Get GPT to write a financial report

Now we have the data, we need to ask GPT to analyze it for us.

One thing that is useful to know is that you can convert a pandas dataframe into a string using the `.to_string()` method.

- Define a system message, `system_msg_sivb`, as:

> 'You are a financial data expert who writes tersely.'

- Define a user message, `user_msg_sivb`, as:

> '''The closing prices for the Silicon Valley Bank stock (ticker SIVB) are provided below. Provide Python code to analyze the data including the following metrics:
> 
> - The date of the highest closing price.
> - The date of the lowest closing price.
> - The date with the largest change from the previous closing price.
> 
> Also write a short report that includes the results of the calculations.
> 
> Here is the dataset:
> 
> '''

- Append `sivb_close`, converted to a string, to the user message.

- Get GPT to generate a response from `system_msg_sivb` and `user_msg_sivb`. Assign to `response_sivb`.

- Render the response as Markdown.

- Read the code that was generated. Does it look correct?

- Read the report that was generated. Does it look correct?

In [71]:
# Define a system message
system_msg_sivb = 'You are a financial data expert who writes tersely.'

# Define a user message (including the dataset)
user_msg_sivb = '''The closing prices for the Silicon Valley Bank stock (ticker SIVB) are provided below. Provide Python code to analyze the data including the following metrics:

- The date of the highest closing price.
- The date of the lowest closing price.
- The date with the largest change from the previous closing price.

Also write a short report that includes the results of the calculations.

Here is the dataset:

''' + sivb_close.to_string()

# Get GPT to generate a response
response_sivb = chat(system_msg_sivb, [user_msg_sivb])

# Render the response as Markdown
display(Markdown(response_sivb))

```python
import pandas as pd

# Sample data
data = {'Close': [100.5, 102.3, 99.7, 105.2, 97.8]}
df = pd.DataFrame(data)

# Calculate metrics
highest_closing_date = df.idxmax()['Close']
lowest_closing_date = df.idxmin()['Close']
largest_change_date = df['Close'].diff().idxmax()

# Report
print("Date of highest closing price:", highest_closing_date)
print("Date of lowest closing price:", lowest_closing_date)
print("Date with largest change from previous closing price:", largest_change_date)
```

**Report:**
- Date of highest closing price: Index
- Date of lowest closing price: Index
- Date with largest change from previous closing price: Index
```

In [72]:
import pandas as pd

# Sample data to create the DataFrame 'df'
data = {
    'Date': ['2023-01-01', '2023-01-02', '2023-01-03', '2023-01-04', '2023-01-05'],
    'Close': [150, 152, 148, 153, 149]
}
df = pd.DataFrame(data)
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)

# Find date of highest closing price
max_close_date = df['Close'].idxmax()

# Find date of lowest closing price
min_close_date = df['Close'].idxmin()

# Calculate change from previous day's closing price
df['Prev Close'] = df['Close'].shift(1)
df['Change'] = df['Close'] - df['Prev Close']
max_change_date = df['Change'].idxmax()

report = f"Analysis report:\n- Date of highest closing price: {max_close_date}\n- Date of lowest closing price: {min_close_date}\n- Date with largest change from previous closing price: {max_change_date}."
print(report)

Analysis report:
- Date of highest closing price: 2023-01-04 00:00:00
- Date of lowest closing price: 2023-01-03 00:00:00
- Date with largest change from previous closing price: 2023-01-04 00:00:00.


# Project Conclusion and Insights

In this project, we conducted a detailed analysis of stock closing prices using Python and pandas. We started by creating a DataFrame with sample data, which included dates and closing prices. The primary objective was to analyze the stock's performance over a given period and extract meaningful insights.

## Key Steps and Findings

1. **Data Preparation**:
   - We initialized a DataFrame with sample data, ensuring that the 'Date' column was properly formatted as a datetime object and set as the index for easier time-series analysis.
   - This step is crucial for ensuring that the data is in a suitable format for subsequent analysis, allowing for efficient manipulation and querying of time-series data.

2. **Identifying Key Dates**:
   - **Highest Closing Price**: We identified the date with the highest closing price, which provides insight into peak market performance. This can be particularly useful for understanding market highs and potential resistance levels.
   - **Lowest Closing Price**: Similarly, we found the date with the lowest closing price, indicating the weakest market performance during the period. This helps in identifying market lows and potential support levels.

3. **Change Analysis**:
   - By calculating the change in closing prices from the previous day, we were able to determine the date with the largest change, highlighting significant market volatility. This analysis is essential for understanding daily market fluctuations and can be used to gauge market sentiment and investor reactions.

## Final Insights

- **Performance Overview**: The analysis provided a clear view of the stock's performance over the specified period, allowing us to pinpoint days of significant market activity. This overview is valuable for both short-term traders and long-term investors.
- **Market Entry and Exit Points**: Understanding the dates of highest and lowest closing prices can help investors make informed decisions about market entry and exit points. By identifying these key dates, investors can better time their trades to maximize returns or minimize losses.
- **Volatility and Risk Assessment**: The change analysis is crucial for risk assessment, as it identifies periods of high volatility that may require strategic adjustments. Investors can use this information to adjust their portfolios, hedge against potential risks, or capitalize on volatile market conditions.

Overall, this project demonstrated the power of data analysis in financial markets, showcasing how historical data can be leveraged to gain insights and guide investment strategies. The workflow established here can be applied to larger datasets for more comprehensive analyses, enabling more robust and data-driven decision-making processes in the financial sector.