## Enriching stock market data using Open AI API 

<p align="center">
    <img src="images/nasdaq100.png" width="450">
</p>

The Nasdaq-100 is a stock market index made up of 101 equity securities issued by 100 of the largest non-financial companies listed on the Nasdaq stock exchange. It helps investors compare stock prices with previous prices to determine market performance.

In this project you are provided with two CSV files containing Nasdaq-100 stock information:
- _**nasdaq100.csv**_: contains information about companies in the index such as symbol, name, etc.
- _n**asdaq100_price_change.csv**_: contains price changes per stock across periods including (but not limited to) one day, five days, one month, six months, one year, etc.

As an AI developer, you will leverage the OpenAI API to classify companies into sectors and produce a summary of sector and company performance for this year.

# CSV with Nasdaq-100 stock data

In this project, you have available two CSV files `nasdaq100.csv` and `nasdaq100_price_change.csv`.

## nasdaq100.csv

```py
symbol,name,headQuarter,dateFirstAdded,cik,founded
AAPL,Apple Inc.,"Cupertino, CA",,0000320193,1976-04-01
ABNB,Airbnb,"San Francisco, CA",,0001559720,2008-08-01
ADBE,Adobe Inc.,"San Jose, CA",,0000796343,1982-12-01
ADI,Analog Devices,"Wilmington, MA",,0000006281,1965-01-01
...
```

## nasdaq100_price_change.csv

```py
symbol,1D,5D,1M,3M,6M,ytd,1Y,3Y,5Y,10Y,max
AAPL,-1.7254,-8.30086,-6.20411,3.042,15.64824,42.99992,8.47941,60.96299,245.42031,976.99441,139245.53954
ABNB,2.1617,-2.21919,9.88336,19.43286,19.64241,68.66902,23.64013,-1.04347,-1.04347,-1.04347,-1.04347
ADBE,0.5409,-1.77817,9.16191,52.0465,38.01522,57.22723,21.96206,17.83037,109.05718,1024.69214,251030.66399
ADI,0.9291,-4.03352,2.58486,3.65887,5.01602,17.02062,8.09735,63.42847,92.81874,286.77518,26012.63736
...
```

## Before you start

In order to complete the project you will need to create a developer account with OpenAI and store your API key as an environment variable. Instructions for these steps are outlined below.

### Create a developer account with OpenAI

1. Go to the [API signup page](https://platform.openai.com/signup). 

2. Create your account (you'll need to provide your email address and your phone number).

<img src="images/openai-create-account.jpeg" width="200">

3. Go to the [API keys page](https://platform.openai.com/account/api-keys). 

4. Create a new secret key.

<img src="images/openai-new-secret-key.png" width="200">

5. **Take a copy of it**. (If you lose it, delete the key and create a new one.)

### Add a payment method

OpenAI sometimes provides free credits for the API, but it's not clear if that is worldwide or what the conditions are. You may need to add debit/credit card details. 

**The API costs [$0.002 / 1000 tokens](https://openai.com/pricing) for GPT-3.5-turbo. [1000 tokens is about 750 words](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them). This project should cost less than 1 US cents (but if you rerun tasks, you will be charged every time).**

1. Go to the [Payment Methods page](https://platform.openai.com/account/billing/payment-methods).

2. Click Add payment method.

<img src="images/openai-add-payment-method.png" width="200">

3. Fill in your card details.

### Add an environmental variable with your OpenAI key

1. In DataLab, click on "Environment," in the menu.

2. Click on "Environment variables" to add environment variables.

3. In the "Name" field, type "OPENAI_API_KEY". In the "Value" field, paste in your secret key.

<img src="images/workspace-env-var-details.png" width="500">

4. Click "Create", and following instructions to copy the environment variable for use via the `os` library.

See [this article](https://datalab-docs.datacamp.com/work/environment-variables) for further guidance.

In [17]:
# Start your code here!
import os
import pandas as pd
from openai import OpenAI

# Instantiate an API client
# If you named your environment variable differently 
# then change "OPENAI_API_KEY" to reflect the variable name
client = OpenAI(api_key=os.environ["OPENAI"])

# Continue coding here
# Use as many cells as you like

In [18]:
# Read in the two datasets
nasdaq100 = pd.read_csv("nasdaq100.csv")
price_change = pd.read_csv("nasdaq100_price_change.csv")

In [19]:
nasdaq100.columns

Index(['symbol', 'name', 'headQuarter', 'dateFirstAdded', 'cik', 'founded'], dtype='object')

In [20]:
# add a column called "ytd" containing year to date (YTD) performance for each company
# Add ytd into nasdaq100
nasdaq100 = nasdaq100.merge(price_change[["symbol", "ytd"]], on="symbol", how="inner")

In [21]:
nasdaq100.columns

Index(['symbol', 'name', 'headQuarter', 'dateFirstAdded', 'cik', 'founded',
       'ytd'],
      dtype='object')

In [22]:
# Preview the combined dataset
nasdaq100.head()

Unnamed: 0,symbol,name,headQuarter,dateFirstAdded,cik,founded,ytd
0,AAPL,Apple Inc.,"Cupertino, CA",,320193,1976-04-01,42.99992
1,ABNB,Airbnb,"San Francisco, CA",,1559720,2008-08-01,68.66902
2,ADBE,Adobe Inc.,"San Jose, CA",,796343,1982-12-01,57.22723
3,ADI,Analog Devices,"Wilmington, MA",,6281,1965-01-01,17.02062
4,ADP,ADP,"Roseland, NJ",,8670,1949-01-01,5.53732


In [23]:
# Use the OpenAI API to classify each stock into a sector, saving as a new column in the nasdaq100 DataFrame called "sector" with the following values: Technology, Consumer Cyclical, Industrials, Utilities, Healthcare, Communication, Energy, Consumer Defensive, Real Estate, or Financial.

# Loop through the NASDAQ companies
for company in nasdaq100["symbol"]:
    # Create a prompt to enrich nasdaq100 using OpenAI
    prompt = f'''Classify company {company} into one of the following sectors. Answer only with the sector name: Technology, Consumer Cyclical, Industrials, Utilities, Healthcare, Communication, Energy, Consumer Defensive, Real Estate, Financial.
'''
    # Create a request to the completions endpoint
    response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{ "role": "user", "content": prompt}],
        temperature=0.0,
    )
    # Store the output as a variable called sector
    sector = response.choices[0].message.content
    
    # Add the sector for the corresponding company
    nasdaq100.loc[nasdaq100["symbol"] == company, "Sector"] = sector

In [24]:
nasdaq100.head()

Unnamed: 0,symbol,name,headQuarter,dateFirstAdded,cik,founded,ytd,Sector
0,AAPL,Apple Inc.,"Cupertino, CA",,320193,1976-04-01,42.99992,Technology
1,ABNB,Airbnb,"San Francisco, CA",,1559720,2008-08-01,68.66902,Real Estate
2,ADBE,Adobe Inc.,"San Jose, CA",,796343,1982-12-01,57.22723,Technology
3,ADI,Analog Devices,"Wilmington, MA",,6281,1965-01-01,17.02062,Technology
4,ADP,ADP,"Roseland, NJ",,8670,1949-01-01,5.53732,Technology


In [25]:
# Count the number of sectors
nasdaq100["Sector"].value_counts()

Technology            52
Healthcare            15
Consumer Cyclical      7
Financial              6
Industrials            6
Communication          5
Consumer Defensive     5
Utilities              3
Real Estate            2
Name: Sector, dtype: int64

In [26]:
# Use the OpenAI API to provide summary information about Nasdaq-100 stock performance YTD, recommending the three best sectors and three or more companies per sector, storing as a variable called stock_recommendations.

# Prompt to get stock recommendations
prompt = f'''Provide summary information about Nasdaq-100 stock performance year to date (YTD), recommending the three best sectors               and three or more companies per sector.
            Company data: {nasdaq100} 
'''

In [27]:
# Get the model response
response = client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[{ "role": "user", "content": prompt}],
        temperature=0.0,
    )

In [28]:
# Store the output as a variable and print the recommendations
stock_recommendations = response.choices[0].message.content
print(stock_recommendations)

The Nasdaq-100 stock performance year to date (YTD) has been positive overall, with many companies showing significant gains. The three best sectors based on YTD performance are Technology, Real Estate, and Consumer Discretionary.

Three top companies in the Technology sector are:
1. Apple Inc. (AAPL) - YTD performance: 42.99992%
2. Adobe Inc. (ADBE) - YTD performance: 57.22723%
3. Zoom Video Communications (ZM) - YTD performance: 3.00030%

Three top companies in the Real Estate sector are:
1. Airbnb (ABNB) - YTD performance: 68.66902%
2. Zillow Group (Z) - YTD performance: 45.67890%
3. Redfin Corporation (RDFN) - YTD performance: 33.45678%

Three top companies in the Consumer Discretionary sector are:
1. Amazon.com Inc. (AMZN) - YTD performance: 25.67890%
2. Tesla Inc. (TSLA) - YTD performance: 40.56789%
3. Nike Inc. (NKE) - YTD performance: 18.98765%
