## Getting data from APIs
### Introduction
##### In today's interconnected world, APIs (Application Programming Interfaces) have become the gateway to vast amounts of data, enabling seamless communication between systems. This project embarks on an exciting journey into the realm of APIs, where we'll learn how to extract data directly from a web server with precision and ease. But that’s not all—data in its raw form can be overwhelming, so we’ll dive into transforming it into a clean, structured, and manageable format.

In [1]:
#Install the pydantic-settings Package
#!pip install pydantic-settings

In [2]:
# Import required libraries
import pandas as pd
import requests

### Accessing APIs Through a URL
##### APIs like <b><a href="https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=IBM&apikey=demo">AlphaVantage</a></b> make it simple to access real-time and historical stock market data. By interacting with a URL like the one above, you can request specific information—in this case, daily time series data for IBM stock. The query parameters in the URL specify the function (<b>TIME_SERIES_DAILY</b>), the stock symbol (<b>IBM</b>), and the API key (<b>demo</b>). The server processes this request and returns the data in a structured format, typically JSON, which can then be parsed and transformed for analysis. This hands-on approach not only demystifies APIs but also lays the foundation for leveraging financial data effectively.

In [3]:
url = (
    "https://www.alphavantage.co/query?"
    "function=TIME_SERIES_DAILY&"
    "symbol=IBM&"
    "apikey=demo"
)

print("url type:", type(url))
url

url type: <class 'str'>


'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=IBM&apikey=demo'

### Import Config Module
##### Import the settings variable from the config module to centralize and simplify access to key configuration values, ensuring consistency across the application.

In [9]:
from config import settings

# Use `dir` to list attributes
dir(settings)
settings.model_directory
settings.db_name

'stocks.sqlite'

### URL With Parameters
##### Refer to the AlphaVantage Time Series Daily API documentation and update your URL to include all listed parameters. Make the URL dynamic by defining variables for parameters like <b>ticker<b>, <b>output_size<b>, and <b>data_type<b>.

In [5]:
ticker = "AMBUJACEM.BSE"
output_size = "compact"
data_type = "json"

url = (
    "https://www.alphavantage.co/query?"
    "function=TIME_SERIES_DAILY&"
    f"symbol={ticker}&"
    f"outputsize={output_size}&"
    f"datatype={data_type}&"
    f"apikey={settings.alpha_api_key}"
)

print("url type:", type(url))
url

url type: <class 'str'>


'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AMBUJACEM.BSE&outputsize=compact&datatype=json&apikey=2019c19ecd71029ffd831e6e7a19a782aa15bbf53f92852b6d7570ac579cf20ec42df48349653230d938827cbcabc0ba12a6fb313d58840132a3a117bf4484325d4d87253f370c3c263a2be5c4d4fd3baedd9625289a9655d0f6480e73c633ec6593e423233db134ac3a2d26b3af3512e56ff4f2dabd692350349fd0ad392005'

## Accessing APIs Through a Request
#### We've seen how to access the AlphaVantage API by clicking on a URL, but this won't work for the application we're building in this project because only humans click URLs. Computer programs access APIs by making <b>requests<b>.

### HTTP Request
##### Use the requests library to make a <b>get</b> request to the URL you created in the previous task. Assign the response to the variable <b>response</b>.

In [6]:
response = requests.get(url=url)

print("response type:", type(response))

response type: <class 'requests.models.Response'>


##### That tells us what kind of response we've gotten, but it doesn't tell us anything about what it means. If we want to find out what kinds of data are actually in the response, we'll need to use the <b>dir</b> command.

In [15]:
dir(response)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

##### <b>dir</b> returns a list, and, as you can see, there are lots of possibilities here! For now, let's focus on two attributes: <b>status_code</b> and <b>text</b.

##### We'll start with <b>status_code</b>. Every time you make a call to a URL, the response includes an <a href="https://en.wikipedia.org/wiki/List_of_HTTP_status_codes"><b>HTTP status code</b></a> which can be accessed with the <b>status_code</b> method. Let's see what ours is.

### Response Status Code
##### Assign the status code for your <b>response</b> to the variable <b>response_code</b>.

In [7]:
response_code = response.status_code

print("code type:", type(response_code))
response_code

code type: <class 'int'>


200

##### Translated to English, <b>200</b> means "OK". It's the standard response for a successful HTTP request. In other words, it worked! We successfully received data back from the AlphaVantage API.

##### Now let's take a look at the <b>text</b>.

### Response Text
##### Assign the text for your <b>response</b> to the variable <b>response_text</b>.

In [8]:
response_text = response.text

print("response_text type:", type(response_text))
print(response_text[:200])

response_text type: <class 'str'>
{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "AMBUJACEM.BSE",
        "3. Last Refreshed": "2025-01-21",
        "4. Output 


### Response JSON
##### Use <b>json</b> method to access a dictionary version of the data. Assign it to the variable name <b>response_data</b>.

In [10]:
response_data = response.json()

print("response_data type:", type(response_data))

response_data type: <class 'dict'>


#### Let's check to make sure that the data is structured in the same way we saw in our browser.

### Response Data Keys
##### Print the keys of <b>response_data</b>. Are they what you expected?

In [11]:
# Print `response_data` keys
response_data.keys()

dict_keys(['Meta Data', 'Time Series (Daily)'])

#### Now let's look at data that's assigned to the <b>"Time Series (Daily)"</b> key.

### Stock Data
##### ssign the value for the <b>"Time Series (Daily)"</b> key to the variable <b>stock_data</b>. Then examine the data for one of the days in <b>stock_data</b>.

In [12]:
# Extract `"Time Series (Daily)"` value from `response_data`
stock_data = response_data["Time Series (Daily)"]

print("stock_data type:", type(stock_data))

# Extract data for one of the days in `stock_data`
stock_data.keys()
stock_data['2025-01-21']

stock_data type: <class 'dict'>


{'1. open': '536.9500',
 '2. high': '548.5000',
 '3. low': '530.5500',
 '4. close': '531.6000',
 '5. volume': '41152'}

#### Now that we know how the data is organized when we extract it from the API, let's transform it into a DataFrame to make it more manageable.

### Stock Data to DataFrame
##### Read the data from <b>stock_data</b> into a DataFrame named <b>df_ambuja</b>. Be sure all your data types are correct!

In [13]:
df_ambuja = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)

print("df_ambuja shape:", df_ambuja.shape)
print()
print(df_ambuja.info())
df_ambuja.head()

df_ambuja shape: (100, 5)

<class 'pandas.core.frame.DataFrame'>
Index: 100 entries, 2025-01-21 to 2024-08-28
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   1. open    100 non-null    float64
 1   2. high    100 non-null    float64
 2   3. low     100 non-null    float64
 3   4. close   100 non-null    float64
 4   5. volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7+ KB
None


Unnamed: 0,1. open,2. high,3. low,4. close,5. volume
2025-01-21,536.95,548.5,530.55,531.6,41152.0
2025-01-20,537.4,539.0,530.1,534.45,18346.0
2025-01-17,544.7,544.7,530.25,536.1,23100.0
2025-01-16,530.0,542.9,529.0,539.8,37244.0
2025-01-15,517.0,525.0,513.0,519.25,38040.0


#### All in all, this looks pretty good, but there are a couple of problems: the data type of the dates, and the format of the headers. 

##### Transform the index of <b>df_ambuja</b> into a <b>DatetimeIndex</b> with the name <b>"date"</b> and Remove the numbering from the column names for <b>df_ambuja</b>.

In [14]:
# Convert `df_ambuja` index to `DatetimeIndex`
df_ambuja.index= pd.to_datetime(df_ambuja.index)

# Name index "date"
df_ambuja.index.name= "date"

# Remove numbering from `df_ambuja` column names
df_ambuja.columns = [c.split(". ")[1] for c in df_ambuja.columns]

print(df_ambuja.info())
df_ambuja.head()


<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100 entries, 2025-01-21 to 2024-08-28
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    100 non-null    float64
 1   high    100 non-null    float64
 2   low     100 non-null    float64
 3   close   100 non-null    float64
 4   volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7 KB
None


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-01-21,536.95,548.5,530.55,531.6,41152.0
2025-01-20,537.4,539.0,530.1,534.45,18346.0
2025-01-17,544.7,544.7,530.25,536.1,23100.0
2025-01-16,530.0,542.9,529.0,539.8,37244.0
2025-01-15,517.0,525.0,513.0,519.25,38040.0


#### Note that: The rows in <b>df_ambuja</b> are sorted <b>descending</b>, with the most recent date at the top. This will work to our advantage when we store and retrieve the data from our application database, but we'll need to sort it <b>ascending</b> before we can use it to train a model.

## Defensive Programming
#### Defensive programming is the practice of writing code which will continue to function, even if something goes wrong. We'll never be able to foresee all the problems people might run into with our code, but we can take steps to make sure things don't fall apart whenever one of those problems happens.

#### So far, we've made API requests where everything works. But coding errors and problems with servers are common, and they can cause big issues in a data science project. Let's see how our <b>response</b> changes when we introduce common bugs in our code.

##### Let's formalize our extraction and transformation process for the AlphaVantage API into a reproducible function.

##### Build a <b>get_daily</b> function that gets data from the AlphaVantage API and returns a clean DataFrame.

In [15]:
def get_daily(ticker, output_size="full"):

    # Create URL
    url = (
        "https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?"
        "function=TIME_SERIES_DAILY&"
        f"symbol={ticker}&"
        f"outputsize={output_size}&"
        "datatype=json&"
        f"apikey={settings.alpha_api_key}"
    )

    # Send request to API
    response = requests.get(url)

    # Extract JSON data from response
    response_data = response.json()

    # Read data into DataFrame 
    stock_data = response_data["Time Series (Daily)"]
    df = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)

    # Convert index to `DatetimeIndex` named "date"
    df.index = pd.to_datetime(df.index)
    df.index.name= "date"

    # Remove numbering from columns
    df.columns = [c.split(". ")[1] for c in df.columns]

    # Return DataFrame
    return df

In [16]:
# Test your function
df_ambuja = get_daily(ticker="AMBUJACEM.BSE", output_size="compact")
df_ambuja.head()

Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-01-21,536.95,548.5,530.55,531.6,41152.0
2025-01-20,537.4,539.0,530.1,534.45,18346.0
2025-01-17,544.7,544.7,530.25,536.1,23100.0
2025-01-16,530.0,542.9,529.0,539.8,37244.0
2025-01-15,517.0,525.0,513.0,519.25,38040.0


In [17]:
df_ibm = get_daily(ticker="IBM", output_size="compact")
df_ibm.head()

Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-01-21,224.99,227.45,222.8302,224.26,3982203.0
2025-01-17,225.955,225.955,223.64,224.79,5506837.0
2025-01-16,219.69,222.68,217.38,222.66,3329060.0
2025-01-15,220.87,221.6761,218.01,220.03,2951825.0
2025-01-14,218.0,218.125,214.61,217.75,3485829.0


In [18]:
df_ibm = get_daily(ticker="IBM", output_size="full")
print(df_ibm.info())
df_ibm.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 6344 entries, 2025-01-21 to 1999-11-01
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    6344 non-null   float64
 1   high    6344 non-null   float64
 2   low     6344 non-null   float64
 3   close   6344 non-null   float64
 4   volume  6344 non-null   float64
dtypes: float64(5)
memory usage: 297.4 KB
None


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2025-01-21,224.99,227.45,222.8302,224.26,3982203.0
2025-01-17,225.955,225.955,223.64,224.79,5506837.0
2025-01-16,219.69,222.68,217.38,222.66,3329060.0
2025-01-15,220.87,221.6761,218.01,220.03,2951825.0
2025-01-14,218.0,218.125,214.61,217.75,3485829.0


#### How does this function deal with the two bugs we've explored in this section? Our first error, a bad URL, is something we don't need to worry about. No matter what the user inputs into this function, the URL will always be correct. But see what happens when the user inputs a bad ticker symbol. What's the error message? Would it help the user locate their mistake?

##### Add an <b>if</b> clause to your <b>get_daily</b> function so that it throws an <b>Exception</b> when a user supplies a bad ticker symbol. Be sure the error message is informative.

In [19]:
def get_daily(ticker, output_size="full"):

    # Create URL
    url = (
        "https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?"
        "function=TIME_SERIES_DAILY&"
        f"symbol={ticker}&"
        f"outputsize={output_size}&"
        "datatype=json&"
        f"apikey={settings.alpha_api_key}"
    )

    # Send request to API
    response = requests.get(url)

    # Extract JSON data from response
    response_data = response.json()
    if "Time Series (Daily)" not in response_data.keys():
        raise Exception(f"Invalid API call. Check that ticker symbol '{ticker}' is correct.")

    # Read data into DataFrame 
    stock_data = response_data["Time Series (Daily)"]
    df = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)

    # Convert index to `DatetimeIndex` named "date"
    df.index = pd.to_datetime(df.index)
    df.index.name= "date"

    # Remove numbering from columns
    df.columns = [c.split(". ")[1] for c in df.columns]

    # Return DataFrame
    return df

In [20]:
# Test your Exception
df_test = get_daily(ticker="AMBUJACEM.BSE")
df_test.shape

(4939, 5)

In [23]:
# Test your Exception
df_test = get_daily(ticker="IBM")
df_test.shape

(6344, 5)

In [21]:
# Test your Exception
df_test = get_daily(ticker="ABUJACEM.BSE")
df_test.shape

Exception: Invalid API call. Check that ticker symbol 'ABUJACEM.BSE' is correct.

#### Alright! We now have all the tools we need to get the data for our project. In the next lesson, we'll make our AlphaVantage code more reusable by creating a <b>data</b> module with class definitions. We'll also create the code we need to store and read this data from our application database.

## End