## Getting data from APIs

For this project, I want to demonstrate the power of APIs and specifically as mentioned in the email, how we could automate the registration of learners on Nemis to avoid Wrong student details input by the user.

So, besides having a direct database connection with various collaborating organizations i.e MoH, which requires centralization of data in cloud, we could use API. I’m going to get our data from a web server using an API.

In [1]:
# Import Libraries
import pandas as pd
import requests


### Accessing APIs Through a URL

https://www.health.go.ke/query?function=BIRTH_CERT_DATA &symbol=BCN&apikey=demo

Since we Dont have access to MoH Querry or API, 
I'm going to use  AlphaVantage to get information and, create our my API for a different stock.

https://www.alphavantage.co/


Using the URL above as a model, I will create a new URL to get the data for Ambuja Cement. The ticker symbol for this company is: "AMBUJACEM.BSE" as obtained from ALPHAVANTAGE.COM

In [2]:
url = (
    "https://www.alphavantage.co/query?"
    "function=TIME_SERIES_DAILY&"
    "symbol=AMBUJACEM.BSE&"
    "apikey=UINBALTLU0E4ZY67"
)

print("url type:", type(url))
url

url type: <class 'str'>


'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AMBUJACEM.BSE&apikey=UINBALTLU0E4ZY67'

I created my personal API key with Alphavantage.co


an API key is information that should be kept secret, so it's a bad idea to include it in our application code. When it comes to sensitive information like this, developers and data scientists like me store it as an environment variable that's kept in a .env file

Going through the documentation for the AlphaVantage Time Series Daily API, lets expand our URL to incorporate all the parameters listed in the documentation. Also, to make our URL more dynamic, we are going to create variable names for all the parameters that can be added to the URL.

In [3]:
ticker ="AMBUJACEM.BSE"
output_size = "Compact"
data_type = "json"

url = (
       "https://www.alphavantage.co/query?"
    "function=TIME_SERIES_DAILY&"
    f"symbol={ticker}&"
    f"outputsize={output_size}&"
    f"datatype={data_type}&"
    f"apikey=UINBALTLU0E4ZY67"
)


print("url type:", type(url))
url

url type: <class 'str'>


'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AMBUJACEM.BSE&outputsize=Compact&datatype=json&apikey=UINBALTLU0E4ZY67'

### Accessing APIs Through a Request
We've seen how to access the AlphaVantage API by clicking on a URL, but this won't work for the application we're building in this project because only humans click URLs. Computer programs access APIs by making requests. Let's build our first request using the URL we created in the previous task.

Using the requests library to make a get request to the URL you created in the previous task and assigning the response to the variable response.

In [4]:
response = requests.get(url=url)
print("response type:", type(response))

response type: <class 'requests.models.Response'>


Using dir command to see what attributes and methods response has.

In [5]:
# Use `dir` on your `response`
dir(response)

['__attrs__',
 '__bool__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__enter__',
 '__eq__',
 '__exit__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__lt__',
 '__module__',
 '__ne__',
 '__new__',
 '__nonzero__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 '__weakref__',
 '_content',
 '_content_consumed',
 '_next',
 'apparent_encoding',
 'close',
 'connection',
 'content',
 'cookies',
 'elapsed',
 'encoding',
 'headers',
 'history',
 'is_permanent_redirect',
 'is_redirect',
 'iter_content',
 'iter_lines',
 'json',
 'links',
 'next',
 'ok',
 'raise_for_status',
 'raw',
 'reason',
 'request',
 'status_code',
 'text',
 'url']

dir returns a list, and, as you can see, there are lots of possibilities here! For now, let's focus on two attributes: status_code and text.

We'll start with status_code. Every time you make a call to a URL, the response includes an HTTP status code which can be accessed with the status_code method. Let's see what ours is.
Focusing on the attributes "status_code" and "text" typically relates to working with HTTP responses in the context of web development or API interactions. These attributes are commonly used to handle and process the results of HTTP requests.

status_code: This attribute represents the HTTP status code that the server returns in response to a client's request. HTTP status codes indicate the outcome of the request. They are three-digit numbers that provide information about whether the request was successful, encountered an error, or requires further action. Some common status codes include 200 (OK), 404 (Not Found), 500 (Internal Server Error), etc. Developers use status codes to determine the appropriate action to take based on the outcome of the request. For example, a 200 status code typically indicates success, while a 404 code suggests that the requested resource was not found.

text: In the context of HTTP responses, the "text" attribute usually refers to the response body or content returned by the server. After making an HTTP request, the server's response often includes data in the response body, which could be in various formats such as plain text, HTML, JSON, XML, etc. Developers often extract and process this response text to extract relevant information, display it to users, or use it for further processing within their applications.


Therefore, lets assign the status code for our response to the variable response_code.

In [5]:
response_code = response.status_code

print("code type:", type(response_code))
response_code

code type: <class 'int'>


200

Translated to English, 200 means "OK". It's the standard response for a successful HTTP request. In other words, it worked! We successfully received data back from the AlphaVantage API.

Now let's take a look at the text.

Assign the test for your response to the variable response_text.

In [7]:
response_text = response.text

print("response_text type:", type(response_text))
print(response_text[:200])

response_text type: <class 'str'>
{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "AMBUJACEM.BSE",
        "3. Last Refreshed": "2023-08-11",
        "4. Output 


Lets use json method to access a dictionary version of the data. Assigning it to the variable name response_data

In [14]:
response_data = response.json()

print("response_data type:", type(response_data))

response_data type: <class 'dict'>


lets see if the keys corresponds to what we expected. Print the keys of response_data. Are they what we expected?

In [15]:
# Print `response_data` keys
response_data.keys()

dict_keys(['Meta Data', 'Time Series (Daily)'])

Now let's look at data that's assigned to the "Time Series (Daily)" key.

Assigning the value for the "Time Series (Daily)" key to the variable stock_data. Then examinining the data for one of the days in stock_data.

In [22]:
# Extract `"Time Series (Daily)"` value from `response_data`
stock_data = response_data["Time Series (Daily)"]

print("stock_data type:", type(stock_data))

# Extract data for one of the days in `stock_data`

stock_data['2023-08-17']

stock_data type: <class 'dict'>


{'1. open': '437.0000',
 '2. high': '446.8000',
 '3. low': '433.9000',
 '4. close': '445.7000',
 '5. volume': '159490'}

ow that we know how the data is organized when we extract it from the API, let's transform it into a DataFrame to make it more manageable.

VimeoVideo("762464686", h="bbe7285343", width=600)
Task 8.1.13: Read the data from stock_data into a DataFrame named df_ambuja. Be sure all your data types are correct!

In [11]:
df_ambuja = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)` 

print("df_ambuja shape:", df_ambuja.shape)
print()
print(df_ambuja.info())
df_ambuja.head(10)

df_ambuja shape: (100, 5)

<class 'pandas.core.frame.DataFrame'>
Index: 100 entries, 2023-08-11 to 2023-03-17
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   1. open    100 non-null    float64
 1   2. high    100 non-null    float64
 2   3. low     100 non-null    float64
 3   4. close   100 non-null    float64
 4   5. volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7+ KB
None


Unnamed: 0,1. open,2. high,3. low,4. close,5. volume
2023-08-11,456.85,463.4,454.05,456.5,83491.0
2023-08-10,462.05,465.85,454.85,456.85,199093.0
2023-08-09,464.25,466.65,460.7,462.9,60893.0
2023-08-08,473.05,476.0,456.55,466.0,163483.0
2023-08-07,472.25,476.0,468.85,474.0,80191.0
2023-08-04,474.25,477.35,470.4,471.6,115736.0
2023-08-03,464.25,480.9,458.6,474.2,787931.0
2023-08-02,460.65,469.0,449.7,460.95,451281.0
2023-08-01,463.3,468.0,452.65,461.6,125907.0
2023-07-31,454.05,468.6,454.05,462.75,186688.0


Did you notice that the index for df_ambuja doesn't have an entry for all days? Given that this is stock market data, why do you think that is?

All in all, this looks pretty good, but there are a couple of problems: the data type of the dates, and the format of the headers. Let's fix the dates first. Right now, the dates are strings; in order to make the rest of our code work, we'll need to create a proper DatetimeIndex.

VimeoVideo("762464725", h="4408b613a1", width=600)

Task 8.1.14: Transform the index of df_ambuja into a DatetimeIndex with the name "date"

In [12]:
# Convert `df_ambuja` index to `DatetimeIndex`
df_ambuja.index=pd.to_datetime(df_ambuja.index)

# Name index "date"
df_ambuja.index.name="date"

print(df_ambuja.info())
df_ambuja.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100 entries, 2023-08-11 to 2023-03-17
Data columns (total 5 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   1. open    100 non-null    float64
 1   2. high    100 non-null    float64
 2   3. low     100 non-null    float64
 3   4. close   100 non-null    float64
 4   5. volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7 KB
None


Unnamed: 0_level_0,1. open,2. high,3. low,4. close,5. volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-08-11,456.85,463.4,454.05,456.5,83491.0
2023-08-10,462.05,465.85,454.85,456.85,199093.0
2023-08-09,464.25,466.65,460.7,462.9,60893.0
2023-08-08,473.05,476.0,456.55,466.0,163483.0
2023-08-07,472.25,476.0,468.85,474.0,80191.0


Note that the rows in df_ambuja are sorted descending, with the most recent date at the top. This will work to our advantage when we store and retrieve the data from our application database, but we'll need to sort it ascending before we can use it to train a model.

Okay! Now that the dates are fixed, lets deal with the headers. There isn't really anything wrong with them, but those numbers make them look a little unfinished. Let's get rid of them.

VimeoVideo("762464753", h="5563b3ca4f", width=600)

Task 8.1.15: Remove the numbering from the column names for df_ambuja.

In [13]:
# Remove numbering from `df_ambuja` column names
df_ambuja.columns = [c.split(". ")[1] for c in df_ambuja.columns]

print(df_ambuja.info())
df_ambuja.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100 entries, 2023-08-11 to 2023-03-17
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    100 non-null    float64
 1   high    100 non-null    float64
 2   low     100 non-null    float64
 3   close   100 non-null    float64
 4   volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7 KB
None


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-08-11,456.85,463.4,454.05,456.5,83491.0
2023-08-10,462.05,465.85,454.85,456.85,199093.0
2023-08-09,464.25,466.65,460.7,462.9,60893.0
2023-08-08,473.05,476.0,456.55,466.0,163483.0
2023-08-07,472.25,476.0,468.85,474.0,80191.0


Defensive Programming
Defensive programming is the practice of writing code which will continue to function, even if something goes wrong. We'll never be able to foresee all the problems people might run into with our code, but we can take steps to make sure things don't fall apart whenever one of those problems happens.

So far, we've made API requests where everything works. But coding errors and problems with servers are common, and they can cause big issues in a data science project. Let's see how our response changes when we introduce common bugs in our code.

VimeoVideo("762464781", h="d7dcf16d18", width=600)

Task 8.1.16: Return to Task 8.1.5 and change the first part of your URL. Instead of "query", use "search" (a path that doesn't exist). Then rerun your code for all the tasks that follow. What changes? What stays the same?

We know what happens when we try to access a bad address. But what about when we access the right path with a bad ticker symbol?

VimeoVideo("762464811", h="84ff4d2518", width=600)

Task 8.1.17: Return to Task 8.1.5 and change the ticker symbol from "AMBUJACEM.BSE" to "RAMBUJACEM.BSE" (a company that doesn't exist). Then rerun your code for all the tasks that follow. Again, take note of what changes and what stays the same.

Let's formalize our extraction and transformation process for the AlphaVantage API into a reproducible function.

VimeoVideo("762464843", h="858c9e1388", width=600)

Task 8.1.18: Build a get_daily function that gets data from the AlphaVantage API and returns a clean DataFrame. Use the docstring as guidance. When you're satisfied with the result, submit your work to the grader.

In [14]:
def get_daily(ticker, output_size="full"):

    """Get daily time series of an equity from AlphaVantage API.

    Parameters
    ----------
    ticker : str
        The ticker symbol of the equity.
    output_size : str, optional
        Number of observations to retrieve. "compact" returns the
        latest 100 observations. "full" returns all observations for
        equity. By default "full".

    Returns
    -------
    pd.DataFrame
        Columns are 'open', 'high', 'low', 'close', and 'volume'.
        All are numeric.
    """
    # Create URL (8.1.5)
    url = (
       "https://www.alphavantage.co/query?"
        "function=TIME_SERIES_DAILY&"
        f"symbol={ticker}&"
        f"outputsize={output_size}&"
        f"datatype=json&"
        f"apikey=UINBALTLU0E4ZY67"
    )




    # Send request to API (8.1.6)
    response = requests.get(url=url)

    # Extract JSON data from response (8.1.10)
    response_data = response.json()
    if "Time Series (Daily)" not in response_data.keys():
        raise Exception(
            f"Invalid API call. Check the ticker symbol'{ticker}' is correct."
        )
    # Read data into DataFrame (8.1.12 & 8.1.13)
    stock_data = response_data["Time Series (Daily)"]
    df= pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)

    # Convert index to `DatetimeIndex` named "date" (8.1.14)
    df.index=pd.to_datetime(df.index)
    df.index.name="date"

    # Remove numbering from columns (8.1.15)
    df.columns = [c.split(". ")[1] for c in df.columns]

    # Return DataFrame
    return df

In [15]:
# Test your function
df_ambuja = get_daily(ticker="AMBUJACEM.BSE", output_size="compact")

print(df_ambuja.info())
df_ambuja.head()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 100 entries, 2023-08-11 to 2023-03-17
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   open    100 non-null    float64
 1   high    100 non-null    float64
 2   low     100 non-null    float64
 3   close   100 non-null    float64
 4   volume  100 non-null    float64
dtypes: float64(5)
memory usage: 4.7 KB
None


Unnamed: 0_level_0,open,high,low,close,volume
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2023-08-11,456.85,463.4,454.05,456.5,83491.0
2023-08-10,462.05,465.85,454.85,456.85,199093.0
2023-08-09,464.25,466.65,460.7,462.9,60893.0
2023-08-08,473.05,476.0,456.55,466.0,163483.0
2023-08-07,472.25,476.0,468.85,474.0,80191.0


How does this function deal with the two bugs we've explored in this section? Our first error, a bad URL, is something we don't need to worry about. No matter what the user inputs into this function, the URL will always be correct. But see what happens when the user inputs a bad ticker symbol. What's the error message? Would it help the user locate their mistake?

VimeoVideo("762464894", h="6ed1dbb9c4", width=600)

Task 8.1.19: Add an if clause to your get_daily function so that it throws an Exception when a user supplies a bad ticker symbol. Be sure the error message is informative.

In [16]:
# Test your Exception
df_test = get_daily(ticker="ABUJACEM.BSE")

Exception: Invalid API call. Check the ticker symbol'ABUJACEM.BSE' is correct.

Alright! We now have all the tools we need to get the data for our project. In the next lesson, we'll make our AlphaVantage code more reusable by creating a data module with class definitions. We'll also create the code we need to store and read this data from our application database.