In [None]:
8.1. Getting data from APIs

You can't build a model without data, right? In previous projects, we've worked with data stored in files (like a CSV) or databases (like MongoDB or SQL). In this project, we're going to get our data from a web server using an API. So in this lesson, we'll learn what an API is and how to extract data from one. We'll also work on transforming our data into a manageable format. Let's get to it!

import pandas as pd
import requests
import wqet_grader
from IPython.display import VimeoVideo
​
wqet_grader.init("Project 8 Assessment")
VimeoVideo("762464407", h="9da2e7b9bc", width=600)
Accessing APIs Through a URL
In this lesson, we'll extract stock market information from the AlphaVantage API. To get a sense of how an API works, consider the URL below. Take a moment to read the text of the link itself, then click on it and examine the data that appears in your browser. What's the format of the data? What data is included? How is it organized?

VimeoVideo("762464423", h="dc6e027e19", width=600)
https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=IBM&apikey=demo

Notice that this URL has several components. Let's break them down one-by-one.

URL	Component
https://www.alphavantage.co	This is the hostname or base URL. It is the web address for the server where we can get our stock data.
/query	This is the path. Most APIs have lots of different operations they can do. The path is the name of the particular operation we want to access.
?	This question mark denotes that everything that follows in the URL is a parameter. Each parameter is separated by a & character. These parameters provide additional information that will change the operation's behavior. This is similar to the way we pass arguments into functions in Python.
function=TIME_SERIES_DAILY	Our first parameter uses the function keyword. The value is TIME_SERIES_DAILY. In this case, we're asking for daily stock data.
symbol=IBM	Our second parameter uses the symbol keyword. So we're asking for a data on a stock whose ticker symbol is IBM.
apikey=demo	Much in the same way you need a password to access some websites, an API key or API token is the password that you'll use to access the API.
Now that we have a sense of the components of URL that gets information from AlphaVantage, let's create our own for a different stock.

VimeoVideo("762464444", h="c9d35e670c", width=600)
Task 8.1.1: Using the URL above as a model, create a new URL to get the data for Ambuja Cement. The ticker symbol for this company is: "AMBUJACEM.BSE".

What's a web API?
url = (
    "https://www.alphavantage.co/query?"
    "function=TIME_SERIES_DAILY&"
    "symbol=AMBUJACEM.BSE&"
    "apikey=demo"
)
​
print("url type:", type(url))
url
url type: <class 'str'>
'https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol=AMBUJACEM.BSE&apikey=demo'
Oh no! A problem. It looks like we need our own API key access the data. Fortunately, WQU provides you one in your profile settings.

As you can imagine, an API key is information that should be kept secret, so it's a bad idea to include it in our application code. When it comes to sensitive information like this, developers and data scientists store it as an environment variable that's kept in a .env file.

VimeoVideo("762464465", h="27845ecce0", width=600)
Tip: If you can't see your .env file, go to the View menu and select Show Hidden Files.

Task 8.1.2: Get your API key and save it in your .env file.

What's an API key?
What's an environment variable?
Now that we've stored our API key, we need to import it into our code base. This is commonly done by creating a config module.

VimeoVideo("762464478", h="b567b82417", width=600)
Task 8.1.3: Import the settings variable from the config module. Then use the dir command to see what attributes it has.

# Import settings
from config import settings
​
# Use `dir` to list attributes
dir(settings)
settings.model_directory
settings.alpha_api_key
'b78b35579f1e0187d891ad670a84fc4fab1119d9170aa856f7443a44a3295dea43d2eb0217480fa92d94021d626dabf90c9e2ef7c6c0f17f813c6101b6b759d4d5d93bb588a34284e7fb3872b90d6e1cbbf450ad09b535e1428d1bd6c16f41764aa35b7abc750ac603473e406ef448212fe1b910eaa98f62d11c927729edd283'
Beautiful! We have an API key. Since the key comes from WQU, we'll need to use a different base URL to get data from AlphaVantage. Let's see if we can get our new URL for Ambuja Cement working.

VimeoVideo("762464501", h="0d93900843", width=600)
Task 8.1.4: Create a new URL for "AMBUJACEM.BSE". This time, use the base URL "https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?" and incorporate your API key.

What's an f-string?
url = (
    "https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?"
    "function=TIME_SERIES_DAILY&"
    "symbol=AMBUJACEM.BSE&"
    f"apikey={settings.alpha_api_key}"
)
​
​
print("url type:", type(url))
url
url type: <class 'str'>
'https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?function=TIME_SERIES_DAILY&symbol=AMBUJACEM.BSE&apikey=b78b35579f1e0187d891ad670a84fc4fab1119d9170aa856f7443a44a3295dea43d2eb0217480fa92d94021d626dabf90c9e2ef7c6c0f17f813c6101b6b759d4d5d93bb588a34284e7fb3872b90d6e1cbbf450ad09b535e1428d1bd6c16f41764aa35b7abc750ac603473e406ef448212fe1b910eaa98f62d11c927729edd283'
It's working! Turns out there are a lot more parameters. Let's build up our URL to include them.

VimeoVideo("762464518", h="34d8d0a0fd", width=600)
Task 8.1.5: Go to the documentation for the AlphaVantage Time Series Daily API. Expand your URL to incorporate all the parameters listed in the documentation. Also, to make your URL more dynamic, create variable names for all the parameters that can be added to the URL.

What's an f-string?
ticker = "AMBUJACEM.BSE"
output_size = "compact"
data_type = "json"
​
url = (
    "https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?"
    "function=TIME_SERIES_DAILY&"
    f"symbol={ticker}&"
    f"outputsize={output_size}&"
    f"datatype={data_type}&"
    f"apikey={settings.alpha_api_key}"
)
​
​
print("url type:", type(url))
url
url type: <class 'str'>
'https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?function=TIME_SERIES_DAILY&symbol=AMBUJACEM.BSE&outputsize=compact&datatype=json&apikey=b78b35579f1e0187d891ad670a84fc4fab1119d9170aa856f7443a44a3295dea43d2eb0217480fa92d94021d626dabf90c9e2ef7c6c0f17f813c6101b6b759d4d5d93bb588a34284e7fb3872b90d6e1cbbf450ad09b535e1428d1bd6c16f41764aa35b7abc750ac603473e406ef448212fe1b910eaa98f62d11c927729edd283'
Accessing APIs Through a Request
We've seen how to access the AlphaVantage API by clicking on a URL, but this won't work for the application we're building in this project because only humans click URLs. Computer programs access APIs by making requests. Let's build our first request using the URL we created in the previous task.

VimeoVideo("762464549", h="24e94d3560", width=600)
Task 8.1.6: Use the requests library to make a get request to the URL you created in the previous task. Assign the response to the variable response.

What's an HTTP request?
Make an HTTP request using requests.WQU WorldQuant University Applied Data Science Lab QQQQ
response = requests.get(url=url)
​
​
print("response type:", type(response))
response type: <class 'requests.models.Response'>
That tells us what kind of response we've gotten, but it doesn't tell us anything about what it means. If we want to find out what kinds of data are actually in the response, we'll need to use the dir command.

VimeoVideo("762464578", h="a2dd6d0361", width=600)
Task 8.1.7: Use dir command to see what attributes and methods response has.

What's a class attribute?
What's a class method?
# Use `dir` on your `response`
dir(response)
dir returns a list, and, as you can see, there are lots of possibilities here! For now, let's focus on two attributes: status_code and text.

We'll start with status_code. Every time you make a call to a URL, the response includes an HTTP status code which can be accessed with the status_code method. Let's see what ours is.

VimeoVideo("762464598", h="c10c6e4186", width=600)
Task 8.1.8: Assign the status code for your response to the variable response_code.

What's a status code?
response_code = response.status_code
​
print("code type:", type(response_code))
response_code
code type: <class 'int'>
200
Translated to English, 200 means "OK". It's the standard response for a successful HTTP request. In other words, it worked! We successfully received data back from the AlphaVantage API.

Now let's take a look at the text.

VimeoVideo("762464606", h="d3d7dcc1bb", width=600)
Task 8.1.9: Assign the text for your response to the variable response_text.

response_text = response.text
​
print("response_text type:", type(response_text))
print(response_text[:200])
response_text type: <class 'str'>
{
    "Meta Data": {
        "1. Information": "Daily Prices (open, high, low, close) and Volumes",
        "2. Symbol": "AMBUJACEM.BSE",
        "3. Last Refreshed": "2025-01-27",
        "4. Output 
This string looks like the data we previously saw in our browser when we clicked on the URL in Task 8.1.5. But we can't work with data structured as JSON when it's a string. Instead, we need it in a dictionary.

VimeoVideo("762464628", h="2758875cfe", width=600)
Task 8.1.10: Use json method to access a dictionary version of the data. Assign it to the variable name response_data.

What's JSON?
response_data = response.json()
​
print("response_data type:", type(response_data))
response_data type: <class 'dict'>
Let's check to make sure that the data is structured in the same way we saw in our browser.

VimeoVideo("762464643", h="a972b7a34b", width=600)
Task 8.1.11: Print the keys of response_data. Are they what you expected?

List the keys of a dictionary in Python.
# Print `response_data` keys
response_data.keys()
dict_keys(['Meta Data', 'Time Series (Daily)'])
Now let's look at data that's assigned to the "Time Series (Daily)" key.

VimeoVideo("762464662", h="41b72e3308", width=600)
Task 8.1.12: Assign the value for the "Time Series (Daily)" key to the variable stock_data. Then examine the data for one of the days in stock_data.

List the keys of a dictionary in Python.
Access an entry in a dictionary in Python.
# Extract `"Time Series (Daily)"` value from `response_data`
​
stock_data = response_data["Time Series (Daily)"]
​
print("stock_data type:", type(stock_data))
​
# Extract data for one of the days in `stock_data`
​
stock_data.keys()
stock_data['2025-01-27']
stock_data type: <class 'dict'>
{'1. open': '551.2500',
 '2. high': '551.8000',
 '3. low': '533.0000',
 '4. close': '534.6500',
 '5. volume': '122349'}
Now that we know how the data is organized when we extract it from the API, let's transform it into a DataFrame to make it more manageable.

VimeoVideo("762464686", h="bbe7285343", width=600)
Task 8.1.13: Read the data from stock_data into a DataFrame named df_ambuja. Be sure all your data types are correct!

Create a DataFrame from a dictionary in pandas.
Inspect a DataFrame using the shape, info, and head in pandas.
df_ambuja = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)
​
print("df_ambuja shape:", df_ambuja.shape)
print()
print(df_ambuja.info())
df_ambuja.head(10)
Did you notice that the index for df_ambuja doesn't have an entry for all days? Given that this is stock market data, why do you think that is?

All in all, this looks pretty good, but there are a couple of problems: the data type of the dates, and the format of the headers. Let's fix the dates first. Right now, the dates are strings; in order to make the rest of our code work, we'll need to create a proper DatetimeIndex.

VimeoVideo("762464725", h="4408b613a1", width=600)
Task 8.1.14: Transform the index of df_ambuja into a DatetimeIndex with the name "date".

Access the index of a DataFrame using pandas.
Convert data to datetime using pandas.
# Convert `df_ambuja` index to `DatetimeIndex`
​
df_ambuja.index = pd.to_datetime(df_ambuja.index)
​
# Name index "date"
df_ambuja.index.name = "date"
​
print(df_ambuja.info())
df_ambuja.head()
Note that the rows in df_ambuja are sorted descending, with the most recent date at the top. This will work to our advantage when we store and retrieve the data from our application database, but we'll need to sort it ascending before we can use it to train a model.

Okay! Now that the dates are fixed, lets deal with the headers. There isn't really anything wrong with them, but those numbers make them look a little unfinished. Let's get rid of them.

VimeoVideo("762464753", h="5563b3ca4f", width=600)
Task 8.1.15: Remove the numbering from the column names for df_ambuja.

What's a list comprehension?
Write a list comprehension in Python.
Split a string in Python.
[c.split(". ")[1] for c in df_ambuja.columns]
['open', 'high', 'low', 'close', 'volume']
# Remove numbering from `df_ambuja` column names
df_ambuja.columns = [c.split(". ")[1] for c in df_ambuja.columns]
​
print(df_ambuja.info())
df_ambuja.head()
Defensive Programming
Defensive programming is the practice of writing code which will continue to function, even if something goes wrong. We'll never be able to foresee all the problems people might run into with our code, but we can take steps to make sure things don't fall apart whenever one of those problems happens.

So far, we've made API requests where everything works. But coding errors and problems with servers are common, and they can cause big issues in a data science project. Let's see how our response changes when we introduce common bugs in our code.

VimeoVideo("762464781", h="d7dcf16d18", width=600)
Task 8.1.16: Return to Task 8.1.5 and change the first part of your URL. Instead of "query", use "search" (a path that doesn't exist). Then rerun your code for all the tasks that follow. What changes? What stays the same?

We know what happens when we try to access a bad address. But what about when we access the right path with a bad ticker symbol?

VimeoVideo("762464811", h="84ff4d2518", width=600)
Task 8.1.17: Return to Task 8.1.5 and change the ticker symbol from "AMBUJACEM.BSE" to "RAMBUJACEM.BSE" (a company that doesn't exist). Then rerun your code for all the tasks that follow. Again, take note of what changes and what stays the same.

Let's formalize our extraction and transformation process for the AlphaVantage API into a reproducible function.

VimeoVideo("762464843", h="858c9e1388", width=600)
Task 8.1.18: Build a get_daily function that gets data from the AlphaVantage API and returns a clean DataFrame. Use the docstring as guidance. When you're satisfied with the result, submit your work to the grader.

What's a function?
Write a function in Python.
def get_daily(ticker, output_size=("full")):
​
    """Get daily time series of an equity from AlphaVantage API.
​
    Parameters
    ----------
    ticker : str
        The ticker symbol of the equity.
    output_size : str, optional
        Number of observations to retrieve. "compact" returns the
        latest 100 observations. "full" returns all observations for
        equity. By default "full".
​
    Returns
    -------
    pd.DataFrame
        Columns are 'open', 'high', 'low', 'close', and 'volume'.
        All are numeric.
    """
    # Create URL (8.1.5)
    
    url = (
        "https://learn-api.wqu.edu/1/data-services/alpha-vantage/query?"
        "function=TIME_SERIES_DAILY&"
        f"symbol={ticker}&"
        f"outputsize={output_size}&"
        f"datatype=json&"
        f"apikey={settings.alpha_api_key}"
    )
​
    # Send request to API (8.1.6)
    response = requests.get(url=url)
​
    # Extract JSON data from response (8.1.10)
    response_data = response.json()
    
    if "Time Series (Daily)" not in response_data.keys():
        raise Exception(
            f"Invalid API call, Check that ticker symbol '{ticker}' is correct."
        )
    
    # Read data into DataFrame (8.1.12 & 8.1.13)
    stock_data = response_data["Time Series (Daily)"]
    df = pd.DataFrame.from_dict(stock_data, orient="index", dtype=float)
    
    # Convert index to `DatetimeIndex` named "date" (8.1.14)
    df.index = pd.to_datetime(df.index)
        # Name index "date"
    df.index.name = "date"
    
    # Remove numbering from columns (8.1.15)
    df.columns = [c.split(". ")[1] for c in df.columns]
​
    # Return DataFrame
    return df
# Test your function
df_ambuja = get_daily(ticker="AMBUJACEM.BSE")
​
print(df_ambuja.info())
df_ambuja.head()
submission = get_daily(ticker="AMBUJACEM.BSE", output_size="compact")
wqet_grader.grade("Project 8 Assessment", "Task 8.1.18", submission)
Python master 😁

Score: 1

How does this function deal with the two bugs we've explored in this section? Our first error, a bad URL, is something we don't need to worry about. No matter what the user inputs into this function, the URL will always be correct. But see what happens when the user inputs a bad ticker symbol. What's the error message? Would it help the user locate their mistake?

VimeoVideo("762464894", h="6ed1dbb9c4", width=600)
Task 8.1.19: Add an if clause to your get_daily function so that it throws an Exception when a user supplies a bad ticker symbol. Be sure the error message is informative.

What's an Exception?
Raise an Exception in Python.
# Test your Exception
df_test = get_daily(ticker="ABUJACEM.BSE")
Alright! We now have all the tools we need to get the data for our project. In the next lesson, we'll make our AlphaVantage code more reusable by creating a data module with class definitions. We'll also create the code we need to store and read this data from our application database.
