# API Mini-Project

Focus on equities data from the Frankfurt Stock Exchange (FSE), analyzing the stock prices of a company called Carl Zeiss Meditec (stock ticker AFX_X).

Quandl Usage: https://docs.quandl.com/docs/in-depth-usage

Quandl Parameters: https://docs.quandl.com/docs/parameters-2

Quandl Error codes: https://docs.quandl.com/docs/error-codes

GET composition:
https://www.quandl.com/api/v3/datasets/{database_code}/{dataset_code}/data.{return_format}

### Getting Started: Calling the API

    # import modules
    import requests

    # API key for Qunadl - variable constant in upper case
    API_KEY = ''

The GET composition for the Quandl API is:

(base url) + (database code) + (dataset code) + (return format) + (api_key) + (params)

(base url) https://www.quandl.com/api/v3/datasets/ + (database code) FSE/ + (dataset code) AFX_X/ + (return format) data.json + (api_key) ?API_KEY + (params) ...

For now, we'll use a constant for the url with our database code set to *Frankfurt Stock Exchange (FSE)*, and the dataset code set to *Carl Zeiss Meditec (AFX_X)*. In the future, we could make this more robust by using variables for the database code & dataset code, as parse together the URL string based on user inputs, etc.

    # assign the API's GET url to variable url
    url = 'https://www.quandl.com/api/v3/datasets/FSE/AFX_X/data.json'

A full list of parameters can be found here (https://docs.quandl.com/docs/parameters-2). To begin, we'll test the API call and return the headers using just the API_KEY as a parameter.

    # dictionary of parameters
    params = dict(api_key=API_KEY)

We'll create the variable `res` to hold the response from the Quandl API.

    # use variable 'res' to hold the response
    res = requests.get(url, params=params)

Next, add in a response confirmation conditional and the response status code.

    # evaluate response
    if res:
        print('Response OK')
    else:
        print('Response Failed')
    print(res.status_code)

Finally, print the headers for more information about the API call.

    # review headers
    print(res.headers)

In [1]:
# import modules
import requests

In [2]:
# API key for Qunadl - variable constant in upper case
API_KEY = 'xRBPkDXL3JtxX9HRxAQ-'

# assign the API's GET url to variable url
url = 'https://www.quandl.com/api/v3/datasets/FSE/AFX_X/data.json'

# dictionary of parameters
params = dict(api_key=API_KEY)

# use variable 'res' to hold the response
res = requests.get(url, params=params)

# evaluate response
if res:
    print('Response OK')
else:
    print('Response Failed')
print(res.status_code)

# review headers
print(res.headers)

Response OK
200
{'Date': 'Thu, 26 Mar 2020 20:34:28 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Transfer-Encoding': 'chunked', 'Connection': 'keep-alive', 'Set-Cookie': '__cfduid=d0365e913d0d937758042ba15ff4273e31585254867; expires=Sat, 25-Apr-20 20:34:27 GMT; path=/; domain=.quandl.com; HttpOnly; SameSite=Lax', 'Allow': 'GET, HEAD, POST, PUT, DELETE, OPTIONS, PATCH', 'Cache-Control': 'max-age=0, private, must-revalidate', 'Content-Encoding': 'gzip', 'ETag': 'W/"e10c3bb2c5f49536ce7d14e10278b446"', 'Vary': 'Origin', 'X-Content-Type-Options': 'nosniff', 'X-Frame-Options': 'SAMEORIGIN', 'X-Rack-CORS': 'miss; no-origin', 'X-RateLimit-Limit': '50000', 'X-RateLimit-Remaining': '49980', 'X-Request-Id': '9e9d0941-677a-49dc-8dd7-746ce7090af1', 'X-Runtime': '0.830617', 'X-XSS-Protection': '1; mode=block', 'CF-Cache-Status': 'DYNAMIC', 'Expect-CT': 'max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"', 'Server': 'cloudflare', 'CF-RAY': '57a3a708bd

*Great!* Headers can return useful information about the API. For example, since I called the Quandl GET API, the *Content-Type* returned will be a `application/json` with a *charset type* as `utf-8`.

*Also*, since I'm working with a trial API key, the headers `X-RateLimit-Limit` and `X-RateLimit-Remaining` will be of use to me as well!

Next, let's return a row of data by adding in the `limit` parameter to our `params` dictionary, and passing it '1' to get the latest row.

    # dictionary of parameters
    params = dict(api_key=API_KEY, limit='1')

In [3]:
# dictionary of parameters
params = dict(api_key=API_KEY, limit='1')

# send the GET request for 1 row of return
res = requests.get(url, params=params)

In [4]:
# print the response
print(res.text)

{"dataset_data":{"limit":1,"transform":null,"column_index":null,"column_names":["Date","Open","High","Low","Close","Change","Traded Volume","Turnover","Last Price of the Day","Daily Traded Units","Daily Turnover"],"start_date":"2000-06-07","end_date":"2020-03-25","frequency":"daily","data":[["2020-03-25",82.5,90.4,80.6,83.1,null,284087.0,23870691.15,null,null,null]],"collapse":null,"order":null}}


*Holy keys!* Remember: the json file type is a series of nested *key: value* pairs. Knowing that a json formatted file can be mapped to a dictionary will come in handy when we solve the project tasks below.

# Project Tasks


## 1. Collect data from the Franfurt Stock Exchange, for the ticker AFX_X, for the whole year 2017 (keep in mind that the date format is YYYY-MM-DD).

In order to collect data for the whole year, let's pull some of the time-series parameters from the **quandl documentation**.

| Parameter | Required | Type | Values | Description |
| :--- | :--- | :--- | :--- | :--- |
| start_date | no | string | yyyy-mm-dd | Retrieve data rows on and after the specified start date. |
| end_date | no | string | yyyy-mm-dd | Retrieve data rows up to and including the specified end date. |

In order to accomplish this, we'll update our `params` variable to include the `start_date` and `end_date` parameters, then pass it to the `.get()` parameter `params=`.

In [5]:
# dictionary of parameters
params = dict(api_key=API_KEY, start_date='2017-01-01', end_date='2017-12-31')

# send the GET request for 1 row of return
res = requests.get(url, params=params)

Let's check the response code to see if the GET request was received:

In [6]:
# evaluate response
if res:
    print('Response OK')
else:
    print('Response Failed')
print(res.status_code)

Response OK
200


## 2. Convert the returned JSON object into a Python dictionary.

Since the json file type is formatted as a dictionary, all we'll need to do here is:
1. Encode our response as *json* with `.json()`
2. Pass the response as an argument to the `dict()` constructor

This will take all of the *key: value* pairs within the json-encoded response and map them as a dictionary!

In [7]:
# encode response as json
jres = res.json()

# create dictionary
d = dict(jres)

> Since we explored the API returns earlier, we know that there was a `start_date` and an `end_date` return. Since the response has nested values, we'll need to reference each key in order to get to the lower hierarchy of values.

Let's print the values for these keys to see if our dataset contains all of the 2017 requested data. We can also count the amount of returns to see how many returns came in. According to Wikipedia via Google, 2017 had 251 trading days in the US stock market!

![US Stock Market 2017 trading days](https://github.com/jameshollisandrew/springboard_mini_projects/blob/master/api_mini/trading_days.PNG?raw=true "trading days")



In [8]:
# check start_date field of the response
print(d['dataset_data']['start_date'])

# check end_date field of the response
print(d['dataset_data']['end_date'])

# check the length of the data responses
print(len(d['dataset_data']['data']))

2017-01-01
2017-12-31
255


#### *255...Close enough!*

> moving forward, let's draw up a quick custom function that we can use to build dictionaries from the API returns.

Since we'll be changing the parameters up as needed, let's set the function parameters to take two variables: the parameters to pass to the GET request, and the name of the dictionary we want to create.

In [9]:
def make_dict(params):
    """Pass parameters to this function for the Quandl GET API
       to return a dictionary of the 'data' nested lists made
       from the API's json response"""
    # use variable 'F_res' to hold the response
    f_res = requests.get(url, params=params)

    # encode response as json
    f_json = f_res.json()
    
    f_dict = dict(f_json)
    
    # return dictionary
    return dict(f_dict['dataset_data']['data'])

## 3. Calculate what the highest and lowest opening prices were for the stock in this period.
We'll answer this question in three steps:
1. Identify which column index relates to the *opening prices*, *date* by checking the `column_names` response key
2. Loop through each data entry, checking for the highest, lowest value; then saving the `Date` and `Open` price to variables
3. Return the highest & lowest opening prices by day.

In [10]:
# check column names
print(d['dataset_data']['column_names'])

['Date', 'Open', 'High', 'Low', 'Close', 'Change', 'Traded Volume', 'Turnover', 'Last Price of the Day', 'Daily Traded Units', 'Daily Turnover']


`Date` and `Open` will be the 0 and 1 index values in each of the `data` dictionary lists.

> Did you catch the double brackets earlier when we peeked at the API return structure? That was an indication that the data returned was a *list of lists*. We'll keep that in mind when building our loop.

In [11]:
# create variable for the returned data
data_list = d['dataset_data']['data']

# create highest values
h_price = 0.0
h_date = ''

# create lowest values
l_price = 0.0
l_date = ''

# take each price entry from the data list and compare it to the 
for entry in data_list:
    op = entry[1]
    oh = h_price
    ol = l_price

    # if open price (op) is greater than stored highest; save price & date
    if op > oh:
        h_price = op
        h_date = entry[0]
    
    # if stored lowest is greater than open price (op); save price & date
    if ol > op:
        l_price = op
        l_date = entry[0]
    
print('The highest open price in 2017 was {0}, and occurred on {1}.'.format(h_price, h_date))
print('The lowest open price in 2017 was {0}, and occurred on {1}.'.format(l_price, l_date))

TypeError: '>' not supported between instances of 'NoneType' and 'float'

*Uh oh!* A **TypeError** at the beginning of our first conditional. Since we're populating the variables with the opening prices returned from the API query, there's a good chance it's an issue there. 

> Another hint is in the *TypeError* return script: ...'NoneType'... suggests we might have a null value populated on the iteration that set off the error.

In [12]:
# take each price entry from the data list and compare it to the 
for entry in data_list:
    op = entry[1]
    oh = h_price
    ol = l_price
    
    # insert our 'try' flag for error catching
    try:

        # if open price (op) is greater than stored highest; save price & date
        if op > oh:
            h_price = op
            h_date = entry[0]

        # if stored lowest is greater than open price (op); save price & date
        if ol > op:
            l_price = op
            l_date = entry[0]
    
    except TypeError:
        print(entry)

['2017-05-01', None, 42.245, 41.655, 41.72, -0.44, 86348.0, 3606589.0, None, None, None]
['2017-04-17', None, 42.48, 41.985, 42.2, None, 88416.0, 3734717.0, None, None, None]
['2017-04-14', None, 42.48, 41.985, 42.2, None, 88416.0, 3734717.0, None, None, None]


#### *For some dates we don't have an opening price!* 
While this discovery would be great for further investigation, let's finish answering the question. To do this, we'll add a condition that will `continue` with the next iterate if the `type != float`.

In [13]:
# create variable for the returned data
data_list = d['dataset_data']['data']

# create highest values
h_price = data_list[0][1]
h_date = data_list[0][0]

# create lowest values
l_price = data_list[0][1]
l_date = data_list[0][0]

# take each price entry from the data list and compare it to the 
for entry in data_list:
    op = entry[1]
    oh = h_price
    ol = l_price

    # continue clause to bypass None values
    if type(op) != float:
        continue
    
    # if open price (op) is greater than stored highest; save price & date
    if op > oh:
        h_price = op
        h_date = entry[0]
    
    # if stored lowest is greater than open price (op); save price & date
    if ol > op:
        l_price = op
        l_date = entry[0]

# print answer
print('The highest open price in 2017 was {0}, and occurred on {1}.'.format(("%.2f" % h_price), h_date))
print('The lowest open price in 2017 was {0}, and occurred on {1}.'.format(("%.2f" % l_price), l_date))

The highest open price in 2017 was 53.11, and occurred on 2017-12-14.
The lowest open price in 2017 was 34.00, and occurred on 2017-01-24.


*Great!*

> In the printout, we standardized the value format with the string `("%.2f" % variable)`. We'll use this again throughout the project to keep the price printouts limited to 2 floats.

Another alternative option is to use the list functions `max()` and `min()` on a list of the opening prices. To do this, we would:
1. Loop through the data *list of lists* and create a list of the price values and a list of the dates
2. Save the highest and lowest prices to variables
3. Use the index position of the highest, lowest prices from the price list to return the dates from the date list

In [14]:
# initialize lists
price_list = []
date_list = []

# create list of open prices
for entry in data_list:

    # continue clause to bypass None values
    if type(entry[1]) != float:
        continue
    
    # append the open price to the list
    price_list.append(entry[1])
    
    # append the date to the list
    date_list.append(entry[0])

# save highest, lowest price to variables
h_price = max(price_list)
l_price = min(price_list)

# use index position of price in price_list as an index call to the date_list
h_date = date_list[price_list.index(h_price)]
l_date = date_list[price_list.index(l_price)]

# print answer
print('The highest open price in 2017 was {0}, and occurred on {1}.'.format(("%.2f" % h_price), h_date))
print('The lowest open price in 2017 was {0}, and occurred on {1}.'.format(("%.2f" % l_price), l_date))

The highest open price in 2017 was 53.11, and occurred on 2017-12-14.
The lowest open price in 2017 was 34.00, and occurred on 2017-01-24.


>Sometimes duplicate values might occur. To double check, let's use the `.count` list method and pass our highest, lowest prices to make sure they only occur once.

In [15]:
print(price_list.count(h_price))
print(price_list.count(l_price))

1
1


#### *Only 1! Let's move on!*

## 4. What was the largest change in any one day (based on High and Low price)?
For this question, we'll need to: 
1. Iterate through each list in the data *list of lists*
2. Calculate the difference as *high* - *low*
3. Append the difference to a list
4. Get the max() value from the difference list

In the last problem, we used a standard *for loop* to solve the problem. This time around, let's do it with ***list comprehensions***. 

The general syntax for a list comprehension is:

[ **output** *for clause* ]

In [16]:
# create list of changes
daily_changes = [entry[2]-entry[3] for entry in data_list]

# return largest change
h_change = max(daily_changes)

# create date list
date_list = [entry[0] for entry in data_list]

# use index position of price in as an index call to the date_list
change_date = date_list[daily_changes.index(h_change)]

# print answer
print('The largest change in any one day was {0}, and occurred on {1}.'.format(("%.2f" % h_change), change_date))

The largest change in any one day was 2.81, and occurred on 2017-05-11.


## 5. What was the largest change between any two days (based on Closing Price)?
In order to calculate the change between two days, we'll need to subtract *day 0* closing price from *day 1* closing price. We'll reference that return as *day 1* change.

The closing price is in the 5th position in the list, or *index 4*.

To solve this question, we will:
1. Create a list of closing values
2. Create a list of 'yesterday's close' by copying the closing value list and removing the last index value
3. Create a list of 'today's close' by copying the closing value list, beginning on index position 1
4. Initialize an empty list to catch the changes
5. Zip the today, yesterday lists and iterate through the pairs, calculating the difference
6. Within the zipped list, append each difference to the closing changes list
7. Return the max change between any two days using `.max()` on the closing changes list
8. Using the max value as an index call, return the date that the max difference occurred

In [17]:
# create list of changes
close_list = [entry[4] for entry in reversed(data_list)]

# remove the last day from the close list
close_yest = close_list[:-1].copy()

# remove the first day from the close list
close_today = close_list[1:].copy()

# initalize empty list
closing_changes = []

# calculate change from zipped lists, append closing_changes
for t, y in zip(close_today, close_yest):
    change = t-y
    closing_changes.append(change)

# initialize list, fill starting value
largest_index = 0
largest_change = closing_changes[0]

for idx, x in enumerate(closing_changes):
    if abs(x) > largest_change:
        largest_change = abs(x)
        largest_index = idx

# create date list
date_list = [entry[0] for entry in reversed(data_list)]

# use index position + 1 to offset index adjustment as an index call to the date_list
change_date = date_list[largest_index + 1]

# print answer
print('The largest change between any two days was {0}, and occurred on {1}.'.format(("%.2f" % largest_change), change_date))

The largest change between any two days was 2.56, and occurred on 2017-08-09.


Since we offset the index for the close_today list, we're technically starting with the date_list index position 1. To compensate for this, we've added 1 to the index call from date_list to, 'realign' the indices.
>
> #### *That was kind of messy though.* Let's try it again, but let's make a call to the API using the `transform` argument to calculate the day over day change for us. We'll also use our custom function and a cleaner looping process.

## 5B. What was the largest change between any two days (based on Closing Price)?
For the next API call, we'll be adding arguments for the `column_index=` and `transform=` parameters.

| Parameter | Required | Type | Values | Description |
| :--- | :--- | :--- | :--- | :--- |
|column_index | no | int |  | Request a specific column. Column 0 is the date column and is always returned. Data begins at column 1. |
| transform | no | string | none, diff, rdiff, rdiff_from, cumul, normalize | Perform elementary calculations on the data prior to downloading. Default is none. |

From the Transformations Table:

| Name | Effect | Formula |
| :--- | :--- | :--- |
| none | no effect | z[t] = y[t] |
| diff | row-on-row change | z[t] = y[t] – y[t-1] |

Okie dokie! So, to answer Question 5 again with the improvements we've discussed, let's:
1. Add the `column_index=, transform=` parameters, passing the arguments **4** to subselect the closing price column and **diff** to return the row-on-row change
2. Pass the new params to our custom function to return a dictionary of the data nested values as a dictionary
3. Loop over the dictionary values (i.e. the closing changes) and compare absolute values to return the largest change
4. Return largest change and date to variables, and pass them to the answer


In [18]:
# dictionary of parameters
params = dict(api_key=API_KEY, start_date='2017-01-01', end_date='2017-12-31', column_index=4, transform='diff')

# make a closing change dict
closing_change_dict = make_dict(params)

In [19]:
largest_change_value = 0
largest_change_date = ''

# loop through dictionary items, return largest change, date
for k, v in closing_change_dict.items():
    if abs(v) > largest_change_value:
        largest_change_value = abs(v)
        largest_change_date = k

# print answer
print('The largest change between any two days was {0}, and occurred on {1}.'.format(("%.2f" % largest_change_value), largest_change_date))

The largest change between any two days was 2.56, and occurred on 2017-08-09.


> #### *Much cleaner.* Using the API to calculate the day-to-day difference, a custom function to make the nested data into a dictionary, and then iterating over its items is a lot more controlled than our previous answer.

## 6. What was the average daily trading volume during this year?
Daily trading volume occurs in the 7th position, or *index 6*. This will go pretty fast - we'll just need to:
1. Create a list of the daily trading volume values
2. Calculate the average with `sum()` / `len()`

In [20]:
# create list of daily volumes
daily_volume = [entry[6] for entry in data_list]

# calculate average
annual_volume_avg = sum(daily_volume) / len(daily_volume)

# print answer
print('The average daily trading volume during 2017 was {}.'.format(("%.2f" % annual_volume_avg)))

The average daily trading volume during 2017 was 89124.34.


## 7. (Optional) What was the median trading volume during this year. (Note: you may need to implement your own function for calculating the median.)

In [21]:
# create sorted list
ordered_volume = daily_volume.copy()
ordered_volume.sort()

# get the number of values in the list
length = len(ordered_volume)

# take the middle of the length, rounded
median_index = round(length / 2)

# pass the median index position to the ordered list
median = ordered_volume[median_index]

# print answer
print('The median trading volume during 2017 was {}.'.format(median))

The median trading volume during 2017 was 76600.0.
