In [1]:
%%html
<!-- Omit this block. This is used for styling this Notebook. -->
<style>
.alert-info {
    color: #000000 !important;
    background-color: #FFFFFF !important;
}
</style>

# Project 1: Visualize Financial Data

*Comparing FANG Stocks With S&P 500*

In this project, we will analyze the returns of high-performing technology stocks - Facebook, Amazon, Netflix, and Google and compare them with the returns of S&P 500 index.

<div class="alert alert-info">
    Fill in missing parts under each <code># Todo</code> comment and run the code block until it outputs a <b>"Passed!"</b> message.
</div>

In [None]:
# Do not change anything in this code block.

from tester import *
import requests
import pandas as pd
import os
import matplotlib.pyplot as plt
from tqdm import tqdm
from datetime import date

API_PATH = """http://app.quotemedia.com/quotetools/getHistoryDownload.csv?&webmasterId=501&startDay={sd}&startMonth={sm}&startYear={sy}&endDay={ed}&endMonth={em}&endYear={ey}&isRanged=true&symbol={sym}"""

start_date, start_month, start_year = 2, 1, 2014
end_date, end_month, end_year = 31, 10, 2019

# In QuoteMedia, months start from 0, so we adjust these variables.
start_month = start_month - 1
end_month = end_month - 1

tickers = ['SPY', 'FB', 'AMZN', 'NFLX', 'GOOG']

## 1. Download Pricing Data

The quickest way to download (free) data is from Quotemedia. Complete the code blocks below to download data for given stock tickers.

First, we create a function to get the data for a **single equity**, store them in a CSV file, and then return a pandas DataFrame object containing these data. 

In [None]:
# Get history of a given ticker.

def get_history(api_path, start_date, start_month, start_year, end_date, end_month, end_year, symbol):
    # Todo: 1. Get the prices from given api path and parameters.
    #       2. Store the file in a csv file with name [symbol].csv.
    #       3. And finally, read that file into a pandas DataFrame object and return it.
    
    return df

assert_history(get_history)

Next, we will perform the above operation for all tickers. Once this is completed, you should have the following csv files in your directory:

```
SPY.csv, FB.csv, AMZN.csv, NFLX.csv, GOOG.csv
```

Also, the function should return a prices dataframe (`pandas.DataFrame` object) with the following specifications:

1. The DataFrame is indexed by `date` column, that is of type `datetime64[ns]`.
2. It should be an **outer union** of all five dataframes we acquired from the QuoteMedia API. If there are two dataframes, one contains data from date 2019-10-01 and the other 2019-10-02, then the union should contain data from both dates.
3. Convert the `adjclose` field of each dataframe into the ticker name of the resulting dataframe. Therefore, the final dataframe should have these columns: **date, SPY, FB, AMZN, NFLX, and GOOG**.

The returned `prices_df` should look like so:

![prices_df.png](media/prices_df-table.png)

In [None]:
# Get prices for all tickers
def get_prices(api_path, start_date, start_month, start_year, end_date, end_month, end_year, tickers):
    # Todo: Create a proper prices_df dataframe here.

    prices_df = pd.DataFrame(columns=['date'])
    for ticker in tqdm(tickers):
        
        prices_df = ...
    # Don't forget to use proper data type for the dates, use it as the index, and then sort
    # the values by the dates, ordered from earliest to latest dates.
    ...
    return prices_df

assert_prices(get_prices)

In [None]:
prices_df = get_prices(API_PATH, start_date, start_month, start_year, end_date, end_month, end_year, tickers)
print(prices_df.head(5))
print(prices_df.tail(5))

## 2. Display Initial Plot

In this task, you will plot the `prices_df` dataframe. The resulting plot should look like so:

<div class="alert alert-info">Plots do not have to look 100% similar with the expected outputs, but they should at least present similar values and color-encoded by column names.</div>

![prices_df plot](media/prices_df-plot.png)

Notice there is an empty area at the beginning for GOOG which shows the unavailability of data during those dates.

In [None]:
# Todo: Plot a line chart showing the prices of all equities we have downloaded.


## 3. Calculate Daily Return Percentages

Now we will calculate daily return percentages for all the dates. Daily return percentage is calculated with the following formula:

$$\text{daily return percentage} = \frac{(\text{end day price}-\text{start day price})}{\text{start day price}}$$

**Important:**

1. **The first value should be set to 0.**
2. **NaN values should not be converted to 0 (we do not want to assume missing data to result in no change to the last price).**

The function below should return a dataframe that looks as follows:

![return percentages table](media/rp_df-table.png)

In [None]:
def get_return_percentages(df):
    # Todo: Calculate daily return percentages.
    
    return returns

assert_return_percentages(get_return_percentages)

In [None]:
rp_df = get_return_percentages(prices_df)
print(rp_df.head(5))
print(rp_df.tail(5))

## 4. Plot Daily Return Percentages and Their Distributions

In this step, plot all return percentages with a simple line chart. The resulting plot should look similar to this:

![daily return percentages](media/rp_df-plot.png)

In [None]:
# Todo: Plot the values of rp_df dataset here


The above chart was not so useful, but it does show you at which times throughout the years did we get the best and worst gains and losses. Next, make a histogram of the values to see how the returns are distributed.

In the following code block, write a code to present distributions of all variables. Please plot them using histograms on Matplotlib subplots. The resulting plots should look as follows:

<div class="alert alert-info">
    <p>For the following plots, make sure they all contain proper titles and x and y-axis labels.</p>
    <p>Note: You may get a couple of warnings "RuntimeWarning: invalid value encountered in greater_equal keep = (tmp_a >= first_edge)" when plotting the histogram, which is expected from having np.NaN values in the plot data. We can omit this error.</p>
</div>

![return price percentages distributions](media/rp_df-distributions.png)

In [None]:
# Todo: Create several plots showing distributions of daily return percentages.

import math
...

From the above chart, we see how the returns distributed for each equity. As expected, they all are normal distribution with mean a small positive number near zero.

## 5. Calculate Cumulative Returns

We ultimately want to compare the performances of these stocks across a given period. To do this, we calculate the cumulative products of daily return percentages.

The resulting dataframe should look as follows:

![cumulative returns](media/cum_rp_df-table.png)

In [None]:
def get_cummulative(df):
    # Todo: +1 to initial values, calculate cumulative products, then -1 from the final results.
    
    return df
assert_cumulative(get_cummulative)

In [None]:
cum_rp_df = get_cummulative(rp_df)

In [None]:
print(cum_rp_df.head(5))
print(cum_rp_df.tail(5))

# 6. Plot Cumulative Returns

And finally, plot the cumulative returns, which should look similar to the following:

![cumulative returns plot](media/cum_rp_df-plot.png)

In [None]:
# Todo: Draw a plot of cumulative daily return percentages here.


## Analysis and Conclusion

Using a few lines of Python code, you have pulled pricing data from an independent source, extracted daily returns from them, and then learned a few characteristics of these daily returns:

1. **Distribution of values.** The SPY index has a much smaller range of returns compared to other stocks that we examined.
2. **Trends.** Taken together, FANG stocks performed better than S&P 500 index.
3. **Correlation of the stocks.** FANG stocks are highly correlated. We will learn more about this in future lessons on asset covariance.

### What next?

You can replace the stocks above with any set of tickers of your choice to perform a basic pricing analysis.

Ideally, we'd want to compare these prices against other variables to check whether there are patterns that emerge. Fundamental values from balance sheet and income statement are a good place to start for this kind of analysis. We will learn more about this in the next lesson, where you will get these values (for free!) from Quantopian platform, and from which you can perform deeper analysis and even building your own algorithm (we will learn about algorithm building on section 3).