## Time Series Project - Functionizing Data Ingestion
In this project, I aim to examine a few different things as well as create a few deliverable items. I want to utilize some time series forecasting models to predict cryptocurrency prices as well as attempt to model prices using outside exogenous factors, such as other relevant stocks and commodities

I also want to analyze any differences in cryptocurrency following Russia's invasion of Ukraine, which took place on February 24th, 2022. 

Lastly, I will create a StreamLit application following this to automate currency juxtaposition and allow the user to analyze different cryptocurrency trends starting from a common date, defined as the furthest date all currencies have data in unison.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()

import time
from datetime import datetime
from datetime import date
from datetime import timedelta

import yfinance
from yahoo_fin.stock_info import get_data

import warnings
warnings.filterwarnings('ignore')


  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,
  from pandas import (to_datetime, Int64Index, DatetimeIndex, Period,


In [2]:
#Creating the current date. Needed for the default functionzed version of creating a table

today = date.today()
prev_day = pd.to_datetime(today - timedelta(days=1)).strftime("%Y-%m-%d")

In [3]:
prev_day

'2022-05-03'

In [4]:
#Some example crypto tickers. A list like this will be used in the function to generate the table.

tickers = ['BTC-USD', 'ETH-USD', 'USDT-USD', 'DOGE-USD', 'SHIB-USD', 'DOT-USD', 'LTC-USD']

In [5]:
ticker = 'BTC-USD'
sample = get_data(ticker, start_date = "2017-01-01", end_date = prev_day, index_as_date = True, interval = '1d')

In [6]:
sample

Unnamed: 0,open,high,low,close,adjclose,volume,ticker
2017-01-03,54.200001,55.240002,52.110001,52.330002,52.330002,727793,CL=F
2017-01-04,52.490002,53.430000,52.150002,53.259998,53.259998,512641,CL=F
2017-01-05,53.389999,54.119999,52.790001,53.759998,53.759998,517362,CL=F
2017-01-06,53.730000,54.320000,53.320000,53.990002,53.990002,528333,CL=F
2017-01-09,53.750000,53.830002,51.759998,51.959999,51.959999,564893,CL=F
...,...,...,...,...,...,...,...
2022-04-26,98.639999,102.779999,97.059998,101.699997,101.699997,351850,CL=F
2022-04-27,101.760002,102.989998,99.800003,102.019997,102.019997,278781,CL=F
2022-04-28,102.110001,105.680000,100.129997,105.360001,105.360001,312064,CL=F
2022-04-29,105.169998,107.989998,103.779999,104.690002,104.690002,294386,CL=F


In [7]:
#Here we store each resulting dataframe after scraping from Yahoo! Finance into a dictionary

crypto_data = {}

for ticker in tickers:
    crypto_data[ticker] = get_data(ticker, start_date = "2017-01-01", end_date = "2022-05-01", index_as_date = True, interval = '1d')
    

In [8]:
crypto_data['BTC-USD']

Unnamed: 0,open,high,low,close,adjclose,volume,ticker
2017-01-01,963.658020,1003.080017,958.698975,998.325012,998.325012,147775008,BTC-USD
2017-01-02,998.617004,1031.390015,996.702026,1021.750000,1021.750000,222184992,BTC-USD
2017-01-03,1021.599976,1044.079956,1021.599976,1043.839966,1043.839966,185168000,BTC-USD
2017-01-04,1044.400024,1159.420044,1044.400024,1154.729980,1154.729980,344945984,BTC-USD
2017-01-05,1156.729980,1191.099976,910.416992,1013.380005,1013.380005,510199008,BTC-USD
...,...,...,...,...,...,...,...
2022-04-27,38120.300781,39397.917969,37997.312500,39241.121094,39241.121094,30981015184,BTC-USD
2022-04-28,39241.429688,40269.464844,38941.421875,39773.828125,39773.828125,33903704907,BTC-USD
2022-04-29,39768.617188,39887.269531,38235.535156,38609.824219,38609.824219,30882994649,BTC-USD
2022-04-30,38605.859375,38771.210938,37697.941406,37714.875000,37714.875000,23895713731,BTC-USD


When creating returns, we'll generate them based off of the closing prices. We also want to create a cut off point where we only include records where all columns across the table have data. i.e., we don't want to include rows where some columns have data but others do not.

## Functionzing 
Now we can put all of this together into a function to return a closing and returns prices table to use for analysis, plotting, and forecasting:

In [44]:
def get_close_column(crypto_dict):
    
    """
    This function returns the closing prices for each chosen cryptocurrency ticker
    
    args:
        crypto_dict (dict): The dictionary containing all cryptocurrency data returned from Yahoo! finance
        
    returns:
        data (DataFrame): A dataframe containing only the closing prices of the chosen cryptocurrencies
    
    """
    keys = crypto_dict.keys()
    
    data = pd.DataFrame()
    
    for key in keys:
        data[key+'_Close'] = crypto_dict[key]['close']
        
    return data


def get_volume_column(crypto_dict):
    
    """
    This function returns the volume for each chosen cryptocurrency ticker
    
    args:
        crypto_dict (dict): The dictionary containing all cryptocurrency data returned from Yahoo! finance
        
    returns:
        data (DataFrame): A dataframe containing only the volumes of the chosen cryptocurrencies
    
    """
    keys = crypto_dict.keys()
    
    data = pd.DataFrame()
    
    for key in keys:
        data[key+'_Volume'] = crypto_dict[key]['volume']
        
    return data

In [50]:
def create_ticker_table(start_date="2017-01-01", interval = '1d'):
    
    """
    This function creates a table that contains closing prices and estimated returns for choice cryptocurrencies
    to be used for analysis, plotting, forecasting, etc.
    
    args:
        tickers (list): A default list containing the choice cryptocurrencies to evaluate
        start_date (date): The beginning date to start the table from
        end_date (date): The last date to generate data up to
        interval (str): The time interval to construct the table from
        
    returns:
        df (DataFrame): The data table containing the data of cryptocurrency closing prices and returns
    
    """
    
    tickers = []
    today = date.today()
    prev_day = pd.to_datetime(today - timedelta(days=1)).strftime("%Y-%m-%d")
    
    stopper = True
    #print('Note: Bitcoin is already in the table by default') #In order to make the function diverse, it has been removed
    
    while stopper:
        crypto_str = input('Enter a ticker from Yahoo! Finance. Enter "No" to continue: ')
        if crypto_str.upper() == 'NO':
            break
            stopper = False
        else:
            tickers.append(crypto_str)
            
    crypto_data = {}
    
    rate = input("Do you want daily or business day data? Enter 'd' for daily and 'b' for business: ")
    rate = rate.lower() #Using lower method for assurance
    
    while rate != 'b' and rate != 'd':
        rate = input("Please enter again: ")
        rate = rate.lower()

    for ticker in tickers:
        crypto_data[ticker] = get_data(ticker, start_date = start_date, end_date = prev_day, index_as_date = True, interval = interval)
    
    closes = get_close_column(crypto_data)
    volumes = get_volume_column(crypto_data)
    
    df_crypto = pd.concat([closes, volumes], axis=1)
    
    close_cols_width = closes.shape[1]
    
    df_crypto = df_crypto.iloc[1:]
    df_crypto = df_crypto.asfreq(freq=rate) #User may decide the rate between d and b
    df_crypto = df_crypto.fillna(method='ffill')
    
    types = closes.columns.tolist()
    
    #squared returns = volatility
    
    for tick in types:
        df_crypto[tick.split('_')[0]+'_Return'] = df_crypto[tick].pct_change(1).mul(100)
        
        #Squared Returns:
        #df_crypto['Squared_'+tick.split(':')[0]+'_Return'] = df_crypto[tick.split(':')[0]+'_Return'].mul(df_crypto[tick.split(':')[0]+'_Return'])
        
    null_elim = df_crypto.iloc[:,:close_cols_width].notna().idxmax().max()
    
    trunc_df_crypto = df_crypto.loc[null_elim:, :].copy() 
    trunc_df_crypto.fillna(0.0, inplace=True)

    print()
    print('********** Table Head **********')
    display(trunc_df_crypto.head(15))
    print('********** Table Tail **********')
    display(trunc_df_crypto.tail(15))
    
    return trunc_df_crypto

In [51]:
custom_crypto_df = create_ticker_table()

Enter a ticker from Yahoo! Finance. Enter "No" to continue: NVDA
Enter a ticker from Yahoo! Finance. Enter "No" to continue: AMD
Enter a ticker from Yahoo! Finance. Enter "No" to continue: no
Do you want daily or business day data? Enter 'd' for daily and 'b' for business: b
b

********** Table Head **********


Unnamed: 0,NVDA_Close,AMD_Close,NVDA_Volume,AMD_Volume,NVDA_Return,AMD_Return
2017-01-04,26.0975,11.43,119922000.0,40781200.0,0.0,0.0
2017-01-05,25.434999,11.24,98429600.0,38855200.0,-2.538559,-1.662297
2017-01-06,25.775,11.32,82285600.0,34453500.0,1.336741,0.711743
2017-01-09,26.82,11.49,91624800.0,37304800.0,4.054317,1.501767
2017-01-10,26.6175,11.44,88092000.0,29201600.0,-0.755031,-0.435163
2017-01-11,26.290001,11.2,52566400.0,39377000.0,-1.230391,-2.0979
2017-01-12,25.860001,10.76,62561600.0,75244100.0,-1.635604,-3.928568
2017-01-13,25.8575,10.58,45782000.0,38377500.0,-0.00967,-1.672865
2017-01-16,25.8575,10.58,45782000.0,38377500.0,0.0,0.0
2017-01-17,25.2775,9.82,58061200.0,70491800.0,-2.243063,-7.183367


********** Table Tail **********


Unnamed: 0,NVDA_Close,AMD_Close,NVDA_Volume,AMD_Volume,NVDA_Return,AMD_Return
2022-04-12,215.039993,95.099998,66225800.0,89246400.0,-1.884384,-2.331318
2022-04-13,222.029999,97.739998,51694300.0,77728400.0,3.250561,2.776025
2022-04-14,212.580002,93.059998,56822500.0,73262000.0,-4.25618,-4.788214
2022-04-15,212.580002,93.059998,56822500.0,73262000.0,0.0,0.0
2022-04-18,217.830002,93.889999,52570100.0,80605800.0,2.469658,0.8919
2022-04-19,221.979996,96.93,51278100.0,77069500.0,1.905153,3.237833
2022-04-20,214.820007,94.019997,46897400.0,62489000.0,-3.225511,-3.00217
2022-04-21,201.830002,89.849998,65620900.0,76680600.0,-6.046925,-4.435225
2022-04-22,195.149994,88.139999,62471300.0,75017700.0,-3.30972,-1.903171
2022-04-25,199.020004,90.690002,64156600.0,93481000.0,1.983095,2.893128
