<h2>Pandas datareader</h2>
<li><span style="color:blue">pandas_datareader</span> is a python library that gets financial data from various sources on the web and returns the data in the form of a dataframe</li>
<li><a href="https://pandas-datareader.readthedocs.io/en/latest/remote_data.html">documented sources</a>, but also others like <span style="color:blue">yahoo finance</span></li>
<li><a href="http://pandas-datareader.readthedocs.io/en/latest/">http://pandas-datareader.readthedocs.io/en/latest/</a></h2>


In [None]:
#!pip install pandas --upgrade
#!pip install pandas-datareader --upgrade

In [None]:
import pandas as pd
import pandas_datareader
print(pd.__version__)
print(pandas_datareader.__version__)

In [None]:
import pandas as pd #pandas library
from pandas_datareader import data as pdr #data readers (yahoo, tingo, etc.)
import numpy as np
import matplotlib.pyplot as plt #Plotting library
import datetime #library for time support

<h1>Pandas and Timeseries data</h1>

<h2>Getting historical stock prices from yahoo finance</h2>
Usage: DataReader(ticker,source,startdate,enddate)<br>

<h2><b>datetime</b>: Python library for reasoning about time</h2>
<li><b>datetime.datetime</b>: functions and objects for time (date+time)</li>
<li><b>datetime.date</b>: functions and objects for date (not time, days only)</li>
<li><b>datetime.timedelta</b>: functions and objects for time differences</li>
<li>The function <b>datetime.datetime.strptime</b> converts a text string into a datetime object</li>
<li>The function <b>datetime.datetime.strftime</b> converts a datetime object into an appropriately formatted text string (formatted printing, saving to a file)</li>

In [None]:
import datetime
today = datetime.date(2023,1,31) #datetime.date object
yesterday = datetime.date(2023,1,30) #datetime.date object
diff = today - yesterday #datetime.timedelta object
print(today,yesterday,diff)
print(type(diff))

In [None]:
now = datetime.datetime.now() #datetime.datetime object
then = datetime.datetime(2000,1,1,0,0,0) #datetime.datetime object
diff = now - then #datetime.timedelta object
print(now, then, diff)

<h3>strptime and strftime</h3>
<li>strptime and strftime handle conversions between datetime objects and locale (the standard used in your locale) formats</li>
<li>Useful when you read time from a file as a string and need to convert it into a datatime object (or vice versa)</li>
<li>See https://pubs.opengroup.org/onlinepubs/007904875/functions/strptime.html for formats</li>


In [None]:
start_time = "01/01/2000 10:26:44 AM"
end_time = "Monday, Sep 26 2024 15:25:21"
dt_start_time = datetime.datetime.strptime(start_time,"%m/%d/%Y %I:%M:%S %p")
dt_end_time = datetime.datetime.strptime(end_time,"%A, %b %d %Y %H:%M:%S")
time_to_completion = dt_end_time - dt_start_time
europe_end_time = datetime.datetime.strftime(dt_end_time,"%d/%m/%Y %H:%M:%S")
print(dt_start_time)
print(dt_end_time)
print(time_to_completion)
print(europe_end_time)

<h2>Getting data from Pandas Datareader</h2>
<li>Datareader, typically, takes upto 5 arguments</li>
<ul>
    <li>A ticker (or other identifier) or a list of tickers (identifiers)</li>
    <li>The source (tiingo, fred, etc.)</li>
    <li>The start date for the data (optional, defaults to 5 years)</li>
    <li>The end date for the data (optional, defaults to now)</li>
    <li>Any additional parameters (optional, e.g., API keys, databases, etc.)

In [None]:
#Example GDP data from FRED
pdr.DataReader("GDP",'fred')

<h3>Stock data from tiingo</h3>
<li>Needs an API key</li>
<li>https://api.tiingo.com/</li>

In [None]:
with open('../credentials/tiingo','r') as f:
    TIINGO_API_KEY = f.read().strip()
    


In [None]:
from pandas_datareader import data as pdr
import datetime
start=datetime.datetime(2000, 1, 1)
end=datetime.datetime.today()
print(start,end)

df = pdr.DataReader(["IBM"], 'tiingo', start, end,api_key=TIINGO_API_KEY).loc["IBM"]

In [None]:
df

<h2>Timeseries data in Pandas</h2>
<li>DataFrames can be organized for timeseries data</li>
<li>Typically, the index is time and the columns are the data objects</li>
<li>The index may be a simple ordering or may contain time enabled data (DatetimeIndex in Pandas)</li>
<li>Pandas datareader returns an index already in DatetimeIndex format</li>

In [None]:
df.info()

<h3>Working with a timeseries data frame</h3>
<li>The data is organized with time as an index</li>
<li>And <span style="color:blue">time based</span> reasoning is possible</li>


In [None]:
#Get me the data for August 2010
df.loc["August 2010"]


<h4>Calculate percent changes</h4>
<li>The function pct_change computes a percent change between successive rows (times in  timeseries data)
<li>Defaults to a single time delta
<li>With an argument, the time delta can be changed
<li>provides nan support
<li><a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html">https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.pct_change.html</a>

In [None]:
p_change = df['adjClose'].pct_change() #One timeperiod percent change
p_change

In [None]:
pct_chg_2 = df.adjClose.pct_change(2) #two time period (t(i) - t(i-2))/t(i-2)
pct_chg_2

<li>Time enabled reasoning lets us reason about different segments of time</li>
<li>The function <span style="color:blue">resample</span> changes the time frame</li>
<li><a href="https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-offset-aliases">resampling offset guide</a></li>

In [None]:
#Get the average adjClose for each month that IBM has traded since 2000
df.adjClose.resample("M").mean()

In [None]:
#Month by month averages
#Collects daily data for each month and computes the mean daily change for that month
print("Mean daily percent change for each month: \n",p_change.resample('M').mean()*100.0)

In [None]:
#Yearly
#Collects daily data for each year and computes the mean daily change for that year

print("Mean daily percent change for each year:\n",p_change.resample('Y').mean()*100.0)

In [None]:
#Quarterly
print("Mean daily percent change for each quarter:\n",p_change.resample('Q').mean())

In [None]:
#df.adjClose.loc["2000-04-30"] #Key error because 4/30/2001 was not a business day
df.adjClose.loc["2000-04-28"]

In [None]:
#Get the month end prices (similarly BQ/BY)
#Alternatives to last: first, max, min, sum, mean

df.adjClose.resample('BM',).apply({'Adj Close':'last'})

<h3>shift a series n-time periods</h3>

In [None]:
df.adjClose

In [None]:
df.adjClose.shift(1,freq='D')

<li>A <span style="color:blue">DatetimeIndex</span> dataframe automatically treats the index as time on the x-axis</li>
<li>Super useful for plotting data</li>

In [None]:
p_change.resample('Y').sum()

In [None]:
p_change.resample('Y').sum().plot()

In [None]:
#Nan's are ignored by aggregate functions
n=13
df.adjClose.pct_change(n).mean()

<h3>Rolling windows</h3>
<li>"rolling" function extracts rolling windows
<li>For example, the 21 period rolling window of the 13 period percent change 

In [None]:
df['close'].pct_change(1)

In [None]:
n=3
df['close'].pct_change().rolling(n)

<h4>Calculate something on the rolling windows</h4>

<h4>Example: mean (the 21 day moving average of the 13 day percent change)

In [None]:
n=3
df['close'].pct_change().rolling(n).mean()

<h4>Calculate several moving averages and graph them</h4>

In [None]:
n=1
ma_8 = df['close'].pct_change(n).rolling(window=8).mean()
ma_13= df['close'].pct_change(n).rolling(window=13).mean()
ma_21= df['close'].pct_change(n).rolling(window=21).mean()
ma_34= df['close'].pct_change(n).rolling(window=34).mean()
ma_55= df['close'].pct_change(n).rolling(window=55).mean()

In [None]:
ma_8[1:500].plot()
ma_55[1:500].plot()

In [None]:
ma_8.plot()

<h3>Numpy style boolean masks work in pandas</h3>

In [None]:
ma_8

In [None]:
ma_8 > 0.001

In [None]:
ma_13[ma_8 > ma_13].mean()

<h3>A simple trading mean reversal trading strategy</h3>
<b>Don't try this at home!</b>
<li>If the stock goes up more than .5% on day 1, it will go down on day 2</li>
<li>What is the expected return from this strategy</li>

In [None]:
up_days = df.adjClose.pct_change() > 0.005
trade_days = up_days.shift(1,fill_value=False)
df.adjClose.pct_change()[trade_days].mean()

<h1>Quick analysis</h1>
<h2>Linear regression with pandas</h2>
<li>Example: TAN is the ticker for a solar ETF. FSLR, NEP, and SPWR are tickers of companies that build or lease solar panels. Each has a different business model. We'll use pandas to study the risk reward tradeoff between the 4 investments and also see how correlated they are</li>
<li>We'll use Tiingo to get pricing data. Tiingo needs an API key. Sign up and get the key from <a href="https://api.tiingo.com/">https://api.tiingo.com/</a></li>

In [None]:
with open('../credentials/tiingo','r') as f:
    TIINGO_API_KEY = f.read().strip()
    


In [None]:

import datetime
import pandas as pd
import pandas_datareader.data as pdr
start = datetime.datetime(2015,7,1)
end = datetime.date.today()
df = pdr.get_data_tiingo(['FSLR', 'TAN','NEP','SPWR'], start,end,api_key=TIINGO_API_KEY)


In [None]:
df

In [None]:
df.index.unique('symbol')

In [None]:
solar_df = pd.DataFrame()
for ind in df.index.unique('symbol'):
    solar_df[ind] = df['adjClose'].loc[ind]


In [None]:
solar_df

<h4>Let's calculate returns (the 1 day percent change)</h4>

In [None]:
rets = solar_df.pct_change()
print(rets)

<h4>Let's visualize the relationship between each stock and the ETF</h4>

In [None]:
import matplotlib.pyplot as plt
plt.scatter(rets.FSLR,rets.TAN)

In [None]:
plt.scatter(rets.NEP,rets.TAN)

In [None]:
plt.scatter(rets.SPWR,rets.TAN)

<h4>The correlation matrix</h4>

In [None]:
solar_corr = rets.corr()
print(solar_corr)

<h3>Basic risk analysis</h3>
<h4>We'll plot the mean and std or returns for each ticker to get a sense of the risk return profile</h4>
<li>And add labels and formatting to each (mean,std) pair for readability</li>
<li>See <a href="https://matplotlib.org/3.3.2/api/_as_gen/matplotlib.pyplot.annotate.html">matplotlib annotate</a></li>

In [None]:
plt.scatter(rets.mean(), rets.std())
plt.xlabel('Expected returns')
plt.ylabel('Standard deviations')


In [None]:
list(zip(rets.columns, rets.mean(), rets.std()))

In [None]:
plt.scatter(rets.mean(), rets.std())
plt.xlabel('Expected returns')
plt.ylabel('Standard deviations')
for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
    plt.annotate(
        label,                                                        #annotation text
        xy = (x, y),                                                  #point being annotated
        xytext = (20, -30),
        textcoords = 'offset points',                                 #text coord fmt (offset points from xy)
        ha = 'right',                                                 #horizontal alignment
        va = 'bottom',                                                #vertical alignment
        bbox = dict(boxstyle = 'round,pad=0.5', fc = 'yellow', alpha = 0.5),  #A yellow box around the text
        arrowprops = dict(arrowstyle = '->', connectionstyle = 'arc3,rad=0')) #arrow to the box
plt.show()


<h2>Regressions</h2>
<li><a href="https://www.statsmodels.org/stable/api.html">statsmodels</a> is a python library for estimating different statistical models</li>
<li>We'll use the <a href="https://www.statsmodels.org/stable/generated/statsmodels.regression.linear_model.OLS.html#statsmodels.regression.linear_model.OLS">OLS</a> package to run a linear regression with daily returns on the ETF as the dependent variable and daily returns on the component stocks as independent variables</li> 


<h3>Steps for regression</h3>
<li>Construct y (dependent variable series)
<li>Construct matrix (dataframe) of X (independent variable series)
<li>Add intercept
<li>Model the regression
<li>Get the results
<h3>The statsmodels library contains various regression packages. We'll use the OLS (Ordinary Least Squares) model

In [None]:
rets = solar_df.pct_change()
rets

In [None]:
import numpy as np
import statsmodels.api as sm
X=rets[['FSLR','NEP','SPWR']] #The independent variables data
X = sm.add_constant(X) #Add a constant (alpha)
y=rets['TAN'] #The dependent variable
model = sm.OLS(y,X,missing='drop') #Build the model (drop missing values)
result = model.fit() #fit the data to the model
print(result.summary()) #Print the results

<h4>If we want, plot the fitted line with the actual y values

In [None]:
fig, ax = plt.subplots(figsize=(8,6))
ax.plot(y[:100])
ax.plot(result.fittedvalues[:100])