Plotting Historical Share Data 

Analyzing share data of a certain company can be effective at assessing a company's growth. A time series analysis of share data can be utilized to easily visualize the growth rate of a certain company. Plotting a company's closing price every day over a long period of time can display patterns of the rise or fall of a share. These patterns can then be used to forecast future prices and (hopefully) aid in investment strategy. This notebook outlines the process of grabbing data of a certain company and plotting its historical closing prices data using Python and Bokeh, a plotting library in Python. The data will come from the Yahoo finance API. 

The first thing we need to do is import the Bokeh library, specifically the figure module to plot the data and the show module to show the data. We're also going to import the Pandas library that will provide a data structure to hold the date and Pandas datareader which will allow remote access to Yahoo Finance to grab data. 

In [1]:
from bokeh.plotting import figure, show
from bokeh.io import output_notebook
import pandas as pd
from pandas_datareader import data

Next, we should ask the user to type in the ticker symbol of the company they would like to analyze. We will save this symbol in a variable for easier use. Note: not all ticker symbols may work because Yahoo Finance does not include all companies. A premium source of data like Bloomberg would be more reliable and accurate.

In [2]:
ticker = input("Enter a ticker symbol: ")

Enter a ticker symbol: TSLA


We're going to then use the datareader to pull the share data of that company and store it in a dataframe called companyData. A dataframe is a data structure that models a table or a 2 dimensional array. The first argument is the company ticker symbol; the second is the API we are pulling our data from; the next two arguments are the start and end date of the data we want to get.

In [3]:
company = data.DataReader(ticker, 'yahoo', '2003-01-01', '2020-01-19')
company.head()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2010-06-29,25.0,17.540001,19.0,23.889999,18766300,23.889999
2010-06-30,30.42,23.299999,25.790001,23.83,17187100,23.83
2010-07-01,25.92,20.27,25.0,21.959999,8218800,21.959999
2010-07-02,23.1,18.709999,23.0,19.200001,5139800,19.200001
2010-07-06,20.0,15.83,20.0,16.110001,6866900,16.110001


As shown above, the variable company now stores share data, specifically the High, Low, Open, Close, Volume and Adj. Close prices of every day from the start date to the end. The date column is known as the index; it is a not a column that is used as data, but rather a column to number the data. In order to graph the data with the dates on the x-axis, we need to make the date column a column of data itself. The following line does just that.

In [4]:
company.reset_index(inplace=True)
company.head()

Unnamed: 0,Date,High,Low,Open,Close,Volume,Adj Close
0,2010-06-29,25.0,17.540001,19.0,23.889999,18766300,23.889999
1,2010-06-30,30.42,23.299999,25.790001,23.83,17187100,23.83
2,2010-07-01,25.92,20.27,25.0,21.959999,8218800,21.959999
3,2010-07-02,23.1,18.709999,23.0,19.200001,5139800,19.200001
4,2010-07-06,20.0,15.83,20.0,16.110001,6866900,16.110001


Now, the date column is a column of data on its own and a new index column was added that numbers the data sequencially.

We have the data stored in a format that is easy to graph. All that remains now is to graph it. The first step is to create the figure object, a gridded plane on which our data will go on. We can specify the width and height of the graph and provided some labels for the x and y axes. Since our x-axis are dates, we should specify our x-axis as datetime objects so Bokeh can easily format them. The output notebook line is simply included so that the plot works with Jupyter Notebook

In [5]:
output_notebook()
p = figure(plot_width=900, plot_height=600, x_axis_label = 'Year', 
           y_axis_label = 'Closing Prices', x_axis_type="datetime")

We'll then plot our line on the figure with the data. We'll specify that we're graphing a line with the x coordinates as the dates from the data, and the y coordinates as the adjusted closing prices from the data. The source of the data is the variable company. We also set the width and color of the line. 

In [6]:
p.line(x='Date', y='Adj Close', line_width=2, source = company, line_color="#f5a623")

All that remains is simply showing the graph.

In [7]:
show(p)

As you can see, the graph displays the growth of a certain company over a period of 17 years. One problem, however, is that the graph has too much 'noise'. This results in the jagged edges you can see in the graph, which makes it quite difficult to see trends in the data. Thus, in order to better visualize the price trends, we can also graph the 100-day moving average, which is simply the average closing price of the last 100 days. It is considered 'moving' because as the x-coordinate moves up one day, the average is taken of updated values.

We will start by creating a new column of data that stores the mean of the Adj. Close data. The window specifies that we are taking the data points of the past 100 days. One problem with this approach is that the first 99 days does not have values because there aren't 100 values behind it. Thus, the min_periods argument specifies how many observations must occue before a value is found. Setting it to 0 means that the new column will simply use the Adj. Close value for the first 99 values.

In [8]:
company['100ma'] = company['Adj Close'].rolling(window=100, min_periods=0).mean()
company

Unnamed: 0,Date,High,Low,Open,Close,Volume,Adj Close,100ma
0,2010-06-29,25.000000,17.540001,19.000000,23.889999,18766300,23.889999,23.889999
1,2010-06-30,30.420000,23.299999,25.790001,23.830000,17187100,23.830000,23.860000
2,2010-07-01,25.920000,20.270000,25.000000,21.959999,8218800,21.959999,23.226666
3,2010-07-02,23.100000,18.709999,23.000000,19.200001,5139800,19.200001,22.220000
4,2010-07-06,20.000000,15.830000,20.000000,16.110001,6866900,16.110001,20.998000
5,2010-07-07,16.629999,14.980000,16.400000,15.800000,6921700,15.800000,20.131667
6,2010-07-08,17.520000,15.570000,16.139999,17.459999,7711400,17.459999,19.750000
7,2010-07-09,17.900000,16.549999,17.580000,17.400000,4050600,17.400000,19.456250
8,2010-07-12,18.070000,17.000000,17.950001,17.049999,2202500,17.049999,19.188889
9,2010-07-13,18.639999,16.900000,17.389999,18.139999,2680100,18.139999,19.084000


As you can see the first few '100ma' values are the same as the Adj. Close values. But after 100, the '100ma' values are averages of the previous 100 values. 

Now, all we need to do is graph the '100ma' data on a line. This is the same process as the previous line. We will also show the graph. 

In [9]:
p.line(x='Date', y='100ma', line_width=2, source = company, line_color="#d30000")
show(p)

As you can see above, the red line displays the overall price trend. It cuts out the 'noise' of the data and presents a visually appeasing way of following the direction of the prices. 