<h1>Stocks Analysis with Python (Bokeh)</h1>

<h2>Background</h2>

<p>Yahoo Finance is a renowned platform that grants free access to an extensive array of financial data, encompassing stock prices, historical records, company information, and more. This accessibility renders it the preferred choice for individual investors, traders, and researchers alike.</p>
<p>Historical stock price data, a pivotal feature offered by Yahoo Finance, enables users to scrutinize past performance, discern trends, and execute backtesting for trading strategies. Such historical data stands as a cornerstone for diverse financial analyses and modeling endeavors.</p>
<p>Yahoo Finance boasts a reputation for reliability in delivering stock data. While occasional discrepancies or delays may arise, particularly during periods of heightened market volatility, Yahoo Finance remains a trusted source for the majority of users' needs.</p>
<p>Leveraging the yFinance library to fetch stock data from Yahoo Finance confers several benefits:</p>
<ul><li>yFinance provides a simple and intuitive interface for accessing historical market data, covering stock prices, dividends, and splits.</li>
<li>Yahoo Finance extends complimentary access to historical stock data for the majority of publicly traded companies, catering to the cost-effective needs of developers and analysts.</li>
<li>Yahoo Finance houses an extensive database of historical stock data, facilitating users in retrieving data for comprehensive long-term analyses and backtesting.</li>
<li>Despite the inherent imperfections of financial data sources, Yahoo Finance stands as a reputable platform, with a longstanding history of furnishing reliable market data.</li>
<li>yFinance seamlessly integrates with the pandas library, a favored data analysis tool in Python, streamlining the manipulation and analysis of stock data within the Python ecosystem.</li>
<li>The yFinance library undergoes active maintenance and updates by the community, ensuring alignment with the latest changes on the Yahoo Finance platform.</li></ul>
<p>Overall, Yahoo Finance emerges as a favored option for obtaining stock data, courtesy of its accessibility, reliability, historical data richness, and user-friendly interface. Nonetheless, it's prudent to acknowledge the existence of alternative sources of stock data, with the choice of platform contingent upon specific requirements, preferences, and intended use cases.</p>

The purpose of this project is to perform Stock Analysis using Python (Bokeh) on the following tickers: GOOG (Alphabet Inc.), TSLA (Tesla, Inc.), AAPL (Apple Inc.), and MSFT (Microsoft Corporation).
<br><br><strong>Alphabet Inc. (GOOG): </strong>
<br>Alphabet Inc. was established in August 2015 by Google's co-founders, Larry Page and Sergey Brin, as part of a strategic restructuring initiative. This move aimed to segregate Google's core internet-related operations, including search, advertising, YouTube, and Android, from its more speculative or non-core ventures.
<br><br><strong>Tesla, Inc. (TSLA): </strong>
<br>Tesla, Inc. was founded in 2003 by engineers Martin Eberhard and Marc Tarpenning with the vision of revolutionizing the automotive industry through the production of electric vehicles, thereby reducing reliance on fossil fuels. In February 2004, Elon Musk joined the company as chairman of the board of directors and later assumed roles as CEO and product architect.
<br><br><strong>Apple Inc. (AAPL): </strong>
<br>Apple Inc. traces its origins back to April 1, 1976, when Steve Jobs, Steve Wozniak, and Ronald Wayne established the company. Initially focused on designing and marketing personal computers, Apple gained prominence with the introduction of the Apple I and Apple II. Throughout its evolution, the company underwent significant transitions, marked by changes in leadership and product emphasis. Key milestones include the debut of the Macintosh computer in 1984, the iPod portable media player in 2001, the revolutionary iPhone smartphone in 2007, and the iPad tablet in 2010.
<br><br><strong>Microsoft Corporation (MSFT): </strong>
<br>Microsoft Corporation was founded on April 4, 1975, by Bill Gates and Paul Allen. Initially specializing in the development and sale of BASIC interpreters for the Altair 8800, Microsoft evolved into one of the world's foremost technology firms. Renowned for its software offerings such as the Windows operating system and the Microsoft Office suite, the company has played a pivotal role in shaping the modern computing landscape.

<h2>Import Libraries</h2>

In [1]:
# Uncomment to install the required libraries if it is not installed
# !pip install bokeh
# !pip install pandas
# !pip install yfinance

# Import the libraries
from bokeh.models import ColumnDataSource, Select, DataTable, TableColumn, Div, TabPanel, Tabs, Spacer
from bokeh.plotting import figure, show
from bokeh.layouts import column, row, layout
from bokeh.io import show, output_notebook
import numpy as np
import pandas as pd
import yfinance as yf
from datetime import date

# Configures Bokeh to render its plots as interactive HTML directly within the notebook.
output_notebook()

<h2>Dowload stock ticker data from Yahoo Finance</h2>

In [2]:
# Declare defaults
DEFAULT_TICKERS = ["AAPL", "GOOG", "MSFT", "TSLA"]
START_DATE, END_DATE = "2020-01-01", date.today()
#=====================================================================
# Initialize an empty dictionary to store the data
stock_data = {}

# Iterate over each ticker symbol
for symbol in DEFAULT_TICKERS:
    # Download the data for the current symbol and store it in the dictionary
    stock_data[symbol] = yf.download(symbol, start=START_DATE, end=END_DATE)

# Accessing the data for each stock
goog_data = stock_data["GOOG"]
tsla_data = stock_data["TSLA"]
msft_data = stock_data["MSFT"]
aapl_data = stock_data["AAPL"]

[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed
[*********************100%%**********************]  1 of 1 completed


<h2>Data Exploration, Cleaning, and Preprocessing</h2>

<h3>Data Exploration</h3>

In [3]:
# Define a list of dataframes
dataframes = [aapl_data, goog_data, msft_data, tsla_data]

# Iterate over each dataframe and print its shape
for i, df in enumerate(dataframes, start=1):
    print(f"Shape of DataFrame {i}: {df.shape}")

Shape of DataFrame 1: (1100, 6)
Shape of DataFrame 2: (1100, 6)
Shape of DataFrame 3: (1100, 6)
Shape of DataFrame 4: (1100, 6)


In [4]:
aapl_data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-05-09,182.559998,184.660004,182.110001,184.570007,184.320007,48983000
2024-05-10,184.899994,185.089996,182.130005,183.050003,183.050003,50759500
2024-05-13,185.440002,187.100006,184.619995,186.279999,186.279999,72044800
2024-05-14,187.509995,188.300003,186.289993,187.429993,187.429993,52393600
2024-05-15,187.910004,190.649994,187.369995,189.720001,189.720001,70356000


In [5]:
goog_data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-05-09,171.149994,172.440002,169.929993,171.580002,171.580002,11937700
2024-05-10,169.690002,171.339996,167.910004,170.289993,170.289993,18740500
2024-05-13,165.847,170.949997,165.759995,170.899994,170.899994,19648600
2024-05-14,171.589996,172.779999,170.419998,171.929993,171.929993,18729500
2024-05-15,172.300003,174.046005,172.029999,173.880005,173.880005,20942700


In [6]:
msft_data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-05-09,410.570007,412.720001,409.100006,412.320007,411.577637,14689700
2024-05-10,412.940002,415.380005,411.799988,414.73999,413.993256,13402300
2024-05-13,418.01001,418.350006,410.820007,413.720001,412.975098,15440200
2024-05-14,412.019989,417.48999,411.549988,416.559998,415.809998,15109300
2024-05-15,417.899994,423.809998,417.269989,423.079987,423.079987,22217500


In [7]:
tsla_data.tail()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2024-05-09,175.009995,175.619995,171.369995,171.970001,171.970001,65950300
2024-05-10,173.050003,173.059998,167.75,168.470001,168.470001,72627200
2024-05-13,170.0,175.399994,169.0,171.889999,171.889999,67018900
2024-05-14,174.5,179.490005,174.070007,177.550003,177.550003,86407400
2024-05-15,179.899994,180.0,173.110001,173.990005,173.990005,79466700


In [8]:
aapl_data.dtypes

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

In [9]:
goog_data.dtypes

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

In [10]:
msft_data.dtypes

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

In [11]:
tsla_data.dtypes

Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object

In [12]:
aapl_data.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,1100.0,1100.0,1100.0,1100.0,1100.0,1100.0
mean,143.951364,145.582798,142.42417,144.072309,142.415852,95796780.0
std,33.695161,33.786895,33.578329,33.686891,34.091311,53387450.0
min,57.02,57.125,53.1525,56.092499,54.632896,24048300.0
25%,126.010002,127.3475,124.570625,125.859375,123.648037,60783420.0
50%,148.205002,149.739998,146.805,148.479996,146.585083,81117950.0
75%,171.162502,173.095001,169.947495,171.530003,170.037498,112440500.0
max,198.020004,199.619995,197.0,198.110001,197.589523,426510000.0


In [13]:
goog_data.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,1100.0,1100.0,1100.0,1100.0,1100.0,1100.0
mean,112.038572,113.372034,110.861205,112.147873,112.147873,28747540.0
std,27.229561,27.355978,27.065582,27.215452,27.215452,12714220.0
min,52.8255,53.566002,50.6768,52.831001,52.831001,6936000.0
25%,89.608747,90.912001,88.611502,89.652124,89.652124,20341420.0
50%,114.592251,116.35825,113.568001,114.868752,114.868752,25461000.0
75%,136.007126,137.37537,134.674622,136.282497,136.282497,33129100.0
max,175.990005,176.419998,172.029999,173.880005,173.880005,97798600.0


In [14]:
msft_data.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,1100.0,1100.0,1100.0,1100.0,1100.0,1100.0
mean,275.048491,277.9008,272.131709,275.143473,270.339338,29878350.0
std,66.258094,66.335822,65.972973,66.194933,67.619614,12823520.0
min,137.009995,140.570007,132.520004,135.419998,130.375549,9200800.0
25%,227.557495,231.217503,224.404995,227.232502,222.102997,21771550.0
50%,266.210007,270.389999,264.529999,267.610001,263.059509,26697150.0
75%,320.990005,324.962509,318.015007,321.814987,318.248535,33786350.0
max,429.829987,430.820007,427.160004,429.369995,428.596924,97012700.0


In [15]:
tsla_data.describe()

Unnamed: 0,Open,High,Low,Close,Adj Close,Volume
count,1100.0,1100.0,1100.0,1100.0,1100.0,1100.0
mean,207.265692,211.895302,202.258157,207.173208,207.173208,130570800.0
std,82.733958,84.374003,80.792162,82.548701,82.548701,85539950.0
min,24.98,26.990667,23.367332,24.081333,24.081333,29401800.0
25%,164.724998,169.532501,161.067497,165.035,165.035,79065900.0
50%,218.256668,222.775002,212.481667,218.041664,218.041664,105871000.0
75%,258.500839,263.752502,253.542503,258.862488,258.862488,150664300.0
max,411.470001,414.496674,405.666656,409.970001,409.970001,914082000.0


In [16]:
# Check for missing values
print(aapl_data.isnull().sum())

Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64


In [17]:
# Check for missing values
print(goog_data.isnull().sum())

Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64


In [18]:
# Check for missing values
print(msft_data.isnull().sum())

Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64


In [19]:
# Check for missing values
print(tsla_data.isnull().sum())

Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64


<h3>Data Cleaning</h3>

In [20]:
# Checking for any missing values in AAPL dataframe
aapl_data.isna().sum()*100/aapl_data.shape[0]

Open         0.0
High         0.0
Low          0.0
Close        0.0
Adj Close    0.0
Volume       0.0
dtype: float64

In [21]:
# Removes any rows from the DataFrame that contain missing values (NaN) 
# and resets the index to a sequential numerical index starting from 0.
#aapl_data.dropna() #-- Since there was no missing values, I had to comment this line

In [22]:
# Checking for any missing values in GOOG dataframe
goog_data.isna().sum()*100/goog_data.shape[0]

Open         0.0
High         0.0
Low          0.0
Close        0.0
Adj Close    0.0
Volume       0.0
dtype: float64

In [23]:
# Removes any rows from the DataFrame that contain missing values (NaN) 
# and resets the index to a sequential numerical index starting from 0.
#goog_data.dropna() #-- Since there was no missing values, I had to comment this line

In [24]:
# Checking for any missing values in MSFT dataframe
msft_data.isna().sum()*100/msft_data.shape[0]

Open         0.0
High         0.0
Low          0.0
Close        0.0
Adj Close    0.0
Volume       0.0
dtype: float64

In [25]:
# Removes any rows from the DataFrame that contain missing values (NaN) 
# and resets the index to a sequential numerical index starting from 0.
#msft_data.dropna() #-- Since there was no missing values, I had to comment this line

In [26]:
# Checking for any missing values in TSLA dataframe
tsla_data.isna().sum()*100/tsla_data.shape[0]

Open         0.0
High         0.0
Low          0.0
Close        0.0
Adj Close    0.0
Volume       0.0
dtype: float64

In [27]:
# Removes any rows from the DataFrame that contain missing values (NaN) 
# and resets the index to a sequential numerical index starting from 0.
#tsla_data.dropna() #-- Since there was no missing values, I had to comment this line

<h3>Data Preprocessing</h3>

In [28]:
# Calculate returns
goog_data['Return'] = goog_data['Adj Close'].pct_change()
tsla_data['Return'] = tsla_data['Adj Close'].pct_change()
msft_data['Return'] = msft_data['Adj Close'].pct_change()
aapl_data['Return'] = aapl_data['Adj Close'].pct_change()

In [29]:
# Calculating daily percentage change (volatility)
goog_volatility = goog_data['Adj Close'].pct_change() * 100
tsla_volatility = tsla_data['Adj Close'].pct_change() * 100
msft_volatility = msft_data['Adj Close'].pct_change() * 100
aapl_volatility = aapl_data['Adj Close'].pct_change() * 100

In [30]:
# Calculate volumes
goog_vol = goog_data['Volume']
tsla_vol = tsla_data['Volume']
msft_vol = msft_data['Volume']
aapl_vol = aapl_data['Volume']

In [31]:
# Create an array of AAPL ['Adj Close'] dataset with associated dates 
aapl = np.array(aapl_data['Adj Close'])
aapl_dates = np.array(aapl_data.index, dtype=np.datetime64)

# Create an array of GOOG ['Adj Close'] dataset with associated dates
goog = np.array(goog_data['Adj Close'])
goog_dates = np.array(goog_data.index, dtype=np.datetime64)

# Create an array of MSFT ['Adj Close'] dataset with associated dates
msft = np.array(msft_data['Adj Close'])
msft_dates = np.array(msft_data.index, dtype=np.datetime64)

# Create an array of TSLA ['Adj Close'] dataset with associated dates
tsla = np.array(tsla_data['Adj Close'])
tsla_dates = np.array(tsla_data.index, dtype=np.datetime64)

<h2>Data Visualization</h2>

<h3>Spacer</h3>

In [32]:
# Create a space to separate rows horizontally
spacer = Spacer(height=100)

<h3>Title</h3>

In [33]:
# Create a title to be displayed on the dashboard
title_div = Div(
    text='''
    <h1>Stock Analysis with Python (Bokeh)</h1>
    <h4>Source: Yahoo Finance [https://finance.yahoo.com/]</h4>
    <hr>''',
    width=600, sizing_mode='scale_width',
)

<h3>Scatter Markers</h3>

In [34]:
# create a new plot with default tools, using figure
p = figure(x_axis_type="datetime", title="Stock Volumes", x_axis_label='Date', 
           y_axis_label='Volume', width=500, height=300, sizing_mode="scale_width")

# add renderer with x and y coordinates, size, color, and alpha
p.scatter(x=tsla_dates, y=tsla_vol, size=5, line_color="red", marker="hex_dot",
          fill_color="red", legend_label= "TSLA", fill_alpha=0.5)
p.scatter(x=msft_dates, y=msft_vol, size=5, line_color="green", marker="star",
       fill_color="green", legend_label= "MSFT", fill_alpha=0.5)
p.scatter(x=goog_dates, y=goog_vol, size=5, line_color="blue", marker="diamond",
          fill_color="blue", legend_label= "GOOG", fill_alpha=0.5)
p.scatter(x=aapl_dates, y=aapl_vol, size=5, line_color="orange", marker="circle",
         fill_color="orange", legend_label= "AAPL", fill_alpha=0.5)
p.title.text_font_size = "18px"
p.title.align = "center"
p.legend.title = "Stock Tickers"
#p.legend.orientation = "horizontal"
p.legend.location = "top_right"
p.legend.label_text_font_size = '8pt'
p.outline_line_width = 5
p.outline_line_alpha = 0.8
p.outline_line_color = "green"

# Display plots or layouts
show(p)

<h3>Line Graph</h3>

In [35]:

line_fig = figure(x_axis_type="datetime", title="Stock Prices", x_axis_label='Date', 
                  y_axis_label='Price', width=500, height=300, sizing_mode="scale_width")

# Add line glyphs
line_fig.line(tsla_dates, tsla, legend_label='TSLA', color='red')
line_fig.line(msft_dates, msft, legend_label='MSFT', color='green')
line_fig.line(goog_dates, goog, legend_label='GOOG', color='blue')
line_fig.line(aapl_dates, aapl, legend_label='AAPL', color='orange')
line_fig.title.text_font_size = "18px"
line_fig.title.align = "center"
line_fig.legend.title = "Stock Tickers"
line_fig.legend.location = "top_left"
#line_fig.legend.orientation = "horizontal"
line_fig.legend.label_text_font_size = '8pt'
line_fig.outline_line_width = 5
line_fig.outline_line_alpha = 0.8
line_fig.outline_line_color = "blue"

# Display plots or layouts
show(line_fig)

<h3>Bar Chart</h3>

In [36]:
# create a vertical bar plot with default tools, using figure # x_axis_label='Date' for TSLA, MSFT, GOOG, and AAPL
tsla_bar = figure(x_axis_type="datetime", title="TSLA Stock Returns", y_axis_label='Return', 
            x_axis_label='Dates', width=500, height=200, sizing_mode="scale_width")
tsla_bar.vbar(x=tsla_dates, top=tsla_data['Return'], width=0.9, color='red')
tsla_bar.title.text_font_size = "18px"
tsla_bar.title.align = "center"
tsla_bar.outline_line_width = 5
tsla_bar.outline_line_alpha = 0.8
tsla_bar.outline_line_color = "red"

msft_bar = figure(x_axis_type="datetime", title="MSFT Stock Returns", y_axis_label='Return', 
            x_axis_label='Dates', width=500, height=200, sizing_mode="scale_width")
msft_bar.vbar(x=msft_dates, top=msft_data['Return'], width=0.9, color='green')
msft_bar.title.text_font_size = "18px"
msft_bar.title.align = "center"
msft_bar.outline_line_width = 5
msft_bar.outline_line_alpha = 0.8
msft_bar.outline_line_color = "green"

goog_bar = figure(x_axis_type="datetime", title="GOOG Stock Returns", y_axis_label='Return', 
            x_axis_label='Dates', width=500, height=200, sizing_mode="scale_width")
goog_bar.vbar(x=goog_dates, top=goog_data['Return'], width=0.9, color='blue')
goog_bar.title.text_font_size = "18px"
goog_bar.title.align = "center"
goog_bar.outline_line_width = 5
goog_bar.outline_line_alpha = 0.8
goog_bar.outline_line_color = "blue"

aapl_bar = figure(x_axis_type="datetime", title="AAPL Stock Returns", y_axis_label='Return', 
            x_axis_label='Dates', width=500, height=200, sizing_mode="scale_width")
aapl_bar.vbar(x=aapl_dates, top=aapl_data['Return'], width=0.9, color='orange')
aapl_bar.title.text_font_size = "18px"
aapl_bar.title.align = "center"
aapl_bar.outline_line_width = 5
aapl_bar.outline_line_alpha = 0.8
aapl_bar.outline_line_color = "orange"

# Add each vertical bar plot to a tabpanel
tab1 = TabPanel(child=aapl_bar, title="AAPL")
tab2 = TabPanel(child=goog_bar, title="GOOG")
tab3 = TabPanel(child=msft_bar, title="MSFT")
tab4 = TabPanel(child=tsla_bar, title="TSLA")

# Assign the tabpanels into tabs
return_bars = Tabs(tabs=[tab4, tab3, tab2, tab1], sizing_mode="scale_width")

# Display plots or layouts
show(return_bars)

<h2>Dashboard</h2>

In [37]:
# Set up a layout with multiple rows containing various Bokeh elements
layout=layout([[title_div],[spacer], [p, line_fig], [spacer], [return_bars]], sizing_mode='scale_width')

# Display plots or layouts
show(layout)