# Demo: Normal Distribution of Stock Ticker Data

This program retrieves stock price data using the Alpaca API and plots the data using Pandas. These plots demonstrate the distribution of daily stock closing prices compared to the expected normal probability distribution. 

## Import Dependencies

today, we will work with stock data to forecast possible outcomes using Monte Carlo simulations, as a first step, let's learn how to use histograms and density plots to see the probability distributions in action.

Let's start by importing the required libraries and loading our Alpaca keys from the environment variables stored in the .env. file.

In [8]:
# Import libraries and dependencies
import os
import pandas as pd
import alpaca_trade_api as tradeapi

# Load .env environment variables
from dotenv import load_dotenv
load_dotenv("newkeys.env")

%matplotlib inline

In [9]:
# Set Alpaca API key and secret
# Next, we create the Alpaca API object.
alpaca_api_key = os.getenv("ALPACA_API_KEY")
alpaca_secret_key = os.getenv("ALPACA_SECRET_KEY")

# Create the Alpaca API object
alpaca = tradeapi.REST(
    alpaca_api_key,
    alpaca_secret_key,
    api_version="v2"
)

'ef6UUcXuBIRiUNjlSRYjYWl3yowUUl0LHmyEFqDW'

## Get 1 Year's Worth of Stock Price Data via Alpaca API Call and Read in as DataFrame 

Let's continue by fetching stock price data over one year for Tesla (TSLA) and Coca-Cola (KO) using the Alpaca SDK.

 Be sure to highlight the importance of changing the limit parameter from its default value of 100 to the maximum value of 1,000 (otherwise, we can only pull 100 days per call)

In [None]:
# Set the Tesla and Coca-Cola tickers
ticker = ["TSLA","KO"]

# Set timeframe to '1D'
timeframe = "1D"

# Set start and end datetimes of 1 year, between now and 365 days ago.
start_date = pd.Timestamp("2019-05-01", tz="America/New_York").isoformat()
end_date = pd.Timestamp("2020-05-01", tz="America/New_York").isoformat()

# Get 1 year's worth of historical data for Tesla and Coca-Cola
df_ticker = alpaca.get_barset(
    ticker,
    timeframe,
    start=start_date,
    end=end_date,
    limit=1000,
).df

# Display sample data
df_ticker.head(10)

## Pick closing prices and compute the daily returns

To analyze the probability distribution of these stock prices, we will create a new DataFrame containing only the closing prices over one year, and we will compute the daily returns.

In [None]:
# Create and empty DataFrame for closing prices
df_closing_prices = pd.DataFrame()

# Fetch the closing prices of KO and TSLA
df_closing_prices["KO"] = df_ticker["KO"]["close"]
df_closing_prices["TSLA"] = df_ticker["TSLA"]["close"]

# Drop the time component of the date
df_closing_prices.index = df_closing_prices.index.date

# Compute daily returns
df_daily_returns = df_closing_prices.pct_change().dropna()

# Display sample data
df_daily_returns.head(10)

At a glance, we can get an idea of how the values are distributed by generating the descriptive statistics of a DataFrame using the describe() function.



In [None]:
# Generate descriptive statistics
df_daily_returns.describe()

Observing the standard deviation (std), you can verify how far the values are from the mean. A bigger standard deviation indicates that values are further away from the mean, so the stock prices tend to be more volatile. On the contrary, with lower standard deviation, values are closer to the mean, and stock prices would be less volatile.

### Plot Distributions

We can also visually analyze the probability distribution by plotting a histogram.

In [None]:
# Visualize distribution of Tesla percent change in closing price using a histogram plot
df_daily_returns["TSLA"].plot.hist()

In [None]:
# Visualize distribution of Coca-Cola percent change in closing price using a histogram plot
df_daily_returns["KO"].plot.hist()

In both histogram plots, the distributions resemble our "bell curve" shape of a normal distribution.

the percent change in daily price for both companies have similar probability distributions - smaller changes in daily price are far more likely than large swings in daily price (although they are not impossible!).

#### Besides a histogram, we can use a density plot to visualize a smoother shape of the probability distribution.

In [None]:
# Visualize the distribution of percent change in closing price for both stocks using a density plot
df_daily_returns.plot.density()

A density plot is a variation of the histogram that uses a statistical technique called kernel smoother to plot values in the form of a smooth shape. An advantage of density plots over histograms is that they allow a more straightforward determination of the distribution shape since they are not affected by the number of bins.

When we overlay the two distributions together using the density plot, we can see that Coca-Cola's distribution has a higher frequency of small daily changes compared to Tesla. This is due to the volatility of a stock - the less volatile the stock, the smaller the standard deviation value. A smaller standard deviation means that the stock is less likely to have large (positive or negative) changes in value.

Probability distributions such as the normal distribution help us make educated guesses about what might happen to a stock or commodity in the future. When it comes to the Monte Carlo simulations, the model will randomly select changes that fit within the normal distribution to simulate real-world data best!

#### Even if most pricing distributions not being perfectly normal, as a FinTech professional it's important to understand what a normal distribution is since it's the most common type of distribution assumed in technical analysis of a stock, commodity, or other assets.