# Statisticals Fundamentals

In this activity, students will code along with the instructor to get the opportunity to practice using statistical fundamentals to create a report for a group of stocks. The report will give a recommendation for each stock on whether it is over or under valued and more or less volatile than the market.

## Importing Required Modules

In [5]:
# Import modules
import pandas as pd
from pathlib import Path
import matplotlib.pyplot as plt

## Loading the Stocks Data

  - Each CSV file contains a stock's closing price and the date of the closing price.

  - Create a `Path` object for each CSV filepath.

In [39]:
# Set paths to CSV files
hd_csv_path = Path("../Resources/HD.csv")
intc_csv_path = Path("../Resources/INTC.csv")
mu_csv_path = Path("../Resources/MU.csv")
nvda_csv_path = Path("../Resources/NVDA.csv")
tsla_csv_path = Path("../Resources/TSLA.csv")
sp500_csv_path = Path("../Resources/sp500.csv")

For each CSV file read the data into a `pandas` `DataFrame`.

  - Set the index column to be the date.

  - Infer the date time format.

  - Parse all dates when the CSV file is loaded.

In [40]:
# Read in CSV files

hd_df = pd.read_csv(
    hd_csv_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)

intc_df = pd.read_csv(
    intc_csv_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)
mu_df = pd.read_csv(
    mu_csv_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)
nvda_df = pd.read_csv(
    nvda_csv_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)
tsla_df = pd.read_csv(
    tsla_csv_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)
sp500_df = pd.read_csv(
    sp500_csv_path, 
    index_col="date", 
    infer_datetime_format=True, 
    parse_dates=True
)


## Coding Statistical Measures in Python

### Create a function named `calculate_mean`

   - Calculate mean should return the average value for a given `list` or `Series`.

   - $\mu = \frac{\sum{x_{i}}}{n}$

   - Choose a function name that will not conflict with any modules that may have been imported.

In [41]:
# Create a function named 'calculate_mean'.

def calculate_mean(data):
    #Convert all strings in returns list to floats
    #returns = [float(x) for x in returns]
    df = pd.DataFrame({'data' : data})
    return df.mean()

In [42]:
# Test the `calculate_mean` function
data = [1, 2, 3, 4, 5]
data_average = calculate_mean(data)
print(data_average)

data    3.0
dtype: float64


### Create a function named `calculate_variance`
   - Variance is the squared average change around the mean.

   - ${S}^2 = \frac{\sum{ (x_{i} - \mu })^{2}}{ n - 1}$


In [43]:
# Create a function named 'calculate_variance'.
def calculate_variance(data_set):
    df = pd.DataFrame({'data' : data})
    return df.var()
    

In [44]:
# Test the `calculate_variance` function
data = [1, 2, 3, 4, 5]
data_variance = calculate_variance(data)
print(data_variance)

data    2.5
dtype: float64


### Create a function named `calculate_standard_deviation`

 - The standard deviation is the square root of the variance.

 - $\sigma = \sqrt{S^{2}}$

In [45]:
# Create a function named 'calculate_standard_deviation'.
def calculate_standard_deviation(data_set):
    df = pd.DataFrame ({'data' : data})
    return df.std()    

In [46]:
# Test the `calculate_standard_deviation` function
data = [1, 2, 3, 4, 5]
data_standard_deviation = calculate_standard_deviation(data)
print(data_standard_deviation)

data    1.581139
dtype: float64


## Coding Helper Functions

### Create a function named `check_value`

   - The function should compare the most recent price of the asset to it's mean price.

   - If the most recent price is greater than the mean price the asset is over-valued.

   - If the most recent price is under than the mean price the asset is under-valued.

   - If neither case is true then the most recent price must be at the mean price.

In [47]:
# Create a function to check to most recent price against the mean price to determine if the stock is overvalued.
def check_value(current_price, mean_price):
    if current_price > mean_price: 
        return "The stock is overvalued."
    elif current_price < mean_price:
        return "The stock is undervalued."
    else:
        return "The stock is properly valued."
    

### Create a function named `compare_volatility`
   
   - The function should compare the standard deviation of an assets price change percentage to a market's.

   - If the asset's standard deviation is greater than the market's the stock is more volatile otherwise it's less volatile.

In [48]:
# Create a function to compare the volatility with the underlying market
def compare_volatility(stock_std, market_std):
    if stock_std > market_std:
        return "The stock is more volatile."
    elif stock_std < market_std:
        return "The stock is less volatile."
    else:
        return "The stock volitility is on par with the market."
    

## Coding the Stocks Report

### Calculate the Daily Percent Change for the SP500

In [49]:
# Calculate the daily percent changes for sp500 and drop n/a values
daily_returns = sp500_df.pct_change().dropna()
print(daily_returns)

               close
date                
2014-05-21  0.008116
2014-05-22  0.002362
2014-05-23  0.004248
2014-05-27  0.005988
2014-05-28 -0.001114
...              ...
2019-05-09 -0.003021
2019-05-10  0.003720
2019-05-13 -0.024131
2019-05-14  0.008016
2019-05-15  0.005839

[1255 rows x 1 columns]


## Calculate the Standard Deviation for the SP500

In [50]:
# Calculate the standard deviation for the sp500
sp500_standard_dev = sp500_df.std()
print(sp500_standard_dev)

close    318.992692
dtype: float64


### Create a Python Dictionary of Stocks to Run the Report On
   
   - Map each stock name to it's dataframe

   - Do not include the SP500

   - Example: stocks_to_check = {"stock_name" : stock_df}

In [53]:
# Create a dictionary for all stocks except the sp500
stocks = {
    "HD" : hd_df,
    "INTC" : intc_df,
    "MU" : mu_df,
    "NVDA" : nvda_df,
    "TSLA" : tsla_df
}

### Generate the Report

  - Loop through the dictionary of stocks.

  - **Hint**: Use the `items()` method for dictionaries. You can read more on the [documentation page](https://docs.python.org/3/tutorial/datastructures.html#looping-techniques).

  - For each stock:
    * Calculate the daily percent change.
    * Get the most recent price.
    * Calculate the mean and standard deviation using the functions you created.
    * Print the stock's name.
    * Print the statistics that you calculated.
    * Using `check_value` see if the stock is over or under valued.
    * Using `compare_volatility` check if the stock is more or less volatile than the SP500
    * Plot a box plot of the daily percent change

In [70]:
# Loop through the stocks in the dictionary and compare their performance with the sp500.

for stock_name, dataframe in stocks.items():
    
    # Calculate the daily percent change for each stock
  
    daily_returns = dataframe.pct_change().dropna()
    #print(daily_returns)
    
    # Get most recent price
    most_recent_price = dataframe['close'].iloc[-1]
    #print(most_recent_price)

    # Calculate the mean price percent change
    avg_pct_change = daily_returns.mean()
    #print(avg_pct_change)
    
    # Calculate the standard deviation of the percent change
    std_dev = daily_returns.std()
    #print(std_dev)
    
    # Print the stock name and calculated statistics
    #print(stock_name, dataframe.describe())

    # Using check_value, check if the stock is overvalued or not
    check_value
    sp500_standard_dev

    # Compare the stock's volatility with the market
    

TypeError: check_value() missing 1 required positional argument: 'mean_price'