<a href="https://colab.research.google.com/github/jmohsbeck1/jpmc_mle/blob/Mar.-29/Stock_Predict.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [16]:
# John Mohsbeck
# Stock Predictor for JPMC stock ticker: JPM
# Source Dr. Lee stock predictor

# Import libraries
import yfinance as yf
import pandas as pd

In [17]:
# Fetch JPM stock data function

def fetch_stock_data(tickers=["JPM"], start="2000-01-01", end="2021-12-31"):
    """
    Fetches stock data for the specified tickers and time period using the yfinance library.

    Parameters:
    tickers (list): A list of stock tickers (default is ["JPM"]).
    start (str): The start date for fetching data in the format YYYY-MM-DD (default is "2000-01-01").
    end (str): The end date for fetching data in the format YYYY-MM-DD (default is "2021-12-31").

    Returns:
    data (pd.DataFrame): A pandas DataFrame containing the fetched stock data.
    """
    
    # Combine data for all tickers into a single DataFrame
    data = pd.DataFrame()
    for ticker in tickers:
        stock_data = yf.download(ticker, start=start, end=end)
        stock_data["Ticker"] = ticker
        data = data.append(stock_data, sort=True)

    # Reset the index and return the final DataFrame
    data.reset_index(inplace=True)
    return data

Fetching data and discovering insights

In [18]:
stock_data = fetch_stock_data(tickers=["JPM"], start="2010-01-01", end="2023-3-27")
print(stock_data.head())

[*********************100%***********************]  1 of 1 completed
        Date  Adj Close      Close       High        Low       Open Ticker  \
0 2010-01-04  30.517248  42.849998  42.990002  41.669998  41.790001    JPM   
1 2010-01-05  31.108356  43.680000  43.840000  42.779999  42.790001    JPM   
2 2010-01-06  31.279291  43.919998  44.090000  43.310001  43.450001    JPM   
3 2010-01-07  31.898895  44.790001  45.119999  43.610001  43.790001    JPM   
4 2010-01-08  31.820543  44.680000  44.700001  44.080002  44.369999    JPM   

     Volume  
0  35460500  
1  41208300  
2  27729000  
3  44864700  
4  33110100  


  data = data.append(stock_data, sort=True)


Min stock price: 28.38

In [28]:
min(stock_data.Close)

28.3799991607666

Max stock price: 171.78

In [29]:
max(stock_data.Close)

171.77999877929688

In [31]:
stock_data.loc[stock_data.Close == stock_data.Close.max(), 'Date']

2972   2021-10-22
Name: Date, dtype: datetime64[ns]

Adding a Target Column: Decoding the Market’s Swings

In [19]:
def add_target_column(data, target_col="Target"):
    """
    Adds a target column to the input DataFrame, indicating whether the previous day's closing price was up or down.

    Parameters:
    data (pd.DataFrame): The input DataFrame containing stock data.
    target_col (str): The name of the target column to be added (default is "Target").

    Returns:
    data (pd.DataFrame): The DataFrame with the new target column added.
    """
    
    # Calculate the difference between consecutive closing prices
    data["Price_Diff"] = data["Close"].diff()
    
    # Assign "UP" or "DOWN" based on the sign of the price difference
    data[target_col] = data["Price_Diff"].apply(lambda x: "UP" if x > 0 else "DOWN")
    
    # Drop the first row since it has no previous day to compare to, and the "Price_Diff" column
    data = data.drop(0).drop(columns=["Price_Diff"])
    
    return data

In [20]:
stock_data_with_target = add_target_column(stock_data)
print(stock_data_with_target.head())

        Date  Adj Close      Close       High        Low       Open Ticker  \
1 2010-01-05  31.108356  43.680000  43.840000  42.779999  42.790001    JPM   
2 2010-01-06  31.279291  43.919998  44.090000  43.310001  43.450001    JPM   
3 2010-01-07  31.898895  44.790001  45.119999  43.610001  43.790001    JPM   
4 2010-01-08  31.820543  44.680000  44.700001  44.080002  44.369999    JPM   
5 2010-01-11  31.713722  44.529999  45.189999  44.310001  45.119999    JPM   

     Volume Target  
1  41208300     UP  
2  27729000     UP  
3  44864700     UP  
4  33110100   DOWN  
5  31878700   DOWN  


In [22]:
stock_data_with_target.head

<bound method NDFrame.head of            Date   Adj Close       Close        High         Low        Open  \
1    2010-01-05   31.108356   43.680000   43.840000   42.779999   42.790001   
2    2010-01-06   31.279291   43.919998   44.090000   43.310001   43.450001   
3    2010-01-07   31.898895   44.790001   45.119999   43.610001   43.790001   
4    2010-01-08   31.820543   44.680000   44.700001   44.080002   44.369999   
5    2010-01-11   31.713722   44.529999   45.189999   44.310001   45.119999   
...         ...         ...         ...         ...         ...         ...   
3324 2023-03-20  127.139999  127.139999  129.470001  126.010002  126.989998   
3325 2023-03-21  130.550003  130.550003  131.729996  130.190002  130.589996   
3326 2023-03-22  127.180000  127.180000  130.660004  127.080002  130.559998   
3327 2023-03-23  126.839996  126.839996  129.529999  126.019997  127.900002   
3328 2023-03-24  124.910004  124.910004  125.680000  123.110001  125.629997   

     Ticker    Volume

In [23]:
stock_data_with_target.Target.value_counts()

UP      1682
DOWN    1646
Name: Target, dtype: int64

JPM stock UP: 50.5%

JPM stock Down: 49.4%

In [24]:
stock_data_with_target.Target.value_counts("normalize")

UP      0.505409
DOWN    0.494591
Name: Target, dtype: float64

Encoding and Preprocessing: Gearing Up for Machine Learning

In [25]:
from sklearn.preprocessing import LabelEncoder, StandardScaler

In [26]:
def encode_and_preprocess(data, categorical_cols=None, numerical_cols=None, target_col="Target"):
    """
    Encodes categorical variables and standardizes/normalizes numerical variables in the input DataFrame.

    Parameters:
    data (pd.DataFrame): The input DataFrame containing stock data.
    categorical_cols (list): A list of categorical column names to be encoded (default is None).
    numerical_cols (list): A list of numerical column names to be standardized/normalized (default is None).
    target_col (str): The name of the target column (default is "Target").

    Returns:
    data (pd.DataFrame): The preprocessed DataFrame.
    """
    
    # Make a copy of the input DataFrame to avoid modifying the original data
    data = data.copy()
    
    # Encode categorical columns
    if categorical_cols:
        le = LabelEncoder()
        for col in categorical_cols:
            data[col] = le.fit_transform(data[col])
    
    # Standardize/normalize numerical columns
    if numerical_cols:
        scaler = StandardScaler()
        data[numerical_cols] = scaler.fit_transform(data[numerical_cols])
    
    # Ensure the target column is the last column in the DataFrame
    data = data[[col for col in data.columns if col != target_col] + [target_col]]
    
    return data

In [27]:
preprocessed_data = encode_and_preprocess(stock_data_with_target, categorical_cols=["Ticker"], numerical_cols=["Open", "High", "Low", "Close", "Adj Close", "Volume"])
print(preprocessed_data.head())

        Date  Adj Close     Close      High       Low      Open  Ticker  \
1 2010-01-05  -1.045125 -1.032403 -1.039541 -1.044060 -1.055611       0   
2 2010-01-06  -1.040806 -1.026088 -1.033023 -1.029988 -1.038248       0   
3 2010-01-07  -1.025149 -1.003196 -1.006173 -1.022023 -1.029303       0   
4 2010-01-08  -1.027129 -1.006090 -1.017121 -1.009544 -1.014045       0   
5 2010-01-11  -1.029828 -1.010037 -1.004348 -1.003438 -0.994314       0   

     Volume Target  
1  1.479616     UP  
2  0.512753     UP  
3  1.741887     UP  
4  0.898736   DOWN  
5  0.810408   DOWN  


Algorithm Harness
