## Quantitative analysis using pynance and TaLib

#### Load yfinance dataset

In [2]:
import sys, os
import pandas as pd
# Add the 'scripts' directory to the Python path
sys.path.append(os.path.abspath(os.path.join('..', 'scripts')))

from data_processing import load_data
# Step 1: Unzip the file
zip_file_path = '../Data'  # Replace with your .zip file path
extracted_folder_path = zip_file_path+'/yfinance_data'  # Replace with your desired extract path

# extract multiple datasets  and concatenate in one dataset for analysis
filenames = ['AAPL_historical_data.csv',
             'AMZN_historical_data.csv',
             'GOOG_historical_data.csv',
             'META_historical_data.csv',
             'MSFT_historical_data.csv',
             'NVDA_historical_data.csv',
             'TSLA_historical_data.csv']
# Initialize an empty list to hold individual DataFrames
dataframes = []
# Loop through the filenames, load and process each, then append to the list
for filename in filenames:
    # Load the CSV file
    data = load_data(zip_file_path+'/yfinance_data.zip', 
                     'yfinance_data/'+filename)
    
    stock = filename.split('_')[0]
    data['stock'] = stock
    
    dataframes.append(data)  # Append the processed DataFrame to the list

                 
# Merge all DataFrames into one
df = pd.concat(dataframes)

# Display the merged data
df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume,Dividends,Stock Splits,stock
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
1980-12-12,0.128348,0.128906,0.128348,0.128348,0.098943,469033600,0.0,0.0,AAPL
1980-12-15,0.12221,0.12221,0.121652,0.121652,0.093781,175884800,0.0,0.0,AAPL
1980-12-16,0.113281,0.113281,0.112723,0.112723,0.086898,105728000,0.0,0.0,AAPL
1980-12-17,0.115513,0.116071,0.115513,0.115513,0.089049,86441600,0.0,0.0,AAPL
1980-12-18,0.118862,0.11942,0.118862,0.118862,0.09163,73449600,0.0,0.0,AAPL


In [3]:
# check the side of the dataset
df.shape

(45428, 9)

In [4]:
# check missing values
df.isnull().sum()

Open            0
High            0
Low             0
Close           0
Adj Close       0
Volume          0
Dividends       0
Stock Splits    0
stock           0
dtype: int64

In [5]:
# check the data types
df.dtypes

Open            float64
High            float64
Low             float64
Close           float64
Adj Close       float64
Volume            int64
Dividends       float64
Stock Splits    float64
stock            object
dtype: object

In [6]:
# check duplicates

df.duplicated().sum()

0

## Apply Analysis Indicators with TA-Lib
Here we can use `TA-Lib` to calculate various technical indicators such as moving averages, RSI (Relative Strength Index), and MACD (Moving Average Convergence Divergence)


In [1]:
# Import ta-lib library
import talib


Applying Technical Indicators

In [7]:

# Moving Averages
df['SMA_20'] = talib.SMA(df['Close'], timeperiod=20)
df['EMA_20'] = talib.EMA(df['Close'], timeperiod=20)

In [9]:
# Relative Strength Index (RSI)
df['RSI_14'] = talib.RSI(df['Close'], timeperiod=14)

In [10]:
# Moving Average Convergence Divergence (MACD)
df['MACD'], df['MACD_Signal'], df['MACD_Hist'] = talib.MACD(df['Close'], 
                                                            fastperiod=12, 
                                                            slowperiod=26, 
                                                            signalperiod=9)

In [12]:
df.isnull().sum()

Open             0
High             0
Low              0
Close            0
Adj Close        0
Volume           0
Dividends        0
Stock Splits     0
stock            0
SMA_20          19
EMA_20          19
RSI_14          14
MACD            33
MACD_Signal     33
MACD_Hist       33
dtype: int64

Seeing `NaN` values in the first few rows of the indicators is expected due to the look-back period required for calculation. We can choose to drop those rows, fill them, or handle them according to our analysis needs.