### **Table of Contents**
#### <ul><li> [Project Overview](#project-overview) </li></ul>
#### <ul><li> [Requirements](#requirements) </li></ul>
#### <ul><li> [Tools and Libraries](#tools-and-libraries) </li></ul>
#### <ul><li> [Data Source](#data-source) </li></ul>
#### <ul><li> [Data Preparation](#data-preparation) </li></ul>


<a id='project-overview'></a>

### **Project Overview**

##### This project focuses on collecting, analyzing, and visualizing stock data with key financial indicators to support informed decision-making. Using data from the Alpha Vantage API, we retrieve historical stock prices and compute various technical indicators to provide insights into stock performance trends and patterns.
##### Our goal is to create an interactive dashboard that not only visualizes stock data but also integrates indicators like moving averages, RSI (Relative Strength Index), and Bollinger Bands. These indicators help users understand trends, volatility, and momentum, offering a more comprehensive view of stock behavior over time.

<a id='requirements'></a>
### **Requirements**
<ul>
    <h5>
        <li>Python version: 3.12.1</li>
        <li>Required libraries: pandas, numpy, matplotlib, seaborn, plotly, holoviews, requests</li>
        <li>Data sources: <a href="www.alphavantage.co">www.alphavantage.co</a></li>
    </h5>
</ul>

<a id='tools-and-libraries'></a>

### **Tools and Libraries**

##### This project leverages several Python libraries for data manipulation, visualization, and API requests. Below is an overview of each library and its specific use.

<ol>
    <li>
        <h5>
            <strong> Data Manipulation and Analysis </strong> <br>
            Efficient data manipulation and analysis are essential for processing and preparing data for visualization. The following libraries are used: <br>
            <ul>
                <li><strong>pandas</strong>: A powerful library for data manipulation and analysis. It provides data structures like DataFrames, which allow for easy filtering, transformation, and aggregation of data.</li>
                    <ul><li><strong>Usage</strong>: Loading and structuring stock data, handling missing values, and performing calculations for technical indicators (e.g., moving averages).</li></ul>
            </ul>
            <ul>
                <li><strong>numpy</strong>: A library for numerical computing in Python, providing support for large arrays and matrices, along with mathematical functions to operate on these arrays.</li>
                    <ul><li><strong>Usage</strong>: Performing efficient numerical calculations, particularly when working with large datasets or generating values for indicators.</li></ul>
            </ul>
        </h5>
    </li>
    <li>
        <h5>
            <strong> Data Visualization </strong> <br>
            To create clear and insightful visualizations, several libraries are used to generate static, interactive, and customizable charts. <br>
            <ul>
                <li><strong>matplotlib</strong>: The foundational Python library for plotting. It’s widely used for generating basic static plots and serves as the backbone for other visualization libraries.</li>
                    <ul><li><strong>Usage</strong>: Creating simple line plots, bar charts, and basic static visualizations.</li></ul>
            </ul>
            <ul>
                <li><strong>seaborn</strong>: A data visualization library based on <strong><em>matplotlib</em></strong> that provides a high-level interface for drawing attractive and informative statistical graphics.</li>
                    <ul><li><strong>Usage</strong>: Creating more visually appealing charts with additional customization options for aesthetics, themes, and color schemes.</li></ul>
            </ul>
            <ul>
                <li><strong>holoviews</strong>: A high-level library that simplifies the process of creating complex visualizations, especially when working with large datasets or interactive elements.</li>
                    <ul><li><strong>Usage</strong>: Integrating with <strong><em>Bokeh</em></strong> or <strong><em>Plotly</em></strong> to create dynamic plots that are easy to explore interactively.</li></ul>
            </ul>
            <ul>
                <li><strong>plotly</strong>: A library for creating interactive and customizable visualizations, suitable for dashboards and web applications.</li>
                    <ul><li><strong>Usage</strong>: Building interactive charts, such as candlestick and scatter plots, for the dashboard, allowing users to zoom, pan, and hover over data points for detailed insights.</li></ul>
            </ul>
        </h5>
    </li>
    <li>
        <h5>
            <strong> API Requests </strong> <br>
            To retrieve data from external sources, this project uses the <strong><em>requests</em></strong> library. <br>
            <ul>
                <li><strong>requests</strong>: A simple HTTP library for Python, used to send requests to APIs and retrieve data in JSON format.</li>
                    <ul><li><strong>Usage</strong>: Connecting to the Alpha Vantage API to download stock data, which is then parsed and processed for analysis and visualization.</li></ul>
            </ul>
        </h5>
    </li>
</ol>

In [1]:
# Data Manipulation and Analysis
import pandas as pd
import numpy as np

# Data Visualization Libraries
import matplotlib.pyplot as plt
import seaborn as sns
# !pip install plotly 
import plotly.express as px
import plotly.graph_objects as go
# !pip install holoviews
import holoviews as hv
hv.extension('bokeh', 'plotly')

# Time Series Analysis (if needed)
# !pip install statsmodels
# import statsmodels.api as sm
# import statsmodels.tsa.api as tsa

# API Requests (if fetching data from an external API)
import requests
import time  # For rate limiting

# Date and Time Manipulation
from datetime import datetime

# Optional: Suppress warnings for cleaner output
import warnings
warnings.filterwarnings("ignore")


<a id='data-source'></a>

### **Data Source**

#### **Alpha Vantage API**

##### Data for this project is sourced from <a href="https://www.alphavantage.co">Alpha Vantage</a>, a free API that provides real-time and historical data for stocks, forex, and cryptocurrencies.
##### To access Alpha Vantage data:
<ol>
    <h5>
        <li><strong>Register for an API Key</strong>: Visit <a href="https://www.alphavantage.co">Alpha Vantage</a> to obtain a free API key.</li>
        <li><strong>Requesting Data</strong>: Use the API key to query stock data. The Alpha Vantage API endpoint for daily stock data, for example, is:
        <a>https://www.alphavantage.co/query?function=TIME_SERIES_DAILY&symbol={symbol}&apikey={API_KEY}</a></li>
        <li><strong>Data Format</strong>: The data returned is in JSON format, which will be parsed and converted to a DataFrame for further analysis and visualization.</li>
    </h5>
</ol>


<a id='data-preparation'></a>

### **Data Preparation**

##### Data preparation is a crucial step in ensuring that the data is accurate, clean, and suitable for analysis. This project involves several stages of data preparation, including data collection, parsing, cleaning, and transformation. Here’s a detailed breakdown of each step:

#####
<h5>
<ol> 
    <li> <strong> Data Collection </strong> </li>
    Data is retrieved from the <a href="https://www.alphavantage.co">Alpha Vantage API</a> to obtain historical stock data. Using the
    <strong><em>requests</em></strong> library, we fetch data for specific stocks in JSON format, specifying parameters such as:

<ul>
    <li><strong>Stock Symbol</strong>: The unique ticker symbol of each stock.</li>
    <li><strong>Time Series</strong>: Frequency of data (e.g., daily, weekly).</li>
    <li><strong>API Key</strong>: Required for authenticated access.</li>
</ul>

Data collection code in [data_scraping.ipynb](./data_scraping.ipynb)

<li> <strong> Data Parsing </strong> </li>
The JSON response from Alpha Vantage contains nested structures, with date keys and stock price values. This raw data needs to be parsed and converted into an excel sheet for easier manipulation and analysis, which later on can read directly and used. 

Data pulled from API is stored in [stock_data_with_indicator](./stock_data_with_indicators.csv)

In [14]:
#Reading the data in format of csv 
df = pd.read_csv('stock_data_with_indicators.csv')

# First 5 entries from the dataframe
df.head()

Unnamed: 0,symbol,date,open,high,low,close,volume,SMA,EMA,MACD,RSI,BBANDS Upper,BBANDS Lower,ATR,VWAP
0,AAPL,2024-05-28,191.51,193.0,189.1,189.99,52280051,184.4302,184.8461,,62.1548,197.924,170.9364,3.3632,
1,AMZN,2024-05-28,179.93,182.24,179.49,182.15,29926963,,,,,,,,
2,GOOGL,2024-05-28,174.45,177.27,174.365,176.4,20572157,170.8826,170.7276,,64.6056,179.4875,162.2778,3.5863,
3,MSFT,2024-05-28,429.63,430.82,426.6,430.32,15718024,414.6745,419.0863,,60.1202,438.6944,390.6546,7.064,
4,TSLA,2024-05-28,176.4,178.25,173.16,176.75,59736620,177.466,175.774,,50.7812,186.3753,168.5567,7.9397,


In [13]:
# Random 5 entries from the dataframe 

df.sample(5)

Unnamed: 0,symbol,date,open,high,low,close,volume,SMA,EMA,MACD,RSI,BBANDS Upper,BBANDS Lower,ATR,VWAP
250,AAPL,2024-08-08,213.11,214.2,208.83,213.31,47161149,220.5995,216.9801,,48.5966,235.6154,205.5836,6.2827,
299,TSLA,2024-08-21,222.67,224.6594,218.86,223.27,70145964,212.277,215.2626,,53.5514,235.9598,188.5942,11.2629,
228,MSFT,2024-08-01,420.785,427.46,413.0901,417.11,30296400,441.1037,434.1226,,38.6135,474.0694,408.1381,8.8372,
108,MSFT,2024-06-27,452.175,456.17,451.77,452.85,14806324,434.793,438.7968,,67.6638,462.4501,407.1359,6.6096,
281,AMZN,2024-08-16,177.04,178.34,176.2601,177.06,31489175,,,,,,,,


In [12]:
# To see the correlations between columns in dataframe
df.describe()

Unnamed: 0,open,high,low,close,volume,SMA,EMA,MACD,RSI,BBANDS Upper,BBANDS Lower,ATR,VWAP
count,500.0,500.0,500.0,500.0,500.0,400.0,400.0,0.0,400.0,400.0,400.0,400.0,0.0
mean,243.776098,246.511304,240.784521,243.7134,46298360.0,256.482044,256.310489,,54.163487,273.345544,239.618546,6.630114,
std,95.716602,96.0579,94.979736,95.569693,35922230.0,101.745387,101.528052,,9.814425,105.497854,99.153943,2.781138,
min,149.92,151.27,147.215,148.66,9932830.0,157.886,158.1246,,30.5818,167.3591,143.673,3.0735,
25%,180.09875,182.604175,177.3556,179.795,21084950.0,179.188425,178.91075,,47.2941,191.4207,170.26995,4.202325,
50%,208.89,213.92,206.49,209.24,36272790.0,219.46095,219.50545,,53.54715,232.55955,200.35585,6.47035,
75%,242.01,247.66,236.1475,240.88,59977210.0,286.4933,285.151025,,60.67995,321.593325,264.05885,8.228275,
max,467.0,468.35,464.46,467.56,318679900.0,453.6354,451.4491,,80.9931,475.6094,440.2929,13.1012,


In [11]:
# To see details about each column in the dataframe
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 500 entries, 0 to 499
Data columns (total 15 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   symbol        500 non-null    object 
 1   date          500 non-null    object 
 2   open          500 non-null    float64
 3   high          500 non-null    float64
 4   low           500 non-null    float64
 5   close         500 non-null    float64
 6   volume        500 non-null    int64  
 7   SMA           400 non-null    float64
 8   EMA           400 non-null    float64
 9   MACD          0 non-null      float64
 10  RSI           400 non-null    float64
 11  BBANDS Upper  400 non-null    float64
 12  BBANDS Lower  400 non-null    float64
 13  ATR           400 non-null    float64
 14  VWAP          0 non-null      float64
dtypes: float64(12), int64(1), object(2)
memory usage: 58.7+ KB


In [16]:
# Number of null values in each column
df.isnull().sum()

symbol            0
date              0
open              0
high              0
low               0
close             0
volume            0
SMA             100
EMA             100
MACD            500
RSI             100
BBANDS Upper    100
BBANDS Lower    100
ATR             100
VWAP            500
dtype: int64

In [18]:
# Checking for duplicate rows
df.duplicated().sum()

np.int64(0)

<h5>
&nbsp&nbsp&nbsp&nbsp 3.<strong> Data Cleaning </strong><br>
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp Data cleaning ensures that the dataset is free of inconsistencies and ready for analysis. Key steps include:

<ul>
    <li><strong>Handling Missing Values</strong>: Removing or imputing missing data points.</li>
    <li><strong>Removing Duplicates</strong>: Checking for and removing duplicate rows, if any.</li>
    <li><strong>Standardizing Columns</strong>: Renaming columns for consistency and clarity.</li>
</ul>

</ol>
</h5>

In [None]:
columns = df.columns

for i in columns: 
    print(f'{i}: {df[f'{i}'].isna().sum()}')

symbol: 0
date: 0
open: 0
high: 0
low: 0
close: 0
volume: 0
SMA: 100
EMA: 100
MACD: 500
RSI: 100
BBANDS Upper: 100
BBANDS Lower: 100
ATR: 100
VWAP: 500


In [4]:
df['VWAP'].isna().sum()


np.int64(500)