Capstone 1 by Astha Shandilya

In this capstone 1 project, META's historical sock data is cleaned, prepared, and visualized. The analysis is done on the market trends, how it effects the stock price and volume, and how one effects the other. Significant event references/occurences are linked to the data changes. META.csv sourced from https://www.kaggle.com/datasets/vainero/google-apple-facebook-stock-price.

In [None]:
# Libraries import
import pandas as pd
import numpy as np

# Installation for visualization
!pip install plotly
import plotly.express as px



In [None]:
df = pd.read_csv('META.csv')

In [None]:
print(df.head())

# Summary statistics
print(df.info())

print(df.describe())

         Date        Open        High         Low       Close   Adj Close  \
0  2017-09-07  171.940002  173.309998  170.270004  173.210007  173.210007   
1  2017-09-08  173.089996  173.490005  170.800003  170.949997  170.949997   
2  2017-09-11  172.399994  173.889999  172.199997  173.509995  173.509995   
3  2017-09-12  173.759995  174.000000  171.750000  172.960007  172.960007   
4  2017-09-13  173.009995  173.169998  172.059998  173.050003  173.050003   

     Volume  
0  18049500  
1  10998500  
2  12372000  
3  11186300  
4   9119300  
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1258 entries, 0 to 1257
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       1258 non-null   object 
 1   Open       1258 non-null   float64
 2   High       1258 non-null   float64
 3   Low        1258 non-null   float64
 4   Close      1258 non-null   float64
 5   Adj Close  1258 non-null   float64
 6   Volume     1258 non-nu

Data Cleaning

In [None]:
# Checking missing values
print(df.isnull().sum())

df.dropna(inplace=True)

# Remove duplicates
df = df.drop_duplicates()

Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64


Data preparing, and organizing

In [None]:
# Checking data type
print(df.dtypes)

df['Date'] = pd.to_datetime(df['Date'])
print(df.dtypes)

# Sorting by date
df = df.sort_values(by='Date')


Date          object
Open         float64
High         float64
Low          float64
Close        float64
Adj Close    float64
Volume         int64
dtype: object
Date         datetime64[ns]
Open                float64
High                float64
Low                 float64
Close               float64
Adj Close           float64
Volume                int64
dtype: object


In [None]:
# Plot
# creating line plot
fig = px.line(
    df,
    x='Date',
    y='Close',
    title='META Stock Closing Price Over Time',
    labels={'Close': 'Price (USD)', 'Date': 'Date'}
)

# hover information
fig.update_traces(hovertemplate='Date: %{x}<br>Close: $%{y:.2f}')

# Show the plot
fig.show()


Analysis: According to the internet research, in year 2021, overall increasing stock prices resulted from several factors- continuous growing numbers of active users, strong financial revenue report, increase in ad price and number of ads. Also, 2021 was a good year for tech sector. High peaks on Sep 1, 2021 and Sep 7, 2021. The payments card fee income from government stimulus programs and positive trust about the recovery of US economy during that time also effected in overall growth.

In [None]:
# Volume over time Bar Chart
fig = px.bar(
    df,
    x='Date',
    y='Volume',
    title='Volume Traded over Time',
    labels={'Date': 'Date', 'Volume': 'Volume Traded'},
    color_discrete_sequence=['#FF4500']
)

# Hover details
fig.update_traces(
    hovertemplate='Date: %{x}    Volume: %{y:,}'

)

# layout details
fig.update_layout(
    xaxis_title='Date',
    yaxis_title='Volume'
)

# Show the plot
fig.show()


Analysis: Feb 3, 2022 that shows high peak volume (188119900) marks a largest loss in market value for any US firm resulted due to several factors- rising popularity of tiktok and low 4th quarter financial results.

In [None]:
# data for February 2,3, and 4, 2022 to compare
date_compare = ['2022-02-02', '2022-02-03', '2022-02-04']

for date in date_compare:
  df_date_compare = df[df['Date'].isin(pd.to_datetime(date_compare))]

print(df_date_compare.to_string())

print('\nMean of Volume: ', df['Volume'].mean())

           Date        Open        High         Low       Close   Adj Close     Volume
1109 2022-02-02  327.820007  328.000000  316.869995  323.000000  323.000000   58458300
1110 2022-02-03  244.649994  248.000000  235.750000  237.759995  237.759995  188119900
1111 2022-02-04  234.970001  242.610001  230.110001  237.089996  237.089996   89342200

Mean of Volume:  22456187.122416534
