<h1 style="text-align: center">  Exploratory data analysis (EDA)  on five companies: Apple, Miscrosoft, IBM, Amazon, Tesla </h1>
<h2 style="text-align: center"> Written by: Manzoor Hussain </h2>

## Importing Libraries

In [240]:
import pandas as pd
import numpy as np
import datetime  as dt
import matplotlib.pyplot as plt

## Reading Historical Data

In [241]:
aapl_data = pd.read_csv('Nasdaq_data/HistoricalData_AAPL.csv')
amzn_data = pd.read_csv('Nasdaq_data/HistoricalData_AMZN.csv')
ibm_data = pd.read_csv('Nasdaq_data/HistoricalData_IBM.csv')
msft_data = pd.read_csv('Nasdaq_data/HistoricalData_MSFT.csv')
tsla_data = pd.read_csv('Nasdaq_data/HistoricalData_TSLA.csv')


<h1 style="text-align: center;color:red">  Apple Data </h1>

In [242]:
aapl_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$146.92,53477870,$145.66,$147.4701,$145.56
1,09/23/2021,$146.83,64838170,$146.65,$147.08,$145.64
2,09/22/2021,$145.85,76404340,$144.45,$146.43,$143.7001
3,09/21/2021,$143.43,75833960,$143.93,$144.6,$142.78
4,09/20/2021,$142.94,123478900,$143.8,$144.84,$141.27


<h1 style="text-align: center;color:blue">  Apple Data Cleaning and Preprocessing</h1>

In [243]:
aapl_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$146.92,53477870,$145.66,$147.4701,$145.56
1,09/23/2021,$146.83,64838170,$146.65,$147.08,$145.64
2,09/22/2021,$145.85,76404340,$144.45,$146.43,$143.7001
3,09/21/2021,$143.43,75833960,$143.93,$144.6,$142.78
4,09/20/2021,$142.94,123478900,$143.8,$144.84,$141.27


In [244]:
 aapl_data.dtypes # checking data types

Date          object
Close/Last    object
Volume         int64
Open          object
High          object
Low           object
dtype: object

We need to remove $(dollar sign) and convert columns into numerical


In [245]:
def remove_dollar(lst):
    new_list = []
    for i in lst:
        i = i.translate(str.maketrans('','','$'))
        new_list.append(float(i))
    return new_list

### Defining lists for every column

In [246]:
aapl_low_list = []
aapl_closeLast_list = []
aapl_volume_list = []
aapl_open_list = []
aapl_high_list = []

aapl_month_list = []
aapl_day_list = []
aapl_year_list = []

# with aapl_data columns
aapl_low_list = list(aapl_data['Low'])
aapl_date_list = list(aapl_data['Date'])
aapl_closeLast_list = list(aapl_data['Close/Last'])
aapl_volume_list = list(aapl_data['Volume'])
aapl_open_list = list(aapl_data['Open'])
aapl_high_list = list(aapl_data['Low'])


aapl_month_list = list(aapl_data['Date'].map(lambda x: pd.to_datetime(x).month))
aapl_day_list = list(aapl_data['Date'].map(lambda x: pd.to_datetime(x).day))
aapl_year_list = list(aapl_data['Date'].map(lambda x: pd.to_datetime(x).year))

#### Removing dollar sign

In [247]:
# calling remove dollar function
aapl_low_list = remove_dollar(aapl_low_list)
aapl_closeLast_list = remove_dollar(aapl_closeLast_list)
aapl_open_list = remove_dollar(aapl_open_list)
aapl_high_list = remove_dollar(aapl_high_list)

## Creating Clean DataFrame for Apple from lists

In [248]:
 aapl_df = pd.DataFrame(list(zip( aapl_day_list,aapl_month_list,aapl_year_list,aapl_closeLast_list,aapl_volume_list,aapl_open_list,aapl_high_list,aapl_low_list)),
               columns =['Day','Month','Year', 'Close/Last','Volume','Open','High','Low'])

#### An overview of apple dataset

In [249]:
aapl_df.head()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
0,24,9,2021,146.92,53477870,145.66,145.56,145.56
1,23,9,2021,146.83,64838170,146.65,145.64,145.64
2,22,9,2021,145.85,76404340,144.45,143.7001,143.7001
3,21,9,2021,143.43,75833960,143.93,142.78,142.78
4,20,9,2021,142.94,123478900,143.8,141.27,141.27


#### A statictical view for numerical columns

In [250]:
aapl_df.describe()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
count,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0
mean,15.968254,6.357143,2021.0,137.912183,81958460.0,137.784643,136.649813,136.649813
std,8.728745,1.736499,0.0,9.971587,21504220.0,10.043779,9.864231,9.864231
min,1.0,3.0,2021.0,119.9,46397670.0,120.11,118.86,118.86
25%,9.0,5.0,2021.0,127.95,66238120.0,128.155,126.875,126.875
50%,16.0,6.0,2021.0,134.81,78442560.0,135.015,133.875,133.875
75%,23.0,8.0,2021.0,146.8225,94011020.0,146.8975,145.7975,145.7975
max,31.0,9.0,2021.0,156.69,151101000.0,156.98,154.39,154.39


#### Checking if any column has null value

In [251]:
aapl_df.isnull().sum()

Day           0
Month         0
Year          0
Close/Last    0
Volume        0
Open          0
High          0
Low           0
dtype: int64

<h1 style="text-align: center;color:blue">  Useful Insights from Apple Stock market</h1>

## Monthly average of Volume, Open, Low, High, and Close/Last values

In [252]:
aapl_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').mean()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,121.146667,94938310.0,121.136667,120.246633,120.246633
4,131.812619,89997940.0,131.672381,130.464767,130.464767
5,126.784,85590350.0,127.03475,125.82375,125.82375
6,129.958636,73026820.0,129.489545,128.761509,128.761509
7,145.139524,91387050.0,144.569762,143.5382,143.5382
8,148.177727,66513730.0,147.850682,146.865455,146.865455
9,149.577647,85362740.0,150.246471,148.400359,148.400359


## Monthly minumum of Volume, Open, Low, High, and Close/Last values

In [253]:
aapl_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').min()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,119.9,80819200,120.11,118.86,118.86
4,123.0,66015800,123.66,122.49,122.49
5,122.77,56575920,123.16,122.25,122.25
6,123.54,53522370,124.07,123.13,123.13
7,137.27,52485780,136.6,135.76,135.76
8,145.52,46397670,145.03,144.5,144.5
9,142.94,53477870,143.8,141.27,141.27


#### It shows that when the stock price has lowest price , volume increases. It means that day trading increased. Volume is dependent on the price of low column.

## Monthly maximum of Volume, Open, Low, High, and Close/Last values

In [254]:
aapl_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').max()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,122.15,118323800,121.65,121.15,121.15
4,134.84,151101000,136.47,134.11,134.11
5,132.54,137564700,132.04,131.83,131.83
6,136.96,108953300,136.17,135.87,135.87
7,149.15,127050800,149.24,147.7,147.7
8,153.12,103558800,152.66,151.29,151.29
9,156.69,140893200,156.98,154.39,154.39


<h1 style="text-align: center;color:red">  Amazon Stock  Data </h1>

<h1 style="text-align: center;color:blue">  Amazon Data Cleaning and Preprocessing</h1>

In [255]:
amzn_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$3425.52,2116241,$3402.01,$3429.26,$3393.4
1,09/23/2021,$3416,2379414,$3380.05,$3428.96,$3380.05
2,09/22/2021,$3380.05,2411403,$3351,$3389,$3341.05
3,09/21/2021,$3343.63,2780878,$3375,$3379.7,$3332.39
4,09/20/2021,$3355.73,4669130,$3396,$3419,$3305.01


In [256]:
 amzn_data.dtypes # checking data types

Date          object
Close/Last    object
Volume         int64
Open          object
High          object
Low           object
dtype: object

We need to remove $(dollar sign) and convert columns into numerical


### Defining lists for every column

In [257]:
amzn_low_list = []
amzn_closeLast_list = []
amzn_volume_list = []
amzn_open_list = []
amzn_high_list = []

amzn_month_list = []
amzn_day_list = []
amzn_year_list = []

# with amzn_data columns
amzn_low_list = list(amzn_data['Low'])
amzn_date_list = list(amzn_data['Date'])
amzn_closeLast_list = list(amzn_data['Close/Last'])
amzn_volume_list = list(amzn_data['Volume'])
amzn_open_list = list(amzn_data['Open'])
amzn_high_list = list(amzn_data['Low'])


amzn_month_list = list(amzn_data['Date'].map(lambda x: pd.to_datetime(x).month))
amzn_day_list = list(amzn_data['Date'].map(lambda x: pd.to_datetime(x).day))
amzn_year_list = list(amzn_data['Date'].map(lambda x: pd.to_datetime(x).year))





#### Removing dollar sign

In [258]:
# calling remove dollar function
amzn_low_list = remove_dollar(amzn_low_list)
amzn_closeLast_list = remove_dollar(amzn_closeLast_list)
amzn_open_list = remove_dollar(amzn_open_list)
amzn_high_list = remove_dollar(amzn_high_list)

## Creating Clean DataFrame for Amazon from lists

In [259]:
 amzn_df = pd.DataFrame(list(zip( amzn_day_list,amzn_month_list,amzn_year_list,amzn_closeLast_list,amzn_volume_list,amzn_open_list,amzn_high_list,amzn_low_list)),
               columns =['Day','Month','Year', 'Close/Last','Volume','Open','High','Low'])

#### An overview of amazon dataset

In [260]:
amzn_df.head()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
0,24,9,2021,3425.52,2116241,3402.01,3393.4,3393.4
1,23,9,2021,3416.0,2379414,3380.05,3380.05,3380.05
2,22,9,2021,3380.05,2411403,3351.0,3341.05,3341.05
3,21,9,2021,3343.63,2780878,3375.0,3332.39,3332.39
4,20,9,2021,3355.73,4669130,3396.0,3305.01,3305.01


#### A statictical view for numerical columns

In [261]:
amzn_df.describe()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
count,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0
mean,15.968254,6.357143,2021.0,3381.898095,3355987.0,3381.396675,3351.344528,3351.344528
std,8.728745,1.736499,0.0,149.284062,1251255.0,151.039057,145.894676,145.894676
min,1.0,3.0,2021.0,3055.29,1680306.0,3055.439,3028.445,3028.445
25%,9.0,5.0,2021.0,3270.4275,2476100.0,3275.31825,3234.9875,3234.9875
50%,16.0,6.0,2021.0,3369.125,3119550.0,3364.2125,3332.92,3332.92
75%,23.0,8.0,2021.0,3470.38,3784162.0,3477.605,3437.5128,3437.5128
max,31.0,9.0,2021.0,3731.41,9965593.0,3744.0,3696.7929,3696.7929


#### Checking if any column has null value

In [262]:
amzn_df.isnull().sum()

Day           0
Month         0
Year          0
Close/Last    0
Volume        0
Open          0
High          0
Low           0
dtype: int64

<h1 style="text-align: center;color:blue">  Useful Insights from Amazon Stock market</h1>

## Monthly average of Volume, Open, Low, High, and Close/Last values

In [263]:
amzn_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').mean()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,3075.033333,2725822.0,3063.169667,3041.648333,3041.648333
4,3352.174286,3659218.0,3347.732524,3320.005529,3320.005529
5,3246.26,3757665.0,3261.30826,3221.329545,3221.329545
6,3367.725455,3045961.0,3360.013409,3339.959995,3339.959995
7,3616.00619,3987445.0,3612.711048,3575.027781,3575.027781
8,3312.917727,2857590.0,3310.760682,3283.299818,3283.299818
9,3450.76,2886210.0,3453.762459,3424.144659,3424.144659


## Monthly minumum of Volume, Open, Low, High, and Close/Last values

In [264]:
amzn_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').min()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,3055.29,2337603,3055.439,3028.445,3028.445
4,3161.0,2211166,3117.94,3115.55,3115.55
5,3151.94,2331509,3136.28,3127.37,3127.37
6,3187.01,2014524,3197.33,3172.2,3172.2
7,3327.59,2037053,3347.95,3306.98,3306.98
8,3187.75,1680306,3194.02,3175.76,3175.76
9,3343.63,1936897,3351.0,3305.01,3305.01


#### It shows that when the stock price has lowest price , volume increases. It means that day trading increased. Volume is dependent on the price of low column.

## Monthly maximum of Volume, Open, Low, High, and Close/Last values

In [265]:
amzn_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').max()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,3094.08,3093896,3070.01,3062.5,3062.5
4,3471.31,7682381,3525.12,3462.5,3462.5
5,3386.49,5875530,3484.73,3372.7012,3372.7012
6,3505.44,5247737,3507.64,3483.2,3483.2
7,3731.41,9965593,3744.0,3696.7929,3696.7929
8,3470.79,4356413,3424.8,3395.59,3395.59
9,3525.5,4669130,3526.02,3495.67,3495.67


<br>

<h1 style="text-align: center;color:red">  IBM Stock  Data </h1>


In [266]:
ibm_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$137.49,2964397,$137.03,$138.48,$136.75
1,09/23/2021,$136.73,3013238,$135.25,$137.42,$135.03
2,09/22/2021,$134.63,3602416,$133.72,$135.37,$133.47
3,09/21/2021,$132.97,4074528,$135.11,$135.65,$132.94
4,09/20/2021,$134.31,4770651,$133.9,$135.18,$132.78


<h1 style="text-align: center;color:blue">  IBM Data Cleaning and Preprocessing</h1>

In [267]:
ibm_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$137.49,2964397,$137.03,$138.48,$136.75
1,09/23/2021,$136.73,3013238,$135.25,$137.42,$135.03
2,09/22/2021,$134.63,3602416,$133.72,$135.37,$133.47
3,09/21/2021,$132.97,4074528,$135.11,$135.65,$132.94
4,09/20/2021,$134.31,4770651,$133.9,$135.18,$132.78


In [268]:
 ibm_data.dtypes # checking data types

Date          object
Close/Last    object
Volume         int64
Open          object
High          object
Low           object
dtype: object

We need to remove $(dollar sign) and convert columns into numerical

### Defining lists for every column

In [269]:
ibm_low_list = []
ibm_closeLast_list = []
ibm_volume_list = []
ibm_open_list = []
ibm_high_list = []

ibm_month_list = []
ibm_day_list = []
ibm_year_list = []

# with ibm_data columns
ibm_low_list = list(ibm_data['Low'])
ibm_date_list = list(ibm_data['Date'])
ibm_closeLast_list = list(ibm_data['Close/Last'])
ibm_volume_list = list(ibm_data['Volume'])
ibm_open_list = list(ibm_data['Open'])
ibm_high_list = list(ibm_data['Low'])


ibm_month_list = list(ibm_data['Date'].map(lambda x: pd.to_datetime(x).month))
ibm_day_list = list(ibm_data['Date'].map(lambda x: pd.to_datetime(x).day))
ibm_year_list = list(ibm_data['Date'].map(lambda x: pd.to_datetime(x).year))

#### Removing dollar sign

In [270]:
# calling remove dollar function
ibm_low_list = remove_dollar(ibm_low_list)
ibm_closeLast_list = remove_dollar(ibm_closeLast_list)
ibm_open_list = remove_dollar(ibm_open_list)
ibm_high_list = remove_dollar(ibm_high_list)

## Creating Clean DataFrame for IBM from lists

In [271]:
 ibm_df = pd.DataFrame(list(zip( ibm_day_list,ibm_month_list,ibm_year_list,ibm_closeLast_list,ibm_volume_list,ibm_open_list,ibm_high_list,ibm_low_list)),
               columns =['Day','Month','Year', 'Close/Last','Volume','Open','High','Low'])

#### An overview of IBM dataset

In [272]:
ibm_df.head()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
0,24,9,2021,137.49,2964397,137.03,136.75,136.75
1,23,9,2021,136.73,3013238,135.25,135.03,135.03
2,22,9,2021,134.63,3602416,133.72,133.47,133.47
3,21,9,2021,132.97,4074528,135.11,132.94,132.94
4,20,9,2021,134.31,4770651,133.9,132.78,132.78


#### A statictical view for numerical columns

In [273]:
ibm_df.describe()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
count,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0
mean,15.968254,6.357143,2021.0,141.347302,4429673.0,141.401151,140.375849,140.375849
std,8.728745,1.736499,0.0,4.423316,2393538.0,4.392897,4.364116,4.364116
min,1.0,3.0,2021.0,131.18,1910443.0,131.305,130.38,130.38
25%,9.0,5.0,2021.0,138.6975,3048210.0,138.4725,137.56625,137.56625
50%,16.0,6.0,2021.0,141.47,3886453.0,141.68,140.8805,140.8805
75%,23.0,8.0,2021.0,144.235,4767610.0,144.47,143.61,143.61
max,31.0,9.0,2021.0,151.28,16828160.0,151.47,150.37,150.37


#### Checking if any column has null value

In [274]:
ibm_df.isnull().sum()

Day           0
Month         0
Year          0
Close/Last    0
Volume        0
Open          0
High          0
Low           0
dtype: int64

<h1 style="text-align: center;color:blue">  Useful Insights from IBM Stock market</h1>

## Monthly average of Volume, Open, Low, High, and Close/Last values

In [275]:
ibm_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').mean()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,134.613333,4786115.0,135.46,134.08,134.08
4,137.379524,5853357.0,137.165,136.149048,136.149048
5,144.521,4832958.0,144.349,143.47621,143.47621
6,147.078636,3838590.0,147.050909,146.046914,146.046914
7,140.929048,5268529.0,141.33619,139.929176,139.929176
8,141.161364,3167661.0,141.12,140.328709,140.328709
9,137.043529,3495536.0,137.347059,136.334494,136.334494


## Monthly minumum of Volume, Open, Low, High, and Close/Last values

In [276]:
ibm_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').min()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,133.26,4622664,134.54,132.71,132.71
4,131.18,2976136,131.305,130.38,130.38
5,141.3,2534811,141.45,140.92,140.92
6,143.12,2417455,144.11,143.04,143.04
7,137.92,2544099,136.45,136.2089,136.2089
8,138.02,1910443,137.74,137.21,137.21
9,132.97,1924215,133.72,132.78,132.78


#### It shows that when the stock price has lowest price , volume increases. It means that day trading increased. Volume is dependent on the price of low column.

## Monthly maximum of Volume, Open, Low, High, and Close/Last values

In [277]:
ibm_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').max()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,135.86,4945315,135.98,135.51,135.51
4,144.24,15480580,144.13,142.98,142.98
5,148.42,7503487,145.94,145.8,145.8
6,151.28,9156505,151.47,150.37,150.37
7,146.84,16828160,146.96,146.57,146.57
8,144.09,5299869,143.8,142.89,142.89
9,140.01,5633480,139.98,139.3,139.3


<br><br>

<h1 style="text-align: center;color:red"> Microsoft Stock Market Data </h1>

In [278]:
msft_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$299.35,14998980,$298.23,$299.8,$296.93
1,09/23/2021,$299.56,18604600,$298.845,$300.9,$297.5339
2,09/22/2021,$298.58,26626340,$296.725,$300.22,$294.51
3,09/21/2021,$294.8,22364100,$295.69,$297.54,$294.07
4,09/20/2021,$294.3,38278660,$296.33,$298.72,$289.52


<h1 style="text-align: center;color:blue">  Microsoft Data Cleaning and Preprocessing</h1>

In [279]:
msft_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$299.35,14998980,$298.23,$299.8,$296.93
1,09/23/2021,$299.56,18604600,$298.845,$300.9,$297.5339
2,09/22/2021,$298.58,26626340,$296.725,$300.22,$294.51
3,09/21/2021,$294.8,22364100,$295.69,$297.54,$294.07
4,09/20/2021,$294.3,38278660,$296.33,$298.72,$289.52


In [280]:
msft_data.dtypes # checking data types

Date          object
Close/Last    object
Volume         int64
Open          object
High          object
Low           object
dtype: object

We need to remove $(dollar sign) and convert columns into numerical

### Defining lists for every column

In [281]:
msft_low_list = []
msft_closeLast_list = []
msft_volume_list = []
msft_open_list = []
msft_high_list = []

msft_month_list = []
msft_day_list = []
msft_year_list = []

# with msft_data columns
msft_low_list = list(msft_data['Low'])
msft_date_list = list(msft_data['Date'])
msft_closeLast_list = list(msft_data['Close/Last'])
msft_volume_list = list(msft_data['Volume'])
msft_open_list = list(msft_data['Open'])
msft_high_list = list(msft_data['Low'])


msft_month_list = list(msft_data['Date'].map(lambda x: pd.to_datetime(x).month))
msft_day_list = list(msft_data['Date'].map(lambda x: pd.to_datetime(x).day))
msft_year_list = list(msft_data['Date'].map(lambda x: pd.to_datetime(x).year))

#### Removing dollar sign

In [282]:
# calling remove dollar function
msft_low_list = remove_dollar(msft_low_list)
msft_closeLast_list = remove_dollar(msft_closeLast_list)
msft_open_list = remove_dollar(msft_open_list)
msft_high_list = remove_dollar(msft_high_list)

## Creating Clean DataFrame for Microsoft from lists

In [283]:
msft_df = pd.DataFrame(list(zip( msft_day_list,msft_month_list,msft_year_list,msft_closeLast_list,msft_volume_list,msft_open_list,msft_high_list,msft_low_list)),
               columns =['Day','Month','Year', 'Close/Last','Volume','Open','High','Low'])

#### An overview of Microsoft dataset

In [284]:
msft_df.head()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
0,24,9,2021,299.35,14998980,298.23,296.93,296.93
1,23,9,2021,299.56,18604600,298.845,297.5339,297.5339
2,22,9,2021,298.58,26626340,296.725,294.51,294.51
3,21,9,2021,294.8,22364100,295.69,294.07,294.07
4,20,9,2021,294.3,38278660,296.33,289.52,289.52


#### A statictical view for numerical columns

In [285]:
msft_df.describe()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
count,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0
mean,15.968254,6.357143,2021.0,271.3775,23868750.0,271.028111,269.176551,269.176551
std,8.728745,1.736499,0.0,20.960726,6418192.0,21.090994,20.894985,20.894985
min,1.0,3.0,2021.0,231.85,13900170.0,232.91,231.1,231.1
25%,9.0,5.0,2021.0,252.4725,19526720.0,252.305,250.895,250.895
50%,16.0,6.0,2021.0,267.705,23075060.0,266.2075,265.69,265.69
75%,23.0,8.0,2021.0,289.6325,26241520.0,288.9975,287.37835,287.37835
max,31.0,9.0,2021.0,305.22,46903120.0,305.02,302.0035,302.0035


#### Checking if any column has null value

In [286]:
msft_df.isnull().sum()

Day           0
Month         0
Year          0
Close/Last    0
Volume        0
Open          0
High          0
Low           0
dtype: int64

<h1 style="text-align: center;color:blue">  Useful Insights from Microsoft Stock market</h1>

## Monthly average of Volume, Open, Low, High, and Close/Last values

In [287]:
msft_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').mean()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,234.286667,31214310.0,234.341667,231.79,231.79
4,255.58,27080870.0,254.69981,253.231548,253.231548
5,247.3955,24752880.0,247.6897,245.737,245.737
6,259.018409,23116920.0,258.317727,256.875195,256.875195
7,281.502381,24889960.0,281.022667,278.994395,278.994395
8,294.314091,20060290.0,293.607545,291.877605,291.877605
9,299.455882,22204460.0,300.011471,297.460529,297.460529


## Monthly minumum of Volume, Open, Low, High, and Close/Last values

In [288]:
msft_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').min()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,231.85,24792010,232.91,231.1,231.1
4,242.35,19722880,238.47,238.0501,238.0501
5,239.0,17704300,239.31,238.07,238.07
6,245.71,17937630,245.22,243.0,243.0
7,271.6,16725320,269.61,269.6,269.6
8,284.82,13900170,285.42,283.74,283.74
9,294.3,14751610,295.69,289.52,289.52


#### It shows that when the stock price has lowest price , volume increases. It means that day trading increased. Volume is dependent on the price of low column.

In [289]:
msft_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').max()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,235.77,43623470,236.59,232.39,232.39
4,261.97,46903120,261.66,260.17,260.17
5,252.46,36684370,253.4,251.17,251.17
6,271.4,37202220,270.69,269.6043,269.6043
7,289.67,33604070,289.43,286.642,286.642
8,304.65,40817650,305.02,302.0035,302.0035
9,305.22,41372460,304.17,301.82,301.82


<br> <br>

<h1 style="text-align: center;color:red">  Tesla Stock Market Data </h1>

In [290]:
tsla_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$774.39,21373020,$745.89,$774.8,$744.56
1,09/23/2021,$753.64,11947530,$755,$758.2,$747.92
2,09/22/2021,$751.94,15126270,$743.5263,$753.6699,$739.12
3,09/21/2021,$739.38,16330720,$734.79,$744.7399,$730.44
4,09/20/2021,$730.17,24757650,$734.5577,$742,$718.6249


<h1 style="text-align: center;color:blue">  Tesla Data Cleaning and Preprocessing</h1>

In [291]:
tsla_data.head()

Unnamed: 0,Date,Close/Last,Volume,Open,High,Low
0,09/24/2021,$774.39,21373020,$745.89,$774.8,$744.56
1,09/23/2021,$753.64,11947530,$755,$758.2,$747.92
2,09/22/2021,$751.94,15126270,$743.5263,$753.6699,$739.12
3,09/21/2021,$739.38,16330720,$734.79,$744.7399,$730.44
4,09/20/2021,$730.17,24757650,$734.5577,$742,$718.6249


In [292]:
tsla_data.dtypes # checking data types

Date          object
Close/Last    object
Volume         int64
Open          object
High          object
Low           object
dtype: object

We need to remove $(dollar sign) and convert columns into numerical


### Defining lists for every column

In [293]:
tsla_low_list = []
tsla_closeLast_list = []
tsla_volume_list = []
tsla_open_list = []
tsla_high_list = []

tsla_month_list = []
tsla_day_list = []
tsla_year_list = []

# with tsla_data columns
tsla_low_list = list(tsla_data['Low'])
tsla_date_list = list(tsla_data['Date'])
tsla_closeLast_list = list(tsla_data['Close/Last'])
tsla_volume_list = list(tsla_data['Volume'])
tsla_open_list = list(tsla_data['Open'])
tsla_high_list = list(tsla_data['Low'])


tsla_month_list = list(tsla_data['Date'].map(lambda x: pd.to_datetime(x).month))
tsla_day_list = list(tsla_data['Date'].map(lambda x: pd.to_datetime(x).day))
tsla_year_list = list(tsla_data['Date'].map(lambda x: pd.to_datetime(x).year))

#### Removing dollar sign

In [294]:
# calling remove dollar function
tsla_low_list = remove_dollar(tsla_low_list)
tsla_closeLast_list = remove_dollar(tsla_closeLast_list)
tsla_open_list = remove_dollar(tsla_open_list)
tsla_high_list = remove_dollar(tsla_high_list)

## Creating Clean DataFrame for Tesla from lists

In [295]:
 tsla_df = pd.DataFrame(list(zip( tsla_day_list,tsla_month_list,tsla_year_list,tsla_closeLast_list,tsla_volume_list,tsla_open_list,tsla_high_list,tsla_low_list)),
               columns =['Day','Month','Year', 'Close/Last','Volume','Open','High','Low'])

#### An overview of Tesla dataset

In [296]:
tsla_df.head()#### An overview of tesla dataset.head()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
0,24,9,2021,774.39,21373020,745.89,744.56,744.56
1,23,9,2021,753.64,11947530,755.0,747.92,747.92
2,22,9,2021,751.94,15126270,743.5263,739.12,739.12
3,21,9,2021,739.38,16330720,734.79,730.44,730.44
4,20,9,2021,730.17,24757650,734.5577,718.6249,718.6249


#### A statictical view for numerical columns

In [297]:
tsla_df.describe()

Unnamed: 0,Day,Month,Year,Close/Last,Volume,Open,High,Low
count,126.0,126.0,126.0,126.0,126.0,126.0,126.0,126.0
mean,15.968254,6.357143,2021.0,674.67127,24232350.0,674.011193,662.674837,662.674837
std,8.728745,1.736499,0.0,52.448037,8535515.0,52.422448,52.663529,52.663529
min,1.0,3.0,2021.0,563.46,9800558.0,552.55,546.98,546.98
25%,9.0,5.0,2021.0,632.0425,17456400.0,628.4025,620.735,620.735
50%,16.0,6.0,2021.0,677.635,22862900.0,679.375,668.1701,668.1701
75%,23.0,8.0,2021.0,714.4125,29361670.0,713.67,704.1575,704.1575
max,31.0,9.0,2021.0,774.39,49017430.0,770.7,751.6301,751.6301


#### Checking if any column has null value

In [298]:
tsla_df.isnull().sum()

Day           0
Month         0
Year          0
Close/Last    0
Volume        0
Open          0
High          0
Low           0
dtype: int64

<h1 style="text-align: center;color:blue">  Useful Insights from Tesla Stock market</h1>

## Monthly average of Volume, Open, Low, High, and Close/Last values

In [299]:
tsla_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').mean()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,638.28,33802210.0,621.336667,609.38,609.38
4,709.618095,32313520.0,709.282857,695.203605,695.203605
5,616.753,31251670.0,618.7184,604.396305,604.396305
6,626.919545,23632810.0,626.255818,617.057732,617.057732
7,659.134762,21356800.0,659.522967,647.039924,647.039924
8,705.243182,17362420.0,704.433409,695.659123,695.659123
9,747.487647,17521450.0,745.114529,736.122265,736.122265


## Monthly minumum of Volume, Open, Low, High, and Close/Last values

In [300]:
tsla_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').min()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,611.29,28636990,601.75,591.01,591.01
4,661.75,21437090,667.59,659.42,659.42
5,563.46,21901890,552.55,546.98,546.98
6,572.84,16205300,579.71,571.22,571.22
7,643.38,13953340,628.37,620.46,620.46
8,665.71,9800558,669.748,648.84,648.84
9,730.17,11947530,732.249,708.85,708.85


#### It shows that when the stock price has lowest price , volume increases. It means that day trading increased. Volume is dependent on the price of low column.

## Monthly maximum of Volume, Open, Low, High, and Close/Last values

In [301]:
tsla_df[['Close/Last','Volume','Open','High','Low','Month']].groupby('Month').max()

Unnamed: 0_level_0,Close/Last,Volume,Open,High,Low
Month,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
3,667.93,39432360,646.62,641.11,641.11
4,762.32,49017430,770.7,732.6053,732.6053
5,684.9,46503900,703.8,680.5,680.5
6,688.72,45982390,689.58,678.14,678.14
7,687.2,32813290,686.32,673.26,673.26
8,735.72,33615770,733.0,726.44,726.44
9,774.39,28204180,761.58,751.6301,751.6301
