## Stock Market Dataset 

__Overview__

This dataset contains historical daily prices for all tickers currently trading on NASDAQ. The up to date list is available from nasdaqtrader.com. The historic data is retrieved from Yahoo finance via yfinance python package.

It contains prices for up to 20th of February 2024. 

__Data Structure__

The date for every symbol is saved in CSV format with common fields:

- Date - specifies trading date
- Open - opening price
- High - maximum price during the day
- Low - minimum price during the day
- Close - close price adjusted for splits
- Adj Close - adjusted close price adjusted for both dividends and splits.
- Volume - the number of shares that changed hands during a given day

All that ticker data is then stored in either ETFs or stocks folder, depending on a type. Moreover, each filename is the corresponding ticker symbol. At last, symbols_valid_meta.csv contains some additional metadata for each ticker such as full name.

### EDA And Data Manipulation Using Pandas

In [1]:
#importing library
import pandas as pd

  from pandas.core import (


In [2]:
#Reading csv Data Files
apple = pd.read_csv('D:/Datasets and projects/Stock Data/stocks/AAPL.csv')
facebook = pd.read_csv('D:/Datasets and projects/Stock Data/stocks/FB.csv')
google = pd.read_csv("D:/Datasets and projects/Stock Data/stocks/GOOGL.csv")
nvidia = pd.read_csv("D:/Datasets and projects/Stock Data/stocks/NVDA.csv")
tesla = pd.read_csv("D:/Datasets and projects/Stock Data/stocks/TSLA.csv")
amazon = pd.read_csv("D:/Datasets and projects/Stock Data/stocks/AMZN.csv")
goldmansachs = pd.read_csv("D:/Datasets and projects/Stock Data/stocks/GS.csv")

In [3]:
#Checking Data Head
apple.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400


In [4]:
facebook.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2012-05-18,42.049999,45.0,38.0,38.23,38.23,573576400
1,2012-05-21,36.529999,36.66,33.0,34.029999,34.029999,168192700
2,2012-05-22,32.610001,33.59,30.940001,31.0,31.0,101786600
3,2012-05-23,31.370001,32.5,31.360001,32.0,32.0,73600000
4,2012-05-24,32.950001,33.209999,31.77,33.029999,33.029999,50237200


In [5]:
google.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2004-08-19,50.050049,52.082081,48.028027,50.220219,50.220219,44659000
1,2004-08-20,50.555557,54.594593,50.300301,54.209209,54.209209,22834300
2,2004-08-23,55.430431,56.796795,54.579578,54.754753,54.754753,18256100
3,2004-08-24,55.675674,55.855854,51.836838,52.487488,52.487488,15247300
4,2004-08-25,52.532532,54.054054,51.991993,53.053055,53.053055,9188600


In [6]:
nvidia.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1999-01-22,1.75,1.953125,1.552083,1.640625,1.509998,67867200.0
1,1999-01-25,1.770833,1.833333,1.640625,1.8125,1.668188,12762000.0
2,1999-01-26,1.833333,1.869792,1.645833,1.671875,1.538759,8580000.0
3,1999-01-27,1.677083,1.71875,1.583333,1.666667,1.533965,6109200.0
4,1999-01-28,1.666667,1.677083,1.651042,1.661458,1.529172,5688000.0


In [7]:
tesla.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2010-06-29,19.0,25.0,17.540001,23.889999,23.889999,18766300
1,2010-06-30,25.790001,30.42,23.299999,23.83,23.83,17187100
2,2010-07-01,25.0,25.92,20.27,21.959999,21.959999,8218800
3,2010-07-02,23.0,23.1,18.709999,19.200001,19.200001,5139800
4,2010-07-06,20.0,20.0,15.83,16.110001,16.110001,6866900


In [8]:
amazon.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1997-05-15,2.4375,2.5,1.927083,1.958333,1.958333,72156000
1,1997-05-16,1.96875,1.979167,1.708333,1.729167,1.729167,14700000
2,1997-05-19,1.760417,1.770833,1.625,1.708333,1.708333,6106800
3,1997-05-20,1.729167,1.75,1.635417,1.635417,1.635417,5467200
4,1997-05-21,1.635417,1.645833,1.375,1.427083,1.427083,18853200


In [9]:
goldmansachs.head(5)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,1999-05-04,76.0,77.25,70.0,70.375,55.818161,22320900
1,1999-05-05,69.875,69.875,66.25,69.125,54.826698,7565700
2,1999-05-06,68.0,69.375,67.0625,67.9375,53.884857,2905700
3,1999-05-07,67.9375,74.875,66.75,74.125,58.792484,4862300
4,1999-05-10,73.375,73.5,70.25,70.6875,56.066021,2589400


In [10]:
#Creating a list of DataFrames
dfs = [apple,facebook,google,nvidia,tesla,amazon,goldmansachs]

__SMA (Simple Moving Average) is calculated on the closing price of a financial instrument.
This means that the SMA for a specific period (e.g., 50 days) is the average of the closing prices over the last 50 days. Other price points like the opening price, high, or low are not typically used for calculating SMA.__

In [11]:
#Running loops to calculate SMA50 and SMA200 and inserting into DF

for df in dfs:
    df["SMA50"]=df.Close.rolling(50).mean()
    df["SMA200"]=df.Close.rolling(200).mean()

In [12]:
#Checking Data again
apple.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SMA50,SMA200
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400,,
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200,,
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000,,
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400,,
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400,,
5,1980-12-19,0.504464,0.506696,0.504464,0.504464,0.399707,12157600,,
6,1980-12-22,0.529018,0.53125,0.529018,0.529018,0.419162,9340800,,
7,1980-12-23,0.551339,0.553571,0.551339,0.551339,0.436848,11737600,,
8,1980-12-24,0.580357,0.582589,0.580357,0.580357,0.45984,12000800,,
9,1980-12-26,0.633929,0.636161,0.633929,0.633929,0.502287,13893600,,


In [13]:
#Checking for Data Intergrity in SMA50 and SMA200 columns
print(apple.iloc[200:210])
print(apple.iloc[200:210])


           Date      Open      High       Low     Close  Adj Close    Volume  \
200  1981-09-30  0.272321  0.274554  0.272321  0.272321   0.215771  12499200   
201  1981-10-01  0.272321  0.274554  0.272321  0.272321   0.215771  15282400   
202  1981-10-02  0.294643  0.296875  0.294643  0.294643   0.233457  11261600   
203  1981-10-05  0.303571  0.308036  0.303571  0.303571   0.240532  10774400   
204  1981-10-06  0.303571  0.303571  0.301339  0.301339   0.238763   7089600   
205  1981-10-07  0.319196  0.323661  0.319196  0.319196   0.252912   9710400   
206  1981-10-08  0.330357  0.332589  0.330357  0.330357   0.261755   7772800   
207  1981-10-09  0.332589  0.337054  0.332589  0.332589   0.263524  13630400   
208  1981-10-12  0.343750  0.345982  0.343750  0.343750   0.272367   6837600   
209  1981-10-13  0.343750  0.348214  0.343750  0.343750   0.272367  11048800   

        SMA50    SMA200  
200  0.371964  0.470391  
201  0.368839  0.469319  
202  0.366652  0.468538  
203  0.364420  

In [14]:
#Calculating Previous Day Close

for df in dfs:
    df['Prev. Day Close'] = df.Close.shift(1)

In [15]:
apple.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SMA50,SMA200,Prev. Day Close
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400,,,
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200,,,0.513393
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000,,,0.486607
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400,,,0.450893
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400,,,0.462054
5,1980-12-19,0.504464,0.506696,0.504464,0.504464,0.399707,12157600,,,0.475446
6,1980-12-22,0.529018,0.53125,0.529018,0.529018,0.419162,9340800,,,0.504464
7,1980-12-23,0.551339,0.553571,0.551339,0.551339,0.436848,11737600,,,0.529018
8,1980-12-24,0.580357,0.582589,0.580357,0.580357,0.45984,12000800,,,0.551339
9,1980-12-26,0.633929,0.636161,0.633929,0.633929,0.502287,13893600,,,0.580357


In [16]:
# Calculating Change in price

for df in dfs:
    df["Change in Price"] = df["Close"] - df["Prev. Day Close"]

In [17]:
apple.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SMA50,SMA200,Prev. Day Close,Change in Price
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400,,,,
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200,,,0.513393,-0.026786
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000,,,0.486607,-0.035714
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400,,,0.450893,0.011161
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400,,,0.462054,0.013393
5,1980-12-19,0.504464,0.506696,0.504464,0.504464,0.399707,12157600,,,0.475446,0.029018
6,1980-12-22,0.529018,0.53125,0.529018,0.529018,0.419162,9340800,,,0.504464,0.024554
7,1980-12-23,0.551339,0.553571,0.551339,0.551339,0.436848,11737600,,,0.529018,0.022321
8,1980-12-24,0.580357,0.582589,0.580357,0.580357,0.45984,12000800,,,0.551339,0.029018
9,1980-12-26,0.633929,0.636161,0.633929,0.633929,0.502287,13893600,,,0.580357,0.053571


In [18]:
#Calculating Percentage Change in price

for df in dfs:
    df["Percent change in Price"] = df.Close.pct_change(fill_method=None)

In [19]:
apple.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SMA50,SMA200,Prev. Day Close,Change in Price,Percent change in Price
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400,,,,,
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200,,,0.513393,-0.026786,-0.052174
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000,,,0.486607,-0.035714,-0.073394
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400,,,0.450893,0.011161,0.024752
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400,,,0.462054,0.013393,0.028986
5,1980-12-19,0.504464,0.506696,0.504464,0.504464,0.399707,12157600,,,0.475446,0.029018,0.061033
6,1980-12-22,0.529018,0.53125,0.529018,0.529018,0.419162,9340800,,,0.504464,0.024554,0.048673
7,1980-12-23,0.551339,0.553571,0.551339,0.551339,0.436848,11737600,,,0.529018,0.022321,0.042194
8,1980-12-24,0.580357,0.582589,0.580357,0.580357,0.45984,12000800,,,0.551339,0.029018,0.052632
9,1980-12-26,0.633929,0.636161,0.633929,0.633929,0.502287,13893600,,,0.580357,0.053571,0.092308


In [20]:
#Calculating Previous Day Volume

for df in dfs:
    df["Previous Day Volume"] = df.Volume.shift(1)

In [21]:
apple.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SMA50,SMA200,Prev. Day Close,Change in Price,Percent change in Price,Previous Day Volume
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400,,,,,,
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200,,,0.513393,-0.026786,-0.052174,117258400.0
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000,,,0.486607,-0.035714,-0.073394,43971200.0
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400,,,0.450893,0.011161,0.024752,26432000.0
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400,,,0.462054,0.013393,0.028986,21610400.0
5,1980-12-19,0.504464,0.506696,0.504464,0.504464,0.399707,12157600,,,0.475446,0.029018,0.061033,18362400.0
6,1980-12-22,0.529018,0.53125,0.529018,0.529018,0.419162,9340800,,,0.504464,0.024554,0.048673,12157600.0
7,1980-12-23,0.551339,0.553571,0.551339,0.551339,0.436848,11737600,,,0.529018,0.022321,0.042194,9340800.0
8,1980-12-24,0.580357,0.582589,0.580357,0.580357,0.45984,12000800,,,0.551339,0.029018,0.052632,11737600.0
9,1980-12-26,0.633929,0.636161,0.633929,0.633929,0.502287,13893600,,,0.580357,0.053571,0.092308,12000800.0


In [22]:
#Calculating Change in Volume

for df in dfs:
    df["Change in Volume"] = df["Volume"]-df["Previous Day Volume"]

In [23]:
apple.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SMA50,SMA200,Prev. Day Close,Change in Price,Percent change in Price,Previous Day Volume,Change in Volume
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400,,,,,,,
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200,,,0.513393,-0.026786,-0.052174,117258400.0,-73287200.0
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000,,,0.486607,-0.035714,-0.073394,43971200.0,-17539200.0
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400,,,0.450893,0.011161,0.024752,26432000.0,-4821600.0
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400,,,0.462054,0.013393,0.028986,21610400.0,-3248000.0
5,1980-12-19,0.504464,0.506696,0.504464,0.504464,0.399707,12157600,,,0.475446,0.029018,0.061033,18362400.0,-6204800.0
6,1980-12-22,0.529018,0.53125,0.529018,0.529018,0.419162,9340800,,,0.504464,0.024554,0.048673,12157600.0,-2816800.0
7,1980-12-23,0.551339,0.553571,0.551339,0.551339,0.436848,11737600,,,0.529018,0.022321,0.042194,9340800.0,2396800.0
8,1980-12-24,0.580357,0.582589,0.580357,0.580357,0.45984,12000800,,,0.551339,0.029018,0.052632,11737600.0,263200.0
9,1980-12-26,0.633929,0.636161,0.633929,0.633929,0.502287,13893600,,,0.580357,0.053571,0.092308,12000800.0,1892800.0


In [24]:
#Calculating Percentage Change in Volume
for df in dfs:
    df["Percentage change in Volume"] = df.Volume.pct_change(fill_method=None)

In [25]:
apple.head(10)

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume,SMA50,SMA200,Prev. Day Close,Change in Price,Percent change in Price,Previous Day Volume,Change in Volume,Percentage change in Volume
0,1980-12-12,0.513393,0.515625,0.513393,0.513393,0.406782,117258400,,,,,,,,
1,1980-12-15,0.488839,0.488839,0.486607,0.486607,0.385558,43971200,,,0.513393,-0.026786,-0.052174,117258400.0,-73287200.0,-0.625006
2,1980-12-16,0.453125,0.453125,0.450893,0.450893,0.35726,26432000,,,0.486607,-0.035714,-0.073394,43971200.0,-17539200.0,-0.398879
3,1980-12-17,0.462054,0.464286,0.462054,0.462054,0.366103,21610400,,,0.450893,0.011161,0.024752,26432000.0,-4821600.0,-0.182415
4,1980-12-18,0.475446,0.477679,0.475446,0.475446,0.376715,18362400,,,0.462054,0.013393,0.028986,21610400.0,-3248000.0,-0.150298
5,1980-12-19,0.504464,0.506696,0.504464,0.504464,0.399707,12157600,,,0.475446,0.029018,0.061033,18362400.0,-6204800.0,-0.337908
6,1980-12-22,0.529018,0.53125,0.529018,0.529018,0.419162,9340800,,,0.504464,0.024554,0.048673,12157600.0,-2816800.0,-0.23169
7,1980-12-23,0.551339,0.553571,0.551339,0.551339,0.436848,11737600,,,0.529018,0.022321,0.042194,9340800.0,2396800.0,0.256595
8,1980-12-24,0.580357,0.582589,0.580357,0.580357,0.45984,12000800,,,0.551339,0.029018,0.052632,11737600.0,263200.0,0.022424
9,1980-12-26,0.633929,0.636161,0.633929,0.633929,0.502287,13893600,,,0.580357,0.053571,0.092308,12000800.0,1892800.0,0.157723


In [26]:
#Exploring the shape of Data
for df in dfs:
    print(df.shape)

(9909, 15)
(1980, 15)
(3932, 15)
(5334, 15)
(2457, 15)
(5758, 15)
(5263, 15)


In [28]:
#Writing the final Datasets to new csv files
apple.to_csv("Apple.csv")
facebook.to_csv("Facebook.csv")
google.to_csv("Google.csv")
nvidia.to_csv("Nvidia.csv")
tesla.to_csv("Tesla.csv")
amazon.to_csv("Amazon.csv")
goldmansachs.to_csv("GoldmanSachs.csv")

Now we have obtained the final csv Datasets to Visualize And Explore using Tableau