# Description of the dataset variables 

- Date: Represents the date at which the share is traded in the stock market.
- Open: Represents the opening price of the stock at a particular date. It is the price at which a stock started trading when the opening bell rang.
- Close: Represents the closing price of the stock at a particular date. It is the last buy-sell order executed between two traders. The closing price is the raw price, which is just the cash value of the last transacted price before the market closes.
- High: The high is the highest price at which a stock is traded during a period. Here the period is a day.
- Low: The low is the lowest price at which a stock is traded during a period. Here the period is a day.
- Adj Close: The adjusted closing price amends a stock's closing price to reflect that stock's value after accounting for any corporate actions. The adjusted closing price factors in corporate actions, such as stock splits, dividends, and rights offerings.
- Volume: Volume is the number of shares of security traded during a given period of time. Here the security is stock and the period of time is a day.

In [3]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [4]:
# To gain access to the data content, you have to read in the data using the pandas library
data = pd.read_csv('Tesla_Stock_prices.csv')

In [5]:
# To view the first few rows of the data, attach '.head' function to the data's name
data.head()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
0,2010-06-29,1.266667,1.666667,1.169333,1.592667,1.592667,281494500
1,2010-06-30,1.719333,2.028,1.553333,1.588667,1.588667,257806500
2,2010-07-01,1.666667,1.728,1.351333,1.464,1.464,123282000
3,2010-07-02,1.533333,1.54,1.247333,1.28,1.28,77097000
4,2010-07-06,1.333333,1.333333,1.055333,1.074,1.074,103003500


# Data Preprocessing

In [6]:
# For basic data info and data types, the below code will be used
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3261 entries, 0 to 3260
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   Date       3261 non-null   object 
 1   Open       3261 non-null   float64
 2   High       3261 non-null   float64
 3   Low        3261 non-null   float64
 4   Close      3261 non-null   float64
 5   Adj Close  3261 non-null   float64
 6   Volume     3261 non-null   int64  
dtypes: float64(5), int64(1), object(1)
memory usage: 178.5+ KB


In [7]:
# The data type of Date should not be an object but rather datetime. It is advisable to change it to the proper datatype
data['Date'] = pd.to_datetime(data['Date'])

In [8]:
data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3261 entries, 0 to 3260
Data columns (total 7 columns):
 #   Column     Non-Null Count  Dtype         
---  ------     --------------  -----         
 0   Date       3261 non-null   datetime64[ns]
 1   Open       3261 non-null   float64       
 2   High       3261 non-null   float64       
 3   Low        3261 non-null   float64       
 4   Close      3261 non-null   float64       
 5   Adj Close  3261 non-null   float64       
 6   Volume     3261 non-null   int64         
dtypes: datetime64[ns](1), float64(5), int64(1)
memory usage: 178.5 KB


In [9]:
# Check for null values
data.isnull().sum()

Date         0
Open         0
High         0
Low          0
Close        0
Adj Close    0
Volume       0
dtype: int64

In [10]:
# Summary of the data
data.describe()

Unnamed: 0,Date,Open,High,Low,Close,Adj Close,Volume
count,3261,3261.0,3261.0,3261.0,3261.0,3261.0,3261.0
mean,2016-12-18 13:32:30.689972224,62.917564,64.338205,61.376366,62.895762,62.895762,95725510.0
min,2010-06-29 00:00:00,1.076,1.108667,0.998667,1.053333,1.053333,1777500.0
25%,2013-09-24 00:00:00,9.656667,9.882667,9.438667,9.66,9.66,43898400.0
50%,2016-12-16 00:00:00,16.546667,16.788,16.353333,16.559999,16.559999,78091800.0
75%,2020-03-17 00:00:00,46.733334,49.318001,45.618,47.042,47.042,122446500.0
max,2023-06-12 00:00:00,411.470001,414.496674,405.666656,409.970001,409.970001,914082000.0
std,,96.610729,98.837821,94.132775,96.520274,96.520274,81415010.0


In [11]:
# For the size of the data
data.shape

(3261, 7)