# Problem Statement

TESLA shares have been quite volatile since the company’s foundation
in 2010. If you had all the companies share data for the last 10 years,
can you make a prediction regarding its future value?
Requirements
You are required to take the dataset “TSLA.csv” and using any Machine Leaning
Algorithm, make a prediction as to what the stock’s price will be on the 03/03/2020. (The
data set contains 10 years’ worth of data finishing on the 03/02/2020)
The output of your program must:
1. Visualize the data and prediction
2. Print out the message: “The predicted share price on the 3rd March 2020 is”
(predicted share price)
3. Print out the accuracy or error of the ML model.
4. Explain in the code (As a comment) why you picked the specific algorithm

In [None]:
# Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [None]:
# Read Data "TSLA.csv" and set "Date" as INDEX for the dataset
data = pd.read_csv("../input/tesla-stock-data-from-2010-to-2020/TSLA.csv", index_col ="Date", parse_dates = True)

In [None]:
data.shape

In [None]:
data.head()

In [None]:
# Plot "Adj Close" and "Volume" of Tesla Shares
plt.figure(figsize=(20, 15))

plt.subplot(2,1,1)
plt.plot(data['Adj Close'], label='Adj Close', color="purple")
plt.legend(loc="upper right")
plt.title('Adj Close Prices of Tesla')

plt.subplot(2,1,2)
plt.plot(data['Volume'], label='Volume', color="Orange")
plt.legend(loc="upper right")
plt.title('Volume Of Shares Traded')

### ARIMA Model for Time Series Forecasting

ARIMA stands for autoregressive integrated moving average model and is specified by three order parameters: (p, d, q).

<b> AR(p) Autoregression – </b> a regression model that utilizes the dependent relationship between a current observation and observations over a previous period.An auto regressive (AR(p)) component refers to the use of past values in the regression equation for the time series. <br>

<b> I(d) Integration – </b> uses differencing of observations (subtracting an observation from observation at the previous time step) in order to make the time series stationary. Differencing involves the subtraction of the current values of a series with its previous values d number of times. <br>

<b> MA(q) Moving Average – </b> </b> a model that uses the dependency between an observation and a residual error from a moving average model applied to lagged observations. A moving average component depicts the error of the model as a combination of previous error terms. The order q represents the number of terms to be included in the model.

<b> Types of ARIMA Model </b> 

<b> ARIMA: </b> Non-seasonal Autoregressive Integrated Moving Averages
<b> SARIMA: </b> Seasonal ARIMA
<b> SARIMAX: </b> Seasonal ARIMA with exogenous variables
<b> Pyramid Auto-ARIMA </b> 

The <b> ‘auto_arima’ </b> function from the <b> ‘pmdarima’ </b> library helps us to identify the most optimal parameters for an ARIMA model and returns a fitted ARIMA model.

In [None]:
# To install the library 
!pip install pmdarima 

In [None]:
# Import the library 
from pmdarima import auto_arima 

In [None]:
# Ignore harmless warnings 
import warnings 
warnings.filterwarnings("ignore") 
  
# Fit auto_arima function to AirPassengers dataset 
stepwise_fit = auto_arima(data['Adj Close'], start_p = 1, start_q = 1, 
                          max_p = 3, max_q = 3, m = 12, 
                          start_P = 0, seasonal = True, 
                          d = None, D = 1, trace = True, 
                          error_action ='ignore',    # we don't want to know if an order does not work 
                          suppress_warnings = True,  # we don't want convergence warnings 
                          stepwise = True)           # set to stepwise 
  
# To print the summary 
stepwise_fit.summary() 

In [None]:
# Split data into train / test sets 
train = data.iloc[:len(data)-12] 
test = data.iloc[len(data)-12:] # set one year(12 months) for testing 
  
# Fit a SARIMAX(1, 0, 2)x(0, 1, [1], 12) on the training set 
from statsmodels.tsa.statespace.sarimax import SARIMAX 
  
model = SARIMAX(train['Adj Close'],  
                order = (1, 0, 2),  
                seasonal_order =(0, 1, 1, 12)) 
  
result = model.fit() 
result.summary() 

# Predictions of ARIMA Model against the test set

In [None]:
start = len(train) 
end = len(train) + len(test) - 1
  
# Predictions for one-year against the test set 
predictions = result.predict(start, end, 
                             typ = 'levels').rename("Predictions") 
  
# Create dataframe of Predictions
predictions_df = pd.DataFrame(predictions)
predictions_df.index = ['2020-01-16', '2020-01-17', '2020-01-21', '2020-01-22',
               '2020-01-23', '2020-01-24', '2020-01-27', '2020-01-28',
               '2020-01-29', '2020-01-30', '2020-01-31', '2020-02-03']
predictions_df.index = pd.to_datetime(predictions_df.index)

# plot predictions and actual values 
predictions_df.plot(legend = True) 
test['Adj Close'].plot(legend = True)

# Evaluate the model using MSE and RMSE

In [None]:
# Load specific evaluation tools 
from sklearn.metrics import mean_squared_error 
from statsmodels.tools.eval_measures import rmse 
  
# Calculate root mean squared error 
print("RMSE on Test Data: ", rmse(test["Adj Close"], predictions))
  
# Calculate mean squared error 
print("MSE on Test Data: ", mean_squared_error(test["Adj Close"], predictions))


# Code : Forecast using ARIMA Model

In [None]:
# Train the model on the full dataset 
model = model = SARIMAX(data['Adj Close'],  
                        order = (1, 0, 2),  
                        seasonal_order =(0, 1, 1, 12)) 
    
result = model.fit() 
  
# Forecast for the next 1 Month 
forecast = result.predict(start = len(data),  
                          end = (len(data)-1) + 1,             # +1 means 1 month advance from the last date i.e. 2nd Feb 2020
                          typ = 'levels').rename('Forecast') 


# Forecasted value on 3rd March 2020

In [None]:
print("The predicted share price on the 3rd March 2020 is: {}".format(forecast.iloc[0]))