<h1 style="text-align: center;"> Model Tuning </p>

## Notebook Description

In this notebook, time-series data are modeled for forecasting oil stock performance as part of the requirements of the RMDS 2021 Data Science Competition.

##  Table of contents
1. [Required Libraries](#Required-Libraries)
2. [Load Data](#Load-Data)
3. 
4. 
5. 
6. 
7. 
8. 
9. 
10. 
11. [ARIMA Modeling](#ARIMA-Modeling)
12. [Conclusion](#Conclusion)

## Required Libraries

[[ go back to the top ]](#Table-of-contents)

This notebook uses several Python libraries such as:

In [2]:
# Load required packages 
import datetime
from datetime import timedelta
import numpy as np
import pandas as pd

# Visuals
import matplotlib.pyplot as plt
import seaborn as sns

# Time-Series
import statsmodels.api as sm
#from statsmodels.tsa.stattools import adfuller
#from statsmodels.tsa.seasonal import seasonal_decompose
#from statsmodels.tsa.stattools import acf, pacf
#from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima_model import ARMA, ARIMA
from scipy import signal
import scipy.stats as stats

import warnings
warnings.filterwarnings("ignore")

<a id='Load-Data'></a>

## Load Data

[[ go back to the top ]](#Table-of-contents)

In [49]:
# Load Data Function
def LOAD_DATA(filepath, filename):
    # Read CSV files
    if filename.endswith('.csv'):
        new_df = pd.read_csv(filepath+filename)

    # Read Excel files
    elif filename.endswith('.xlsx'):
        new_df = pd.read_excel(filepath+filename)

    # Try to identify the date column
    for col in new_df.columns:
        if col.lower().find('date') != -1:
            print(f"TIMESTAMP FOUND! '{col}'")
            print()
            new_df['date'] = pd.to_datetime(new_df[col]) # format = '%Y/%m/%d'
            new_df.drop(columns = col, inplace = True)
            new_df.set_index('date', inplace = True)

    display(new_df.info())
    return new_df

In [52]:
# Load International Sentiment Data

fpath = '../../data/News_AI_Sentiments/'
fname = 'daily-news-sentiment-international.csv'

sentiment_int = LOAD_DATA(filepath = fpath, filename = fname)
sentiment_int

TIMESTAMP FOUND! 'DateTime'

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 395 entries, 2000-04-17 to 2021-02-27
Data columns (total 1 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Daily News Sentiment  395 non-null    float64
dtypes: float64(1)
memory usage: 6.2 KB


None

Unnamed: 0_level_0,Daily News Sentiment
date,Unnamed: 1_level_1
2000-04-17,-0.10
2000-08-01,0.20
2001-01-24,0.20
2001-04-04,-0.10
2002-10-31,-0.50
...,...
2021-02-23,-0.27
2021-02-24,-0.10
2021-02-25,0.00
2021-02-26,-0.30


In [53]:
# Load North American Sentiment Data

fpath = '../../data/News_AI_Sentiments/'
fname = 'daily-news-sentiment-NA.csv'

sentiment_na = LOAD_DATA(filepath = fpath, filename = fname)
sentiment_na

TIMESTAMP FOUND! 'DateTime'

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 396 entries, 2013-07-26 to 2021-02-27
Data columns (total 1 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   Daily News Sentiment  396 non-null    float64
dtypes: float64(1)
memory usage: 6.2 KB


None

Unnamed: 0_level_0,Daily News Sentiment
date,Unnamed: 1_level_1
2013-07-26,0.10
2013-08-01,-0.50
2013-08-08,0.10
2013-08-09,0.10
2013-08-17,0.20
...,...
2021-02-23,-0.50
2021-02-24,-0.50
2021-02-25,0.07
2021-02-26,-0.40


In [54]:
# Load International Sentiment Data

fpath = '../../'
fname = 'exxon_sentiment_clean.csv'

sentiment_exxon = LOAD_DATA(filepath = fpath, filename = fname)
sentiment_exxon

TIMESTAMP FOUND! 'Date'

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 1000 entries, 2017-02-28 to 2021-02-25
Data columns (total 3 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   Value                     1000 non-null   float64
 1   Daily News Sentiment_INT  1000 non-null   float64
 2   Daily News Sentiment_NA   1000 non-null   float64
dtypes: float64(3)
memory usage: 31.2 KB


None

Unnamed: 0_level_0,Value,Daily News Sentiment_INT,Daily News Sentiment_NA
date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2017-02-28,81.32,0.00,0.00
2017-03-01,83.02,0.00,0.00
2017-03-02,83.30,0.00,-0.20
2017-03-03,82.46,0.00,0.00
2017-03-06,82.83,-0.20,-0.20
...,...,...,...
2021-02-19,52.37,0.00,0.00
2021-02-22,54.30,0.00,0.00
2021-02-23,55.05,-0.27,-0.50
2021-02-24,56.70,-0.10,-0.50
