## Installing Required Libraries

The Jupyter environment does not come pre-installed with the necessary library below. Nevertheless, if you are using this notebook in an alternate Jupyter environment, you will be required to install these library. To do so, simply uncomment the code cell below by removing the '#' sign before '!pip'.

By default, Jupyter Notebook doesn’t support Plotly’s interactive features out of the box. To enable these features, we need to install the jupyterlab-plotly extension for Jupyter Notebook.

In [None]:
#!pip install prophet
#!pip install plotly
# Note: If your environment doesn't support "!mamba install", use "!pip install"

## Importing Required Libraries

In [None]:
import pandas as pd
from prophet import Prophet
from matplotlib import pyplot
from matplotlib.pyplot import figure
from sklearn.metrics import mean_absolute_error
import plotly.express as px
import plotly.graph_objects as go

## Read .csv File

Read a comma-seperated values (csv) file into DataFrame.

This code disables SSL certificate validation for HTTPS requests, allowing you to proceed without certificate checks. Remember that this approach should be used sparingly and only when you are confident about the safety and trustworthiness of the data source. In production environments, it's essential to prioritize security and maintain proper SSL certificate validation.

In [None]:
#To bypass SSL verification in Jupyter Notebook script
import ssl
ssl._create_default_https_context = ssl._create_unverified_context

The dataset contains Turkish characters encoded with ISO 8859-9, often referred to as "Latin 5." ISO 8859-9 is similar to Latin 1 encoding but includes specific modifications to accommodate the Turkish language's unique character requirements.

In [42]:
# Specify the path to the CSV file and the encoding
csv_file_path = '/Users/rafetbatuhanemek/Desktop/TimeSeriesAnalysis-WithProphet/data/RealTimeConsumption-01092020-01092023.csv'
csv_encoding = 'latin5'  # or 'ISO-8859-9'

In [43]:
# Read the CSV file into a DataFrame with the specified encoding
df = pd.read_csv(csv_file_path, encoding=csv_encoding)

The ```.head()``` function returns the first five rows (or as many as you specify within the parentheses) of a dataset. It is useful for quickly testing if your object has the right type of data.

In [44]:
df.head()

Unnamed: 0,Tarih,Saat,Tüketim Miktarı (MWh)
0,01.09.2020,00:00,"37.389,67"
1,01.09.2020,01:00,"35.688,23"
2,01.09.2020,02:00,"34.387,69"
3,01.09.2020,03:00,"33.582,27"
4,01.09.2020,04:00,"33.076,31"


## Data Exploring and Pre-processing 

### Check All column Datatypes

The ```.dtypes``` function returns the data type of each column.

In [None]:
df.dtypes

### Format Date (Tarih) column into datetime type with using Pandas

The ```to_datetime``` function is used to transform a scalar, array-like, Series, or DataFrame/dict-like object into a datetime object within the Pandas library.

This conversion is necessary for using the Prophet model, which requires the input data to be in datetime format.

In [45]:
# Combine 'Tarih' and 'Saat' columns into a single datetime column
df['Datetime'] = pd.to_datetime(df['Tarih'] + ' ' + df['Saat'], format='%d.%m.%Y %H:%M')

You can also use drop function to ```.drop()``` unnecessary column by using, drop specifying  labels from rows or columns.

```axis=0``` can be used for dropping particular Rows.
```axis=1``` can be used for dropping particular Columns.

In [46]:
# Drop the original 'Tarih' and 'Saat' columns
df = df.drop(['Tarih', 'Saat'], axis=1)

Swapping columns is necessary because the DataFrame must have a specific format and translation of the column name.

In [47]:
# Swap the two columns
df = df[['Datetime', 'Tüketim Miktarı (MWh)']]

# Rename the column to 'Consumption Amount (MWh)'
df.rename(columns={'Tüketim Miktarı (MWh)': 'Consumption Amount (MWh)'}, inplace=True)

The code below will remove the commas used as thousand separators and then convert the column to the float data type.

In [48]:
# Convert 'Tüketim Miktarı (MWh)' column to float (removing commas)
df['Consumption Amount (MWh)'] = df['Consumption Amount (MWh)'].str.replace('.', '').str.replace(',', '.').astype(float)

### Plot the Data

You may be familiar with ```Matplotib and Seaborn```, which are commonly taught in Data Science courses for basic plotting. However, for our time series data, we've selected for ```Plotly```, an ```interactive, open-source``` Python library that offers over 40 chart types, including statistical, financial, geographic, scientific, and 3D visualizations. 

Seaborn and Matplotlib provide limited interactivity, while Plotly provides highly interactive and responsive plots that can be zoomed, panned, and rotated.

In [49]:
fig = px.line(df, x='Datetime', y='Consumption Amount (MWh)')
fig.show()

In [50]:
df.head()

Unnamed: 0,Datetime,Consumption Amount (MWh)
0,2020-09-01 00:00:00,37389.67
1,2020-09-01 01:00:00,35688.23
2,2020-09-01 02:00:00,34387.69
3,2020-09-01 03:00:00,33582.27
4,2020-09-01 04:00:00,33076.31
