# Import libraries 

As a data scientist, you would typically use libraries such as: 
- `pandas` : used to read data into a structured tabular format known as a `DataFrame`. It supports reading data from files, databases and APIs. It allows for operations to be performed on the `DataFrame` before then being written out to another file or database. 
- `statsmodels` : used to train a model and perform forecasting. 
- `scikit-learn` : used to train a model and perform forecasting.  
- And much more..! Depending on what you need to do. 

Go ahead an import these popular libraries into your notebook by running

```python
import pandas as pd  
import statsmodels.api as sm
import matplotlib.pyplot as plt
```

If these libraries do not exist on your computer, you would see a `Module Not Found` error. In that case, go ahead and install these libraries by running: 

```
pip install pandas 
pip install statsmodels
pip install matplotlib
```


# Reading in data

As a Data Scientist, you would work closely with the Data Engineer to have the data pulled in from various sources to speed up your ability to create machine learning models. 

For this instructor demo, let's assume that a Data Engineer has prepared a dataset for you (as per the previous exercises we did), and the dataset is stored in `../resources/final_superstore.csv`. 

We can read in the data using: 
```python
pd.read_csv("file_path_here")
```

# Feature engineering

As a Data Scientist, you would often have to generate new features (columns) from existing data. 

These new features are used in the building of a machine learning model. 

Create a new date column (feature) so that it can be used in a time series forecasting model. 

Prepare your dataset to be used in a time series forecasting model by grouping your data by `order_date` and summing up `total_value`.  

# Train model 

As a Data Scientist, your role is to build a model that would answer a forwards-looking business question. 

For example: "How much revenue can I expect to generate in the next 3 months?" 

Depending on the question you are trying to answer, you will select different types of models. 

In this example, you will use SARIMAX, a time series forecasting method. 

```python
model = sm.tsa.statespace.SARIMAX(df,order=(1, 0, 0), seasonal_order=(1, 1, 1, 12))
```

Where: 
- `df`: your DataFrame with the date set as the index and a column for the values 
- `order`: the (p,d,q) order of the model. 
- `seasonal_order`: the (P,D,Q,s) order of the seasonal component of the model 

In [None]:
# fit a time series model 


# Perform prediction

Taking the trained model, you can now perform predictions that look forward into the future. 