# Microsoft Stocks Price Prediction (using AutoML framework FEDOT)

This notebook was inspired by that [post](https://www.kaggle.com/paramarthasengupta/microsoft-stocks-price-prediction?select=Microsoft_Stock.csv). 

Task: Time series forecasting

## [Original repository on GitHub](https://github.com/nccr-itmo/FEDOT)

--- 


In [None]:
# Additional imports 
import pandas as pd 
import numpy as np

# Imports for creating plots
import matplotlib.pyplot as plt
from pylab import rcParams
rcParams['figure.figsize'] = 18, 7

import warnings
warnings.filterwarnings('ignore')

## Exploratory visualizations 

In [None]:
file_path = '../input/microsoft-stock-time-series-analysis/Microsoft_Stock.csv'
df = pd.read_csv(file_path, parse_dates=['Date'])
df.head(5)

In [None]:
plt.plot(df['Date'], df['Close'])
plt.show()

## Train test split

In [None]:
train_size = int(len(df)*0.75)
test_size = len(df) - train_size

The length of the forecast is equal to the length of the validation block

In [None]:
forecast_length = test_size

# AutmoML framework FEDOT

fedot version 0.3.1

In [None]:
# We will use FEDOT framework version 0.3.1 for forecasting
!pip install fedot==0.3.1

In [None]:
from fedot.api.main import Fedot

# Chain and nodes
from fedot.core.chains.chain import Chain
from fedot.core.chains.node import PrimaryNode, SecondaryNode

# Data 
from fedot.core.data.data import InputData
from fedot.core.data.data_split import train_test_data_setup
from fedot.core.repository.dataset_types import DataTypesEnum

# Tasks
from fedot.core.repository.tasks import Task, TaskTypesEnum, TsForecastingParams

# Metric
from sklearn.metrics import mean_absolute_error

In [None]:
task = Task(TaskTypesEnum.ts_forecasting,
            TsForecastingParams(forecast_length=forecast_length))

# Load data from csv file and wrap it into InputData structure
input_data = InputData.from_csv_time_series(task, file_path, target_column='Close')

# Divide into train and test 
train_data, test_data = train_test_data_setup(input_data)

Launch AutmoML framework for two minutes

*Due to the specifics of the jupiter notebooks format, in order not to overload the page with unnecessary logs, we do not show the cell output below.

The log must starts with: "Composition started. Parameters tuning: True. Set of candidate models: ['linear', 'lasso', 'ridge', 'xgbreg', 'adareg', 'gbr', 'dtreg', 'treg', 'rfr', 'svr', 'sgdr', 'ar', 'scaling', 'normalization', 'simple_imputation', 'pca', 'poly_features', 'ransac_lin_reg', 'ransac_non_lin_reg', 'rfe_lin_reg', 'rfe_non_lin_reg', 'lagged', 'smoothing', 'gaussian_filter']. Composing time limit: 2 min

Model composition started ..."

In [None]:
# Define parameters
task_parameters = TsForecastingParams(forecast_length=forecast_length)

# Init model for the time series forecasting
model = Fedot(problem='ts_forecasting', task_params=task_parameters)

# Run AutoML model design in the same way
chain = model.fit(features=train_data)

In [None]:
# Use model to obtain forecast
forecast = model.predict(features=test_data)

Prepare function for visualisation.

In [None]:
def display_results(actual_time_series, predicted_values, len_train_data, y_name = 'Microsoft Stocks Price'):
    """
    Function for drawing plot with predictions and check metrics
    
    :param actual_time_series: the entire array with one-dimensional data
    :param predicted_values: array with predicted values
    :param len_train_data: number of elements in the training sample
    :param y_name: name of the y axis
    """
    
    plt.plot(np.arange(0, len(actual_time_series)), 
             actual_time_series, label = 'Actual values', c = 'green')
    plt.plot(np.arange(len_train_data, len_train_data + len(predicted_values)), 
             predicted_values, label = 'Predicted', c = 'blue')
    # Plot black line which divide our array into train and test
    plt.plot([len_train_data, len_train_data],
             [min(actual_time_series), max(actual_time_series)], c = 'black', linewidth = 1)
    plt.ylabel(y_name, fontsize = 15)
    plt.xlabel('Time index', fontsize = 15)
    plt.legend(fontsize = 15, loc='upper left')
    plt.grid()
    plt.show()
    
    mae_value = mean_absolute_error(actual_time_series[len_train_data:], predicted_values)
    print(f'MAE value: {mae_value}')

In [None]:
display_results(np.array(df['Close']), forecast, len(train_data.features))

Check obtained chain structure

In [None]:
chain.show()

Note that it may be inefficient to predict the time series of stocks as one-dimensional arrays. it will be better to use additional data, which, however, does not guarantee a very accurate result.