![](https://github.com/linkedin/greykite/raw/master/LOGO-C8.png)

LinkedIn releases a time-series forecasting library, [Greykite](https://github.com/linkedin/greykite), to simplify prediction for data scientists. The primary forecasting algorithm used in this library is Silverkite, which automates the forecasting. LinkedIn developed GrekKite to support its team make effective decisions based on the time-series forecasting models. As the library also helps interpret outputs, it can become a go-to tool for most time-series forecasting. LinkedIn also had, last year, released a Fairness Toolkit for explainability in machine learning. 

Over the years, LinkedIn has been using the Greykite library to provide sufficient infrastructure to handle peak traffic, set business targets, and optimize budget decisions.

![](https://analyticsdrift.com/wp-content/uploads/2021/05/LinkedIn-greykite-architecture-1024x481.png)



# Installation

In [None]:
%matplotlib inline
!pip install -qqq greykite


Simple Forecast
===============

In [None]:
# !pip install -qqq pandas
import pandas as pd


df = pd.read_csv('/kaggle/input/electric-production/Electric_Production.csv')

df['DATE'] = df['DATE'].astype('datetime64[ns]')
df.rename(columns = {'DATE': 'ts', 'Value': 'y'}, inplace = True)
df = df.head(100)
df

In [None]:
from collections import defaultdict
import warnings

warnings.filterwarnings("ignore")

import pandas as pd
import plotly

from greykite.framework.templates.autogen.forecast_config import ForecastConfig
from greykite.framework.templates.autogen.forecast_config import MetadataParam
from greykite.framework.templates.forecaster import Forecaster 
from greykite.framework.templates.model_templates import ModelTemplateEnum
from greykite.framework.utils.result_summary import summarize_grid_search_results


Create a forecast
-----------------



In [None]:
# Specifies dataset information
metadata = MetadataParam(
     time_col="ts",  # name of the time column
     value_col="y",  # name of the value column
     freq="MS"  #"MS" for Montly at start date, "H" for hourly, "D" for daily, "W" for weekly, etc.
 )

forecaster = Forecaster()
result = forecaster.run_forecast_config(
     df=df,
     config=ForecastConfig(
         model_template=ModelTemplateEnum.SILVERKITE.name,
         forecast_horizon=100,  # forecasts 100 steps ahead
         coverage=0.95,  # 95% prediction intervals
         metadata_param=metadata
    )
)

The output of **run_forecast_config** is a dictionary that contains the future forecast, historical forecast performance, and the original timeseries.



In [None]:
ts = result.timeseries
fig = ts.plot()
plotly.io.show(fig)

## Cross-validation

By default, run_forecast_config provides historical evaluation, so you can see how the forecast performs on past data. This is stored in grid_search (cross-validation splits) and backtest (holdout test set).

Let’s check the cross-validation results. By default, all metrics in ElementwiseEvaluationMetricEnum are computed on each CV train/test split. The configuration of CV evaluation metrics can be found at Evaluation Metric. Below, we show the Mean Absolute Percentage Error (MAPE) across splits 

(see [summarize_grid_search_results](https://linkedin.github.io/greykite/docs/0.1.0/html/pages/autodoc/doc.html#greykite.framework.utils.result_summary.summarize_grid_search_results) to control what to show and for details on the output columns).

In [None]:
 grid_search = result.grid_search
 cv_results = summarize_grid_search_results(
     grid_search=grid_search,
     decimals=2,
     # The below saves space in the printed output. Remove to show all available metrics and columns.
     cv_report_metrics=None,
     column_order=["rank", "mean_test", "split_test", "mean_train", "split_train", "mean_fit_time", "mean_score_time", "params"])
 # Transposes to save space in the printed output
 cv_results["params"] = cv_results["params"].astype(str)
 cv_results.set_index("params", drop=True, inplace=True)
 cv_results.transpose()

## Backtest
Let's plot the historical forecast on the holdout test set.
You can zoom in to see how it performed in any given period.



In [None]:
backtest = result.backtest
fig = backtest.plot()
plotly.io.show(fig)

You can also check historical evaluation metrics (on the historical training/test set).


In [None]:
 backtest_eval = defaultdict(list)
 for metric, value in backtest.train_evaluation.items():
     backtest_eval[metric].append(value)
     backtest_eval[metric].append(backtest.test_evaluation[metric])
 metrics = pd.DataFrame(backtest_eval, index=["train", "test"]).T
 metrics

## Forecast
The **forecast** attribute contains the forecasted result. Just as for **backtest**, you can plot the result or see the evaluation metrics.

Let’s plot the forecast (trained on all data):

In [None]:
forecast = result.forecast
fig = forecast.plot()
plotly.io.show(fig)

The forecasted values are available in df.



In [None]:
 forecast.df.head().round(2)

## Model Diagnostics
The component plot shows how your dataset’s trend, seasonality, and event / holiday patterns are handled in the model:

In [None]:
fig = forecast.plot_components()
plotly.io.show(fig)     # fig.show() if you are using "PROPHET" template

Model summary allows inspection of individual model terms. Check parameter estimates and their significance for insights on how the model works and what can be further improved.

In [None]:
 summary = result.model[-1].summary()  # -1 retrieves the estimator from the pipeline
 print(summary)