Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Time Series plot_model support for difference plots #1654

Closed
ngupta23 opened this issue Oct 6, 2021 · 8 comments · Fixed by #2086
Closed

Time Series plot_model support for difference plots #1654

ngupta23 opened this issue Oct 6, 2021 · 8 comments · Fixed by #2086
Assignees
Labels
enhancement New feature or request plot_model time_series Topics related to the time series

Comments

@ngupta23
Copy link
Collaborator

ngupta23 commented Oct 6, 2021

Add option to plot difference plots. Use sktime Differencer for this. More details here

Option 1: User can specify order of difference

plot_model(plot="diff") # Defaults to order = 1 (first difference)

# User can specify a specific order of difference with data_kwargs.
# e.g. this does 1st difference, then 1st difference on differenced data (2nd order difference).
# Finally shows original and 2nd order differenced data.
plot_model(plot="diff", data_kwargs={'order':2})  
# If using sktime differencer, this is equivalent to lags=[1, 1]

# Same as above but finally shows original, 1st order, and 2nd order differenced data.
plot_model(plot="diff", data_kwargs={'order':[1,2]})  

Option 2: User can specify lags directly

# Instead of specifying the order, user can specify lags directly as well. These should be passed to the underlying sktime Differencer
# The below if equivalent to `plot_model(plot="diff", data_kwargs={'order':2})`
plot_model(plot="diff", data_kwargs={'lags':[1,1]})


# Does first difference then 12th difference
# From `sktime` doc: "given a timeseries with monthly periodicity, using lags=[1, 12] corresponds to applying a standard first difference to handle trend, and followed by a seasonal difference (at lag 12) to attempt to account for seasonal dependence."
plot_model(plot="diff", data_kwargs={'lags': [1,12]}) 

# User may specify multiple sets of lags as well
# This should show 3 plots - Original, then one with lags = [1, 12], then one with lags = [1, 1, 12]
plot_model(plot="diff", data_kwargs={'lags': [[1,12], [1, 1, 12]}) 

Option 3: Use Option 1 or 2 and plot additional plots such as ACF, PACF

Trellis with ACF and PACF below the time series in individual rows. This would return a plot of 3x3

  • Row 1: Original Dataset, 1st order difference, 2nd order difference
  • Row 2: ACF for Original Dataset, 1st order difference, 2nd order difference
  • Row 3: PACF for Original Dataset, 1st order difference, 2nd order difference
plot_model(plot="diff", data_kwargs={'order':[1,2], 'acf': True, 'pacf" True})  
@ngupta23
Copy link
Collaborator Author

Consider using sktime difference transformer for this to make it consistent with underlying sktime package.

https://www.sktime.org/en/v0.9.0/api_reference/auto_generated/sktime.transformations.series.difference.Differencer.html

@ngupta23
Copy link
Collaborator Author

ngupta23 commented Dec 17, 2021

@codecypher Are you going to work on this one? Let me know so that it can be assigned appropriately.

@ngupta23
Copy link
Collaborator Author

So we should always plot the original and then the differences.

If only first difference is asked for, it can be a 1x2 plot. If first 2 are asked for, then it can be a 1x3 plot.

Anything more than that.wouod need to wrap into multiple rows.

@codecypher
Copy link

codecypher commented Dec 17, 2021

No, it would kinda help to have an example of what we want to do such as How to Difference a Time Series Dataset with Python

@ngupta23
Copy link
Collaborator Author

@codecypher I updated the original post itself to avoid confusion. Let me know if this makes more sense.

#1654 (comment)

@ngupta23
Copy link
Collaborator Author

@codecypher As you build this, think about adding modular functions to get the data required for the plot (i.e. the differences). I would like to reuse those functions for the check_stats functionality. For example, I will use your function to get the differenced data and then run statistical tests on them (#1655)

@ngupta23
Copy link
Collaborator Author

ngupta23 commented Jan 7, 2022

@codecypher I got some time to work on this. Since you are working on the frequency components, I will go ahead and add some code for this. We can sync up once that is done.

This is sample code for base functions that need to be incorporated: https://gist.github.com/ngupta23/c93116dede5436c05faffd992181b89d

@ngupta23
Copy link
Collaborator Author

ngupta23 commented Jan 9, 2022

To test

import pandas as pd
from pycaret.datasets import get_data

y = get_data("airline")

#### Setup the experiment ----
from pycaret.internal.pycaret_experiment import TimeSeriesExperiment
exp = TimeSeriesExperiment()
exp.setup(data=y, fh=12, session_id=42)

# Works on original dataset
exp.plot_model(plot="diff") # Default Diff Order = 1
exp.plot_model(plot="diff", data_kwargs={"order_list":[1, 2]})
exp.plot_model(plot="diff", data_kwargs={"lags_list":[1, 12, [1, 12]]})
exp.plot_model(plot="diff", data_kwargs={"order_list":[1, 2], "lags_list":[1, 12, [1, 12]]})

# Works on model residuals
model = exp.create_model("theta")
exp.plot_model(model, plot="diff") # Default Diff Order = 1


# Difference with Diagnostics

# Plot differences along with diagnostics such as ACF and PACF

# Row 1: Original with ACF
# Row 2: d = 1 with ACF
# Row 3: d = 2 with ACF
exp.plot_model(plot="diff", data_kwargs={"order_list": [1, 2], "acf": True})

# Row 1: Original with PACF
# Row 2: d = 1 with PACF
# Row 3: d = 2 with PACF
exp.plot_model(plot="diff", data_kwargs={"order_list": [1, 2], "pacf": True})


# Row 1: Original
# Row 2: d = 1 with ACF & PACF
# Row 3: d = 1 + (D = 1, s = 12) with ACF & PACF
exp.plot_model(plot="diff", data_kwargs={"lags_list": [[1], [1, 12]], "acf": True, "pacf": True}) 

@ngupta23 ngupta23 linked a pull request Jan 22, 2022 that will close this issue
13 tasks
ngupta23 added a commit that referenced this issue Jan 22, 2022
Enhance Time Series Difference Plots - Closes #1654
Time Series Forecasting automation moved this from To do to Done Jan 22, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request plot_model time_series Topics related to the time series
Development

Successfully merging a pull request may close this issue.

3 participants