Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARIMA - Add support for missing observations and padding #4058

Merged
merged 47 commits into from
Sep 22, 2021

Conversation

Nyrio
Copy link
Contributor

@Nyrio Nyrio commented Jul 15, 2021

This PR allows support for missing observations and padding at the start for variable-length batch. Example:

missing_obs_0

Note: I had to change ARIMA tests because I used a different method than statsmodels (which is used as a reference in tests) to compute the initial parameter estimation. They cut all missing observations for their initial least-square estimation, and I decided to fill them with naive replacements instead, so I keep the temporal relationships in the data and have a much better initial estimate and often a better fit in the end, according to some MASE measurements I made. So I updated the integration test to use the MASE and pass if we are approximately the same or better than statsmodels.

Nyrio added 28 commits June 16, 2021 05:41
@Nyrio Nyrio requested review from a team as code owners July 15, 2021 17:42
@ajschmidt8
Copy link
Member

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

@ajschmidt8 ajschmidt8 removed the request for review from a team September 2, 2021 14:35
@github-actions github-actions bot removed the conda conda issue label Sep 2, 2021
@Nyrio
Copy link
Contributor Author

Nyrio commented Sep 2, 2021

I solved a bug of numerical issues that could result in NaNs, changed one of the new test cases to a more stable ARIMA order, and relaxed the tolerance of confidence intervals tests.

The PR should be ready to merge if CI passes.

@Nyrio
Copy link
Contributor Author

Nyrio commented Sep 3, 2021

rerun tests

@Nyrio Nyrio added 4 - Waiting on Reviewer Waiting for reviewer to review or respond and removed 4 - Waiting on Author Waiting for author to respond to review labels Sep 10, 2021
@Nyrio
Copy link
Contributor Author

Nyrio commented Sep 10, 2021

I have fixed some C++ tests that were failing due to a change in the Jones transform.

@Nyrio
Copy link
Contributor Author

Nyrio commented Sep 20, 2021

@dantegd Can you please update your review?

@Nyrio
Copy link
Contributor Author

Nyrio commented Sep 21, 2021

rerun tests

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@d657178). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #4058   +/-   ##
===============================================
  Coverage                ?   86.20%           
===============================================
  Files                   ?      231           
  Lines                   ?    19072           
  Branches                ?        0           
===============================================
  Hits                    ?    16441           
  Misses                  ?     2631           
  Partials                ?        0           
Flag Coverage Δ
dask 47.75% <0.00%> (?)
non-dask 78.76% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d657178...aa544b7. Read the comment docs.

@dantegd
Copy link
Member

dantegd commented Sep 22, 2021

@gpucibot merge

v21.10 Release automation moved this from PR-WIP to PR-Reviewer approved Sep 22, 2021
@rapids-bot rapids-bot bot merged commit f432537 into rapidsai:branch-21.10 Sep 22, 2021
v21.10 Release automation moved this from PR-Reviewer approved to Done Sep 22, 2021
rapids-bot bot pushed a commit that referenced this pull request Oct 11, 2021
The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR #4058.

In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels.

Previous formula:
```python
lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
```
New formula:
```python
lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t)
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t)
```

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: #4248
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
This PR allows support for missing observations and padding at the start for variable-length batch. Example:

![missing_obs_0](https://user-images.githubusercontent.com/17441062/125832072-1ff903c9-088e-4d77-9b17-be365890d982.png)

Note: I had to change ARIMA tests because I used a different method than statsmodels (which is used as a reference in tests) to compute the initial parameter estimation. They cut all missing observations for their initial least-square estimation, and I decided to fill them with naive replacements instead, so I keep the temporal relationships in the data and have a much better initial estimate and often a better fit in the end, according to some MASE measurements I made. So I updated the integration test to use the MASE and pass if we are approximately the same _or better_ than statsmodels.

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)
  - Ray Douglass (https://github.com/raydouglass)

URL: rapidsai#4058
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR rapidsai#4058.

In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels.

Previous formula:
```python
lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
```
New formula:
```python
lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t)
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t)
```

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4248
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Reviewer Waiting for reviewer to review or respond CMake CUDA/C++ Cython / Python Cython or Python issue feature request New feature or request gpuCI gpuCI issue non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

7 participants