Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change calculation of ARIMA confidence intervals #4248

Merged
merged 4 commits into from
Oct 11, 2021

Conversation

Nyrio
Copy link
Contributor

@Nyrio Nyrio commented Sep 29, 2021

The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR #4058.

In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels.

Previous formula:

lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))

New formula:

lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t)
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t)

@Nyrio Nyrio added 3 - Ready for Review Ready for review by team CUDA / C++ CUDA issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 29, 2021
@Nyrio Nyrio requested review from a team as code owners September 29, 2021 16:38
@Nyrio Nyrio added this to PR-WIP in v21.12 Release via automation Sep 29, 2021
@github-actions github-actions bot added CUDA/C++ Cython / Python Cython or Python issue labels Sep 29, 2021
Copy link
Contributor

@tfeher tfeher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Nyrio thanks for this fix! The implementation looks good.

I see that the updated formula omits the d_sigma2 value while we calculate the margin. Could you add a line to the PR description describing this change in the margin definition? Is there a theoretical reason to prefer the new formula, or is it just to align better with statsmodels?

@Nyrio
Copy link
Contributor Author

Nyrio commented Sep 30, 2021

@tfeher I have added the previous and new formulas in the description.

As to why we need to change: both statsmodels and R use this formula. I can't remember where/how I found the formula I was using, but I'd rather trust the two most popular implementations than my past self.

@dantegd
Copy link
Member

dantegd commented Sep 30, 2021

@Nyrio I think merging branch-21.10 into the PR would be a good idea to get CI to pass, the errors that happened should be fixed by now

@github-actions github-actions bot added CMake conda conda issue labels Oct 4, 2021
@Nyrio Nyrio changed the base branch from branch-21.10 to branch-0.12 October 4, 2021 15:28
@Nyrio Nyrio requested review from a team as code owners October 4, 2021 15:28
@Nyrio Nyrio changed the base branch from branch-0.12 to branch-21.12 October 4, 2021 15:28
@ajschmidt8 ajschmidt8 removed the request for review from a team October 4, 2021 15:44
@ajschmidt8
Copy link
Member

Removing ops-codeowners from the required reviews since it doesn't seem there are any file changes that we're responsible for. Feel free to add us back if necessary.

@Nyrio
Copy link
Contributor Author

Nyrio commented Oct 4, 2021

@ajschmidt8 I misclicked branch-0.12 instead of branch-21.12 and it triggered the request for review, sorry for that.

@codecov-commenter
Copy link

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.12@57a6ff7). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.12    #4248   +/-   ##
===============================================
  Coverage                ?   86.06%           
===============================================
  Files                   ?      231           
  Lines                   ?    18691           
  Branches                ?        0           
===============================================
  Hits                    ?    16087           
  Misses                  ?     2604           
  Partials                ?        0           
Flag Coverage Δ
dask 47.01% <0.00%> (?)
non-dask 78.75% <0.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.


Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 57a6ff7...904e81c. Read the comment docs.

@Nyrio
Copy link
Contributor Author

Nyrio commented Oct 5, 2021

@rapidsai/cuml-python-codeowners for required approval

v21.12 Release automation moved this from PR-WIP to PR-Reviewer approved Oct 11, 2021
@dantegd
Copy link
Member

dantegd commented Oct 11, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 0c13f44 into rapidsai:branch-21.12 Oct 11, 2021
v21.12 Release automation moved this from PR-Reviewer approved to Done Oct 11, 2021
vimarsh6739 pushed a commit to vimarsh6739/cuml that referenced this pull request Oct 9, 2023
The formula that we have been using since I added support for confidence intervals in ARIMA is slightly different than the one used in statsmodels. The difference is in particular quite pronounced when datasets have missing observations, which pushed me to raise tolerance for the intervals unit tests when I added test cases in the recent PR rapidsai#4058.

In this PR, I change our calculation to match statsmodels, and decrease the corresponding test tolerance, as we now have a strict match with statsmodels.

Previous formula:
```python
lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t * mean(v_i**2 / F_i))
```
New formula:
```python
lower_t = fc_t - sqrt(2) * erfinv(level) * sqrt(F_t)
upper_t = fc_t + sqrt(2) * erfinv(level) * sqrt(F_t)
```

Authors:
  - Louis Sugy (https://github.com/Nyrio)

Approvers:
  - Tamas Bela Feher (https://github.com/tfeher)
  - Dante Gama Dessavre (https://github.com/dantegd)

URL: rapidsai#4248
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team CMake conda conda issue CUDA / C++ CUDA issue CUDA/C++ Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
No open projects
Development

Successfully merging this pull request may close these issues.

None yet

5 participants