New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP ENH: Add stats.control_charts #5506
base: main
Are you sure you want to change the base?
Conversation
there is one unrelated failure in the test run for the last python 3.6 on travis Thanks for taking over. In my experience (especially including my own work), open ended PRs with feature creep are difficult to merge, and might stall for some time, (even though those stalled and too wide reaching PRs are often useful to figure out the general pattern and design), |
Codecov Report
@@ Coverage Diff @@
## master #5506 +/- ##
==========================================
- Coverage 82% 81.82% -0.18%
==========================================
Files 586 587 +1
Lines 92398 92600 +202
Branches 10250 10277 +27
==========================================
Hits 75768 75768
- Misses 14282 14480 +198
- Partials 2348 2352 +4
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #5506 +/- ##
==========================================
+ Coverage 82.05% 82.07% +0.02%
==========================================
Files 586 588 +2
Lines 92741 93027 +286
Branches 10281 10307 +26
==========================================
+ Hits 76095 76353 +258
- Misses 14290 14312 +22
- Partials 2356 2362 +6
Continue to review full report at Codecov.
|
Here's a Jupyter Notebook that shows what the various control charts look like in practice: https://gist.github.com/shangyian/041128898e7a6a010f2efe336ebfdd09 The tests were all verified using two R packages: https://cran.r-project.org/web/packages/mvdalab/ and https://cran.r-project.org/web/packages/qcc/ |
8b698c0
to
de0fde3
Compare
@josef-pkt let me know if the changes look okay! I'm not sure why the AppVeyor build failed - doesn't seem related to any changes in this branch. I'll look into it. |
I looked through parts of your changes earlier today. The second surprise was the distinction between phase1 and phase2 limits, but then I saw that that was already in my comments on multivariate mean chart. (So, I have to think more in order to see if the design has problems or not.) Two github comments: Second, don't merge statsmodels master into your feature branch, that makes the github commit history a bit messy. If you need to, then rebase on master. We usually rebase a feature branch when it falls too far behind master. |
Yeah, I wasn't sure if it was necessary to have the in & out sample points in init, but I figured keeping some flexibility there would be good. Another thing that came to mind - the phase 1 vs 2 structure is slightly confusing - technically we should be able to update both phase 1 and phase 2 with more data, but the current setup only allows for updating (appending) phase 2 with new data points. Do you think this is a use case that may prove necessary? Sorry about the commit overwrite! I was trying to clean up a bunch of commits I made (almost all of them related to linting), but messed up the history there. I'll keep that in mind about rebasing vs merging. |
about updating the stored data:
A variation of this will be in self-updating or continuously updating control charts, but we might not want to worry about those for now. |
another use case that I don't know yet how we should handle it: using different statistics in phase 1 and phase 2. example: I was reading/skimming some of the articles on robust estimators in phase 1. My impression (given that I did not confirm all the small print) is that they use the same robust estimators (median, trimmed mean, or mad, iqr for variance) also in phase 2. |
Good point. The ability to update phase 1 data is what's missing from my current implementation. And the need to remove out-of-control observations from phase 1. I'll try to add that later today.
With the way that it's currently implemented, I believe we can just add a different set of statistics in |
distr, | ||
alpha) | ||
self.center = self.endog.mean(0) | ||
self.std = self.endog.std(0, ddof=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to go into a method that can be overwritten by subclasses
skimming the code again: code looks good overall, docstrings don't follow numpy standard and are incomplete fitting on historical data assumes that historical data is clean (no out of control observations) bump for 0.15 detail: |
AFAICS, there is no method to update the historical data |
Taking over this PR (#4191) to build control charts: #4191