WIP ENH: Add stats.control_charts #5506

shangyian · 2019-02-19T00:49:25Z

Taking over this PR (#4191) to build control charts: #4191

josef-pkt · 2019-02-19T01:59:53Z

there is one unrelated failure in the test run for the last python 3.6 on travis
two ci versions fail because I didn't pep-8 and lint clean my code
e,g, at the bottom of this https://travis-ci.org/statsmodels/statsmodels/jobs/495212888

Thanks for taking over.
As I briefly mentioned, I would like to merge an intial version as soon as the basic charts are ok, and leave extra options and variations or extensions of the chart to follow-up PRs.

In my experience (especially including my own work), open ended PRs with feature creep are difficult to merge, and might stall for some time, (even though those stalled and too wide reaching PRs are often useful to figure out the general pattern and design),
And given the topic and the literature, there are a very large number of extensions that we could be doing.

codecov · 2019-02-26T11:55:17Z

Codecov Report

Merging #5506 into master will decrease coverage by 0.17%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master    #5506      +/-   ##
==========================================
- Coverage      82%   81.82%   -0.18%     
==========================================
  Files         586      587       +1     
  Lines       92398    92600     +202     
  Branches    10250    10277      +27     
==========================================
  Hits        75768    75768              
- Misses      14282    14480     +198     
- Partials     2348     2352       +4

Impacted Files	Coverage Δ
statsmodels/stats/control_chart.py	`0% <0%> (ø)`
statsmodels/sandbox/regression/treewalkerclass.py	`0% <0%> (ø)`	⬆️
statsmodels/stats/libqsturng/make_tbls.py	`0% <0%> (ø)`	⬆️
statsmodels/tsa/statespace/mlemodel.py	`86.19% <0%> (ø)`	⬆️
statsmodels/iolib/summary.py	`54.94% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update db49000...67a99e1. Read the comment docs.

codecov · 2019-02-26T11:55:17Z

Codecov Report

Merging #5506 into master will increase coverage by 0.02%.
The diff coverage is 92.65%.

@@            Coverage Diff             @@
##           master    #5506      +/-   ##
==========================================
+ Coverage   82.05%   82.07%   +0.02%     
==========================================
  Files         586      588       +2     
  Lines       92741    93027     +286     
  Branches    10281    10307      +26     
==========================================
+ Hits        76095    76353     +258     
- Misses      14290    14312      +22     
- Partials     2356     2362       +6

Impacted Files	Coverage Δ
statsmodels/stats/tests/test_control_chart.py	`100% <100%> (ø)`
statsmodels/stats/control_chart.py	`89.75% <89.75%> (ø)`
statsmodels/regression/mixed_linear_model.py	`89.46% <0%> (-0.78%)`	⬇️
statsmodels/stats/libqsturng/qsturng_.py	`89.05% <0%> (+0.99%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7a33ef0...24a7977. Read the comment docs.

shangyian · 2019-02-26T11:58:03Z

Here's a Jupyter Notebook that shows what the various control charts look like in practice: https://gist.github.com/shangyian/041128898e7a6a010f2efe336ebfdd09

The tests were all verified using two R packages: https://cran.r-project.org/web/packages/mvdalab/ and https://cran.r-project.org/web/packages/qcc/

coveralls · 2019-02-26T12:49:25Z

Coverage increased (+0.07%) to 84.617% when pulling 24a7977 on shangyian:control_charts into 7a33ef0 on statsmodels:master.

…rol_charts

shangyian · 2019-02-27T21:22:13Z

@josef-pkt let me know if the changes look okay!

I'm not sure why the AppVeyor build failed - doesn't seem related to any changes in this branch. I'll look into it.

josef-pkt · 2019-02-27T21:53:49Z

I looked through parts of your changes earlier today.
I think it looks good overall, but there are several design decisions that I'm not sure about, and need to think more or try them out.
The main design decision that surprised me is to have the out of sample, future, data in the __init__. I can see some usecases for that, but I thought the main usecase would be to add future at a later stage.
Given that you allow that, it might be fine this way.
The main change is that additing new observations to the attached data should be optional (i.e. needs a keyword.)

The second surprise was the distinction between phase1 and phase2 limits, but then I saw that that was already in my comments on multivariate mean chart.

(So, I have to think more in order to see if the design has problems or not.)

Two github comments:
You dropped my initial commit, so we don't see what the changes compared to my version are. Although, there are so many changes and moving of code, that the changeset wouldn't be very informative anyway.

Second, don't merge statsmodels master into your feature branch, that makes the github commit history a bit messy. If you need to, then rebase on master. We usually rebase a feature branch when it falls too far behind master.

shangyian · 2019-02-27T23:36:25Z

Yeah, I wasn't sure if it was necessary to have the in & out sample points in init, but I figured keeping some flexibility there would be good.

Another thing that came to mind - the phase 1 vs 2 structure is slightly confusing - technically we should be able to update both phase 1 and phase 2 with more data, but the current setup only allows for updating (appending) phase 2 with new data points. Do you think this is a use case that may prove necessary?

Sorry about the commit overwrite! I was trying to clean up a bunch of commits I made (almost all of them related to linting), but messed up the history there. I'll keep that in mind about rebasing vs merging.

josef-pkt · 2019-02-28T00:14:15Z

about updating the stored data:
I have some comments about it in my initial code or issue.

usage without updating: this is for example needed if we want to simulate different phase 2 samples and check the behavior, e.g. time to signal, assuming all start with the same phase 1 sample or estimated parameters
We want to update the phase 2 sample if we do online monitoring and want to add new observations to an existing plot, or more precisely make a new plot with the an extended sample - this is what you have right now, AFAICS
We want to update the phase 1 sample if we want to re-estimate the parameters with a sample that has newly arrived. - this is what my update method does.
Because we are back to phase 1, this requires that we handle out-of-control observations e.g. by only adding "clean" new observations (that are supposed to be in-control)

A variation of this will be in self-updating or continuously updating control charts, but we might not want to worry about those for now.

josef-pkt · 2019-02-28T02:12:18Z

another use case that I don't know yet how we should handle it: using different statistics in phase 1 and phase 2.
My guess is that the best would be to work with two separate chart instances for phase 1 and phase 1.

example: I was reading/skimming some of the articles on robust estimators in phase 1. My impression (given that I did not confirm all the small print) is that they use the same robust estimators (median, trimmed mean, or mad, iqr for variance) also in phase 2.
However, I don't think that is necessary, i.e. we can take a robust estimator for mean and variance in phase 1, but use a standard mean, variance chart in phase 2 using the robust parameter estimates.
For standard charts all we need are the upp and low limits, so we can simulate phase 2 cases by just taking some (robust) estimates from phase 1 as given.

shangyian · 2019-02-28T13:15:01Z

We want to update the phase 1 sample if we want to re-estimate the parameters with a sample that has newly arrived. - this is what my update method does.
Because we are back to phase 1, this requires that we handle out-of-control observations e.g. by only adding "clean" new observations (that are supposed to be in-control)

Good point. The ability to update phase 1 data is what's missing from my current implementation. And the need to remove out-of-control observations from phase 1. I'll try to add that later today.

another use case that I don't know yet how we should handle it: using different statistics in phase 1 and phase 2.

With the way that it's currently implemented, I believe we can just add a different set of statistics in _ControlChartDataHolder, similar to how there are currently different upper and lower boundaries depending on the phase. I think we can postpone the implementation of features to support online updates for control charts to a different PR.

josef-pkt · 2022-11-26T12:59:06Z

statsmodels/stats/control_chart.py

+                                                  distr,
+                                                  alpha)
+        self.center = self.endog.mean(0)
+        self.std = self.endog.std(0, ddof=1)


this needs to go into a method that can be overwritten by subclasses

josef-pkt · 2022-11-26T13:17:08Z

skimming the code again:

code looks good overall, docstrings don't follow numpy standard and are incomplete
We need attribute chart(s), Poisson, Binomial, as subclasses to see whether we need changes for inheritance in base class and/or mean class. (#8533 new design issue)
Also, xbar/batches chart as subclass.
Attribute and xbar/batches chart will need nobs/exposure for each observation as extra argument. Do we use WLS, weighted statistics for xbar?
(I need to run more examples to see how the details like updating works.

fitting on historical data assumes that historical data is clean (no out of control observations)
I guess we can add a fit method to update the statistics.
Class is and will remain stateful and include "model" and "results" info. So, fit could update the estimate on historical data using more options for robust fit.
I did not compare details with my old PR.

bump for 0.15
I think we will be able to merge it with only smaller changes.

detail:
rename out -> outoc or to something similar, outside ? for out-of-control candidates
"out" sounds too much like out array in numpy (name also in my old notebook)
Those are just signaled as out-of-control, they are not observations verified as out-of-control

josef-pkt · 2022-11-26T13:51:57Z

AFAICS, there is no method to update the historical data
update and concat_data_and_index is only used for future data

shangyian mentioned this pull request Feb 19, 2019

ENH: Control Charts #4191

Open

josef-pkt added type-enh comp-stats labels Feb 19, 2019

josef-pkt added this to the 0.11 milestone Feb 19, 2019

Adding control charts basics

6f0af90

STY: Improving comments. Fixing lint and codacy issues.

de0fde3

shangyian force-pushed the control_charts branch from 8b698c0 to de0fde3 Compare February 27, 2019 14:14

shangyian added 2 commits February 27, 2019 16:21

Merge branch 'master' of github.com:statsmodels/statsmodels into cont…

747412c

…rol_charts

Fixing lint errors

24a7977

josef-pkt modified the milestones: 0.11, josef Oct 5, 2019

josef-pkt mentioned this pull request May 6, 2020

SUMM: organizing basic multivariate stats #6689

Open

josef-pkt added this to review/finalize in comp-stats-hypo May 10, 2020

josef-pkt reviewed Nov 26, 2022

View reviewed changes

josef-pkt modified the milestones: josef, 0.15 Nov 26, 2022

josef-pkt mentioned this pull request Nov 26, 2022

SUMM: roadmap for 0.15 josef #8217

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP ENH: Add stats.control_charts #5506

WIP ENH: Add stats.control_charts #5506

shangyian commented Feb 19, 2019

josef-pkt commented Feb 19, 2019

codecov bot commented Feb 26, 2019

codecov bot commented Feb 26, 2019 •

edited

shangyian commented Feb 26, 2019

coveralls commented Feb 26, 2019 •

edited

shangyian commented Feb 27, 2019

josef-pkt commented Feb 27, 2019

shangyian commented Feb 27, 2019

josef-pkt commented Feb 28, 2019

josef-pkt commented Feb 28, 2019 •

edited

shangyian commented Feb 28, 2019

josef-pkt Nov 26, 2022

josef-pkt commented Nov 26, 2022 •

edited

josef-pkt commented Nov 26, 2022

WIP ENH: Add stats.control_charts #5506

Are you sure you want to change the base?

WIP ENH: Add stats.control_charts #5506

Conversation

shangyian commented Feb 19, 2019

josef-pkt commented Feb 19, 2019

codecov bot commented Feb 26, 2019

Codecov Report

codecov bot commented Feb 26, 2019 • edited

Codecov Report

shangyian commented Feb 26, 2019

coveralls commented Feb 26, 2019 • edited

shangyian commented Feb 27, 2019

josef-pkt commented Feb 27, 2019

shangyian commented Feb 27, 2019

josef-pkt commented Feb 28, 2019

josef-pkt commented Feb 28, 2019 • edited

shangyian commented Feb 28, 2019

josef-pkt Nov 26, 2022

Choose a reason for hiding this comment

josef-pkt commented Nov 26, 2022 • edited

josef-pkt commented Nov 26, 2022

codecov bot commented Feb 26, 2019 •

edited

coveralls commented Feb 26, 2019 •

edited

josef-pkt commented Feb 28, 2019 •

edited

josef-pkt commented Nov 26, 2022 •

edited