Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add nbconvert script to make the example files into .py files #65

Merged
merged 18 commits into from
Jul 11, 2022

Conversation

tms-bananaquit
Copy link
Collaborator

@tms-bananaquit tms-bananaquit commented Jul 8, 2022

Fixes #64 . Adds docs/source/examples/convert_notebooks.py to convert the notebooks to .py scripts in the right directory.

The intent for this one is just to add a script that converts everything in /docs/source/examples into .py files in /examples/, so that we don't have to do double-maintenance on them, as has already come up in the last two PRs.

  • Resolve the RTD issue
  • Markdown cells get thrown out. Can we include an nbconvert preprocessor s.t. the raw markdown is included as comments?
    • Nope.

@tms-bananaquit
Copy link
Collaborator Author

I added a function that finds the location of the Circle data for the examples, to resolve the problem where the notebooks' and the .py files' relative directories differ. This is done by finding the root of the git repo, so it works fine locally, since that's where I run out of -- but of course that fails on RTD, when nbsphinx tries to do the conversion out of its temporary directory:

FileNotFoundError: [Errno 2] No such file or directory: '/home/docs/checkouts/readthedocs.org/user_builds/menelaus/checkouts/64-nbconvert-scripts/docs/source/examples/change_detection/src/menelaus/datasets/dataCircleGSev3Sp3Train.csv'

@Anmol-Srivastava Did you ever resolve the bit with the absolute paths before we switched away from it as a solution?

@tms-bananaquit
Copy link
Collaborator Author

tms-bananaquit commented Jul 8, 2022

I don't think this is going to work until we get a version of the streaming data that's generated live, rather than read from a file.

Of course, we had it working on dev, so I could implement a hideous switch statement that just tries the paths that we actually use until one of them exists. This is not a good solution, but we can use it in the meantime to avoid having to make these tweaks repeatedly in two places.

As it turns out, my original solution did work, but for a fatfingering of the syntax. Switching back resolves the local stuff and RTD. 🙃

@tms-bananaquit tms-bananaquit marked this pull request as ready for review July 8, 2022 22:02
Copy link
Contributor

@Anmol-Srivastava Anmol-Srivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor optional comments, and one point about the find_git_root location.

README.md Show resolved Hide resolved
docs/source/examples/convert_notebooks.py Outdated Show resolved Hide resolved
src/menelaus/datasets/make_example_data.py Outdated Show resolved Hide resolved
tests/menelaus/test_example_data.py Outdated Show resolved Hide resolved
Copy link
Contributor

@Anmol-Srivastava Anmol-Srivastava left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last thing, I think _locate also ended up in /src/menelaus somehow on top of being in utils

@tms-bananaquit tms-bananaquit merged commit 9780d55 into dev Jul 11, 2022
@tms-bananaquit tms-bananaquit deleted the 64-nbconvert-scripts branch July 11, 2022 14:54
tms-bananaquit added a commit that referenced this pull request Dec 8, 2022
* Bring main and dev in sync for new GH workflow

* Sync main and dev

* Enable linting in pipeline (#34)

configure and run flake8

Co-authored-by: Thomas Schill

* 32 enhance sphinx bibtext (#35)

* fix typo

* add refs.bib, update citations

Co-authored-by: Thomas Schill 

* Update README.rst

* Update setup.cfg

* Update setup.cfg

* Update setup.cfg

* Remove wheel dependency.

Including it didn't speed up the pipeline or squash the complaint from sklearn.

* kdqTree: Added compatibility with pandas dataframes (#40)

Transferred changes for issue #133 on kdqtree pandas from GL to GH

Co-authored-by: Shashank Jarmale

* Merge unit tests for example materials (#43)

* add simple copy of main yml towards #8

* add environment variable for script tester

* add script tester towards #8

* limit job to example scripts

* use abspath for executing from tests/

* split src / example tests, reorganize test directories

* revert to simpler file refs, edit workflow to cd into correct dir

* fix typo in job trigger

* add notebook tester towards #8

* add venv/kernel steps to allow for nbconvert tests

* fix typo towards !43

* clean up lines to close !43

* Add separate coverage test, workflow improvemets (#44)

* move coverage to separate workflow, fail under 100

* revert to 1 combined job for test + cov, fail under 99 (#5)

* remove tmp branch from workflow

* separate lint workflow, add isort

* rework black step, remove isort

* fix black version

* run black, update README towards #44

* add badges to readme

* fix badge section

* Update .github/workflows/tests.yml

* Update README.rst

* Update README.rst

* Update README.rst

* Update README.rst

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* Add unit tests for kdqTree (#47)

* Adds more unit tests for kdqTree

* add validation unit tests

* Add a unit test for KDQTreePartitioner

* add reset to set_reference

* use ref_data variable properly when drift occurs

* Update hdm docs (#50)

* updated docs

* closes issue 33

* apply black formatting

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* Update README_dataCircleGSev3Sp3Train.txt (#52)

* Adds notebook versions of the examples to RTD. (#53)

* Started testing out python example notebooks with sphinx

* update conf.py to enable pandoc, move examples around

* Added example notebooks for data drift detectors

* Added example notebooks for modules

* remove extra notebooks, fix plotly plots

Co-authored-by: Shashank Jarmale
Co-authored-by: Shashank Jarmale

* add tox to dev install

* Fix kdq-tree batch example in documentation example notebook (#54)

Duplicating the examples for the purpose of the documentation got us an older version pulled forward that I didn't catch during review of the PR.

* Minor updates to constrain Python version used in installation (#55)

* add tox python version checks towards #2

* Update .gitignore

* fix syntax

* add version notes

* remove older versions from tox

* Update README to include pyenv steps, towards #2

* remove pyenv section

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* Merge new data module and reorganize data files (#56)

* add (untested) .python that duplicates make_example_data.R

* add TODO items

* reorganize tools => datasets towards #38

* further reorganize datasets module, add DataGenerator idea

* split DataGenerator idea, and fix bugs in make_example_batch_data

* update any example_data.csv script to now use function

* consolidate dataset descriptions into one README

* debug make_example_data

* delete outdated data files towards #38

* remove TODO comment

* satisfy formatting requirements

* add unit tests and comment out untested code

* comment out missing code, add single-line description towards #38

* minor formatting changes to trigger checks

* debug unit tests, re-satisfy formatting requirements

* update references in docs notebooks, add generator docstring
 also fixes some whitespace in cdbd.py

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* Merge new streaming, batch ABCs and refactor KdqTree detector (#62)

* separate into streaming and batch detector ABCs (#46)

* split kdqtree into streaming/batch versions, update tests

* finish batch version of kdqtree

* begin using multiple inheritance scheme for kdqtree detectors (#46)

* establish commonly inherited functionality in new KdqTreeDetector class

* establish commonly inherited functionality in new KdqTree detector class (#46)

* deconstruct update to enable code reuse in KdqTreeDetector (#46)

* debug all failing tests in test_kdqtree (#46)

* update __init__, update refs in examples (#46)

* update outdated data refs

* add any missing docstrings (#46)

* format with black

* add unit test for new ABC drift setters

* updated the data_drift_examples notebook

* docstring formatting tweaks

* fix typo

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* fix formatting in docstring

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* fix formatting in docstring

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* fix typo

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* fix description in docstring

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* formatting

* remove double-documented attributes from docstring

* provide useful information in child docstrings

* move _drift_counter into KdqTreeStreaming

* delete coverage file

* toss ref data once processed

* format with black

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* Switch to README.md for better rendering on github (#49)

* switch to README.md for better rendering on github
 - removes reference links from table
 - adds placeholder mermaid flow diagram
 - makes some tweaks to the README text

* update requirements in setup.cfg

* test mermaid rendering

* Add "Choosing a Detector" page to TOC

* tweak README text

* add RTD hyperlink

* Merge with current dev

* remove draft flow diagram

* Add CHANGELOG and pypi actions for release.  (#51)

* add CHANGELOG, yaml

* add Action to push to pypi upon published release

* change name of workflow

* test adding security linter

* test bandit linting

* comments

* alphabetize setup.cfg.test

* increment version number

* change lint badge name

* address comments for main-dev PR 63

* Add nbconvert script to make the example files into .py files (#65)

* add a function to fetch the circle data from whatever working dir

* add a script to generate .py files from .ipynb docs/source/examples

* add coverage test

* add unit tests

* update README

* move find_git_root to utils/_locate, tweak formatting

* Add template for benchmarks directory (#69)

* setup benchmarks materials (#58)

* add details to benchmarking materials (#58)

* Implemented Margin Density Drift Detection (MD3) Method (#60)

* Created md3 class and started to build out basic detector methods

* More development on MD3 implementation

Added update and marginal inclusion signal calculation capabilities

* MD3 Implementation Progress

Added ability to issue drift warning based on change in margin density
relative to that of reference distribution.
Next step is to implement system to collect labeled samples to confirm
that drift is occurring (or rule out drift).
Another next step is to work on building out example(s) of MD3, with SVM
specifically for now.

* Wrapped some calculation lines for clarity

* Finished preliminary MD3 implementation

Next step is to complete a full working example for MD3 using an SVM.
Step after that is to address TODOs in implementation (add compatibility
with other types of models, make all user-facing methods intuitive and
clear, etc.).

* Completed MD3 implementation with oracle labels and retraining

Example finished for the most part, probably some debugging to do

* Got MD3 example working

But currently is not actually detecting drift. Is tracking margin
density over the stream correctly, but no drift warnings/detections are
being issued. Need to play around with sensitivity threshold for
suspecting/confirming drift to see if that's the issue.

* MD3 Implementation working + example working

* MD3 implementation and example working

Still some debugging and cleanup to do. Also have to play around with
sensitivity parameter to see what a good default value would be.

* Continued updating MD3 implementation and example

Few design questions to answer regarding when MD3 warns/resets internal
paremeters based on drift confirmation

* Moved retraining data green lines to be in the right place

start right after warning

* Removed some TODOs

* Reverted concept drift example back to original version

* Resolved some TODOs from PR draft

* Changed dataset for MD3 example to new rainfall dataset from India

* Added unit tests for MD3, added MD3 to README

* Will have to reorganize MD3 example before merging PR

* Reformatted md3.py with black

* Increased test coverage, reformatted md3.py

* Finalized md3 unit tests

* Finished some TODOs in MD3 implementation

* Added tests for fetch_rainfall_data and formatted make_example_data.py

* This commit contains:

the completed md3_example.py script, and the example has been added to
the concept_drift_examples.ipynb example notebook. md3_example.py will
be deleted in the next commit.

* Deleted md3_example.py (in concept drift example notebook)

* Regenerated example scripts from example notebooks

* Removed TODO from md3.py

* Update citation

Co-authored-by: Shashank Jarmale
Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* 68 remove changelog workflow (#71)

* drop the changelog yaml

* tweaks to docstrings

* update documentation for make_example_batch_data

* add docstring for fetch_circle_data

* update refs.bib

* Add citation and description to rainfall data (#73)

* tweaks to md3, kdq_tree docstrings

* remove "example" from notebook subheadings

* remove old datasets README

* closes issue #72

* and-delimited authors in citation; fix in-line cite

* fix some sphinx build errors

* add returns to docstrings

Co-authored-by: indialindsay <68126147+indialindsay@users.noreply.github.com>

* fix a typo preventing doc build

* Refactor inputs to update and set_reference functions (#75)

* refactor ABCs to have common signature in update (#15)

* refactor change detectors to have common signature in update (#15)

* refactor data drift detectors to have common signature in update (#15)

* remove obs_id from PageHinkley

* refactor concept drift detectors to have common update signature (#15)

* refactor batch detectors to have common set_reference signature (#15)

* update examples, README, ADWIN with function signature changes (#15)

* modify adwin unit tests for new update sig

* add int cast to ADWIN input check

* fix formatting with black

* fix outdated udpate function in stepd

* debug MD3, modify docstring for find_git_root

* improve formatting and test coverage

* update example notebooks, regenerate scripts

* make sure README example is functional

* resolve minor issue in PCACD

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* Reorganize tests directory to mirror src/ structure (#79)

reorganize tests directory (#78)

* Make concept_drift.ADWIN a child of change_detection.ADWIN (#83)

* make concept_drift.ADWIN a child of change_detection.ADWIN

* increment version, update eddm docstring

* rename new ADWIN, add unit test

* update README ADWIN example

* remove input_type

* add redundant docstrings, unit tests for sphinx

* fix import statement

* move convert_notebooks.py to utils

* move lfr.round_val to init

* add change_detection.ADWIN example

* update nb description

* Revert "move convert_notebooks.py to utils"

This reverts commit fe05b78.

* add original accuracy calculation to ADWIN example

* regenerate .py examples

* update rainfall unit test

* tweaks to examples

* rename new class

* 84 remove redundant docstrings (#85)

* make garbage input for unit test more garbagey

* add :inherited-members: option to docs

* update README

* remove redundant unit tests for cdbd, hdddm

* remove drift_state from child signatures

* remove drift_state from adwin_outcome

* make the other class attribs into properties for consistency's sake

* more docstring cleanup

* added groupwise member order to conf.py

* Add accuracy calculations to the example notebook plots for concept drift detectors (#88)

* add running accuracy to concept drift examples

* Draft: Add Ensemble for Batch Detectors (#90)

* begin sketching BatchEnsembler (#77)

* sketch Ensembler, BatchEnsembler using toolz.pipe

* add more scratch work re: ensemblers and pipelines (#77)

* significantly simplify batch ensembling, add data robustness (#77)

* finish getting ensembler to execute set_ref/update for all batch dets (#77)

* add simple majority evaluator, make ensembler fully functional (#77)

* cleanup for PR !90, begin adding tests

* add more unit tests for batch ensemble

* add docs for new ensemble code

* formatting with black/bandit

* Replace notebook examples workflow (#92)

* update workflow yml

* update docstrings for #91

* Add validation to StreamingDetector (#95)

* add y-validation

* add X-validation

* kdq_tree tweaks

* switch concept drift detectors to StreamingDetector

* switch change detectors to StreamingDetector

* switch PCA-CD to StreamingDetector

* 96 batch validation (#97)

* add batch validation

* remove redundant validation

* switch to deepcopy for set_reference

* Add streaming ensemble (#101)

* add initial draft of StreamingEnsemble, move set_reference into BatchEnsemble

* make stream ensemble run for every data/concept detector except MD3 (#89)

* add minimim-approval and confirmed-approval evaluators (#89)

* temporary fix for univariate detectors (#89)

* add unit tests, begin debugging KdqTree in batch case (#89)

* debug kdqtree batch ensemble unit test

* fix and test confirmed approval evaluator

* evaluators are now just functions

* evaluators can now be str or function

* Revert "evaluators can now be str or function"

This reverts commit d4d45de.

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* 99 ensemble quickstart (#102)

* sketch new readme examples (#99)

* less wordy version of quickstart steps (#99)

* reduce lines

* further cleanup

* spacing?

* wording changes

* asserts instead of print (downside: example exits with error)

* Update README.md

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* changed references for phtest and cusum (#108)

* Reshape input as part of validation. (#107)

* add validation to concept drift, change detectors, data drift detectors
* new unit tests
* update docstrings

* Change column specification to column selectors (#109)

* initial change to column selectors (#104)

* intermediate push before merging

* debug input issues

* security check - remove assert

* Rename ADWIN concept drift detector (#111)

* rename ADWINOutcome to ADWINAccuracy

* update coverage badge text

* docstring tweaks

* Add Maciel Election, refactor other elections (#113)

* initial switch to evaluator class (#100)

* debug new evaluators

* emergency commit for viewing

* add maciel election; change evaluator to election (#100)

* det list now passed by ensemble to election call (#100)

* add passing tests (#100)

* formatting, coverage fixes

* edit list comprehension for conciseness

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* changes to satify PR comments

* remove question comments

* add citations

* add a bullet on ensembles to README

* rename tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* rename tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* rename maciel tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* rename maciel tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* update ensemble table portion

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* update ensemble table portion

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* rename maciel tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* rename maciel tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* rename min approval tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* rename min approval tests

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* Update README.md

* Finish ensemble documentation (#114)

* cannot test, but minor adjustments to ensemble docs, setup files

* add ensemble examples

* docstring fixes, add notebook to index.rst

* format with black

* better example in notebook

* fix missing booktitle in Maciel

* tweaked some narration in the example

* add comment for sphinx freeze

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* shorten wording in docstring

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* formatting for sphinx

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* first pass at addressing remaining PR comments

* final formatting changes to docstrings

* directory change

* use better file-finding method in make_example_data

* Update src/menelaus/ensemble/ensemble.py

* Update src/menelaus/ensemble/ensemble.py

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* 105 rename src (#116)

* move /menelaus/ up a directory, remove /src/

* rename drift_detector.py to detector.py

* remove erroneous pytest call from make_example_data

* add references page to index

* move deprecation note to DriftDetector docstring

* README tweak

* 118 dummy out validation (#119)

* pass None instead of X where appropriate

* dummy out unused args for change detector validation

* dummy out validation in drift detectors

* black formatting

* tweak narration in ensemble notebook

* fix typo, comments

* Add fixed NNDVI  (#124)

* Create NNSpacePartitioner.py

* Add NNSP to partitioner __init__.py

* Create nndvi.py to set up debugging in #36

* add nndvi example to debug #36

* potential fix for build and dissimilarity

* change drift threshold computation
- the current implementation uses a random permutation for both new sets - which means they are not mutually exclusive. This means the numerator in compute_nnps_distance can contain 0's.
- making the second set the reverse of the first fixes this.

* change NNDVI to BatchDetector

* add nndvi unit tests

* add unit tests for NNSP

* formatting with black

* add sklearn to cfg

* remove assert in nndvi for security

* formatting with black

* add a few comments

* modify scikit-learn import for pipeline

* add nndvi tests for better coverage

* remove warning statement in outdated if clause

* == to = fixes coverage bug

* formatting with black

* validation / other PR changes

* fold nndvi example into notebook

* replace length calculation, comments

* tweaks to example notebook

* add comments to NN-DVI example

Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com>

* update CHANGELOG

* remove benchmarks folder

Co-authored-by: Shashank Jarmale 
Co-authored-by: Shashank Jarmale
Co-authored-by: Anmol Srivastava
Co-authored-by: India Lindsay <68126147+indialindsay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add nbconvert script to convert the docs notebooks to .py scripts
2 participants