Add nbconvert script to make the example files into .py files #65

tms-bananaquit · 2022-07-08T20:07:01Z

Fixes #64 . Adds docs/source/examples/convert_notebooks.py to convert the notebooks to .py scripts in the right directory.

The intent for this one is just to add a script that converts everything in /docs/source/examples into .py files in /examples/, so that we don't have to do double-maintenance on them, as has already come up in the last two PRs.

Resolve the RTD issue
~~Markdown cells get thrown out. Can we include an nbconvert preprocessor s.t. the raw markdown is included as comments?~~
- Nope.

tms-bananaquit · 2022-07-08T20:32:22Z

I added a function that finds the location of the Circle data for the examples, to resolve the problem where the notebooks' and the .py files' relative directories differ. This is done by finding the root of the git repo, so it works fine locally, since that's where I run out of -- but of course that fails on RTD, when nbsphinx tries to do the conversion out of its temporary directory:

FileNotFoundError: [Errno 2] No such file or directory: '/home/docs/checkouts/readthedocs.org/user_builds/menelaus/checkouts/64-nbconvert-scripts/docs/source/examples/change_detection/src/menelaus/datasets/dataCircleGSev3Sp3Train.csv'

@Anmol-Srivastava Did you ever resolve the bit with the absolute paths before we switched away from it as a solution?

tms-bananaquit · 2022-07-08T20:56:43Z

~~I don't think this is going to work until we get a version of the streaming data that's generated live, rather than read from a file.~~

Of course, we had it working on dev, so I could implement a hideous switch statement that just tries the paths that we actually use until one of them exists. This is not a good solution, but we can use it in the meantime to avoid having to make these tweaks repeatedly in two places.

As it turns out, my original solution did work, but for a fatfingering of the syntax. Switching back resolves the local stuff and RTD. 🙃

Anmol-Srivastava

Minor optional comments, and one point about the find_git_root location.

README.md

docs/source/examples/convert_notebooks.py

src/menelaus/datasets/make_example_data.py

tests/menelaus/test_example_data.py

Anmol-Srivastava

One last thing, I think _locate also ended up in /src/menelaus somehow on top of being in utils

* Bring main and dev in sync for new GH workflow * Sync main and dev * Enable linting in pipeline (#34) configure and run flake8 Co-authored-by: Thomas Schill * 32 enhance sphinx bibtext (#35) * fix typo * add refs.bib, update citations Co-authored-by: Thomas Schill * Update README.rst * Update setup.cfg * Update setup.cfg * Update setup.cfg * Remove wheel dependency. Including it didn't speed up the pipeline or squash the complaint from sklearn. * kdqTree: Added compatibility with pandas dataframes (#40) Transferred changes for issue #133 on kdqtree pandas from GL to GH Co-authored-by: Shashank Jarmale * Merge unit tests for example materials (#43) * add simple copy of main yml towards #8 * add environment variable for script tester * add script tester towards #8 * limit job to example scripts * use abspath for executing from tests/ * split src / example tests, reorganize test directories * revert to simpler file refs, edit workflow to cd into correct dir * fix typo in job trigger * add notebook tester towards #8 * add venv/kernel steps to allow for nbconvert tests * fix typo towards !43 * clean up lines to close !43 * Add separate coverage test, workflow improvemets (#44) * move coverage to separate workflow, fail under 100 * revert to 1 combined job for test + cov, fail under 99 (#5) * remove tmp branch from workflow * separate lint workflow, add isort * rework black step, remove isort * fix black version * run black, update README towards #44 * add badges to readme * fix badge section * Update .github/workflows/tests.yml * Update README.rst * Update README.rst * Update README.rst * Update README.rst Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * Add unit tests for kdqTree (#47) * Adds more unit tests for kdqTree * add validation unit tests * Add a unit test for KDQTreePartitioner * add reset to set_reference * use ref_data variable properly when drift occurs * Update hdm docs (#50) * updated docs * closes issue 33 * apply black formatting Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * Update README_dataCircleGSev3Sp3Train.txt (#52) * Adds notebook versions of the examples to RTD. (#53) * Started testing out python example notebooks with sphinx * update conf.py to enable pandoc, move examples around * Added example notebooks for data drift detectors * Added example notebooks for modules * remove extra notebooks, fix plotly plots Co-authored-by: Shashank Jarmale Co-authored-by: Shashank Jarmale * add tox to dev install * Fix kdq-tree batch example in documentation example notebook (#54) Duplicating the examples for the purpose of the documentation got us an older version pulled forward that I didn't catch during review of the PR. * Minor updates to constrain Python version used in installation (#55) * add tox python version checks towards #2 * Update .gitignore * fix syntax * add version notes * remove older versions from tox * Update README to include pyenv steps, towards #2 * remove pyenv section Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * Merge new data module and reorganize data files (#56) * add (untested) .python that duplicates make_example_data.R * add TODO items * reorganize tools => datasets towards #38 * further reorganize datasets module, add DataGenerator idea * split DataGenerator idea, and fix bugs in make_example_batch_data * update any example_data.csv script to now use function * consolidate dataset descriptions into one README * debug make_example_data * delete outdated data files towards #38 * remove TODO comment * satisfy formatting requirements * add unit tests and comment out untested code * comment out missing code, add single-line description towards #38 * minor formatting changes to trigger checks * debug unit tests, re-satisfy formatting requirements * update references in docs notebooks, add generator docstring also fixes some whitespace in cdbd.py Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * Merge new streaming, batch ABCs and refactor KdqTree detector (#62) * separate into streaming and batch detector ABCs (#46) * split kdqtree into streaming/batch versions, update tests * finish batch version of kdqtree * begin using multiple inheritance scheme for kdqtree detectors (#46) * establish commonly inherited functionality in new KdqTreeDetector class * establish commonly inherited functionality in new KdqTree detector class (#46) * deconstruct update to enable code reuse in KdqTreeDetector (#46) * debug all failing tests in test_kdqtree (#46) * update __init__, update refs in examples (#46) * update outdated data refs * add any missing docstrings (#46) * format with black * add unit test for new ABC drift setters * updated the data_drift_examples notebook * docstring formatting tweaks * fix typo Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * fix formatting in docstring Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * fix formatting in docstring Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * fix typo Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * fix description in docstring Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * formatting * remove double-documented attributes from docstring * provide useful information in child docstrings * move _drift_counter into KdqTreeStreaming * delete coverage file * toss ref data once processed * format with black Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * Switch to README.md for better rendering on github (#49) * switch to README.md for better rendering on github - removes reference links from table - adds placeholder mermaid flow diagram - makes some tweaks to the README text * update requirements in setup.cfg * test mermaid rendering * Add "Choosing a Detector" page to TOC * tweak README text * add RTD hyperlink * Merge with current dev * remove draft flow diagram * Add CHANGELOG and pypi actions for release. (#51) * add CHANGELOG, yaml * add Action to push to pypi upon published release * change name of workflow * test adding security linter * test bandit linting * comments * alphabetize setup.cfg.test * increment version number * change lint badge name * address comments for main-dev PR 63 * Add nbconvert script to make the example files into .py files (#65) * add a function to fetch the circle data from whatever working dir * add a script to generate .py files from .ipynb docs/source/examples * add coverage test * add unit tests * update README * move find_git_root to utils/_locate, tweak formatting * Add template for benchmarks directory (#69) * setup benchmarks materials (#58) * add details to benchmarking materials (#58) * Implemented Margin Density Drift Detection (MD3) Method (#60) * Created md3 class and started to build out basic detector methods * More development on MD3 implementation Added update and marginal inclusion signal calculation capabilities * MD3 Implementation Progress Added ability to issue drift warning based on change in margin density relative to that of reference distribution. Next step is to implement system to collect labeled samples to confirm that drift is occurring (or rule out drift). Another next step is to work on building out example(s) of MD3, with SVM specifically for now. * Wrapped some calculation lines for clarity * Finished preliminary MD3 implementation Next step is to complete a full working example for MD3 using an SVM. Step after that is to address TODOs in implementation (add compatibility with other types of models, make all user-facing methods intuitive and clear, etc.). * Completed MD3 implementation with oracle labels and retraining Example finished for the most part, probably some debugging to do * Got MD3 example working But currently is not actually detecting drift. Is tracking margin density over the stream correctly, but no drift warnings/detections are being issued. Need to play around with sensitivity threshold for suspecting/confirming drift to see if that's the issue. * MD3 Implementation working + example working * MD3 implementation and example working Still some debugging and cleanup to do. Also have to play around with sensitivity parameter to see what a good default value would be. * Continued updating MD3 implementation and example Few design questions to answer regarding when MD3 warns/resets internal paremeters based on drift confirmation * Moved retraining data green lines to be in the right place start right after warning * Removed some TODOs * Reverted concept drift example back to original version * Resolved some TODOs from PR draft * Changed dataset for MD3 example to new rainfall dataset from India * Added unit tests for MD3, added MD3 to README * Will have to reorganize MD3 example before merging PR * Reformatted md3.py with black * Increased test coverage, reformatted md3.py * Finalized md3 unit tests * Finished some TODOs in MD3 implementation * Added tests for fetch_rainfall_data and formatted make_example_data.py * This commit contains: the completed md3_example.py script, and the example has been added to the concept_drift_examples.ipynb example notebook. md3_example.py will be deleted in the next commit. * Deleted md3_example.py (in concept drift example notebook) * Regenerated example scripts from example notebooks * Removed TODO from md3.py * Update citation Co-authored-by: Shashank Jarmale Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * 68 remove changelog workflow (#71) * drop the changelog yaml * tweaks to docstrings * update documentation for make_example_batch_data * add docstring for fetch_circle_data * update refs.bib * Add citation and description to rainfall data (#73) * tweaks to md3, kdq_tree docstrings * remove "example" from notebook subheadings * remove old datasets README * closes issue #72 * and-delimited authors in citation; fix in-line cite * fix some sphinx build errors * add returns to docstrings Co-authored-by: indialindsay <68126147+indialindsay@users.noreply.github.com> * fix a typo preventing doc build * Refactor inputs to update and set_reference functions (#75) * refactor ABCs to have common signature in update (#15) * refactor change detectors to have common signature in update (#15) * refactor data drift detectors to have common signature in update (#15) * remove obs_id from PageHinkley * refactor concept drift detectors to have common update signature (#15) * refactor batch detectors to have common set_reference signature (#15) * update examples, README, ADWIN with function signature changes (#15) * modify adwin unit tests for new update sig * add int cast to ADWIN input check * fix formatting with black * fix outdated udpate function in stepd * debug MD3, modify docstring for find_git_root * improve formatting and test coverage * update example notebooks, regenerate scripts * make sure README example is functional * resolve minor issue in PCACD Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * Reorganize tests directory to mirror src/ structure (#79) reorganize tests directory (#78) * Make concept_drift.ADWIN a child of change_detection.ADWIN (#83) * make concept_drift.ADWIN a child of change_detection.ADWIN * increment version, update eddm docstring * rename new ADWIN, add unit test * update README ADWIN example * remove input_type * add redundant docstrings, unit tests for sphinx * fix import statement * move convert_notebooks.py to utils * move lfr.round_val to init * add change_detection.ADWIN example * update nb description * Revert "move convert_notebooks.py to utils" This reverts commit fe05b78. * add original accuracy calculation to ADWIN example * regenerate .py examples * update rainfall unit test * tweaks to examples * rename new class * 84 remove redundant docstrings (#85) * make garbage input for unit test more garbagey * add :inherited-members: option to docs * update README * remove redundant unit tests for cdbd, hdddm * remove drift_state from child signatures * remove drift_state from adwin_outcome * make the other class attribs into properties for consistency's sake * more docstring cleanup * added groupwise member order to conf.py * Add accuracy calculations to the example notebook plots for concept drift detectors (#88) * add running accuracy to concept drift examples * Draft: Add Ensemble for Batch Detectors (#90) * begin sketching BatchEnsembler (#77) * sketch Ensembler, BatchEnsembler using toolz.pipe * add more scratch work re: ensemblers and pipelines (#77) * significantly simplify batch ensembling, add data robustness (#77) * finish getting ensembler to execute set_ref/update for all batch dets (#77) * add simple majority evaluator, make ensembler fully functional (#77) * cleanup for PR !90, begin adding tests * add more unit tests for batch ensemble * add docs for new ensemble code * formatting with black/bandit * Replace notebook examples workflow (#92) * update workflow yml * update docstrings for #91 * Add validation to StreamingDetector (#95) * add y-validation * add X-validation * kdq_tree tweaks * switch concept drift detectors to StreamingDetector * switch change detectors to StreamingDetector * switch PCA-CD to StreamingDetector * 96 batch validation (#97) * add batch validation * remove redundant validation * switch to deepcopy for set_reference * Add streaming ensemble (#101) * add initial draft of StreamingEnsemble, move set_reference into BatchEnsemble * make stream ensemble run for every data/concept detector except MD3 (#89) * add minimim-approval and confirmed-approval evaluators (#89) * temporary fix for univariate detectors (#89) * add unit tests, begin debugging KdqTree in batch case (#89) * debug kdqtree batch ensemble unit test * fix and test confirmed approval evaluator * evaluators are now just functions * evaluators can now be str or function * Revert "evaluators can now be str or function" This reverts commit d4d45de. Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * 99 ensemble quickstart (#102) * sketch new readme examples (#99) * less wordy version of quickstart steps (#99) * reduce lines * further cleanup * spacing? * wording changes * asserts instead of print (downside: example exits with error) * Update README.md Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * changed references for phtest and cusum (#108) * Reshape input as part of validation. (#107) * add validation to concept drift, change detectors, data drift detectors * new unit tests * update docstrings * Change column specification to column selectors (#109) * initial change to column selectors (#104) * intermediate push before merging * debug input issues * security check - remove assert * Rename ADWIN concept drift detector (#111) * rename ADWINOutcome to ADWINAccuracy * update coverage badge text * docstring tweaks * Add Maciel Election, refactor other elections (#113) * initial switch to evaluator class (#100) * debug new evaluators * emergency commit for viewing * add maciel election; change evaluator to election (#100) * det list now passed by ensemble to election call (#100) * add passing tests (#100) * formatting, coverage fixes * edit list comprehension for conciseness Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * changes to satify PR comments * remove question comments * add citations * add a bullet on ensembles to README * rename tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * rename tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * rename maciel tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * rename maciel tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * update ensemble table portion Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * update ensemble table portion Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * rename maciel tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * rename maciel tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * rename min approval tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * rename min approval tests Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * Update README.md * Finish ensemble documentation (#114) * cannot test, but minor adjustments to ensemble docs, setup files * add ensemble examples * docstring fixes, add notebook to index.rst * format with black * better example in notebook * fix missing booktitle in Maciel * tweaked some narration in the example * add comment for sphinx freeze Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * shorten wording in docstring Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * formatting for sphinx Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * first pass at addressing remaining PR comments * final formatting changes to docstrings * directory change * use better file-finding method in make_example_data * Update src/menelaus/ensemble/ensemble.py * Update src/menelaus/ensemble/ensemble.py Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * 105 rename src (#116) * move /menelaus/ up a directory, remove /src/ * rename drift_detector.py to detector.py * remove erroneous pytest call from make_example_data * add references page to index * move deprecation note to DriftDetector docstring * README tweak * 118 dummy out validation (#119) * pass None instead of X where appropriate * dummy out unused args for change detector validation * dummy out validation in drift detectors * black formatting * tweak narration in ensemble notebook * fix typo, comments * Add fixed NNDVI (#124) * Create NNSpacePartitioner.py * Add NNSP to partitioner __init__.py * Create nndvi.py to set up debugging in #36 * add nndvi example to debug #36 * potential fix for build and dissimilarity * change drift threshold computation - the current implementation uses a random permutation for both new sets - which means they are not mutually exclusive. This means the numerator in compute_nnps_distance can contain 0's. - making the second set the reverse of the first fixes this. * change NNDVI to BatchDetector * add nndvi unit tests * add unit tests for NNSP * formatting with black * add sklearn to cfg * remove assert in nndvi for security * formatting with black * add a few comments * modify scikit-learn import for pipeline * add nndvi tests for better coverage * remove warning statement in outdated if clause * == to = fixes coverage bug * formatting with black * validation / other PR changes * fold nndvi example into notebook * replace length calculation, comments * tweaks to example notebook * add comments to NN-DVI example Co-authored-by: Thomas Schill <33845624+tms-bananaquit@users.noreply.github.com> * update CHANGELOG * remove benchmarks folder Co-authored-by: Shashank Jarmale Co-authored-by: Shashank Jarmale Co-authored-by: Anmol Srivastava Co-authored-by: India Lindsay <68126147+indialindsay@users.noreply.github.com>

tms-bananaquit added 5 commits July 8, 2022 15:32

add a function to fetch the circle data from whatever working dir

87ff3fa

add a script to generate .py files from .ipynb docs/source/examples

5296756

troubleshooting

e7a99ad

troubleshooting

85f47ec

troubleshooting

199b462

add a hack for RTD

1549450

tms-bananaquit added 5 commits July 8, 2022 17:07

try another hack for RTD

946f29b

hard-code Circle data filepaths

d497938

check whether the missing "," was the culprit for RTD

a47ff99

add coverage test

ddd9654

add unit tests

4e83c1f

tms-bananaquit marked this pull request as ready for review July 8, 2022 22:02

tms-bananaquit added 3 commits July 8, 2022 18:08

update README

98d30e1

nitpick

0aedaad

nitpick

c949d5c

tms-bananaquit requested a review from Anmol-Srivastava July 8, 2022 22:19

Anmol-Srivastava reviewed Jul 9, 2022

View reviewed changes

README.md Show resolved Hide resolved

docs/source/examples/convert_notebooks.py Outdated Show resolved Hide resolved

src/menelaus/datasets/make_example_data.py Outdated Show resolved Hide resolved

tests/menelaus/test_example_data.py Outdated Show resolved Hide resolved

tms-bananaquit added 2 commits July 11, 2022 10:00

tweak README

3f6ed68

move find_git_root to utils/_locate, tweak formatting

9cbd42a

Anmol-Srivastava reviewed Jul 11, 2022

View reviewed changes

tms-bananaquit mentioned this pull request Jul 11, 2022

Replace test_example_notebooks with a call to convert_notebooks? #66

Closed

remove duplicate _locate.py

eabad66

Anmol-Srivastava previously approved these changes Jul 11, 2022

View reviewed changes

nitpick

02754f2

tms-bananaquit dismissed Anmol-Srivastava’s stale review via 02754f2 July 11, 2022 14:46

Anmol-Srivastava approved these changes Jul 11, 2022

View reviewed changes

tms-bananaquit merged commit 9780d55 into dev Jul 11, 2022

tms-bananaquit deleted the 64-nbconvert-scripts branch July 11, 2022 14:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add nbconvert script to make the example files into .py files #65

Add nbconvert script to make the example files into .py files #65

tms-bananaquit commented Jul 8, 2022 •

edited

tms-bananaquit commented Jul 8, 2022

tms-bananaquit commented Jul 8, 2022 •

edited

Anmol-Srivastava left a comment

Anmol-Srivastava left a comment

Add nbconvert script to make the example files into .py files #65

Add nbconvert script to make the example files into .py files #65

Conversation

tms-bananaquit commented Jul 8, 2022 • edited

tms-bananaquit commented Jul 8, 2022

tms-bananaquit commented Jul 8, 2022 • edited

Anmol-Srivastava left a comment

Choose a reason for hiding this comment

Anmol-Srivastava left a comment

Choose a reason for hiding this comment

tms-bananaquit commented Jul 8, 2022 •

edited

tms-bananaquit commented Jul 8, 2022 •

edited