Skip to content

MAINT, ENH: library code/manuscript tracking #1

@tylerjereddy

Description

@tylerjereddy

@nray @eiviani-lanl just migrating this tracking issue to the public repo via copy/paste of current state.

  • finalize the edRVFL PR (DOC, CI: lint weights.py with numpydoc in the CI #26)
  • finalize the regression PR (ENH, TST: Add rtol to classifier and a test. #44)
  • other code changes/additions -- please expand as needed, there will likely be some
  • decide on how to "split off" the library code and make it public while this repo remains private? preservation of git history vs. keeping research plans to ourselves for now, etc. There are various logistical annoyances related to that
  • Add an appropriate project license, considering what we import/link to, what FCI approved, and all that (may need to talk to FCI again if we change from original agreement, and to sort out the multiple sub repos for library vs. research code, etc.)
  • (Tyler) PyPI release process -- we may not need binaries if we don't have compiled code, but should at least update pyproject.toml to match modern standards for support metadata, etc.
  • conda-forge release process
  • (Tyler) Portability -- we should probably be conformant with SPEC 0 -- this likely boils down to supporting Python 3.12 - 3.14 and testing for those in the CI
  • Do an official/GitHub immutable release when the (likely separate?) library repo is public, and assign it a Zenodo DOI
  • (Tyler) Make sure our documentation/docstrings are pretty decent -- i.e., could follow https://numpydoc.readthedocs.io/en/latest/format.html which has a validation/linter I believe (https://numpydoc.readthedocs.io/en/latest/validation.html); could also run doctests to make sure docs stay up to date (code examples are valid over time, etc.)
  • Hosting our documentation somewhere (GitHub pages, ReadTheDocs, whatever suits) - Navamita
  • Pick a journal -- we are NOT eligible for JOSS (https://joss.theoj.org/) because we don't have 6 months of sustained public engagement on i.e., GitHub
  • Draft the paper somewhere -- I can create a private Overleaf link that up to 10 people can edit -- I believe this should be "ok" since it would be similar level of privacy to this GitHub repo (can we loop Kyle/Kostas in a bit, or not for the code?) -- https://www.overleaf.com/3111751118mvxmcpqnqxmw#eb42c5
  • Get an LA-UR for the publication once we have a draft
  • Draft developer docs on how to add additional RVFL architectures in the future? Is that process clear?
  • (Emma) Remove StandardScaler usage from the estimators proper
  • what about the OneHotEncoder usage--double check that as well vs. requiring properly encoded input (I think one-hot requirement was more fundamental, but let's think about that)
  • Emma would like to hoist the weight initializers out of the classes - Emma
  • Emma would like to see some more input validation/error checking
  • Add a regression equivalent of EnsembleRVFLClassifier() - Emma
  • Allow for "online learning" via partial_fit since Grafo supports this, sklearn MLPClassifier supports it, and it may be necessary for training with very large design matrices - Emma
  • Follow up on difference in results for regression between GrafoRVFL and our regressor (https://github.com/lanl/ascr_rvfl/pull/54) - Navamita
  • Follow up with Kostas regarding the iterative scheme we need to test (SGD vs GD) for comparison with exact solve and clarify the details of the numerical test - Navamita
  • Split off appropriate parts of the library code (which parts?) for incorporation into a fork/branch of scikit-learn, and iterate with the team using a forked repo of scikit-learn (try to add all 3 of us in the condensed/redone commit history)
  • In the scikit-learn fork, work with the team to draft the PR description we'll eventually use to propose our addition--this should be well-written markdown that includes of examples of relevant papers that have been well cited, a clear explanation of what RVFL is, and so on, to convince the scikit-learn developers that this work is of sufficiently broad interest to be useful at the base of the Python ML ecosystem
  • in the scikit-learn fork, adjust our docstrings to match the exact standards they use
  • in the scikit-learn fork, make sure we pass their CI with their full testsuite/linting requirements, etc.
  • decide on the name of our open source library project (rvfl may not suit if we also offer ELM, etc.?), then open source it under LANL org

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions