MAINT, ENH: library code/manuscript tracking

@nray @eiviani-lanl just migrating this tracking issue to the public repo via copy/paste of current state.


- [x] finalize the edRVFL PR (gh-26)
- [x] finalize the regression PR (gh-44)
- [ ] other code changes/additions -- please expand as needed, there will likely be some
- [ ] decide on how to "split off" the library code and make it public while this repo remains private? preservation of `git` history vs. keeping research plans to ourselves for now, etc. There are various logistical annoyances related to that
- [ ] Add an appropriate project license, considering what we import/link to, what FCI approved, and all that (may need to talk to FCI again if we change from original agreement, and to sort out the multiple sub repos for library vs. research code, etc.)
- [ ] (Tyler) PyPI release process -- we may not need binaries if we don't have compiled code, but should at least update `pyproject.toml` to match modern standards for support metadata, etc.
- [ ] `conda-forge` release process
- [x] (Tyler) Portability -- we should probably be conformant with [SPEC 0](https://scientific-python.org/specs/spec-0000/) -- this likely boils down to supporting Python `3.12` - `3.14` and testing for those in the CI
- [ ] Do an official/GitHub [immutable release](https://docs.github.com/en/code-security/supply-chain-security/understanding-your-software-supply-chain/immutable-releases) when the (likely separate?) library repo is public, and assign it a Zenodo DOI
- [ ] (Tyler) Make sure our documentation/docstrings are pretty decent -- i.e., could follow https://numpydoc.readthedocs.io/en/latest/format.html which has a validation/linter I believe (https://numpydoc.readthedocs.io/en/latest/validation.html); could also run doctests to make sure docs stay up to date (code examples are valid over time, etc.)
- [ ] Hosting our documentation somewhere (GitHub pages, ReadTheDocs, whatever suits) - Navamita 
- [ ] Pick a journal -- we are NOT eligible for JOSS (https://joss.theoj.org/) because we don't have 6 months of sustained public engagement on i.e., GitHub
- [ ]  Draft the paper somewhere -- I can create a private Overleaf link that up to 10 people can edit -- I believe this should be "ok" since it would be similar level of privacy to this GitHub repo (can we loop Kyle/Kostas in a bit, or not for the code?) -- https://www.overleaf.com/3111751118mvxmcpqnqxmw#eb42c5
- [ ] Get an LA-UR for the publication once we have a draft
- [ ] Draft developer docs on how to add additional RVFL architectures in the future? Is that process clear?
- [x] (Emma) Remove `StandardScaler` usage from the estimators proper
- [ ]  what about the `OneHotEncoder` usage--double check that as well vs. requiring properly encoded input (I think one-hot requirement was more fundamental, but let's think about that)
- [x] Emma would like to hoist the weight initializers out of the classes - Emma
- [ ] Emma would like to see some more input validation/error checking
- [ ] Add a regression equivalent of `EnsembleRVFLClassifier()` - Emma
- [ ] Allow for "online learning" via `partial_fit` since Grafo supports this, `sklearn` `MLPClassifier` supports it, and it may be necessary for training with very large design matrices - Emma
- [ ] Follow up on difference in results for regression between GrafoRVFL and our regressor (https://github.com/lanl/ascr_rvfl/pull/54) - Navamita 
- [ ] Follow up with Kostas regarding the iterative scheme we need to test (SGD vs GD) for comparison with exact solve and clarify the details of the numerical test - Navamita 
- [ ] Split off appropriate parts of the library code (which parts?) for incorporation into a fork/branch of `scikit-learn`, and iterate with the team using a forked repo of `scikit-learn` (try to add all 3 of us in the condensed/redone commit history)
- [ ] In the `scikit-learn` fork, work with the team to draft the PR description we'll eventually use to propose our addition--this should be well-written markdown that includes of examples of relevant papers that have been well cited, a clear explanation of what RVFL is, and so on, to convince the `scikit-learn` developers that this work is of sufficiently broad interest to be useful at the base of the Python ML ecosystem
- [ ] in the `scikit-learn` fork, adjust our docstrings to match the exact standards they use
- [ ] in the `scikit-learn` fork, make sure we pass their CI with their full testsuite/linting requirements, etc.
- [ ] decide on the name of our open source library project (`rvfl` may not suit if we also offer ELM, etc.?), then open source it under LANL org


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT, ENH: library code/manuscript tracking #1

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

MAINT, ENH: library code/manuscript tracking #1

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions