-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge branch 'main' into refactor_ordination
- Loading branch information
Showing
22 changed files
with
447 additions
and
48 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
version: 2 | ||
|
||
build: | ||
os: ubuntu-22.04 | ||
tools: | ||
python: "3.11" | ||
|
||
mkdocs: | ||
configuration: docs/mkdocs.yml | ||
|
||
python: | ||
install: | ||
- requirements: docs/requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,77 +1,87 @@ | ||
# scikit-learn-knn-regression | ||
# sknnr | ||
|
||
This package is in active development. | ||
> ⚠️ **WARNING: sknnr is in active development!** ⚠️ | ||
## Developer Guide | ||
## What is sknnr? | ||
|
||
### Setup | ||
`sknnr` is a package for running k-nearest neighbor (kNN) imputation[^imputation] methods using estimators that are fully compatible with [`scikit-learn`](https://scikit-learn.org/stable/). Notably, common methods such as most similar neighbor (MSN, Moeur & Stage 1995), gradient nearest neighbor (GNN, Ohmann & Gregory, 2002), and random forest nearest neighbors[^rfnn] (RFNN, Crookston & Finley, 2008) are included in this package. | ||
|
||
This project uses [hatch](https://hatch.pypa.io/latest/) to manage the development environment and build and publish releases. Make sure `hatch` is [installed](https://hatch.pypa.io/latest/install/) first: | ||
## Features | ||
|
||
```bash | ||
$ pip install hatch | ||
``` | ||
- 🤝 Tight integration with the [`scikit-learn`](https://scikit-learn.org/stable/) API | ||
- 🐼 Native support for [`pandas`](https://pandas.pydata.org/) dataframes | ||
- 📊 [Multi-output](https://scikit-learn.org/stable/modules/multiclass.html) estimators for [regression and classification](https://sknnr.readthedocs.io/usage/#regression-and-classification) | ||
- 📝 Results validated against [yaImpute](https://cran.r-project.org/web/packages/yaImpute/index.html) (Crookston & Finley 2008)[^validation] | ||
|
||
Now you can [enter the development environment](https://hatch.pypa.io/latest/environment/#entering-environments) using: | ||
## Why the Name "sknnr"? | ||
|
||
```bash | ||
$ hatch shell | ||
``` | ||
`sknnr` is an acronym of its main three components: | ||
|
||
This will install development dependencies in an isolated environment and drop you into a shell (use `exit` to leave). | ||
1. **"s"** is for `scikit-learn`. All estimators in this package derive from the `sklearn.BaseEstimator` class and comply with the requirements associated with [developing custom estimators](https://scikit-learn.org/stable/developers/develop.html). | ||
2. **"knn"** is for k-nearest neighbors. All estimators use the _k_ >= 1 samples that are nearest in feature space to create their prediction. Each estimator in this package defines that feature space in a different way which often leads to different neighbors chosen for the prediction. | ||
3. **"r"** is for regression. Estimators in this package are run in [regression mode](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsRegressor.html). For nearest neighbor imputation, this is simply an (optionally-weighted) average of its _k_ neighbors. When _k_ is set to 1, this effectively behaves as in [classification mode](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html). All estimators support multi-output prediction so that multiple features can be predicted with the same estimator. | ||
|
||
### Pre-commit | ||
## Quick-Start | ||
|
||
Use [pre-commit](https://pre-commit.com/) to run linting, type-checking, and formatting: | ||
1. Follow the [installation guide](https://sknnr.readthedocs.io/installation). | ||
2. Import any `sknnr` estimator, like [MSNRegressor](https://sknnr.readthedocs.io/api/estimators/msn), as a drop-in replacement for a `scikit-learn` regressor. | ||
```python | ||
from sknnr import MSNRegressor | ||
|
||
```bash | ||
$ pre-commit run --all-files | ||
est = MSNRegressor() | ||
``` | ||
3. Load a custom dataset like [SWO Ecoplot](https://sknnr.readthedocs.io/api/datasets/swo_ecoplot) (or bring your own). | ||
```python | ||
from sknnr.datasets import load_swo_ecoplot | ||
|
||
...or install it to run automatically before every commit with: | ||
|
||
```bash | ||
$ pre-commit install | ||
X, y = load_swo_ecoplot(return_X_y=True, as_frame=True) | ||
``` | ||
4. Train, predict, and score [as usual](https://scikit-learn.org/stable/getting_started.html#fitting-and-predicting-estimator-basics). | ||
```python | ||
from sklearn.model_selection import train_test_split | ||
|
||
You can run pre-commit hooks separately and pass additional arguments to them. For example, to run `black` on a single file: | ||
X_train, X_test, y_train, y_test = train_test_split(X, y) | ||
|
||
```bash | ||
$ pre-commit run black --files=src/sknnr/_base.py | ||
est = est.fit(X_train, y_train) | ||
est.score(X_test, y_test) | ||
``` | ||
5. Check out the additional features like [independent scoring](https://sknnr.readthedocs.io/usage/#independent-scores-and-predictions), [dataframe indexing](https://sknnr.readthedocs.io/usage/#retrieving-dataframe-indexes), and [dimensionality reduction](https://sknnr.readthedocs.io/usage/#dimensionality-reduction). | ||
```python | ||
# Evaluate the model using the second-nearest neighbor in the test set | ||
print(est.fit(X, y).independent_score_) | ||
|
||
### Testing | ||
|
||
Unit tests are *not* run by `pre-commit`, but can be run manually using `hatch` [scripts](https://hatch.pypa.io/latest/config/environment/overview/#scripts): | ||
# Get the dataframe index of the nearest neighbor to each plot | ||
print(est.kneighbors(return_dataframe_index=True, return_distance=False)) | ||
|
||
```bash | ||
$ hatch run test:all | ||
# Apply dimensionality reduction using CCorA ordination | ||
MSNRegressor(n_components=3).fit(X_train, y_train) | ||
``` | ||
|
||
Measure test coverage with: | ||
## History and Inspiration | ||
`sknnr` was heavily inspired by (and endeavors to implement functionality of) the [yaImpute](https://cran.r-project.org/web/packages/yaImpute/index.html) package for R (Crookston & Finley 2008). As Crookston and Finley (2008) note in their abstract, | ||
> Although nearest neighbor imputation is used in a host of disciplines, the methods implemented in the yaImpute package are tailored to imputation-based forest attribute estimation and mapping ... [there is] a growing interest in nearest neighbor imputation methods for spatially explicit forest inventory, and a need within this research community for software that facilitates comparison among different nearest neighbor search algorithms and subsequent imputation techniques. | ||
```bash | ||
$ hatch run test:coverage | ||
``` | ||
Indeed, many regional (e.g. [LEMMA](https://lemmadownload.forestry.oregonstate.edu/)) and national (e.g. [BIGMAP](https://storymaps.arcgis.com/stories/c710684b98f54452804e8960d37905b2), [TreeMap](https://www.firelab.org/project/treemap-tree-level-model-forests-united-states)) projects use nearest-neighbor methods to | ||
estimate and map forest attributes across time and space. | ||
|
||
Any additional arguments are passed to `pytest`. For example, to run a subset of tests matching a keyword: | ||
To that end, `sknnr` ports and expands the functionality present in `yaImpute` into a Python package that helps facilitate intercomparison between k-nearest neighbor methods (and other built-in estimators from `scikit-learn`) using an API which is familiar to `scikit-learn` users. | ||
|
||
```bash | ||
$ hatch run test:all -k gnn | ||
``` | ||
## Acknowledgements | ||
|
||
### Releasing | ||
Thanks to Andrew Hudak (USDA Forest Service Rocky Mountain Research Station) for the inclusion of the [Moscow Mountain / St. Joes dataset](https://sknnr.readthedocs.io/api/datasets/moscow_stjoes) (Hudak 2010), and the USDA Forest Service Region 6 Ecology Team for the inclusion of the [SWO Ecoplot dataset](https://sknnr.readthedocs.io/api/datasets/swo_ecoplot) (Atzet et al., 1996). Development of this package was funded by: | ||
|
||
First, use `hatch` to [update the version number](https://hatch.pypa.io/latest/version/#updating). | ||
- an appointment to the United States Forest Service (USFS) Research Participation Program administered by the Oak Ridge Institute for Science and Education (ORISE) through an interagency agreement between the U.S. Department of Energy (DOE) and the U.S. Department of Agriculture (USDA). | ||
- a joint venture agreement between USFS Pacific Northwest Research Station and Oregon State University (agreement 19-JV-11261959-064). | ||
- a cost-reimbursable agreement between USFS Region 6 and Oregon State University (agreeement 21-CR-11062756-046). | ||
|
||
```bash | ||
$ hatch version [major|minor|patch] | ||
``` | ||
## References | ||
|
||
Then, [build](https://hatch.pypa.io/latest/build/#building) and [publish](https://hatch.pypa.io/latest/publish/#publishing) the release to PyPI with: | ||
- Atzet, T, DE White, LA McCrimmon, PA Martinez, PR Fong, and VD Randall. 1996. Field guide to the forested plant associations of southwestern Oregon. USDA Forest Service. Pacific Northwest Region, Technical Paper R6-NR-ECOL-TP-17-96. | ||
- Crookston, NL, Finley, AO. 2008. yaImpute: An R package for kNN imputation. Journal of Statistical Software, 23(10), 16. | ||
- Hudak, A.T. 2010. Field plot measures and predictive maps for "Nearest neighbor imputation of species-level, plot-scale forest structure attributes from LiDAR data". Fort Collins, CO: U.S. Department of Agriculture, Forest Service, Rocky Mountain Research Station. https://www.fs.usda.gov/rds/archive/Catalog/RDS-2010-0012. | ||
- Moeur M, Stage AR. 1995. Most Similar Neighbor: An Improved Sampling Inference Procedure for Natural Resources Planning. Forest Science, 41(2), 337–359. | ||
- Ohmann JL, Gregory MJ. 2002. Predictive Mapping of Forest Composition and Structure with Direct Gradient Analysis and Nearest Neighbor Imputation in Coastal Oregon, USA. Canadian Journal of Forest Research, 32, 725–741. | ||
|
||
```bash | ||
$ hatch clean | ||
$ hatch build | ||
$ hatch publish | ||
``` | ||
[^imputation]: In a mapping context, kNN imputation refers to predicting feature values for a target from its k-nearest neighbors, and should not be confused with the usual `scikit-learn` usage as a pre-filling strategy for missing input data, e.g. [`KNNImputer`](https://scikit-learn.org/stable/modules/generated/sklearn.impute.KNNImputer.html). | ||
[^rfnn]: In [development](https://github.com/lemma-osu/scikit-learn-knn-regression/issues/24)! | ||
[^validation]: All estimators and parameters with equivalent functionality in `yaImpute` are tested to 3 decimal places against the R package. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*[GNN]: Gradient Nearest Neighbor | ||
*[MSN]: Most Similar Neighbor | ||
*[kNN]: k-nearest neighbor | ||
*[RFNN]: Random Forest Nearest Neighbor | ||
*[CCorA]: Canonical Correlation Analysis | ||
*[CCA]: Canonical Correspondence Analysis |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
site_name: sknnr | ||
repo_url: https://github.com/lemma-osu/scikit-learn-knn-regression | ||
repo_name: lemma-osu/scikit-learn-knn-regression | ||
docs_dir: pages/ | ||
|
||
nav: | ||
- Home: index.md | ||
- Installation: installation.md | ||
- Usage: usage.md | ||
- "API Reference": | ||
- Estimators: | ||
- RawKNNRegressor: api/estimators/raw.md | ||
- EuclideanKNNRegressor: api/estimators/euclidean.md | ||
- MahalanobisKNNRegressor: api/estimators/mahalanobis.md | ||
- GNNRegressor: api/estimators/gnn.md | ||
- MSNRegressor: api/estimators/msn.md | ||
- Transformers: | ||
- StandardScalerWithDOF: api/transformers/standardscalerwithdof.md | ||
- MahalanobisTransformer: api/transformers/mahalanobis.md | ||
- CCATransformer: api/transformers/cca.md | ||
- CCorATransformer: api/transformers/ccora.md | ||
- Datasets: | ||
- Dataset: api/datasets/dataset.md | ||
- "Moscow Mountain / St. Joes": api/datasets/moscow_stjoes.md | ||
- "SWO Ecoplot": api/datasets/swo_ecoplot.md | ||
- Contributing: contributing.md | ||
|
||
theme: | ||
name: material | ||
features: | ||
- search.suggest | ||
- search.highlight | ||
- navigation.instant | ||
- navigation.path | ||
- content.code.copy | ||
- content.code.annotate | ||
palette: | ||
- media: "(prefers-color-scheme: light)" | ||
scheme: default | ||
toggle: | ||
icon: material/weather-night | ||
name: Dark mode | ||
- media: "(prefers-color-scheme: dark)" | ||
scheme: slate | ||
toggle: | ||
icon: material/weather-sunny | ||
name: Light mode | ||
|
||
plugins: | ||
- search | ||
- mkdocstrings: | ||
handlers: | ||
python: | ||
paths: [../src] | ||
options: | ||
show_source: false | ||
inherited_members: true | ||
undoc_members: true | ||
docstring_style: numpy | ||
show_if_no_docstring: true | ||
show_signature_annotations: true | ||
show_root_heading: true | ||
show_category_heading: true | ||
merge_init_into_class: true | ||
signature_crossrefs: true | ||
|
||
markdown_extensions: | ||
- abbr | ||
- admonition | ||
- tables | ||
- footnotes | ||
- toc: | ||
permalink: true | ||
- pymdownx.snippets: | ||
auto_append: | ||
- docs/abbreviations.md | ||
- pymdownx.highlight: | ||
anchor_linenums: true | ||
line_spans: __span | ||
pygments_lang_class: true | ||
- pymdownx.inlinehilite | ||
- pymdownx.superfences |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.datasets._base.Dataset |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.datasets.load_moscow_stjoes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.datasets.load_swo_ecoplot |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.EuclideanKNNRegressor |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.GNNRegressor |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.MahalanobisKNNRegressor |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.MSNRegressor |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.RawKNNRegressor |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.transformers.CCATransformer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.transformers.CCorATransformer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.transformers.MahalanobisTransformer |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
::: sknnr.transformers.StandardScalerWithDOF |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
# Contributing | ||
|
||
## Developer Guide | ||
|
||
### Setup | ||
|
||
This project uses [hatch](https://hatch.pypa.io/latest/) to manage the development environment and build and publish releases. Make sure `hatch` is [installed](https://hatch.pypa.io/latest/install/) first: | ||
|
||
```bash | ||
$ pip install hatch | ||
``` | ||
|
||
Now you can [enter the development environment](https://hatch.pypa.io/latest/environment/#entering-environments) using: | ||
|
||
```bash | ||
$ hatch shell | ||
``` | ||
|
||
This will install development dependencies in an isolated environment and drop you into a shell (use `exit` to leave). | ||
|
||
### Pre-commit | ||
|
||
Use [pre-commit](https://pre-commit.com/) to run linting, type-checking, and formatting: | ||
|
||
```bash | ||
$ pre-commit run --all-files | ||
``` | ||
|
||
...or install it to run automatically before every commit with: | ||
|
||
```bash | ||
$ pre-commit install | ||
``` | ||
|
||
You can run pre-commit hooks separately and pass additional arguments to them. For example, to run `black` on a single file: | ||
|
||
```bash | ||
$ pre-commit run black --files=src/sknnr/_base.py | ||
``` | ||
|
||
### Testing | ||
|
||
Unit tests are *not* run by `pre-commit`, but can be run manually using `hatch` [scripts](https://hatch.pypa.io/latest/config/environment/overview/#scripts): | ||
|
||
```bash | ||
$ hatch run test:all | ||
``` | ||
|
||
Measure test coverage with: | ||
|
||
```bash | ||
$ hatch run test:coverage | ||
``` | ||
|
||
Any additional arguments are passed to `pytest`. For example, to run a subset of tests matching a keyword: | ||
|
||
```bash | ||
$ hatch run test:all -k gnn | ||
``` | ||
|
||
### Documentation | ||
|
||
Documentation is built with [mkdocs](https://www.mkdocs.org/). During development, you can run a live-reloading server with: | ||
|
||
```bash | ||
$ hatch run docs:serve | ||
``` | ||
|
||
The API reference is generated from Numpy-style docstrings using [mkdocstrings](https://mkdocstrings.github.io/). New classes can be added to the API reference by creating a new markdown file in the `docs/pages/api` directory, adding that file to the [`nav` tree](https://www.mkdocs.org/user-guide/configuration/#nav) in `docs/mkdocs.yml`, and [including the docstring](https://mkdocstrings.github.io/python/usage/#injecting-documentation) in the markdown file: | ||
|
||
```markdown | ||
::: sknnr.module.class | ||
``` | ||
|
||
Whenever the docs are updated, they will be automatically rebuilt and deployed by [ReadTheDocs](https://about.readthedocs.com). Build status can be monitored [here](https://readthedocs.org/projects/sknnr/builds/). | ||
|
||
### Releasing | ||
|
||
First, use `hatch` to [update the version number](https://hatch.pypa.io/latest/version/#updating). | ||
|
||
```bash | ||
$ hatch version [major|minor|patch] | ||
``` | ||
|
||
Then, [build](https://hatch.pypa.io/latest/build/#building) and [publish](https://hatch.pypa.io/latest/publish/#publishing) the release to PyPI with: | ||
|
||
```bash | ||
$ hatch clean | ||
$ hatch build | ||
$ hatch publish | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
--8<-- "README.md" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
# Installation | ||
|
||
!!! info | ||
`sknnr` will be available through PyPI and conda-forge once it is ready for release. Until then, you can install it from source. | ||
|
||
## From Source | ||
|
||
```bash | ||
pip install git+https://github.com/lemma-osu/scikit-learn-knn-regression@main | ||
``` | ||
|
||
## Dependencies | ||
|
||
- Python >= 3.8 | ||
- scikit-learn >= 1.2 | ||
- numpy | ||
- scipy |
Oops, something went wrong.