Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dataset-based data splitting code #461

Merged
merged 22 commits into from
Aug 1, 2024

Conversation

mdekstrand
Copy link
Member

This moves (well, copies for now — need to update usages) the data splitting code into lenskit.splitting and makes it work on Dataset instead of data frames.

@mdekstrand mdekstrand added the data Data management support. label Jul 31, 2024
@mdekstrand mdekstrand added this to the 2024.1 milestone Jul 31, 2024
Copy link

github-actions bot commented Aug 1, 2024

The GitHub 🤖 has run the tests on your PR.

Covered 100.00% of diff (coverage changed 0.01% from 92.11% to 92.12%).

origin/main...HEAD, staged and unstaged changes
  • lenskit/lenskit/data/dataset.py (100%)
  • lenskit/lenskit/data/items.py (75.8%): Missing lines 190,370-373,375,403,440
  • lenskit/lenskit/data/vocab.py (100%)
  • lenskit/lenskit/splitting/init.py (100%)
  • lenskit/lenskit/splitting/holdout.py (92.9%): Missing lines 39,106,110,135
  • lenskit/lenskit/splitting/records.py (100%)
  • lenskit/lenskit/splitting/split.py (77.8%): Missing lines 45,52,60,64-66
  • lenskit/lenskit/splitting/users.py (100%)

Summary

  • Total: 242 lines
  • Missing: 18 lines
  • Coverage: 92%
Source Coverage Report
Name Stmts Miss Cover
lenskit-funksvd/lenskit/funksvd.py 187 8 96%
lenskit-hpf/lenskit/hpf.py 24 0 100%
lenskit-implicit/lenskit/implicit.py 94 9 90%
lenskit/lenskit/algorithms/__init__.py 67 8 88%
lenskit/lenskit/algorithms/als/__init__.py 3 0 100%
lenskit/lenskit/algorithms/als/common.py 129 2 98%
lenskit/lenskit/algorithms/als/explicit.py 121 3 98%
lenskit/lenskit/algorithms/als/implicit.py 112 1 99%
lenskit/lenskit/algorithms/basic.py 161 4 98%
lenskit/lenskit/algorithms/bias.py 150 3 98%
lenskit/lenskit/algorithms/knn/__init__.py 3 0 100%
lenskit/lenskit/algorithms/knn/item.py 303 17 94%
lenskit/lenskit/algorithms/knn/user.py 178 11 94%
lenskit/lenskit/algorithms/mf_common.py 61 0 100%
lenskit/lenskit/algorithms/ranking.py 75 11 85%
lenskit/lenskit/algorithms/svd.py 75 4 95%
lenskit/lenskit/batch/__init__.py 2 0 100%
lenskit/lenskit/batch/_predict.py 30 2 93%
lenskit/lenskit/batch/_recommend.py 46 4 91%
lenskit/lenskit/crossfold.py 136 2 99%
lenskit/lenskit/data/__init__.py 8 0 100%
lenskit/lenskit/data/checks.py 37 0 100%
lenskit/lenskit/data/dataset.py 364 19 95%
lenskit/lenskit/data/fetch.py 38 28 26%
lenskit/lenskit/data/items.py 176 11 94%
lenskit/lenskit/data/matrix.py 115 5 96%
lenskit/lenskit/data/movielens.py 96 18 81%
lenskit/lenskit/data/mtarray.py 57 3 95%
lenskit/lenskit/data/tables.py 25 0 100%
lenskit/lenskit/data/vocab.py 90 7 92%
lenskit/lenskit/diagnostics.py 4 0 100%
lenskit/lenskit/math/__init__.py 0 0 100%
lenskit/lenskit/math/solve.py 6 0 100%
lenskit/lenskit/metrics/__init__.py 0 0 100%
lenskit/lenskit/metrics/predict.py 32 0 100%
lenskit/lenskit/metrics/topn.py 212 1 99%
lenskit/lenskit/parallel/__init__.py 4 0 100%
lenskit/lenskit/parallel/chunking.py 20 1 95%
lenskit/lenskit/parallel/config.py 65 8 88%
lenskit/lenskit/parallel/invoker.py 31 2 94%
lenskit/lenskit/parallel/pool.py 54 9 83%
lenskit/lenskit/parallel/sequential.py 22 0 100%
lenskit/lenskit/parallel/serialize.py 51 1 98%
lenskit/lenskit/parallel/worker.py 43 3 93%
lenskit/lenskit/splitting/__init__.py 4 0 100%
lenskit/lenskit/splitting/holdout.py 56 4 93%
lenskit/lenskit/splitting/records.py 55 0 100%
lenskit/lenskit/splitting/split.py 27 6 78%
lenskit/lenskit/splitting/users.py 60 0 100%
lenskit/lenskit/topn.py 109 25 77%
lenskit/lenskit/util/__init__.py 72 19 74%
lenskit/lenskit/util/envcheck.py 57 44 23%
lenskit/lenskit/util/logging.py 19 0 100%
lenskit/lenskit/util/random.py 26 3 88%
lenskit/lenskit/util/test.py 103 19 82%
lenskit/lenskit/util/timing.py 28 0 100%
TOTAL 4123 325 92%

@mdekstrand mdekstrand merged commit 1db4c20 into lenskit:main Aug 1, 2024
38 checks passed
@mdekstrand mdekstrand deleted the feature/split-dataset branch August 1, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
data Data management support.
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

1 participant