Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added ability to generate split CSVs using external script #833

Conversation

scap3yvt
Copy link
Collaborator

Fixes #828

Proposed Changes

Checklist

  • CONTRIBUTING guide has been followed.
  • PR is based on the current GaNDLF master .
  • Non-breaking change (does not break existing functionality): provide as many details as possible for any breaking change.
  • Function/class source code documentation added/updated (ensure typing is used to provide type hints, including and not limited to using Optional if a variable has a pre-defined value).
  • Code has been blacked for style consistency and linting.
  • If applicable, version information has been updated in GANDLF/version.py.
  • If adding a git submodule, add to list of exceptions for black styling in pyproject.toml file.
  • Usage documentation has been updated, if appropriate.
  • Tests added or modified to cover the changes; if coverage is reduced, please give explanation.
  • If customized dependency installation is required (i.e., a separate pip install step is needed for PR to be functional), please ensure it is reflected in all the files that control the CI, namely: python-test.yml, and all docker files [1,2,3].

Copy link
Contributor

github-actions bot commented Mar 24, 2024

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

…lit-csvs-for-trainingvalidationtesting-as-a-separate-script

828 feature add the ability to split csvs for trainingvalidationtesting as a separate script
@scap3yvt scap3yvt marked this pull request as draft March 25, 2024 01:57
…lit-csvs-for-trainingvalidationtesting-as-a-separate-script

updated checks for stratified split
Copy link

codecov bot commented Mar 25, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.10%. Comparing base (32d70d4) to head (ecbaf9e).

❗ Current head ecbaf9e differs from pull request most recent head dc4c591. Consider uploading reports for the commit dc4c591 to get more accurate results

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #833   +/-   ##
=======================================
  Coverage   95.09%   95.10%           
=======================================
  Files         121      122    +1     
  Lines        8312     8347   +35     
=======================================
+ Hits         7904     7938   +34     
- Misses        408      409    +1     
Flag Coverage Δ
unittests 95.10% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

scap3yvt and others added 8 commits March 25, 2024 09:33
…lit-csvs-for-trainingvalidationtesting-as-a-separate-script

828 feature add the ability to split csvs for trainingvalidationtesting as a separate script
…lit-csvs-for-trainingvalidationtesting-as-a-separate-script

828 feature add the ability to split csvs for trainingvalidationtesting as a separate script
…ainingvalidationtesting-csv-with-proportional-splits
@sarthakpati sarthakpati self-requested a review March 26, 2024 15:38
sarthakpati
sarthakpati previously approved these changes Mar 26, 2024
Copy link
Collaborator

@sarthakpati sarthakpati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@scap3yvt
Copy link
Collaborator Author

Hey @VukW, can you give some insight on why this error is coming up? I am unable to reproduce any changes on my local system:

(venv_gandlf) PS C:\Projects\GaNDLF> black .\gandlf\cli\data_split_saver.py
All done! ✨ 🍰 ✨
1 file left unchanged.

@VukW
Copy link
Contributor

VukW commented Mar 26, 2024

Hi @scap3yvt , Added a fix linter requires. For local reproducing, may you check your black version matches CI one? It should be 23.11.0:

pip list | grep black

@scap3yvt
Copy link
Collaborator Author

Hi @scap3yvt , Added a fix linter requires. For local reproducing, may you check your black version matches CI one? It should be 23.11.0:

pip list | grep black

Here you go:

(venv_gandlf) PS C:\Projects\GaNDLF> pip show black
Name: black
Version: 23.11.0
Summary: The uncompromising code formatter.
Home-page:
Author:
Author-email: Łukasz Langa <lukasz@langa.pl>
License: MIT
Location: C:\Projects\GaNDLF\venv\Lib\site-packages
Requires: click, mypy-extensions, packaging, pathspec, platformdirs
Required-by: GANDLF

Co-authored-by: Viacheslav Kukushkin <vy.kukushkin@gmail.com>
@scap3yvt scap3yvt marked this pull request as ready for review March 26, 2024 18:03
Copy link
Collaborator

@sarthakpati sarthakpati left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sarthakpati sarthakpati merged commit b9557f6 into mlcommons:master Mar 26, 2024
19 checks passed
@github-actions github-actions bot locked and limited conversation to collaborators Mar 26, 2024
gandlf_splitCSV Show resolved Hide resolved
@scap3yvt scap3yvt deleted the 829-feature-add-the-ability-to-generate-trainingvalidationtesting-csv-with-proportional-splits branch March 29, 2024 12:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEATURE] Add the ability to split CSVs for training/validation/testing as a separate script
3 participants