Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added csv directory reading #18853

Closed
wants to merge 1 commit into from
Closed

Conversation

Formiga57
Copy link

@Formiga57 Formiga57 commented Nov 30, 2023

I'm a aerospace engineering student actually searching for ways to measure sensors influence within a wing with help of ML. Given a lot of researches based on gathering sensors data stored in .csv files or binary ones, we thought it would be a nice feature to have a csv loader or even more formats of data files from sensors or researches, not only mainly Images, Audio or Text.
Then we implemented an "csv_dataset_from_directory()", by now as just an example, method to load a directory with multiple classes of csv, notwithstanding resulting in good results with a tiny amount of our dataset.

Please feel free to give any comments or suggestions about this implementation, we had only about 12 hours to think in a project cause we're in a Hackathon right now. Let us know if you guys had issues inputting that kind of data, therefore for our use in the aerospace engineering would be the exactly feature we're looking for!

Copy link

google-cla bot commented Nov 30, 2023

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

labels='inferred')`
will return a `tf.data.Dataset` that yields batches of csv files from
the subdirectories `class_a` and `class_b`, together with labels
0 and 1 (0 corresponding to `class_a` and 1 corresponding to `class_b`).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So 1 csv file = 1 sample? Is this a common case? Usually you have 1 row = 1 sample.



def getReadings(path, stride: int = 0, head: bool = True):
return tf.strings.to_number(tf.strings.split(tf.strings.split(tf.io.read_file(path)), sep=","), out_type=tf.float32)[1::stride]
Copy link
Member

@fchollet fchollet Nov 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line hardcodes a lot of assumptions about the data.

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

My primary questions here are:

  1. Would this generalize to use cases encountered by a lot of people? Or is it closer to being a one-off for your use case?
  2. Would those people find it intuitive to learn to use the utility for their use case?

@codecov-commenter
Copy link

codecov-commenter commented Dec 1, 2023

Codecov Report

Attention: 59 lines in your changes are missing coverage. Please review.

Comparison is base (724321c) 75.57% compared to head (c6b927b) 79.25%.
Report is 9 commits behind head on master.

Files Patch % Lines
keras/utils/csv_dataset_utils.py 18.05% 59 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #18853      +/-   ##
==========================================
+ Coverage   75.57%   79.25%   +3.67%     
==========================================
  Files         352      337      -15     
  Lines       37066    34909    -2157     
  Branches     7225     6875     -350     
==========================================
- Hits        28014    27667     -347     
+ Misses       7357     5656    -1701     
+ Partials     1695     1586     -109     
Flag Coverage Δ
keras 79.11% <19.17%> (+3.96%) ⬆️
keras-jax 60.97% <19.17%> (?)
keras-numpy 55.78% <19.17%> (-0.08%) ⬇️
keras-tensorflow 63.12% <19.17%> (-0.10%) ⬇️
keras-torch 63.71% <19.17%> (-0.09%) ⬇️
keras.applications ?
keras.applications-numpy ?
keras.applications-tensorflow ?
keras.applications-torch ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

This PR is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

@github-actions github-actions bot added the stale label Dec 20, 2023
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Dec 20, 2023
@fchollet fchollet closed this Dec 20, 2023
PR Queue automation moved this from Assigned Reviewer to Closed/Rejected Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
PR Queue
Closed/Rejected
Development

Successfully merging this pull request may close these issues.

None yet

5 participants