Added csv directory reading #18853

Formiga57 · 2023-11-30T11:16:46Z

I'm a aerospace engineering student actually searching for ways to measure sensors influence within a wing with help of ML. Given a lot of researches based on gathering sensors data stored in .csv files or binary ones, we thought it would be a nice feature to have a csv loader or even more formats of data files from sensors or researches, not only mainly Images, Audio or Text.
Then we implemented an "csv_dataset_from_directory()", by now as just an example, method to load a directory with multiple classes of csv, notwithstanding resulting in good results with a tiny amount of our dataset.

Please feel free to give any comments or suggestions about this implementation, we had only about 12 hours to think in a project cause we're in a Hackathon right now. Let us know if you guys had issues inputting that kind of data, therefore for our use in the aerospace engineering would be the exactly feature we're looking for!

google-cla · 2023-11-30T11:16:50Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

fchollet · 2023-11-30T16:41:59Z

keras/utils/csv_dataset_utils.py

+    labels='inferred')`
+    will return a `tf.data.Dataset` that yields batches of csv files from
+    the subdirectories `class_a` and `class_b`, together with labels
+    0 and 1 (0 corresponding to `class_a` and 1 corresponding to `class_b`).


So 1 csv file = 1 sample? Is this a common case? Usually you have 1 row = 1 sample.

fchollet · 2023-11-30T16:42:37Z

keras/utils/csv_dataset_utils.py

+
+
+def getReadings(path, stride: int = 0, head: bool = True):
+    return tf.strings.to_number(tf.strings.split(tf.strings.split(tf.io.read_file(path)), sep=","), out_type=tf.float32)[1::stride]


This line hardcodes a lot of assumptions about the data.

fchollet

Thanks for the PR!

My primary questions here are:

Would this generalize to use cases encountered by a lot of people? Or is it closer to being a one-off for your use case?
Would those people find it intuitive to learn to use the utility for their use case?

codecov-commenter · 2023-12-01T06:54:23Z

Codecov Report

Attention: 59 lines in your changes are missing coverage. Please review.

Comparison is base (724321c) 75.57% compared to head (c6b927b) 79.25%.
Report is 9 commits behind head on master.

Files	Patch %	Lines
keras/utils/csv_dataset_utils.py	18.05%	59 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #18853      +/-   ##
==========================================
+ Coverage   75.57%   79.25%   +3.67%     
==========================================
  Files         352      337      -15     
  Lines       37066    34909    -2157     
  Branches     7225     6875     -350     
==========================================
- Hits        28014    27667     -347     
+ Misses       7357     5656    -1701     
+ Partials     1695     1586     -109

Flag	Coverage Δ
keras	`79.11% <19.17%> (+3.96%)`	⬆️
keras-jax	`60.97% <19.17%> (?)`
keras-numpy	`55.78% <19.17%> (-0.08%)`	⬇️
keras-tensorflow	`63.12% <19.17%> (-0.10%)`	⬇️
keras-torch	`63.71% <19.17%> (-0.09%)`	⬇️
keras.applications	`?`
keras.applications-numpy	`?`
keras.applications-tensorflow	`?`
keras.applications-torch	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

github-actions · 2023-12-20T01:43:17Z

This PR is stale because it has been open for 14 days with no activity. It will be closed if no further activity occurs. Thank you.

Added csv directory reading

c6b927b

google-ml-butler bot added the size:L label Nov 30, 2023

google-ml-butler bot assigned gbaned Nov 30, 2023

This was referenced Nov 30, 2023

Added csv reading tensorflow/docs#2289

Open

Add a csv/plain data input from directory tensorflow/tensorflow#62511

Closed

fchollet reviewed Nov 30, 2023

View reviewed changes

sachinprasadhs added the stat:awaiting response from contributor label Dec 5, 2023

github-actions bot added the stale label Dec 20, 2023

gbaned added this to Assigned Reviewer in PR Queue via automation Dec 20, 2023

fchollet closed this Dec 20, 2023

PR Queue automation moved this from Assigned Reviewer to Closed/Rejected Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added csv directory reading #18853

Added csv directory reading #18853

Formiga57 commented Nov 30, 2023 •

edited

google-cla bot commented Nov 30, 2023

fchollet Nov 30, 2023

fchollet Nov 30, 2023 •

edited

fchollet left a comment

codecov-commenter commented Dec 1, 2023 •

edited

github-actions bot commented Dec 20, 2023



		def getReadings(path, stride: int = 0, head: bool = True):
		return tf.strings.to_number(tf.strings.split(tf.strings.split(tf.io.read_file(path)), sep=","), out_type=tf.float32)[1::stride]

Added csv directory reading #18853

Added csv directory reading #18853

Conversation

Formiga57 commented Nov 30, 2023 • edited

google-cla bot commented Nov 30, 2023

fchollet Nov 30, 2023

Choose a reason for hiding this comment

fchollet Nov 30, 2023 • edited

Choose a reason for hiding this comment

fchollet left a comment

Choose a reason for hiding this comment

codecov-commenter commented Dec 1, 2023 • edited

Codecov Report

github-actions bot commented Dec 20, 2023

Formiga57 commented Nov 30, 2023 •

edited

fchollet Nov 30, 2023 •

edited

codecov-commenter commented Dec 1, 2023 •

edited