Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possibly replace tensorflow usage in data_utils with lighter-weight tool #26

Open
cisaacstern opened this issue Aug 18, 2023 · 1 comment
Labels
enhancement New feature or request

Comments

@cisaacstern
Copy link
Contributor

cisaacstern commented Aug 18, 2023

The tensorflow dependency in climsim_utils.data_utils is pretty heavy, and among other things makes using that module on Mac M1 a bit cumbersome. In that module, tensorflow is actually only used for data loading:

return tf.data.Dataset.from_generator(
gen,
output_types = (tf.float64, tf.float64),
output_shapes = ((None,124),(None,128))
)

Per @jerrylin96, we may be able to use another approach here, which would be desirable if possible.

@cisaacstern cisaacstern changed the title Possibly replace terraform usage in data_utils with lighter-weight tool Possibly replace tensorflow usage in data_utils with lighter-weight tool Aug 18, 2023
@cisaacstern cisaacstern added the enhancement New feature or request label Aug 18, 2023
@jerrylin96
Copy link
Collaborator

I think we need to get rid of tensorflow usage entirely in data_utils.py. This will result in significant streamlining. I propose refactoring the creation of the numpy arrays using Apache Spark instead of the Tensorflow dataloader. Using that for preprocessing is a vestige of some really crude code macgyvering that doesn't belong in a polished repo.

@jerrylin96 jerrylin96 pinned this issue Aug 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants