Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conflicting opinions about streaming data from cloud storage? #30

Closed
hacobe opened this issue Feb 24, 2024 · 2 comments
Closed

Conflicting opinions about streaming data from cloud storage? #30

hacobe opened this issue Feb 24, 2024 · 2 comments

Comments

@hacobe
Copy link

hacobe commented Feb 24, 2024

(1) and (2) seem to express different opinions:

  1. In the "3 Machine Learning IO needs" section, one of the bullet points under "Incoming suggestions from Ross Wightman to integrate" is "Note that once your datasets are optimally friendly for a large, distributed network filesystem, they can usually just be streamed from bucket storage in cloud systems that have that option. So better to move them off the network
    filesystem in that case."

  2. The section "Local storage beats cloud storage" starts with "While cloud storage is cheaper the whole idea of fetching and processing your training data stream dynamically at training time is very problematic with a huge number of issues around it...It’s so much better to have enough disk space locally for data loading."

What am I missing?

@stas00
Copy link
Owner

stas00 commented Feb 26, 2024

Thank you very much for pointing the incongruity, @hacobe - I have fixed it here.

But basically these are 2 different opinions by 2 different people. I moved Ross' suggestions to incoming so that I could integrate them properly later. I shouldn't have dumped them into the main text as is.

Bottom line is that I am yet to find a good streaming solution and that's my experience. Ross seems to have had a working streaming solution, but we have been doing very different things, so possibly both are possible.

@stas00 stas00 closed this as completed Feb 26, 2024
@hacobe
Copy link
Author

hacobe commented Feb 26, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants