Moving large files between tasks? #4476
Replies: 2 comments 2 replies
-
hi @stwerner97 thank you for reaching out. I think Flyte is absolutely fine in handling large files. You do not need to always download a file, you can ideally stream a file. here is the docs for this api - https://docs.flyte.org/projects/flytekit/en/latest/generated/flytekit.types.file.FlyteFile.html#flytekit.types.file.FlyteFile.open Also Flyte does not really move data, it just copies pointers to S3 over. Here is some more docs to understand this - https://docs.flyte.org/en/latest/concepts/data_management.html#divedeep-data-management Data does not need to fit into ram, best would be to stream and if you are using torch using a streaming dataloader. An example for this could be https://medium.com/speechmatics/how-to-build-a-streaming-dataloader-with-pytorch-a66dd891d9dd. Docs on how to extend flyte types - https://docs.flyte.org/projects/cookbook/en/latest/getting_started/extending_flyte.html#id1 |
Beta Was this translation helpful? Give feedback.
-
Thanks for the extensive answer @kumare3 ! 🙆♂️ That makes a lot of sense, although I sometimes need to deal with some data sources that do not support steaming. Is it possible to use persistent volumes or another solution for such cases in Flyte? |
Beta Was this translation helpful? Give feedback.
-
Hi, I am looking into Flyte at the moment, but am unsure if Flyte is suited for me and would like to get some guidance from the community 😄
I want to use Flyte, among other things, for training neural networks on small- to medium-sized datasets of up to ~1-3 TBs.
Is it possible to move files of this size around between tasks? How would I go about training a neural network then? Assuming that data is stored in a S3 bucket, I would need to download the data first and then invoke a training, right? Is it possible to load the data onto some persistent volume then, or is it essential that the data fits into RAM? Is it possible to share a filesystem between two tasks or are these isolated?
Beta Was this translation helpful? Give feedback.
All reactions