-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read data from hdfs #1
Comments
Distribute data to cluster is not added in PaddlePaddle now. You can read data directly from a HDFS file path by PyDataProvider2. PaddlePaddle not handle how to get data file remotely, just pass the file path into a Python function. It is user's job to OPEN the file (or SQL connection string, or HDFS path), and get each It is welcome to contribute a script to distribute data to cluster. Or we may add it soon if this feature is very necessary. |
Invoke check_grad many times for no_grad_set
modify for dynamic zeus
Rename docs-src to docs and rename demo to tutorials.
[MTAI-489] build(ci): test CI
Correct license in rockspec file.
* provide capi for flash attention * cuda enforce; flash error * fix api * fix zero tensors * fix softmax_ptr size
Since you haven't replied for more than a year, we have closed this issue/pr. |
"Different node should owns different parts of all Train data. This simple script did not do this job, so you should prepare it at last. " I saw this in cluster training wiki. So, could paddle read data from hdfs and distribute data to each node automatically?
The text was updated successfully, but these errors were encountered: