-
Notifications
You must be signed in to change notification settings - Fork 318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constrain on the dataset to pull #759
Conversation
isabella618033
commented
May 3, 2022
•
edited
Loading
edited
- Constrain on the dataset to pull from ipfs to be Books3 corpus only, so it would be faster for the github action to pass.
- BUT it would not solve the underlying issue where by default when user trying to pull a mix of datasets from ipfs, it would still take them super long.
- Why would other dataset take longer? Because the mountain hash (the txt file that stores the hashes of all other txt files that belongs to that dataset) of other datasets are larger. Also, some data file of other dataset is so small that we have to send a lot more requests to IPFS to collect the same amount of text.
Codecov Report
@@ Coverage Diff @@
## master #759 +/- ##
==========================================
+ Coverage 81.86% 81.90% +0.04%
==========================================
Files 42 42
Lines 4984 4984
==========================================
+ Hits 4080 4082 +2
+ Misses 904 902 -2
Continue to review full report at Codecov.
|
Overall this LGTM, but I'd like to see a variable like |
Hey, nice suggestion! I have made an update to add |