Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constrain on the dataset to pull #759

Merged
merged 7 commits into from
May 12, 2022

Conversation

isabella618033
Copy link
Contributor

@isabella618033 isabella618033 commented May 3, 2022

  • Constrain on the dataset to pull from ipfs to be Books3 corpus only, so it would be faster for the github action to pass.
  • BUT it would not solve the underlying issue where by default when user trying to pull a mix of datasets from ipfs, it would still take them super long.
  • Why would other dataset take longer? Because the mountain hash (the txt file that stores the hashes of all other txt files that belongs to that dataset) of other datasets are larger. Also, some data file of other dataset is so small that we have to send a lot more requests to IPFS to collect the same amount of text.

@codecov
Copy link

codecov bot commented May 3, 2022

Codecov Report

Merging #759 (82ee114) into master (ed6a908) will increase coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head 82ee114 differs from pull request most recent head 1a12d44. Consider uploading reports for the commit 1a12d44 to get more accurate results

@@            Coverage Diff             @@
##           master     #759      +/-   ##
==========================================
+ Coverage   81.86%   81.90%   +0.04%     
==========================================
  Files          42       42              
  Lines        4984     4984              
==========================================
+ Hits         4080     4082       +2     
+ Misses        904      902       -2     
Impacted Files Coverage Δ
bittensor/_dataset/__init__.py 91.30% <100.00%> (ø)
bittensor/_dataset/dataset_impl.py 89.71% <0.00%> (ø)
bittensor/utils/networking.py 81.17% <0.00%> (+2.35%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d915dc...1a12d44. Read the comment docs.

@joeylegere
Copy link
Contributor

Overall this LGTM, but I'd like to see a variable like mock_dataset_for_tests=["Books3"] defined in a constant.py or config.yaml or whatever so we can just import it. Then if we decide to change the dataset we use we only need to change one line of code rather than 6.

@isabella618033 isabella618033 changed the title constrain on the dataset to pull Constrain on the dataset to pull May 6, 2022
@isabella618033
Copy link
Contributor Author

Overall this LGTM, but I'd like to see a variable like mock_dataset_for_tests=["Books3"] defined in a constant.py or config.yaml or whatever so we can just import it. Then if we decide to change the dataset we use we only need to change one line of code rather than 6.

Hey, nice suggestion! I have made an update to add constant.py, but rn it is rather empty with only constants for the dataset test. Lets move other constant over in some other PR if it is good

@isabella618033 isabella618033 changed the base branch from master to nobunaga May 11, 2022 16:30
@joeylegere joeylegere merged commit 3b9b727 into nobunaga May 12, 2022
@Eugene-hu Eugene-hu mentioned this pull request May 17, 2022
@ifrit98 ifrit98 deleted the BIT-404-github_action_stuck_at_dataset branch May 24, 2023 14:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants