Constrain on the dataset to pull #759

isabella618033 · 2022-05-03T17:04:45Z

Constrain on the dataset to pull from ipfs to be Books3 corpus only, so it would be faster for the github action to pass.
BUT it would not solve the underlying issue where by default when user trying to pull a mix of datasets from ipfs, it would still take them super long.
Why would other dataset take longer? Because the mountain hash (the txt file that stores the hashes of all other txt files that belongs to that dataset) of other datasets are larger. Also, some data file of other dataset is so small that we have to send a lot more requests to IPFS to collect the same amount of text.

codecov · 2022-05-03T17:35:29Z

Codecov Report

Merging #759 (82ee114) into master (ed6a908) will increase coverage by 0.04%.
The diff coverage is 100.00%.

❗ Current head 82ee114 differs from pull request most recent head 1a12d44. Consider uploading reports for the commit 1a12d44 to get more accurate results

@@            Coverage Diff             @@
##           master     #759      +/-   ##
==========================================
+ Coverage   81.86%   81.90%   +0.04%     
==========================================
  Files          42       42              
  Lines        4984     4984              
==========================================
+ Hits         4080     4082       +2     
+ Misses        904      902       -2

Impacted Files	Coverage Δ
bittensor/_dataset/__init__.py	`91.30% <100.00%> (ø)`
bittensor/_dataset/dataset_impl.py	`89.71% <0.00%> (ø)`
bittensor/utils/networking.py	`81.17% <0.00%> (+2.35%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1d915dc...1a12d44. Read the comment docs.

joeylegere · 2022-05-06T03:14:29Z

Overall this LGTM, but I'd like to see a variable like mock_dataset_for_tests=["Books3"] defined in a constant.py or config.yaml or whatever so we can just import it. Then if we decide to change the dataset we use we only need to change one line of code rather than 6.

isabella618033 · 2022-05-06T19:00:44Z

Overall this LGTM, but I'd like to see a variable like mock_dataset_for_tests=["Books3"] defined in a constant.py or config.yaml or whatever so we can just import it. Then if we decide to change the dataset we use we only need to change one line of code rather than 6.

Hey, nice suggestion! I have made an update to add constant.py, but rn it is rather empty with only constants for the dataset test. Lets move other constant over in some other PR if it is good

constrain on the dataset to pull

82ee114

isabella618033 changed the title ~~constrain on the dataset to pull~~ Constrain on the dataset to pull May 6, 2022

isabella618033 added 2 commits May 6, 2022 14:48

added config file for test

fea51c9

constant.py approach instead

be473e0

isabella618033 and others added 3 commits May 10, 2022 14:38

Merge branch 'master' into BIT-404-github_action_stuck_at_dataset

1a12d44

import error fix (#767)

efa3110

Merge branch 'master' into BIT-404-github_action_stuck_at_dataset

7c481da

isabella618033 changed the base branch from master to nobunaga May 11, 2022 16:30

joeylegere approved these changes May 11, 2022

View reviewed changes

Merge branch 'nobunaga' into BIT-404-github_action_stuck_at_dataset

2be54b3

joeylegere merged commit 3b9b727 into nobunaga May 12, 2022

Eugene-hu mentioned this pull request May 17, 2022

Nobunaga #765

Merged

ifrit98 deleted the BIT-404-github_action_stuck_at_dataset branch May 24, 2023 14:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Constrain on the dataset to pull #759

Constrain on the dataset to pull #759

isabella618033 commented May 3, 2022 •

edited

Loading

codecov bot commented May 3, 2022 •

edited

Loading

joeylegere commented May 6, 2022

isabella618033 commented May 6, 2022

Constrain on the dataset to pull #759

Constrain on the dataset to pull #759

Conversation

isabella618033 commented May 3, 2022 • edited Loading

codecov bot commented May 3, 2022 • edited Loading

Codecov Report

joeylegere commented May 6, 2022

isabella618033 commented May 6, 2022

isabella618033 commented May 3, 2022 •

edited

Loading

codecov bot commented May 3, 2022 •

edited

Loading