Skip to content

Parallel download#11

Merged
ferponcem merged 29 commits intomainfrom
parallel_download
Sep 3, 2024
Merged

Parallel download#11
ferponcem merged 29 commits intomainfrom
parallel_download

Conversation

@ferponcem
Copy link
Copy Markdown
Contributor

Make the IBC data fetching parallel, this is not ready yet!

@ferponcem
Copy link
Copy Markdown
Contributor Author

So far I have:

  • added some joblib to handle single files download
  • adjusted tqdm to get a better estimation of progress
  • modified some functions to work accordingly
  • added joblib to dependecies
  • added to ebrains-token to the gitignore list

Some issues:

  • it is still very slow !

Next:

  • expand the ibc.filter_data options to filter according to data extension to, with this testing will be faster
  • test options for optimization
  • test in a different environment to see if some issues are related to my working env or if new issues arise

Copy link
Copy Markdown
Member

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx ! LMK when it's ready.

Comment thread examples/example.py Outdated
#%%
import pdb
import sys
sys.path.append('/home/fer/HBP_IBC/api/src/ibc_api')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should not have to do that ?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ferponcem, you should first install the package via pip as mentioned in the README and then run the example.py.

You wouldn't have to do this then.

Comment thread src/ibc_api/utils.py Outdated
return remote_file_names, local_file_names


# def _update_local_db(db_file, files_data):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove deprected lines

Comment thread src/ibc_api/utils.py Outdated
updated local database
"""

# file_names = [file_data[0] for file_data in files_data]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

ferponcem and others added 3 commits July 30, 2024 17:27
@man-shu
Copy link
Copy Markdown
Contributor

man-shu commented Jul 31, 2024

@ferponcem, I just realized I wasn't watching the repo, so I did not get notified about this.
Let me know if you need a review or some help with this :)

…by_download

fix: curl -> request for metadata
@bthirion
Copy link
Copy Markdown
Member

bthirion commented Aug 7, 2024

Hello, what is the status for this PR ?

@ferponcem
Copy link
Copy Markdown
Contributor Author

It is good for a review now, I'm preparing a small example

@ferponcem ferponcem marked this pull request as ready for review August 8, 2024 15:47
Copy link
Copy Markdown
Contributor

@man-shu man-shu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't test it yet, maybe update the example.py appropriately?

Comment thread examples/example.py
Comment thread examples/example.py Outdated
Comment thread examples/example.py Outdated
#%%
import pdb
import sys
sys.path.append('/home/fer/HBP_IBC/api/src/ibc_api')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ferponcem, you should first install the package via pip as mentioned in the README and then run the example.py.

You wouldn't have to do this then.

Comment thread src/ibc_api/utils.py Outdated
Comment thread src/ibc_api/utils.py Outdated
Comment thread src/ibc_api/utils.py Outdated
Comment thread src/ibc_api/utils.py
Comment thread src/ibc_api/utils.py Outdated
Comment thread src/ibc_api/utils.py Outdated
Comment thread src/ibc_api/metadata.py
ferponcem and others added 6 commits August 23, 2024 16:25
Co-authored-by: Himanshu Aggarwal <himanshuaggarwal1997@gmail.com>
Co-authored-by: Himanshu Aggarwal <himanshuaggarwal1997@gmail.com>
Co-authored-by: Himanshu Aggarwal <himanshuaggarwal1997@gmail.com>
Co-authored-by: Himanshu Aggarwal <himanshuaggarwal1997@gmail.com>
Co-authored-by: Himanshu Aggarwal <himanshuaggarwal1997@gmail.com>
@ferponcem
Copy link
Copy Markdown
Contributor Author

Thank you both for the feedback! I've corrected the example in examples/example.py 😸
For testing just make sure you checkout to this branch and run the install command wherever you've cloned the repo:

cd ~/api
git checkout parallel_download
pip install -e .
python examples/example.py

@bthirion
Copy link
Copy Markdown
Member

Thx !
When running it, I hit an issue;

Found 36 files to download.
... Fetching token and connecting to EBRAINS ...
***
To continue, please go to https://iam.ebrains.eu/auth/realms/hbp/device?user_code=GNTL-FWKT
***
[siibra:ERROR] exceeded max attempts: 12, aborting...
Traceback (most recent call last):
  File "/home/bertrandthirion/mygit/ibc/api/examples/example.py", line 13, in <module>
    downloaded_db = ibc.download_data(filtered_db, n_jobs=2)
  File "/home/bertrandthirion/mygit/ibc/api/src/ibc_api/utils.py", line 444, in download_data
    connector = _connect_ebrains(data_type)
  File "/home/bertrandthirion/mygit/ibc/api/src/ibc_api/utils.py", line 92, in _connect_ebrains
    token_file = _authenticate()
  File "/home/bertrandthirion/mygit/ibc/api/src/ibc_api/utils.py", line 55, in _authenticate
    siibra.fetch_ebrains_token()
  File "/home/bertrandthirion/.local/lib/python3.10/site-packages/siibra/retrieval/requests.py", line 281, in fetch_token
    cls.device_flow(**kwargs)
  File "/home/bertrandthirion/.local/lib/python3.10/site-packages/siibra/retrieval/requests.py", line 348, in device_flow
    raise EbrainsAuthenticationError(message)
siibra.retrieval.exceptions.EbrainsAuthenticationError: exceeded max attempts: 12, aborting...

@man-shu
Copy link
Copy Markdown
Contributor

man-shu commented Aug 26, 2024

Thx ! When running it, I hit an issue;

[siibra:ERROR] exceeded max attempts: 12, aborting...

This happens when it takes you a while to log in and verify your EBRAINS account (it could be due to a forgotten password etc.). You can simply log into your EBRAINS account beforehand and run the example.py again, click on the link, and verify quickly.

The example runs fine for me! Thanks @ferponcem!

Copy link
Copy Markdown
Contributor

@man-shu man-shu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@bthirion
Copy link
Copy Markdown
Member

I can't get it to work. I would need to do it with you.
Can be merged meanwhile I think.

Copy link
Copy Markdown
Member

@bthirion bthirion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works on my side.

@ferponcem
Copy link
Copy Markdown
Contributor Author

Thank you both for the feedback ! Merging now 🚀

@ferponcem ferponcem merged commit 6e5cc23 into main Sep 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants