Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to upload and fetch data archives from a cloud storage service (like amazon s3 buckets) #6

Open
leoank opened this issue Mar 12, 2020 · 8 comments

Comments

@leoank
Copy link
Member

leoank commented Mar 12, 2020

Problem

For testing InterMines on a CI platform like travis, we need to build InterMines from scratch. This takes a lot of time.

Outline of a solution

We want intermine_boot to upload data archives to a cloud storage service.
Start with aws s3 buckets. But, implement this in such a way that adding support for new cloud backends will be easier.
These uploaded archives will be reused on the CI platform to reduce build times

Note This issue can take serious amount of time and effort to solve.

@heralden
Copy link
Member

As a note, these archives are currently created in $XDG_DATA_HOME/intermine_boot (on Linux this is ~/.local/share/intermine_boot). Currently the filenames correspond to the name of the docker containers, but we plan to add versioning to the filenames in the future.

@22PoojaGaur
Copy link
Member

Hi, I would like to take up this issue.
Some questions ->

  1. How would storing data_archive help? (Would travis tests directly interact with latest archive of the intermine and test?)
  2. Does the org has access to aws s3 bucket?

@leoank
Copy link
Member Author

leoank commented Mar 19, 2020

How would storing data_archive help? (Would travis tests directly interact with latest archive of the intermine and test?)

intermine_boot will fetch archives from any cloud storage on travis and mount it inside docker containers. This will drastically reduce build times on ci.

Does the org has access to aws s3 bucket?

Yes. But, unfortunately we cannot give access to it right now. It's free to use amazon s3 buckets for one year if you create a new account on aws. So you can test your implementation using your own account.

@shreyagupta30
Copy link

Hi @uosl!
Can I work on this issue?

@heralden
Copy link
Member

@shreyagupta30 Yes, please go ahead!

@22PoojaGaur
Copy link
Member

Hi, I am trying to make a PR on this but I get this error
fatal: unable to access 'https://github.com/intermine/intermine_boot.git/': The requested URL returned error: 403
Do we have to fork and then create pull request?

@22PoojaGaur
Copy link
Member

22PoojaGaur commented Mar 26, 2020

@uosl
Added PR for upload to S3 bucket #17
Sorry for the delay. I had to wait for aws educate account approval and was later stuck in transit due to covid-19.

@niveditarufus
Copy link
Contributor

@uosl @leoank
Added PR for upload and download from an S3 bucket #20. I am doing versioning by modifying the file names rather than the metadata or the versionid. AWS allows us to specifiy metadata when uploading but only the versionid while downloading. So I did not get any obvious way of mapping the metadata to the versionid which would allow us to download the file with the correct versionid.
The allowed arguments for upload and download are specified here link please review and suggest improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants