Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transfer LLC4320 zarr data to SciServer #1

Open
rabernat opened this issue Oct 12, 2018 · 12 comments
Open

transfer LLC4320 zarr data to SciServer #1

rabernat opened this issue Oct 12, 2018 · 12 comments
Assignees

Comments

@rabernat
Copy link
Member

All of the data is cataloged here, with links to google cloud:
http://pangeo.io/catalog.html

@rabernat
Copy link
Member Author

Specifically, one will use gsutil with a command like

gsutil -m cp -r gcs://pangeo-data/llc4320_surface/SST .

@rabernat
Copy link
Member Author

Hi @glemson, have you given this a try?

@glemson
Copy link
Collaborator

glemson commented Oct 17, 2018 via email

@glemson
Copy link
Collaborator

glemson commented Dec 13, 2018

Problem with gsutil, most likely due to incorrect installation.
$ filedb02:/srv/data01/ocean/LLC4320:50$ ~/gsutil/gsutil -m cp -r gcs://pangeo-data/llc4320_surface/SST .
InvalidUrlError: Unrecognized scheme "gcs".

@rabernat
Copy link
Member Author

rabernat commented Dec 13, 2018 via email

@glemson
Copy link
Collaborator

glemson commented Dec 13, 2018 via email

@glemson
Copy link
Collaborator

glemson commented Dec 14, 2018

files have been downloaded to /SciServer/filedb02-01/ocean/LLC4320.
I asked sysadmins to create a linux group 'poseidon' that should have access to this data. all/most of us will be member of that group. problem is currently I cannot chown to Poseidon, because apparently I am in too many groups. We'll solve that somehow.

@Mikejmnez
Copy link

Hello!
Just adding to this trend. Only SST has been transferred to filedb02, and we are still missing (on SciServer) the rest of the surface variables available on the pangeo cloud, along with the GRID data. Now that I have access to the group directory, I will finish downloading the rest of the data.

@Mikejmnez
Copy link

@rabernat
I tried downloading the rest of the surface variables onto SciServer, but was only able to download SSS and the grid. When I tried SSH, SSU, SSV or grid instead of SSS (or even SST) I get the following error

-bash-4.2$ ~/gsutil/gsutil -m cp -r gs://pangeo-data/llc4320_surface/SSH .
CommandException: No URLs matched: gs://pangeo-data/llc4320_surface/SSH
CommandException: 1 file/object could not be transferred.

I get the same error for SSU, and SSV (I also tried lower case). Do you know what is the correct URLs for these variables?

@rabernat
Copy link
Member Author

All the info is cataloger here:
https://github.com/pangeo-data/pangeo-datastore/blob/master/intake-catalogs/ocean/llc4320.yaml

However, we need to pause these downloads for a moment. Your transfers have racked up nearly $2000 in egress fees in the past few days.

If I put the bucket into requester-pays mode, can you pay the remaining egress fees from your account? I think you should have lots of money in your CSSI budget for this. (We only have $5K a year.)

-Ryan

@Mikejmnez
Copy link

Mikejmnez commented Nov 26, 2019 via email

@rabernat
Copy link
Member Author

Globus will not solve the problem. But I thought that this deal was supposed to help with the egress situation:
https://www.internet2.edu/blogs/detail/14984

I'll ping our contacts at google for more info.

cc @lila

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants