Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

probelm with bashdatacatalog-list #2

Open
pkasibhatla opened this issue Aug 4, 2022 · 9 comments
Open

probelm with bashdatacatalog-list #2

pkasibhatla opened this issue Aug 4, 2022 · 9 comments
Labels
category: Bug Something isn't working

Comments

@pkasibhatla
Copy link

I am trying to download the chem input files for GCHP 13.4. The command
bashdatacatalog-fetch InputDataCatalogs/13.4/ChemistryInputs.csv
from my ExtData directory seems to work fine. Output is attached below.

But when I give the command
bashdatacatalog-list -am -r 2019-06-30,2019-08-02 -f xargs-curl InputDataCatalogs/13.4/ChemistryInputs.csv | xargs curl
I get a bunch of numbers on the screen. Here are the first few lines of what I see:

(gchp-openmpi-env) psk9@dcc-login-01  /work/psk9/Data/ExtData $ bashdatacatalog-list -am -r 2019-06-30,2019-08-02 -f xargs-curl InputDataCatalogs/13.4/ChemistryInputs.csv | xargs curl
curl: option --write-out: requires parameter
curl: try 'curl --help' for more information
curl: option --write-out: requires parameter
curl: try 'curl --help' for more information
curl: option --url: requires parameter
curl: try 'curl --help' for more information
curl: option --url: requires parameter
curl: try 'curl --help' for more information
curl: option --url: requires parameter
curl: try 'curl --help' for more information
curl: option -o: requires parameter
curl: try 'curl --help' for more information
curl: option --write-out: requires parameter
curl: try 'curl --help' for more information
8.621e-28 8.621e-28 8.621e-28 1.526e-26 1.526e-26 2.224e-25 2.224e-25 2.224e-25 2.700e-24 2.700e-24 1.037e-25 1.037e-25 3.833e-28 3.833e-28 3.833e-28 1.275e-30 1.275e-30 4.538e-33 4.538e-33 9.746e-35 9.746e-35 9.746e-35 1.645e-35 1.645e-35 0.000e+00 0.000e+00 0.000e+00 1.168e-36 1.168e-36 5.806e-35 5.806e-35 1.690e-34 1.690e-34 1.690e-34 3.137e-34 3.137e-34 2.057e-34 2.057e-34 4.358e-35 4.358e-35 4.358e-35 2.697e-37 2.697e-37 0.000e+00 0.000e+00 0.000e+00 
1.507e-28 1.507e-28 1.507e-28 7.446e-27 7.446e-27 4.232e-25 4.232e-25 4.232e-25 2.522e-23 2.522e-23 7.689e-25 7.689e-25 2.125e-27 2.125e-27 2.125e-27 5.511e-30 5.511e-30 1.354e-32 1.354e-32 1.325e-34 1.325e-34 1.325e-34 2.230e-36 2.230e-36 0.000e+00 0.000e+00 0.000e+00 8.795e-36 8.795e-36 1.102e-34 1.102e-34 3.159e-34 3.159e-34 3.159e-34 7.244e-34 7.244e-34 4.216e-34 4.216e-34 1.184e-35 1.184e-35 1.184e-35 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 
5.333e-28 5.333e-28 5.333e-28 5.029e-26 5.029e-26 4.204e-24 4.204e-24 4.204e-24 2.432e-22 2.432e-22 5.353e-24 5.353e-24 9.517e-27 9.517e-27 9.517e-27 1.452e-29 1.452e-29 2.247e-32 2.247e-32 4.036e-35 4.036e-35 4.036e-35 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 3.412e-35 3.412e-35 1.615e-34 1.615e-34 3.813e-34 3.813e-34 3.813e-34 8.920e-34 8.920e-34 4.539e-34 4.539e-34 5.912e-37 5.912e-37 5.912e-37 0.000e+00 0.000e+00 0.000e+00 0.000e+00 0.000e+00 

The bashdatacatalog-list seems to work fine for fetching the met data and the hemco files.
chem_meta.txt

@LiamBindle
Copy link
Contributor

LiamBindle commented Aug 6, 2022

Hi Prasad,

I think this is an issue with the version of curl that's installed. Could you try wget instead? The commands would be

$ bashdatacatalog-list -am -r 2019-06-30,2019-08-02 -f xargs-curl InputDataCatalogs/13.4/ChemistryInputs.csv > url_download_list.txt
$ wget -i url_download_list.txt -x -nH -nv --cut-dirs=4 

The first command generates a url list, and the second command downloads the list with wget. Could you try this?


I'll take a look at fixing the -f xargs-curl format. It looks like I'm using an option that was added more recently then I thought.

@pkasibhatla
Copy link
Author

pkasibhatla commented Aug 6, 2022 via email

@jiaying002
Copy link

I also occurred the same "number" issue when downloaded the chem input data, but my problem cannot be solved by the wget method provided by Liam.

I followed the code provided by Liam, and when I run the wget line, I got the feedback "No URLs found in download_url_list.txt."

url_download_list.txt

@yantosca yantosca added the category: Bug Something isn't working label Nov 17, 2022
@jhaskinsPhD
Copy link

jhaskinsPhD commented May 12, 2023

I'm also getting a lot of these curl argument errors described above using the command:

bashdatacatalog-list -am -r "2012-06-01,2013-12-01" -f xargs-curl DataCatalogs/14.1.1/*.csv | xargs -P 4 curl

and when I generate a url list as follows:

bashdatacatalog-list -am -r 2010-09-01,2012-01-01 -f xargs-curl DataCatalogs/14.1.1/*.csv > url_download_list.txt

I get this file: url_download_list.txt

and when I try to use wget as follows:
wget -i url_download_list.txt -x -nH -nv

I also get the error: No URLs found in url_download_list.txt.

Has there been any progress on this bug? I'm trying to set up my server at UUtah so I'm needing to download a lot of dif input files... What version of curl is required to not get these errors? Does anyone have an idea of how this curl -o error messes with the files downloaded? Does it indeed mess up the names as Prasad indicated?

@yantosca
Copy link
Contributor

Hi @jhaskinsPhD, thanks for writing. I was able to replicate your error.

Am tagging @SaptSinha who may be more knowledgeable about bashdatacatalog issues than I am.

Also tagging @LiamBindle, who has since left the GEOS-Chem community, but still may have some ideas.

@yantosca
Copy link
Contributor

@jhaskinsPhD: You might also consider using Globus Endpoint for the file transfer. I bet that U of Utah has a Globus account, you can check with your IT support staff there. Download from "GEOS-Chem data (WashU)".

@jiaying002
Copy link

Hey @jhaskinsPhD , I believe I solved it with the help of this link:
https://github.com/LiamBindle/bashdatacatalog/wiki/3.-Useful-Commands

I used this command to solve this problem:
$ bashdatacatalog-list -am -f url catalog.csv > url_download_list.txt
$ wget -i url_download_list.txt -x -nH -nv --cut-dirs=4 # you will need to modify --cut-dirs=N

The first line added url comparing to the answer of this issue.

You can also use the Globus as @yantosca said by those commands:
$ bashdatacatalog-list -am -f globus="$(pwd),/remote-data-root/" catalog.csv > globus_batch.txt
$ globus transfer --batch globus_batch.txt SOURCE_ENDPOINT_ID DEST_ENDPOINT_ID

Hope this helps!

Copy link
Contributor

Thanks @jiaying002 for the feedback on this issue!

@yctrrr
Copy link

yctrrr commented Dec 22, 2023

For anyone who may be confused about the wget method. The right way to do this seems to be:
1.bashdatacatalog-list -am -r 2019-06-30,2019-08-02 -f url InputDataCatalogs/13.4/ChemistryInputs.csv > url_download_list.txt
where the argument -f url means url links instead of xargs curl
2.wget -i url_download_list.txt -x -nH -nv --cut-dirs=1
-x will create a hierarchy of directories by urls. -nH will remove host-prefixed directories (geoschemdata.wustl.edu in this case). The setting of --cut-dirs = ? will depend on the location of your download txt. It will allow you to cut the component of the dirctories. e.g. --cut-dirs=1 will remove ExtData/ in ExtData/CHEM_INPUTS/

You will also need to repeat the methods above whenever new requests are needed to update the downloading urls.

@SaptSinha SaptSinha removed their assignment Dec 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: Bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants