-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FTP list fails with large number of file #57
Comments
hum, I did not face the issue. I would also have looked at timeout issue, but if it does not solve the problem I don't know. |
Could you try with curl directly with option --trace trace.txt ? I saw same issue on internet about sftp servers, and not ftp. |
Yes me too. But the NCBI site is not sftp :( Here is the trace.txt output:
|
I think it expects to start receiving something within X seconds and cancel Le ven. 12 août 2016 14:29, Emmanuel Quevillon notifications@github.com a
|
Yes that what I suspected, but i could not find any documentation on this using pycurl. By the way, I've tried with Genbank
PDB
The only diff I see is the mode, EDIT: I've try to disable |
Passive vs active should not be issue. This makes pb usually when going Le ven. 12 août 2016 15:39, Emmanuel Quevillon notifications@github.com a
|
Did you try setting CURLOPT_TIMEOUT just like for download step? (and set param in config) |
CURLOPT_TIMEOUT is already set in
which refers to
Even if I increase this value, it has no effect :( |
Hi, Maybe a clue to fix this problem. Using curl option
At least the dir listing is available, however, we fail later in the workflow as this cul option only list the directory content, is does not retrieve metadata such as permissions, date, size etc...
which the build of the release based on last updated files :( |
we need all metadata, so it is not good :-( |
Yes I know, unless we can combine such bank (with huge file list) with a release file number. |
this is a workaround for specific bank, and it is not even sure it will work 100%. |
yeah you're right :( |
could you share the bank ini file? |
Here are the info for Genbank WGS
|
I am trying option TCP_KEEPALIVE, which needs pycurl/curl version >= 7.25.0. |
Ok. For info I've update my pycurl from |
not better, but error (56, 'response reading failed) occurs between 1min and more ( occured at 5 minutes), it depends.... so it depends on remote server. |
For info, using |
does ncftp report all metadata ? |
I dont think so I dont remember actually Le 16 août 2016 17:23, "Olivier Sallou" notifications@github.com a écrit :
|
maybe it acts like CURLOPT_DIRLISTONLY |
probably :( Le 16 août 2016 17:34, "Olivier Sallou" notifications@github.com a écrit :
|
Hi Olivier, Back on the problem. We've found the source of the problem. It is not related to Emmanuel |
Nice analysis. Maybe you should contact upstream ftp maintainer to raise Le ven. 19 août 2016 16:27, Emmanuel Quevillon notifications@github.com a
|
Thanks :) Le 19 août 2016 16:39, "Olivier Sallou" notifications@github.com a écrit :
|
Hi,
I'm facing a problem with a bank that download a lots of files.
I'm trying to get files from
Genbank WGS
(ftp://ftp.ncbi.nlm.nih.gov/genbank/wgs
).This directory contains around 84,000 files. Then when I run biomaj, I always get this error:
It somehow mean that the ftp reponse is longer than expected to retrieve the list of files.
I've try to set some options like (FTP_RESPONSE_TIME) but no success.
So my question is, do you have any clue on how to avoid such problem?
The problem is similar using
Firefox
, listingwgs
directory ends with a blank page.However, using
ncftp
, commanddir
succeed but we need to wait around a minute to get the file list.Thanks
Emmanuel
The text was updated successfully, but these errors were encountered: