-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniprot Metazoan database not downloaded due to changes in UniProt REST API #52
Comments
Hi Ido, you can replace the old ulr with something looking like this: "https://legacy.uniprot.org/uniprot/?query=taxonomy:33208&format=fasta&compress=yes&include=no" changing the taxonomy ID accordingly, that should do the work. cheers Sergio |
The issue with the metazoa UniProt database and whatever other database to be downloaded from the UniProt is that the API was changed, and the legacy option must be fixed too. The new REST API requires pagination if more than 10,000,000 records are to be downloaded. This makes things a bit more complicated but possible. Please note that I drafted this bash script only to make the download work and that it can probably be much more efficient and elegant, but I don't have time to improve it. #test using cyanobacterial proteins (2010 in total) change the url to download the metazoan dataset after testing
url="https://rest.uniprot.org/uniprotkb/search?compressed=true&format=fasta&query=%28%28taxonomy_id%3A1608213%29%29&size=500"
page=1
while [ "$url" != "" ]
do
echo "Downloading $url"
curl -D head $url >>test.gz
url=`grep link head | cut -f 2 -d " " | sed 's/[;]//g' | sed 's/(/%28/g' | sed 's/)/%29/g' | sed 's/id:/id%3A/g' | sed 's/[<>]//g'`
echo $page >>head
page=$((page+1))
done
This will download 2,010 cyanobacterial proteins from UniProt and write the headers necessary to proceed in the file For metazoa, there are 34,470,675 proteins in the database (checked 12.06.2023). You can do the math. The URL for Metazoa looks like this:
Generally, you can build the URLs by replacing the I did not test for the (larger) metazoan file, so I would appreciate it if you could report the results. Oh, and I am assuming you will do this from a Linux OS. |
The
precheck_TransPi.sh
script fails to download the UniProt Metazoa database due to changes to the API.This is what comes back from the download command:
I tried implementing the changes suggested by the new API guide with the command below, but it doesn't download anything.
Thanks, Ido
The text was updated successfully, but these errors were encountered: