`create_cazy_db` fails #4

lonsbio · 2017-05-24T06:23:31Z

Unable to create database on Python 2.7.13. Output (exlcucing BeautifulSoup warning) as follows:

>> Gathering species codes for species with full genomes
>> Glycoside-Hydrolases
>> 145 families found on http://www.cazy.org/Glycoside-Hydrolases.html
> GH1

then error

first_page_idx = int(page_index_list[0]['href'].split('PRINC=')[-1].split('#')[0]) # be careful with this
ValueError: invalid literal for int() with base 10: 'GH1_archaea.html?debut_TAXO=100'

Has the pagination code changed for the expression to fail?

rvhonorato · 2017-05-24T12:28:59Z

Yes, looks like the pagination changed a bit. I did a quick fix using regular expressions #5 and it should work fine now. Thanks for opening this issue.

lonsbio · 2017-05-25T05:47:07Z

Thanks! I tried my own patch overnight (not as elegant) and it seemed to work too.

Also, I'm not sure if this is a recent issue or incidental. My DB download file seems to have newlines surrounding the organism field:

domain	protein_name	family	tag	organism_code	ec	genbank	uniprot	subfamily	organism	pdb
	 Ahos_0285	GH1		invalid	 	AEE93176.1	 		
Acidianus hospitalis W1

Fixing it does't seem to effect the extract script, but does make the csv (tsv) file readable. Is the wrapping intentional?

rvhonorato closed this as completed Oct 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`create_cazy_db` fails #4

`create_cazy_db` fails #4

lonsbio commented May 24, 2017

rvhonorato commented May 24, 2017

lonsbio commented May 25, 2017

create_cazy_db fails #4

create_cazy_db fails #4

Comments

lonsbio commented May 24, 2017

rvhonorato commented May 24, 2017

lonsbio commented May 25, 2017

`create_cazy_db` fails #4

`create_cazy_db` fails #4