Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected error message when retrieving AA UniProt sequences #114

Closed
AlejandroSanchezCano opened this issue Apr 25, 2023 · 2 comments · Fixed by #115
Closed

Unexpected error message when retrieving AA UniProt sequences #114

AlejandroSanchezCano opened this issue Apr 25, 2023 · 2 comments · Fixed by #115
Assignees
Labels
bug Something isn't working duplicate This issue or pull request already exists

Comments

@AlejandroSanchezCano
Copy link

First I built the dabatase with:
cazy_webscraper <email> --classes AA

Then I tried:
cw_get_uniprot_data <path_to_db> --families AA17 -s

And I got this output:

Built output directory: .cazy_webscraper_2023-04-25_17-38-06\uniprot_data_retrieval
Using default CAZy class synonyms
Retrieving GenBank accessions for selected CAZy classes: 0it [00:00, ?it/s]
Applying CAZy family filter(s)
Retrieving GenBank accessions for selected CAZy families:   0%|                                                                                  | 0/1 [00:00<?, ?it/s]Retrieving CAZymes for CAZy family AA17
Retrieving GenBank accessions for selected CAZy families: 100%|██████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 12.69it/s]
Applying no taxonomic filters
Retrieving UniProt data for 418
Retrieving data for 418 proteins
[['CCD28157.1', 'ETN25003.1', 'EGZ04327.1', 'EQC34366.1', 'ETM55527.1', 'EEY56117.1', 'ETL32367.1', 'ETL25332.1', 'ETO77111.1', 'ETK81747.1', 'ETO67284.1', 'ETL88378.1', 'ETN06075.1', 'ETI38762.1', 'ETK71917.1', 'ETO83077.1', 'CCA20830.1', 'EGZ04492.1', 'KDO27085.1', 'ETM02278.1', 'ETP30231.1', 'EQC39423.1', 'ETO81104.1', 'ETK81734.1', 'ETM00651.1', 'CCI47381.1', 'ETI42112.1', 'ETL41683.1', 'ETN20052.1', 'EEY58933.1', 'ETP25841.1', 'ETI48323.1', 'EGZ23522.1', 'ETP18130.1', 'EEY69088.1', 'ETN11075.1', 'EEY61639.1', 'EEY61638.1', 'ETK90881.1', 'ETK81744.1', 'KDO29253.1', 'ETK73643.1', 'ETN20049.1', 'ETP39574.1', 'ETO70369.1', 'ETL27076.1', 'ETK95850.1', 'ETN10519.1', 'ETK88975.1', 'EGZ05857.1', 'KDO26913.1', 'ETO83083.1', 'EGZ27273.1', 'ETI35283.1', 'ETI31514.1', 'ETL32408.1', 'UIZ22004.1', 'ETL85648.1', 'ETM00659.1', 'ETK81732.1', 'ETI41722.1', 'ETI41730.1', 'AHO49056.1', 'ETK95846.1', 'ETP53850.1', 'ETK88582.1', 'ETN10510.1', 'ETO77816.1', 'ETI41706.1', 'ETI32208.1', 'ETI50994.1', 'EGZ10739.1', 'EQC34755.1', 'ETP36460.1', 'KDO27086.1', 'ETK81751.1', 'EEY60789.1', 'EEY58927.1', 'UIZ27392.1', 'ETI41723.1', 'ETK88280.1', 'ETW01779.1', 'ETP53138.1', 'EGZ21313.1', 'CCA13926.1', 'EGZ10738.1', 'ETN25011.1', 'EQC33678.1', 'ETN20074.1', 'ETK95840.1', 'EGZ21309.1', 'ETP39568.1', 'ETL27074.1', 'ETI41715.1', 'ETI35061.1', 'ETL49218.1', 'ETL41675.1', 'ETM32606.1', 'ETN20054.1', 'ETO70332.1', 'ETM55516.1', 'ETI48679.1', 'ETN20047.1', 'KDO27114.1', 'ETN14473.1', 'ETI38761.1', 'ETO71194.1', 'ETI41714.1', 'ETM31823.1', 'EEY68485.1', 'ETM48357.1', 'ETN14460.1', 'ETN10508.1', 'ETM55515.1', 'ETP39587.1', 'ETL80314.1', 'EGZ23520.1', 'ETL85650.1', 'AIG55447.1', 'EEY58936.1', 'EGZ08731.1', 'EGZ21314.1', 'CCA17179.1', 'ETI32192.1', 'ETN10809.1', 'UIZ26027.1', 'EEY58932.1', 'EGZ27342.1', 'ETL88376.1', 'ETN24019.1', 'ETL49222.1', 'ETI56033.1', 'EQC42132.1', 'ETP53858.1', 'ETL35141.1', 'ETI56043.1', 'ETN19254.1', 'EGZ08727.1', 'ETP08653.1', 'ETP03147.1', 'ETI54329.1', 'ETK88959.1', 'ETP53851.1', 'ETM55520.1', 'ETK95844.1', 'EQC25604.1', 'EGZ21312.1', 'ETL25321.1', 'ETO84778.1', 'ETO84785.1'], ['ETK88962.1', 'UIZ24201.1', 'AIG55787.1', 'ETO62052.1', 'EQC25608.1', 'ETM48060.1', 'ETI41729.1', 'EEY54090.1', 'ETI56037.1', 'ETP52131.1', 'ETP08632.1', 'ETN25010.1', 'EEY58944.1', 'EGZ05561.1', 'ETO70366.1', 'EEY58939.1', 'EEY68486.1', 'ETN20053.1', 'ETL35158.1', 'ETL35153.1', 'ETI56084.1', 'ETK78975.1', 'ETM42059.1', 'ETK75541.1', 'ETM32479.1', 'ETL47559.1', 'ETP02046.1', 'AIG55790.1', 'ETN11038.1', 'ETL49225.1', 'EGZ07203.1', 'ETO70367.1', 'ETP25842.1', 'UIZ28766.1', 'ETM97400.1', 'AIG55491.1', 'ETP29504.1', 'EQC24790.1', 'UIZ25173.1', 'ETP53849.1', 'ETO83080.1', 'ETL35151.1', 'ETN24018.1', 'ETP11451.1', 'KAF4046070.1', 'ETK78977.1', 'ETN20064.1', 'EGZ21311.1', 'ETL94825.1', 'ETI54331.1', 'ETN14463.1', 'ETP30207.1', 'ETK72575.1', 'EEY59753.1', 'ETK78714.1', 'ETM38805.1', 'UIZ26903.1', 'ETI42600.1', 'ETI41724.1', 'ETM41652.1', 'ETL35135.1', 'ETV73941.1', 'ETP25843.1', 'ETL35162.1', 'UIZ21835.1', 'ETL35152.1', 'EGZ23516.1', 'ETP53854.1', 'UIZ26907.1', 'EEY55873.1', 'ETI31519.1', 'ETO70368.1', 'ETV73994.1', 'ETP36689.1', 'ETO67480.1', 'UIZ27394.1', 'EGZ05551.1', 'ETM02264.1', 'ETI54320.1', 'EGZ07202.1', 'AIG56201.1', 'EQC24659.1', 'ETI38512.1', 'ETV73954.1', 'EGZ05560.1', 'ETM55519.1', 'ETO67483.1', 'ETN14469.1', 'ETM31809.1', 'ETO70335.1', 'ETO67485.1', 'EQC26776.1', 'ETL32386.1', 'ETL88833.1', 'UIZ22002.1', 'ETM00653.1', 'EGZ08733.1', 'ETL94840.1', 'EGZ17951.1', 'ETI48337.1', 'EGZ07231.1', 'ETO77414.1', 'ETM41632.1', 'ETL79229.1', 'ETN20068.1', 'ETP52112.1', 'KDO29254.1', 'EEY58934.1', 'ETN00173.1', 'ETI31960.1', 'CCI46093.1', 'ETI44371.1', 'ETL88406.1', 'ETK95841.1', 'ETL95131.1', 'ETV73947.1', 'EQC24789.1', 'ETP46069.1', 'ETM02263.1', 'EQC25603.1', 'ETK81737.1', 'ETN20056.1', 'EGZ06334.1', 'ETP11442.1', 'ETV71159.1', 'EEY58926.1', 'EQC34358.1', 'ETW03002.1', 'ETN01693.1', 'ETP29981.1', 'AIG55448.1', 'ETK82650.1', 'EGZ27343.1', 'EEY67612.1', 'ETK81749.1', 'ETI41709.1', 'ETP39589.1', 'ETP18135.1', 'ETL78558.1', 'ETP24138.1', 'ETI48336.1', 'ETP11452.1', 'EGZ08724.1', 'ETN19515.1', 'ETI55312.1', 'ETP11455.1', 'ETI38498.1', 'ETP39588.1', 'EGZ27341.1', 'AIG56266.1'], ['ETP11448.1', 'ETP24139.1', 'UIZ26906.1', 'ETP46800.1', 'ETL49223.1', 'ETO59016.1', 'ETM41638.1', 'EGZ08732.1', 'ETI52336.1', 'ETK81748.1', 'UIZ24199.1', 'AIG55788.1', 'EGZ08725.1', 'ETM02269.1', 'EEY58931.1', 'ETL78541.1', 'ETL35539.1', 'ETN00163.1', 'ETN19290.1', 'ETP24148.1', 'AHO49057.1', 'EEY68484.1', 'ETI35285.1', 'ETK82180.1', 'ETP12314.1', 'ETM41647.1', 'ETO83056.1', 'ETO70334.1', 'EGZ05859.1', 'EGZ08736.1', 'ETP02071.1', 'ETN16020.1', 'ETO60944.1', 'ETP01315.1', 'ETO60224.1', 'EGZ08730.1', 'ETI35284.1', 'ETL27075.1', 'KDO27087.1', 'ETK78976.1', 'ETI56040.1', 'ETP18136.1', 'ETL88399.1', 'ETW02996.1', 'AIG55708.1', 'ETV73948.1', 'ETO70333.1', 'EGZ08739.1', 'ETP08651.1', 'EEY56114.1', 'EEY58928.1', 'EGZ04444.1', 'ETI56038.1', 'EEY67611.1', 'ETP33252.1', 'EEY58943.1', 'KDO29252.1', 'ETI41731.1', 'ETI38720.1', 'EGZ05556.1', 'ETI38760.1', 'ETL32154.1', 'UIZ25176.1', 'ETP11454.1', 'ETI42158.1', 'ETO63823.1', 'EGZ06395.1', 'ETK81741.1', 'EGZ08734.1', 'ETK88276.1', 'ETI31524.1', 'ETL95552.1', 'ETK71888.1', 'EGZ05562.1', 'ETW02997.1', 'EGZ08128.1', 'ETO70341.1', 'ETL94839.1', 'ETK71904.1', 'ETI56039.1', 'ETI41716.1', 'ETM48076.1', 'ETI31959.1', 'ETL80307.1', 'ETL88392.1', 'ETP11446.1', 'EEY65096.1', 'KDO30778.1', 'ETM41641.1', 'ETO70735.1', 'KDO27115.1', 'ETM32478.1', 'ETI48338.1', 'EQC42107.1', 'ETO70329.1', 'ETN20072.1', 'ETP28200.1', 'ETI54319.1', 'EGZ05552.1', 'ETN14458.1', 'ETP39581.1', 'KDO27089.1', 'ETI54333.1', 'ETO83057.1', 'ETK81755.1', 'ETP46074.1', 'EGZ27274.1', 'ETL47567.1', 'ETP25849.1', 'EGZ08747.1', 'ETM55513.1', 'ETI41700.1', 'ETP50074.1', 'EGZ06330.1', 'AIG55793.1', 'ETP25844.1', 'ETI41704.1', 'ETO84776.1']]
Batch retrieving protein data from UniProt:   0%|                                                                                                | 0/3 [00:00<?, ?it/s]WARNING [bioservices.UniProt:596]:  status is not ok with Forbidden
Batch retrieving protein data from UniProt:   0%|                                                                                                | 0/3 [00:03<?, ?it/s]
Traceback (most recent call last):
  File "C:\Users\alexs\anaconda3\envs\ai\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\alexs\anaconda3\envs\ai\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "C:\Users\alexs\anaconda3\envs\ai\Scripts\cw_get_uniprot_data.exe\__main__.py", line 7, in <module>
    sys.exit(main())
  File "C:\Users\alexs\anaconda3\envs\ai\lib\site-packages\cazy_webscraper\expand\uniprot\get_uniprot_data.py", line 147, in main
    downloaded_uniprot_data, all_ecs = get_uniprot_data(gbk_data_to_download, cache_dir, args)
  File "C:\Users\alexs\anaconda3\envs\ai\lib\site-packages\cazy_webscraper\expand\uniprot\get_uniprot_data.py", line 348, in get_uniprot_data
    uniprot_df = UniProt().get_df(entries=query, limit=args.uniprot_batch_size)
  File "C:\Users\alexs\anaconda3\envs\ai\lib\site-packages\bioservices\uniprot.py", line 851, in get_df
    res = self.search(
  File "C:\Users\alexs\anaconda3\envs\ai\lib\site-packages\bioservices\uniprot.py", line 744, in search
    batch = batch.split("\n")[1:]
AttributeError: 'int' object has no attribute 'split'
@HobnobMancer HobnobMancer self-assigned this Apr 25, 2023
@HobnobMancer HobnobMancer added bug Something isn't working duplicate This issue or pull request already exists labels Apr 25, 2023
@HobnobMancer
Copy link
Owner

Hi! Thanks for using cazy_webscraper.

Looking at the error message, this is a duplicate of issue #111:

  File "C:\Users\alexs\anaconda3\envs\ai\lib\site-packages\bioservices\uniprot.py", line 744, in search
    batch = batch.split("\n")[1:]
AttributeError: 'int' object has no attribute 'split'

The error message is the result of a bioservices process not cazy_webscraper. Make sure you are using the latest version of bioservices.

Side note:
We are currently altering cazy_webscraper so that it not longer uses bioservices.UniProt().get_df() method, and we are migrating to bioservices.UniProt().mapping() which will be faster and reduce the burden on the UniProt Rest API (and will also mean cazy_webscraper won't be using the section of bioservices code that keeps causing issues) -- progress is over on PR #115
We've both been busy atm, so progress is slower than expected. We still need to run a few more test runs, and update the unit tests prior releasing the update.

@HobnobMancer HobnobMancer linked a pull request Apr 25, 2023 that will close this issue
@AlejandroSanchezCano
Copy link
Author

Hello! Thank you for the quick response.
I am using version 1.11.2, which is the latest version of bioservices.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working duplicate This issue or pull request already exists
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants