Skip to content

Scrape UniProt for functional summaries and append to table

License

Notifications You must be signed in to change notification settings

j-berg/uniprot_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 

Repository files navigation

UniProt Scraper

Get UniProt summaries for genes appended to dataframe

Example:

UniProtID   |   ...
------------|---------
P#####      |   ...
P#####      |   ...

Will output:

UniProtID   |   ...   |   Summary
------------|---------|----------
P#####      |   ...   |   Info1
P#####      |   ...   |   Info2  

Requirements:

  • Python3

Installation

Install dependencies:

pip install pandas numpy html2text

Run

Navigate to uniprot_scraper script and execute the following:

python uniprot_scraper.py

Then follow the prompts...

Notes

  • If the column where the UniProt IDs are found has other characters besides the UniProt ID, the prompter will ask you what the characters before the ID are. Example:
sp|P19283|gene_name
  • For the example above you would provide sp\| as input. Note that special characters, such as | must be pre-pended with a \