-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generalize refactor_reportingtools_table.rb #13
Comments
I decided now to implement it like this for now: case species
when 'eco'
scan_gene_id_pattern = 'ER[0-9]+_[0-9]+'
ensembl_url = 'https://bacteria.ensembl.org/Escherichia_coli_k_12/Gene/Summary?g='
when 'hsa'
scan_gene_id_pattern = 'ENSG[0-9]+'
ensembl_url = 'https://ensembl.org/Homo_sapiens/Gene/Summary?g='
else
scan_gene_id_pattern = false
ensembl_url = false
end If we add new species, we have to simply extend this. Not the best solution, but works for now. |
|
I just found this prefix list, so mapping from 3 letter code to species name should be possible, but the different base URLs (bacteria.ensembl.org, plants.ensembl.org, etc.) will be difficult probably.. Maybe theres a direct mapping to the URLs somewhere too |
Yes nice @fischer-hub , this list is already a very good starting point. But agree, the correct base url remains difficult. Maybe implement a ping to the url and brute force test which one works (plant, bacteria,...)? :d |
Good idea I'll try that! |
So I looked a little deeper into this and the prefix list actually only contains a subset of all species listed on ensembl. However I think we can just use the ensemble REST API to 'map' the ensemble IDs from the annotation file to their species name? We can then also just make one call with all the IDs at once.
Okay I scraped the species lists from every prefix.ensembl.org site so now we can just lookup the species name with the REST API once and then map the species name to the base url prefix and species url suffix. From here it should be done pretty soon. |
Generalize feature url retrieval in refactor_reportingtools_table.rb, closes #13
I included this script that extends the ReportingTools output table by start/stop positions and gene names. However, the script has some code parts that depend on the input species.
In this example the script works for the input
I think we have two options:
A
Remove the URL link to Ensembl that changes with species name and try to generalize the
gene_id =
part.B
Depending on the input
--species
parameter define these values.The text was updated successfully, but these errors were encountered: