Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The timeout=300 on the first line of .submitQueryXML is too short. #22

Closed
abelew opened this issue Aug 8, 2020 · 4 comments
Closed

Comments

@abelew
Copy link

abelew commented Aug 8, 2020

Querying the transcript lengths with getBM() against an ensembl mart against hsapiens fails with a timeout on the first line of .submitQueryXML. Ideally, the timeout=300 in the httr::POST would be replaced with a parameter that may be modified.

@grimbough
Copy link
Owner

The 300 second time out has been set to match with the time limit Ensembl impose on the BioMart web interface. The server doesn't tend to perform well if there are many long running queries submitted, so I don't really want to make it user configurable.

Perhaps there's a way to reformulate your query to make it work within the time limit? In my experience queries that exceed the time limit are either being submitted without a filter (essentially a data dump) or have a really large number of attributes (the query engine seems to scale non-linearly as you increase the number of attributes). In both instances there's often a way to work around it in a manageable time scale.

If you post what you're trying to do either here or at the Bioconductor Support Site I'd be happy to try and help.

@abelew
Copy link
Author

abelew commented Aug 10, 2020

Greetings, that seems quite sensible. However, I am only asking for the human start positions by transcript ID, thus I find myself tripping the time limit with the relatively stripped down request to eseast.ensembl.org's hsapiens mart:

requests <- c("ensembl_transcript_id", "start_position")
starts <- biomaRt::getBM(attributes=requests, mart=ensembl)

I was initially asking for start, end, chromosome, and strand; but figured (as you suggested) that I was asking for too much and so decided to just ask for one column. It seems to me that this should not be too onerous. My current work around is to ask by gene ID rather than transcript, but that seems unsatisfactory.

@grimbough
Copy link
Owner

Your query has no filter, which means it's basically doing a data dump and the server side BioMart software does't really like it. Generally it works best if you have a filter and no more than 500 values (you can use more than 500 values as biomaRt will automatically chunk anything longer than that).

Here's the trick I use to get this type of query to run, which involves first getting all the gene IDs and then using that in the filter. It's a bit of a ugly work around, but since we ask for all gene IDs it should return everything your original query would have if it could run.

library(biomaRt)
ensembl <- useEnsembl("ensembl", 
                      dataset="hsapiens_gene_ensembl") 

## get all gene ids 
## BioMart can cope with this query without any filters
all_gene_ids <-  getBM(attributes = 'ensembl_gene_id', 
                       mart = ensembl)

## We use all gene ids as values, so we don't miss any data
## but biomaRt will chunk the query automaticall and run much faster
requests <- c("ensembl_transcript_id", "start_position")
starts <- getBM(attributes = requests, 
                filters = "ensembl_gene_id", 
                values = all_gene_ids,
                mart = ensembl)

head(starts)
#>   ensembl_transcript_id start_position
#> 1       ENST00000448773       32628032
#> 2       ENST00000317907       32628032
#> 3       ENST00000647819       32628032
#> 4       ENST00000454690       32628032
#> 5       ENST00000438654       32628032
#> 6       ENST00000433416       32628032

@abelew
Copy link
Author

abelew commented Aug 15, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants