New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API endpoint for verifying entity list for Upload target list #3114
API endpoint for verifying entity list for Upload target list #3114
Comments
Input:
Output:
|
@prashantuniyal02 here are the fields in the API for a single target (ENSG00000001626). As I understand it, we'd like to be able to extract the full Target object based on either Ensembl ID:
|
I think using " |
New plan based on discussions with @prashantuniyal02 and @d0choa is to use the "Bs" filter on the |
Having explored the search_target index, exact ("term" in ES terminology) queries to the If we want to make the search behaviour, "match", i.e. non-exact, this can also be achieved but we'd expect the response to be slightly slower and it may introduce ambiguities. We could write the code so that the query type, exact or non-exact or both, is configurable? I'd suggest this, because in the case for resolving the target IDs, we need to be exact, but for most other searches in the platform this is unlikely to be desirable. For the response, I think it should be list of SearchResults i.e. a list of what you would receive when you make a single search. Additionally, I would like to add the query into the SearchResults object, so that it's clear to the client, which results go with which query. |
I've been digging into the search endpoint and I think making a generic batch search is not necessary for this use case. First, the existing search endpoint already facilitates batch searching! It utilises the "simple query string" search which allows for these operators in the query string. So, assuming I understand the meaning of "batch search", you can already do this with the "OR" operator e.g. "ACHE|INS|ANG" on the target entity. Which is pretty cool! Secondly, the current search approach and response is built on the principle that you are making full-text queries. The results are "hits" with "scores" etc. and the search operates in a specific way across the fields in the search indices. Here, we want to do something more simple, an "exact" term query on the keyword field of either the search_disease or search_target index. We specifically don't want any ambiguity that the full-text search may introduce. For what we want to do this existing generic method, or something close to it should work. We can then return a response that is a mapping for each queried term. From the chat @carcruz and I had, the API could look something like (mappings and results, would be arrays):
On the other hand, we could expand the existing targets Query endpoint by adding another argument for terms e.g.:
The issue with this option is you don't know what mapped to what, but I'm not sure yet how straightforward it will be to provide those mappings. Do you have any thoughts or preferences on these @d0choa or @carcruz? |
@jdhayhurst if I understand this correctly, the question is how relevant is to know what mapped to what? or whether a term had a mapping at all? |
Yes, basically, would you be happy with a response that's a list of search results (like the current search) or do you need the individual mappings between each term in the query list and it's own search results? |
Using the existing search endpoint I was able to add an exact keyword matching option, query SearchQuery {
search(
queryString: "ACHE|INS|ANG"
entityNames: ["target"]
page: {index: 0, size: 5}
isKeywordSearch: true
) {
total
hits {
id
object {
... on Target {
id
approvedSymbol
}
}
}
}
} response for above is: {
"data": {
"search": {
"total": 3,
"hits": [
{
"id": "ENSG00000087085",
"object": {
"id": "ENSG00000087085",
"approvedSymbol": "ACHE"
}
},
{
"id": "ENSG00000214274",
"object": {
"id": "ENSG00000214274",
"approvedSymbol": "ANG"
}
},
{
"id": "ENSG00000254647",
"object": {
"id": "ENSG00000254647",
"approvedSymbol": "INS"
}
}
]
}
}
} |
After discussion with @d0choa and @carcruz, we agreed to move this behaviour to a separate endpoint, perhaps |
Here's the custom endpoint for mapping IDs. Please can you let me know if this works for you @carcruz? The "total" is the number of hits, but not everything will necessary map. The unmapped terms still appear in the response, but don't have any hits - I think this is useful to know. Request example for target id mapping (some map some don't) query MappingQuery {
mapIds(
queryTerms: ["ACHE","INS","ANG","not going to map", "Double-stranded RNA-specific editase 1"]
entityNames: ["target"]
) {
total
mappings {
term
hits {
id
}
}
}
} Response {
"data": {
"mapIds": {
"total": 4,
"mappings": [
{
"term": "ACHE",
"hits": [
{
"id": "ENSG00000087085"
}
]
},
{
"term": "INS",
"hits": [
{
"id": "ENSG00000254647"
}
]
},
{
"term": "ANG",
"hits": [
{
"id": "ENSG00000214274"
}
]
},
{
"term": "not going to map",
"hits": []
},
{
"term": "Double-stranded RNA-specific editase 1",
"hits": [
{
"id": "ENSG00000197381"
}
]
}
]
}
}
} |
Just to note that the limit for the number of terms that can be queried at once is 65,536 (this is the Elastic default), but can be changed if we need. |
Functionally looks good. Questions: Data
API-FE cc @carcruz:
|
@d0choa, I should have mentioned that the endpoint borrows the same entity and pagination logic as search. So you can specify entities and pages in the same way. It also inherits the same aggregation and search result objects from search, so for instance if you searched for a term on "target" and "disease" entities, you could return the query MappingQuery {
mapIds(
queryTerms: ["ACHE"]
entityNames: ["target", "disease"]
) {
total
mappings {
term
hits {
entity
id
}
}
}
} {
"data": {
"mapIds": {
"total": 2,
"mappings": [
{
"term": "ACHE",
"hits": [
{
"entity": "target",
"id": "ENSG00000087085"
},
{
"entity": "disease",
"id": "EFO_0003843"
}
]
}
]
}
}
} Pagination to return the second page with a size of 1, would look like: query MappingQuery {
mapIds(
queryTerms: ["ACHE"]
entityNames: ["target", "disease"]
page: {index: 1, size: 1}
) {
total
mappings {
term
hits {
entity
id
}
}
}
} {
"data": {
"mapIds": {
"total": 2,
"mappings": [
{
"term": "ACHE",
"hits": [
{
"entity": "disease",
"id": "EFO_0003843"
}
]
}
]
}
}
} |
@DSuveges query MappingQuery {
mapIds(
queryTerms: ["DLC1"]
entityNames: ["target"]
) {
total
mappings {
term
hits {
id
}
}
}
} {
"data": {
"mapIds": {
"total": 3,
"mappings": [
{
"term": "DLC1",
"hits": [
{
"id": "ENSG00000088986"
},
{
"id": "ENSG00000164741"
},
{
"id": "ENSG00000008226"
}
]
}
]
}
}
} |
Creating an API endpoint for verifying entity list for enabling upload of a target/disease list
For a uploaded list of target, we need to match the uploaded entry to the following set of ids:
In case an uploaded entry matches to multiple results, we will display all the matched results.
For a uploaded list of diseases, we need to match the uploaded entry to the following set of ids:
We also need to confirm how to deal with entries that do not yield a match in both the backend and the frontend.
The text was updated successfully, but these errors were encountered: