Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Solr integration with VariantStorage when querying #638

Closed
6 tasks done
j-coll opened this issue Jun 30, 2017 · 0 comments
Closed
6 tasks done

Improve Solr integration with VariantStorage when querying #638

j-coll opened this issue Jun 30, 2017 · 0 comments
Assignees
Milestone

Comments

@j-coll
Copy link
Member

j-coll commented Jun 30, 2017

Currently, variant queries can be executed on the Variant Storage (mongodb or hbase), or on Solr.

We have seen that complex queries using multiple indexes work really good over Solr, and not so good on MongoDB or HBase.

In the query can not be 100% solved by Solr (because it requests or filters by fields not stored in solr) we can decide (using some heuristics) to query first solr to get the ids, and then to the real variant storage.

We expect to see a big speed-up on complex queries (more than 5 filters).

  • Add embedded Solr for JUnit tests
  • Join Solr results with results from Mongo or Hadoop
    • Implement multi variant iterator
    • Be aware of limit-skip. If base query is not null, will need client side skip
    • Use hbase scan instead of phoenix query for queries fully resolved in solr? Not now.
      Maybe in a future, with Table#get(List<Get>) or MultiRowRangeFilter
  • Implement approximated count
  • summary parameter should be an alias for a set of included fields

New configuration parameters

intersect.active : true          # Allow intersect queries with the SearchEngine (Solr) 
intersect.always : false         # Force intersect queries 
intersect.params.threshold : 3   # Minimum number of QueryParams in the query to intersect       
@j-coll j-coll added this to the v1.2.0 milestone Jun 30, 2017
@j-coll j-coll self-assigned this Jun 30, 2017
j-coll added a commit that referenced this issue Jul 4, 2017
Accepts query iterator with the queries to execute.
j-coll added a commit that referenced this issue Jul 5, 2017
Copy to the storage query some filters
j-coll added a commit that referenced this issue Jul 6, 2017
Improve time recording in VariantDBIterators
j-coll added a commit that referenced this issue Jul 10, 2017
@j-coll j-coll closed this as completed Aug 1, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant