Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Search Studies object #190

Merged
merged 21 commits into from
Jul 15, 2014
Merged

Conversation

squirrelo
Copy link
Contributor

This object has natural language parsing for searches and returns a dictionary of {study_id: [samp1, samp2, ...]} of samples and studies matching the search query. An example query is also in the documentation, showing you can get pretty complex with them.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.24%) when pulling f966955 on squirrelo:addsearch into 7015360 on biocore:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.25%) when pulling c550f2e on squirrelo:addsearch into 7015360 on biocore:master.


Examples
--------
Searches are done using natural language, with AND, OR, and NOT supported, as
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this qualifies as natural language (i.e., "plain english")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, my closest guess would be a query language then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Boolean?
On Jul 9, 2014 7:28 PM, "Joshua Shorenstein" notifications@github.com
wrote:

In qiita_db/search.py:

+This module provides functionality for searching studies and samples contained
+in the qiita database. All language processing and querying of the database is
+contained within each object.
+
+Classes
+-------
+
+..autosummary::

  • :toctree: generated/
  • QiitaStudySearch

+Examples
+--------
+Searches are done using natural language, with AND, OR, and NOT supported, as

Yeah, my closest guess would be a query language then.


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/190/files#r14746084.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, that looks like it.

@adamrp
Copy link
Contributor

adamrp commented Jul 9, 2014

Are we planning to also add (at some point) an AnalysisSearch object?

@josenavas
Copy link
Contributor

I'd like to see it, probably add an issue so we can keep track of it?

# create the sample finding SQL
sample_sql = ("SELECT r.sample_id FROM qiita.required_sample_info r "
"JOIN qiita.sample_%s s ON s.sample_id = r.sample_id "
"WHERE {0}".format(sql_where))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the above INTERSECTs do not find any tables, this will throw an SQL error, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible to join on INTERSECT instead of magic numbers?
On Jul 9, 2014 5:44 PM, "adamrp" notifications@github.com wrote:

In qiita_db/search.py:

  •    # create the study finding SQL
    
  •    # remove metadata headers that are in required_sample_info table
    
  •    meta_headers = meta_headers.difference(required_cols)
    
  •    # get all study ids that contain all metadata categories searched for
    
  •    sql = []
    
  •    for meta in meta_headers:
    
  •        sql.append("SELECT study_id FROM qiita.study_sample_columns WHERE "
    
  •                   "column_name = '%s' INTERSECT" %
    
  •                   scrub_data(meta))
    
  •    # combine the query, stripping off the last INTERSECT
    
  •    study_sql = ' '.join(sql)[:-10]
    
  •    # create  the sample finding SQL
    
  •    sample_sql = ("SELECT r.sample_id FROM qiita.required_sample_info r "
    
  •                  "JOIN qiita.sample_%s s ON s.sample_id = r.sample_id "
    
  •                  "WHERE {0}".format(sql_where))
    

If the above INTERSECTs do not find any tables, this will throw an SQL
error, right?


Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/190/files#r14743256.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adamrp If the intersects SQL doesn't find any studies the entire process is considered a wash and returns no results found. The sample-finding sql relies on studies being found to run.

@wasade Makes sense, added.

@adamrp
Copy link
Contributor

adamrp commented Jul 9, 2014

This looks pretty awesome! I gave it a somewhat quick review during a talk, so it would be good to get more eyes on it.

@adamrp
Copy link
Contributor

adamrp commented Jul 9, 2014

@josenavas, created #191

@josenavas
Copy link
Contributor

Thanks!

@coveralls
Copy link

Coverage Status

Coverage increased (+1.26%) when pulling 553fc82 on squirrelo:addsearch into 7015360 on biocore:master.

@squirrelo
Copy link
Contributor Author

After starting work on the web front-end for this I realized some features were missing. I've added them in this pull request, so it needs another thorough looking over.

@coveralls
Copy link

Coverage Status

Coverage increased (+1.31%) when pulling 3e82787 on squirrelo:addsearch into 7015360 on biocore:master.

@adamrp
Copy link
Contributor

adamrp commented Jul 11, 2014

I reviewed this with @squirrelo and there were just a couple things 1) an idea on how to get the tests to work with the global variable, and 2) add a test for a complex query with nested/composed ORs and ANDs to make sure that everything is being generated properly. After that, I give it a 👍 and he just needs someone else to go over it for a merge! Thanks @squirrelo

@coveralls
Copy link

Coverage Status

Coverage increased (+1.37%) when pulling 8ac84b8 on squirrelo:addsearch into 7015360 on biocore:master.

results = {}
# run search on each study to get out the matching samples
for sid in study_ids:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this blank line need to be here?

@coveralls
Copy link

Coverage Status

Coverage increased (+1.37%) when pulling ed770d2 on squirrelo:addsearch into 7015360 on biocore:master.

ElDeveloper added a commit that referenced this pull request Jul 15, 2014
@ElDeveloper ElDeveloper merged commit 44169c3 into qiita-spots:master Jul 15, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants