-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Search Studies object #190
Conversation
|
||
Examples | ||
-------- | ||
Searches are done using natural language, with AND, OR, and NOT supported, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this qualifies as natural language (i.e., "plain english")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, my closest guess would be a query language then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Boolean?
On Jul 9, 2014 7:28 PM, "Joshua Shorenstein" notifications@github.com
wrote:
In qiita_db/search.py:
+This module provides functionality for searching studies and samples contained
+in the qiita database. All language processing and querying of the database is
+contained within each object.
+
+Classes
+-------
+
+..autosummary::
- :toctree: generated/
- QiitaStudySearch
+Examples
+--------
+Searches are done using natural language, with AND, OR, and NOT supported, asYeah, my closest guess would be a query language then.
—
Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/190/files#r14746084.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that looks like it.
Are we planning to also add (at some point) an |
I'd like to see it, probably add an issue so we can keep track of it? |
# create the sample finding SQL | ||
sample_sql = ("SELECT r.sample_id FROM qiita.required_sample_info r " | ||
"JOIN qiita.sample_%s s ON s.sample_id = r.sample_id " | ||
"WHERE {0}".format(sql_where)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the above INTERSECTs do not find any tables, this will throw an SQL error, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possible to join on INTERSECT instead of magic numbers?
On Jul 9, 2014 5:44 PM, "adamrp" notifications@github.com wrote:
In qiita_db/search.py:
# create the study finding SQL
# remove metadata headers that are in required_sample_info table
meta_headers = meta_headers.difference(required_cols)
# get all study ids that contain all metadata categories searched for
sql = []
for meta in meta_headers:
sql.append("SELECT study_id FROM qiita.study_sample_columns WHERE "
"column_name = '%s' INTERSECT" %
scrub_data(meta))
# combine the query, stripping off the last INTERSECT
study_sql = ' '.join(sql)[:-10]
# create the sample finding SQL
sample_sql = ("SELECT r.sample_id FROM qiita.required_sample_info r "
"JOIN qiita.sample_%s s ON s.sample_id = r.sample_id "
"WHERE {0}".format(sql_where))
If the above INTERSECTs do not find any tables, this will throw an SQL
error, right?—
Reply to this email directly or view it on GitHub
https://github.com/biocore/qiita/pull/190/files#r14743256.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty awesome! I gave it a somewhat quick review during a talk, so it would be good to get more eyes on it. |
@josenavas, created #191 |
Thanks! |
After starting work on the web front-end for this I realized some features were missing. I've added them in this pull request, so it needs another thorough looking over. |
I reviewed this with @squirrelo and there were just a couple things 1) an idea on how to get the tests to work with the global variable, and 2) add a test for a complex query with nested/composed ORs and ANDs to make sure that everything is being generated properly. After that, I give it a 👍 and he just needs someone else to go over it for a merge! Thanks @squirrelo |
results = {} | ||
# run search on each study to get out the matching samples | ||
for sid in study_ids: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this blank line need to be here?
This object has natural language parsing for searches and returns a dictionary of {study_id: [samp1, samp2, ...]} of samples and studies matching the search query. An example query is also in the documentation, showing you can get pretty complex with them.