-
Notifications
You must be signed in to change notification settings - Fork 80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Let's get redbiom-ing #2118
Let's get redbiom-ing #2118
Changes from 31 commits
06bc8b2
41f0fcd
574c538
4210a61
2c194fe
36c41bf
8a7e9b7
a23c360
a870b3c
acda0d0
0e72f2a
a03f1cd
07fb481
0e56fa0
2da0aaf
c2a844b
f1840f7
0ac26f1
27d9968
e2bcaf9
369628a
cc5afbb
9bd1166
28a9902
17e0780
d378518
16fc478
e41cdb4
babbede
0fc122b
d46392e
c64a019
f886d32
6e7cb7e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,6 +60,7 @@ Install the non-python dependencies | |
|
||
* [PostgreSQL](http://www.postgresql.org/download/) (minimum required version 9.3.5, we have tested most extensively with 9.3.6) | ||
* [redis-server](http://redis.io) (we have tested most extensively with 2.8.17) | ||
* [webdis] (https://github.com/nicolasff/webdis) (latest version should be fine) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you make a comment similar to the one in redis? i.e. Latest version should be fine but we have tested most extensively with XXXX. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. k |
||
|
||
There are several options to install these dependencies depending on your needs: | ||
|
||
|
@@ -87,6 +88,26 @@ brew update | |
brew install homebrew/versions/redis28 | ||
``` | ||
|
||
### webdis | ||
|
||
Note that this is the only package that assumes that Qiita is already installed (due to library dependencies). Also, that the general suggestion is to have 2 redis servers running, one for webdis/redbiom and the other for Qiita. The default configuration. The reason for multiple redis servers is so that the redbiom cache can be flushed without impacting the operation of the qiita server itself. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Remove There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. rm |
||
|
||
The following instructions install, compile and pre-populates the redbiom redis DB so we assume that redis is running on the default port and that Qiita is fully installed as the redbiom package is installed with Qiita. | ||
|
||
``` | ||
git clone https://github.com/nicolasff/webdis | ||
pushd webdis | ||
make | ||
./webdis & | ||
popd | ||
wget https://raw.githubusercontent.com/wasade/redbiom/master/Makefile | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The instructions outlined here looks like they're for setting up a test environment, but I don't believe that is the goal of this documentation. I think it should list the commands used to generate the release from Qiita, populate the redbiom DB from that file and then how to set up a job so that the cache gets updated periodically. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yup, this is a copy of the first version with loaded the redbiom db; the new one actually loads the qiita data so changing |
||
wget https://raw.githubusercontent.com/wasade/redbiom/master/test.txt | ||
wget https://raw.githubusercontent.com/wasade/redbiom/master/test.biom | ||
wget https://raw.githubusercontent.com/wasade/redbiom/master/test_with_alts.txt | ||
wget https://raw.githubusercontent.com/wasade/redbiom/master/test_with_alts.biom | ||
make test_db | ||
``` | ||
|
||
|
||
Install Qiita development version and its python dependencies | ||
------------------------------------------------------------- | ||
|
@@ -163,6 +184,12 @@ Next, make a test environment: | |
qiita-env make --no-load-ontologies | ||
``` | ||
|
||
Finally, redbiom relies on the REDBIOM_HOST environment variable to set the URL to query. By default is set to http://127.0.0.1:7379, which is the webdis default. For example you could: | ||
|
||
```bash | ||
export REDBIOM_HOST=http://my_host.com:7329 | ||
``` | ||
|
||
## Start Qiita | ||
|
||
Start postgres (instructions vary depending on operating system and install method). | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
from tornado.gen import coroutine, Task | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file is missing the copyright header. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. k |
||
|
||
from qiita_core.util import execute_as_transaction | ||
|
||
from .base_handlers import BaseHandler | ||
from requests import ConnectionError | ||
import redbiom.summarize | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor, but these imports are not following the PEP8 standard (and they're not flagged by pep8). The order of the imports should be:
Also, is there any problem on simply doing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
In [1]: import scipy
In [2]: scipy.sparse.csr_matrix
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-2-c4d1267aee28> in <module>()
----> 1 scipy.sparse.csr_matrix
AttributeError: module 'scipy' has no attribute 'sparse'
In [3]: import scipy.sparse
In [4]: scipy.sparse.csr_matrix
Out[4]: scipy.sparse.csr.csr_matrix There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changing blocks, thanks @wasade for your comment. |
||
import redbiom.search | ||
import redbiom._requests | ||
import redbiom.util | ||
import redbiom.fetch | ||
|
||
|
||
class RedbiomPublicSearch(BaseHandler): | ||
@execute_as_transaction | ||
def get(self, search): | ||
self.render('redbiom.html') | ||
|
||
@execute_as_transaction | ||
def _redbiom_search(self, query, search_on, callback): | ||
error = False | ||
message = '' | ||
results = [] | ||
|
||
try: | ||
df = redbiom.summarize.contexts() | ||
except ConnectionError: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What if there is some other error raised? Are we allowing the 500 to be raised on the user screen or should we catch it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure, I can see pros/cons on both (catch everything or just specific errors). I decided to go with specific so we can narrow down why it's failing. Note that we can see the errors in the logs if not caught. -- BTW I think we have caught all of them at this point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, just wanted to make sure that this was the desired behavior. |
||
error = True | ||
message = 'Redbiom is down - contact admin, thanks!' | ||
|
||
if not error: | ||
contexts = df.ContextName.values | ||
query = query.lower() | ||
samples, categories = [], [] | ||
|
||
if search_on == 'metadata': | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. we may want an additional There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yup, let's chat this afternoon There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It seems that this block of code can be simplified to avoid code duplication: if search_on in ('metadata', 'categories'):
try:
samples = redbiom.serach.metadata_full(query, categories=(search_on == 'categories'))
except ... # Perform error handling as in the "categories" case.
elif search_on == 'observations':
... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. k |
||
try: | ||
samples = redbiom.search.metadata_full( | ||
query, categories=False) | ||
except TypeError: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same here regarding error (will not comment again but applies to all) |
||
error = True | ||
message = ('Not a valid search: "%s", are you sure this ' | ||
'is a valid metadata value?' % query) | ||
elif search_on == 'categories': | ||
try: | ||
categories = redbiom.search.metadata_full(query, | ||
categories=True) | ||
except ValueError: | ||
error = True | ||
message = ('Not a valid search: "%s", try a longer query' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if this error message is helpful enough for the user, i.e. I don't know that "longer query" means. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the error is basically because the query is actually too small (too few chars, for example searching for 'a' or 'tt' check the test) ... trying to improve the message, hope this helps |
||
% query) | ||
except TypeError: | ||
error = True | ||
message = ('Not a valid search: "%s", are you sure this ' | ||
'is a valid metadata category?' % query) | ||
elif search_on == 'observations': | ||
samples = [s.split('_', 1)[1] for context in contexts | ||
for s in redbiom.util.samples_from_observations( | ||
query.split(' '), True, context)] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yup, also within our discussion today. |
||
else: | ||
error = True | ||
message = ('Incorrect search by: you can use observations ' | ||
'or metadata and you passed: %s' % search_on) | ||
|
||
if not error: | ||
import qiita_db as qdb | ||
import qiita_db.sql_connection as qdbsc | ||
if samples: | ||
sql = """ | ||
WITH main_query AS ( | ||
SELECT study_title, study_id, artifact_id, | ||
array_agg(DISTINCT sample_id) AS samples, | ||
qiita.artifact_descendants(artifact_id) AS children | ||
FROM qiita.study_prep_template | ||
JOIN qiita.prep_template USING (prep_template_id) | ||
JOIN qiita.prep_template_sample USING | ||
(prep_template_id) | ||
JOIN qiita.study USING (study_id) | ||
WHERE sample_id IN %s | ||
GROUP BY study_title, study_id, artifact_id) | ||
SELECT study_title, study_id, samples, name, command_id, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. change: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. changing but not sure what's the benefit, can you expand? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. note that I was changing it and then realized that nr doesn't have all the members of the query added, for example cid/command_id, so this change then will require del nr['cid'] etc ... not worth it! |
||
(main_query.children).artifact_id AS artifact_id | ||
FROM main_query | ||
JOIN qiita.artifact a ON | ||
(main_query.children).artifact_id = a.artifact_id | ||
JOIN qiita.artifact_type at ON ( | ||
at.artifact_type_id = a.artifact_type_id | ||
AND artifact_type = 'BIOM') | ||
ORDER BY artifact_id | ||
""" | ||
with qdbsc.TRN: | ||
qdbsc.TRN.add(sql, [tuple(samples)]) | ||
results = [] | ||
commands = {} | ||
for row in qdbsc.TRN.execute_fetchindex(): | ||
title, sid, samples, name, cid, aid = row | ||
nr = {'study_title': title, 'study_id': sid, | ||
'artifact_id': aid, 'aname': name, | ||
'samples': samples} | ||
if cid is not None: | ||
if cid not in commands: | ||
c = qdb.software.Command(cid) | ||
commands[cid] = { | ||
'sfwn': c.software.name, | ||
'sfv': c.software.version, | ||
'cmdn': c.name | ||
} | ||
nr['command'] = commands[cid]['cmdn'] | ||
nr['software'] = commands[cid]['sfwn'] | ||
nr['version'] = commands[cid]['sfv'] | ||
else: | ||
nr['command'] = None | ||
nr['software'] = None | ||
nr['version'] = None | ||
results.append(nr) | ||
elif categories: | ||
sql = """ | ||
WITH get_studies AS ( | ||
SELECT | ||
trim(table_name, 'sample_')::int AS study_id, | ||
array_agg(column_name::text) AS columns | ||
FROM information_schema.columns | ||
WHERE column_name IN %s | ||
AND table_name LIKE 'sample_%%' | ||
AND table_name NOT IN ( | ||
'prep_template', 'prep_template_sample') | ||
GROUP BY table_name) | ||
SELECT study_title, get_studies.study_id, columns | ||
-- artifact_id, samples | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It looks like this line is commented out on the SQL, can you remove? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. k |
||
FROM get_studies | ||
JOIN qiita.study ON get_studies.study_id = | ||
qiita.study.study_id""" | ||
with qdbsc.TRN: | ||
results = [] | ||
qdbsc.TRN.add(sql, [tuple(categories)]) | ||
for title, sid, cols in qdbsc.TRN.execute_fetchindex(): | ||
nr = {'study_title': title, 'study_id': sid, | ||
'artifact_id': None, 'aname': None, | ||
'samples': cols, 'command': ', '.join(cols), | ||
'software': None, 'version': None} | ||
results.append(nr) | ||
else: | ||
error = True | ||
message = 'No samples where found! Try again ...' | ||
callback((results, message)) | ||
|
||
@coroutine | ||
@execute_as_transaction | ||
def post(self, search): | ||
search = self.get_argument('search', None) | ||
search_on = self.get_argument('search_on', None) | ||
|
||
data = [] | ||
if search is not None and search and search != ' ': | ||
if search_on in ('observations', 'metadata', 'categories'): | ||
data, msg = yield Task( | ||
self._redbiom_search, search, search_on) | ||
else: | ||
msg = 'Not a valid option for search_on' | ||
else: | ||
msg = 'Nothing to search for ...' | ||
|
||
self.write({'status': 'success', 'message': msg, 'data': data}) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -117,3 +117,89 @@ function show_hide_process_list() { | |
$("#qiita-processing").hide(); | ||
} | ||
} | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. i may not be the best person for review of changes to this file There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it's mainly moving code around so the libraries are available everywhere vs. just in the study listing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add documentation to the JavaScript functions? I know at the beginning we were not doing it but I started doing it based on the Emperor documentation and it is really useful (i.e. it serves the same goal as the numpy doc). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. tried to add as many as possible There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks! |
||
function send_samples_to_moi(aids, samples) { | ||
bootstrapAlert('We are adding ' + aids.length + ' artifact(s) to the analysis. This ' + | ||
'might take some time based on the number of samples on each artifact.', "warning", 10000); | ||
|
||
if (typeof samples === 'undefined') { | ||
$.get('/artifact/samples/', {ids:aids}) | ||
.done(function ( data ) { | ||
if (data['status']=='success') { | ||
moi.send('sel', data['data']); | ||
} else { | ||
bootstrapAlert('ERROR: ' + data['msg'], "danger", 10000); | ||
} | ||
}); | ||
} else { | ||
var data = {} | ||
data[aids[0]] = samples | ||
moi.send('sel', data); | ||
} | ||
} | ||
|
||
function redbiom_send_to_moi(aid, row) { | ||
var row_data = $('#redbiom-table').dataTable().fnGetData(row); | ||
send_samples_to_moi([aid], row_data.samples); | ||
} | ||
|
||
function sel_study(name, row) { | ||
var row_data = $('#'+name).dataTable().fnGetData(row); | ||
var aids = [] | ||
|
||
for(var i=0;i<row_data.proc_data_info.length;i++){ | ||
aids.push(row_data['proc_data_info'][i]['pid']); | ||
} | ||
send_samples_to_moi(aids); | ||
} | ||
|
||
function sel_proc_data(aid) { | ||
send_samples_to_moi([aid]); | ||
} | ||
|
||
function remove_pd_from_html(data) { | ||
pid = data.proc_data; | ||
sid = data.sid; | ||
$('#proc' + pid).remove(); | ||
$('#proc' + pid + '-samples').remove(); | ||
// remove study if all proc data removed | ||
if($('#study'+ sid + '-table tbody').children().length === 1) { $('#study'+sid).remove(); } | ||
check_empty(); | ||
} | ||
|
||
function check_empty() { | ||
if($('.row').length <= 1) { | ||
$('#dflt-sel-info').removeAttr('style'); | ||
$('.topfloat').hide(); | ||
$('#no-selected').show(); | ||
} | ||
} | ||
|
||
function remove_sample_from_html(data) { | ||
pid = data.proc_data; | ||
sample = data.samples[0]; | ||
sid = data.sid; | ||
document.getElementById(pid + '@' + sample).remove(); | ||
//decriment sample count for pid | ||
var count = $('#proc' + pid + '-sample-count'); | ||
count.text(parseInt(count.text(), 10) - 1); | ||
// remove proc data if all samples removed | ||
if($('#proc' + pid + '-samples-table tbody').children().length === 0) { $('#proc'+pid).remove(); $('#proc' + pid + '-samples').remove(); } | ||
// remove study if all proc data removed | ||
if($('#study'+ sid + '-table tbody').children().length === 1) { $('#study'+sid).remove(); } | ||
check_empty(); | ||
} | ||
|
||
function clear_from_html(data) { | ||
$.each($('.row'), function(index, value) { value.remove(); }); | ||
check_empty(); | ||
} | ||
|
||
function error(evt) { | ||
$('#ws-error').html("<b>Server communication error. Sample removal will not be recorded. Please try again later.</b>"); | ||
}; | ||
|
||
function show_alert(data) { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a different name for this function will be more useful, since it is not simply showing an alert, but showing the alert regarding the number of samples selected. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added doc and changed the name |
||
bootstrapAlert(data + ' samples selected.', "success", 10000); | ||
$('#dflt-sel-info').css('color', 'rgb(0, 160, 0)'); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line and the one below can be removed. Then instead of using
${qdbd}
on the other lines you can simply use `pwd`/qiita_db/. According to lines 40, 42, and 43 that should work w/o a problemThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
k