Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Let's get redbiom-ing #2118

Merged
merged 34 commits into from
May 11, 2017
Merged
Show file tree
Hide file tree
Changes from 31 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
06bc8b2
installing redbiom
antgonza Apr 26, 2017
41f0fcd
classifiers=classifiers ??
antgonza Apr 26, 2017
574c538
one of those days!
antgonza Apr 26, 2017
4210a61
init commit
antgonza Apr 29, 2017
2c194fe
Merge branch 'master' of https://github.com/biocore/qiita into redbio…
antgonza May 3, 2017
36c41bf
apt redis
antgonza May 3, 2017
8a7e9b7
redis-server &
antgonza May 3, 2017
a23c360
redis-server
antgonza May 3, 2017
a870b3c
Merge branch 'master' of https://github.com/biocore/qiita into redbio…
antgonza May 5, 2017
acda0d0
adding redbiom tests
antgonza May 5, 2017
0e72f2a
redbiom setup.py
antgonza May 5, 2017
a03f1cd
redbiom #egg
antgonza May 5, 2017
07fb481
ready for test env
antgonza May 5, 2017
0e56fa0
Merge branch 'redbiom' of https://github.com/biocore/qiita into redbi…
antgonza May 5, 2017
2da0aaf
should fix tests
antgonza May 5, 2017
c2a844b
redis 7777 &
antgonza May 5, 2017
f1840f7
before script redis-server
antgonza May 5, 2017
0ac26f1
sudo: required
antgonza May 5, 2017
27d9968
adding more tests and REDBIOM_HOST
antgonza May 6, 2017
e2bcaf9
rm redbiom from conf
antgonza May 6, 2017
369628a
retriving artifact info via SQL
antgonza May 7, 2017
cc5afbb
retriving info from preps
antgonza May 7, 2017
9bd1166
creating redbiom context
antgonza May 7, 2017
28a9902
adding total of samples
antgonza May 7, 2017
17e0780
rm button when not logged
antgonza May 7, 2017
d378518
addressing @wasade initial comments
antgonza May 9, 2017
16fc478
new gui + search on categories
antgonza May 10, 2017
e41cdb4
adding a connection error message
antgonza May 10, 2017
babbede
request -> requests
antgonza May 10, 2017
0fc122b
rm query from connection error
antgonza May 10, 2017
d46392e
changing callback placing
antgonza May 10, 2017
c64a019
addressing @josenavas comments
antgonza May 11, 2017
f886d32
flake8
antgonza May 11, 2017
6e7cb7e
rm duplicated js imports
antgonza May 11, 2017
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 19 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ env:
- TEST_ADD_STUDIES=True
before_install:
- redis-server --version
- redis-server /etc/redis/redis.conf --port 7777 &
- wget http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
- chmod +x miniconda.sh
- ./miniconda.sh -b
Expand All @@ -28,9 +29,14 @@ install:
- source activate qiita
- pip install -U pip
- pip install sphinx sphinx-bootstrap-theme coveralls 'ipython[all]==2.4.1'
- travis_retry pip install . --process-dependency-links
- 'echo "backend: Agg" > matplotlibrc'
script:
- git clone https://github.com/nicolasff/webdis
- pushd webdis
- make
- ./webdis &
- popd
- travis_retry pip install . --process-dependency-links
before_script:
- export MOI_CONFIG_FP=`pwd`/qiita_core/support_files/config_test.cfg
- if [ ${TRAVIS_PULL_REQUEST} == "false" ]; then
export QIITA_CONFIG_FP=`pwd`/qiita_core/support_files/config_test_travis.cfg;
Expand All @@ -39,6 +45,17 @@ script:
- ipython profile create qiita-general --parallel
- qiita-env start_cluster qiita-general
- qiita-env make --no-load-ontologies
# loading redbiom with Qiita's test set
# first let's make sure redis is empty
- curl -s http://127.0.0.1:7379/FLUSHALL > /dev/null
- redbiom admin create-context --name "qiita-test" --description "qiita-test context"
- fp=`python -c 'import qiita_db; print qiita_db.__file__'`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line and the one below can be removed. Then instead of using ${qdbd} on the other lines you can simply use `pwd`/qiita_db/. According to lines 40, 42, and 43 that should work w/o a problem

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

- qdbd=`dirname $fp`;
- redbiom admin load-sample-metadata --metadata ${qdbd}/support_files/test_data/templates/1_19700101-000000.txt
- redbiom admin load-sample-metadata-search --metadata ${qdbd}/support_files/test_data/templates/1_19700101-000000.txt
- redbiom admin load-observations --table ${qdbd}/support_files/test_data/processed_data/1_study_1001_closed_reference_otu_table.biom --context qiita-test
- redbiom admin load-sample-data --table ${qdbd}/support_files/test_data/processed_data/1_study_1001_closed_reference_otu_table.biom --context qiita-test
script:
- if [ ${TEST_ADD_STUDIES} == "True" ]; then test_data_studies/commands.sh ; fi
- if [ ${TEST_ADD_STUDIES} == "True" ]; then qiita-cron-job ; fi
- if [ ${TEST_ADD_STUDIES} == "False" ]; then qiita-test-install ; fi
Expand Down
27 changes: 27 additions & 0 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,7 @@ Install the non-python dependencies

* [PostgreSQL](http://www.postgresql.org/download/) (minimum required version 9.3.5, we have tested most extensively with 9.3.6)
* [redis-server](http://redis.io) (we have tested most extensively with 2.8.17)
* [webdis] (https://github.com/nicolasff/webdis) (latest version should be fine)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make a comment similar to the one in redis? i.e. Latest version should be fine but we have tested most extensively with XXXX.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k


There are several options to install these dependencies depending on your needs:

Expand Down Expand Up @@ -87,6 +88,26 @@ brew update
brew install homebrew/versions/redis28
```

### webdis

Note that this is the only package that assumes that Qiita is already installed (due to library dependencies). Also, that the general suggestion is to have 2 redis servers running, one for webdis/redbiom and the other for Qiita. The default configuration. The reason for multiple redis servers is so that the redbiom cache can be flushed without impacting the operation of the qiita server itself.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove The default configuration or complete the sentence.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm


The following instructions install, compile and pre-populates the redbiom redis DB so we assume that redis is running on the default port and that Qiita is fully installed as the redbiom package is installed with Qiita.

```
git clone https://github.com/nicolasff/webdis
pushd webdis
make
./webdis &
popd
wget https://raw.githubusercontent.com/wasade/redbiom/master/Makefile
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The instructions outlined here looks like they're for setting up a test environment, but I don't believe that is the goal of this documentation. I think it should list the commands used to generate the release from Qiita, populate the redbiom DB from that file and then how to set up a job so that the cache gets updated periodically.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, this is a copy of the first version with loaded the redbiom db; the new one actually loads the qiita data so changing

wget https://raw.githubusercontent.com/wasade/redbiom/master/test.txt
wget https://raw.githubusercontent.com/wasade/redbiom/master/test.biom
wget https://raw.githubusercontent.com/wasade/redbiom/master/test_with_alts.txt
wget https://raw.githubusercontent.com/wasade/redbiom/master/test_with_alts.biom
make test_db
```


Install Qiita development version and its python dependencies
-------------------------------------------------------------
Expand Down Expand Up @@ -163,6 +184,12 @@ Next, make a test environment:
qiita-env make --no-load-ontologies
```

Finally, redbiom relies on the REDBIOM_HOST environment variable to set the URL to query. By default is set to http://127.0.0.1:7379, which is the webdis default. For example you could:

```bash
export REDBIOM_HOST=http://my_host.com:7329
```

## Start Qiita

Start postgres (instructions vary depending on operating system and install method).
Expand Down
163 changes: 163 additions & 0 deletions qiita_pet/handlers/qiita_redbiom.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
from tornado.gen import coroutine, Task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is missing the copyright header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k


from qiita_core.util import execute_as_transaction

from .base_handlers import BaseHandler
from requests import ConnectionError
import redbiom.summarize
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but these imports are not following the PEP8 standard (and they're not flagged by pep8). The order of the imports should be:

  1. standard libraries
  2. Third party (redbiom and requests in this case)
  3. Local package imports (qiita_core and .base_handlers)

Also, is there any problem on simply doing import redbiom or import redbiom as rb and then do the calls as rb.summarize, rb.search etc?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

redbiom.__init__ does not import the other modules. This is not unusual:

In [1]: import scipy

In [2]: scipy.sparse.csr_matrix
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-c4d1267aee28> in <module>()
----> 1 scipy.sparse.csr_matrix

AttributeError: module 'scipy' has no attribute 'sparse'

In [3]: import scipy.sparse

In [4]: scipy.sparse.csr_matrix
Out[4]: scipy.sparse.csr.csr_matrix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing blocks, thanks @wasade for your comment.

import redbiom.search
import redbiom._requests
import redbiom.util
import redbiom.fetch


class RedbiomPublicSearch(BaseHandler):
@execute_as_transaction
def get(self, search):
self.render('redbiom.html')

@execute_as_transaction
def _redbiom_search(self, query, search_on, callback):
error = False
message = ''
results = []

try:
df = redbiom.summarize.contexts()
except ConnectionError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if there is some other error raised? Are we allowing the 500 to be raised on the user screen or should we catch it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure, I can see pros/cons on both (catch everything or just specific errors). I decided to go with specific so we can narrow down why it's failing. Note that we can see the errors in the logs if not caught. -- BTW I think we have caught all of them at this point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, just wanted to make sure that this was the desired behavior.

error = True
message = 'Redbiom is down - contact admin, thanks!'

if not error:
contexts = df.ContextName.values
query = query.lower()
samples, categories = [], []

if search_on == 'metadata':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we may want an additional elif here for searching over categories

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, let's chat this afternoon

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that this block of code can be simplified to avoid code duplication:

if search_on in ('metadata', 'categories'):
    try:
        samples = redbiom.serach.metadata_full(query, categories=(search_on == 'categories'))
   except ... # Perform error handling as in the "categories" case.
elif search_on == 'observations':
   ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

try:
samples = redbiom.search.metadata_full(
query, categories=False)
except TypeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here regarding error (will not comment again but applies to all)

error = True
message = ('Not a valid search: "%s", are you sure this '
'is a valid metadata value?' % query)
elif search_on == 'categories':
try:
categories = redbiom.search.metadata_full(query,
categories=True)
except ValueError:
error = True
message = ('Not a valid search: "%s", try a longer query'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this error message is helpful enough for the user, i.e. I don't know that "longer query" means.
Probably show an example of a valid query is a better strategy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error is basically because the query is actually too small (too few chars, for example searching for 'a' or 'tt' check the test) ... trying to improve the message, hope this helps

% query)
except TypeError:
error = True
message = ('Not a valid search: "%s", are you sure this '
'is a valid metadata category?' % query)
elif search_on == 'observations':
samples = [s.split('_', 1)[1] for context in contexts
for s in redbiom.util.samples_from_observations(
query.split(' '), True, context)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the True here may not be always desirable but that can be deferred

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, also within our discussion today.

else:
error = True
message = ('Incorrect search by: you can use observations '
'or metadata and you passed: %s' % search_on)

if not error:
import qiita_db as qdb
import qiita_db.sql_connection as qdbsc
if samples:
sql = """
WITH main_query AS (
SELECT study_title, study_id, artifact_id,
array_agg(DISTINCT sample_id) AS samples,
qiita.artifact_descendants(artifact_id) AS children
FROM qiita.study_prep_template
JOIN qiita.prep_template USING (prep_template_id)
JOIN qiita.prep_template_sample USING
(prep_template_id)
JOIN qiita.study USING (study_id)
WHERE sample_id IN %s
GROUP BY study_title, study_id, artifact_id)
SELECT study_title, study_id, samples, name, command_id,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change:
SELECT study_title, study_id, samples, name, command_id,
to
SELECT study_title, study_id, samples, name as aname, command_id,
then remove line 96 and replace line 97 by nr = dict(row).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing but not sure what's the benefit, can you expand?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that I was changing it and then realized that nr doesn't have all the members of the query added, for example cid/command_id, so this change then will require del nr['cid'] etc ... not worth it!

(main_query.children).artifact_id AS artifact_id
FROM main_query
JOIN qiita.artifact a ON
(main_query.children).artifact_id = a.artifact_id
JOIN qiita.artifact_type at ON (
at.artifact_type_id = a.artifact_type_id
AND artifact_type = 'BIOM')
ORDER BY artifact_id
"""
with qdbsc.TRN:
qdbsc.TRN.add(sql, [tuple(samples)])
results = []
commands = {}
for row in qdbsc.TRN.execute_fetchindex():
title, sid, samples, name, cid, aid = row
nr = {'study_title': title, 'study_id': sid,
'artifact_id': aid, 'aname': name,
'samples': samples}
if cid is not None:
if cid not in commands:
c = qdb.software.Command(cid)
commands[cid] = {
'sfwn': c.software.name,
'sfv': c.software.version,
'cmdn': c.name
}
nr['command'] = commands[cid]['cmdn']
nr['software'] = commands[cid]['sfwn']
nr['version'] = commands[cid]['sfv']
else:
nr['command'] = None
nr['software'] = None
nr['version'] = None
results.append(nr)
elif categories:
sql = """
WITH get_studies AS (
SELECT
trim(table_name, 'sample_')::int AS study_id,
array_agg(column_name::text) AS columns
FROM information_schema.columns
WHERE column_name IN %s
AND table_name LIKE 'sample_%%'
AND table_name NOT IN (
'prep_template', 'prep_template_sample')
GROUP BY table_name)
SELECT study_title, get_studies.study_id, columns
-- artifact_id, samples
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this line is commented out on the SQL, can you remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

k

FROM get_studies
JOIN qiita.study ON get_studies.study_id =
qiita.study.study_id"""
with qdbsc.TRN:
results = []
qdbsc.TRN.add(sql, [tuple(categories)])
for title, sid, cols in qdbsc.TRN.execute_fetchindex():
nr = {'study_title': title, 'study_id': sid,
'artifact_id': None, 'aname': None,
'samples': cols, 'command': ', '.join(cols),
'software': None, 'version': None}
results.append(nr)
else:
error = True
message = 'No samples where found! Try again ...'
callback((results, message))

@coroutine
@execute_as_transaction
def post(self, search):
search = self.get_argument('search', None)
search_on = self.get_argument('search_on', None)

data = []
if search is not None and search and search != ' ':
if search_on in ('observations', 'metadata', 'categories'):
data, msg = yield Task(
self._redbiom_search, search, search_on)
else:
msg = 'Not a valid option for search_on'
else:
msg = 'Nothing to search for ...'

self.write({'status': 'success', 'message': msg, 'data': data})
86 changes: 86 additions & 0 deletions qiita_pet/static/js/qiita.js
Original file line number Diff line number Diff line change
Expand Up @@ -117,3 +117,89 @@ function show_hide_process_list() {
$("#qiita-processing").hide();
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i may not be the best person for review of changes to this file

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's mainly moving code around so the libraries are available everywhere vs. just in the study listing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add documentation to the JavaScript functions? I know at the beginning we were not doing it but I started doing it based on the Emperor documentation and it is really useful (i.e. it serves the same goal as the numpy doc).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to add as many as possible

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

function send_samples_to_moi(aids, samples) {
bootstrapAlert('We are adding ' + aids.length + ' artifact(s) to the analysis. This ' +
'might take some time based on the number of samples on each artifact.', "warning", 10000);

if (typeof samples === 'undefined') {
$.get('/artifact/samples/', {ids:aids})
.done(function ( data ) {
if (data['status']=='success') {
moi.send('sel', data['data']);
} else {
bootstrapAlert('ERROR: ' + data['msg'], "danger", 10000);
}
});
} else {
var data = {}
data[aids[0]] = samples
moi.send('sel', data);
}
}

function redbiom_send_to_moi(aid, row) {
var row_data = $('#redbiom-table').dataTable().fnGetData(row);
send_samples_to_moi([aid], row_data.samples);
}

function sel_study(name, row) {
var row_data = $('#'+name).dataTable().fnGetData(row);
var aids = []

for(var i=0;i<row_data.proc_data_info.length;i++){
aids.push(row_data['proc_data_info'][i]['pid']);
}
send_samples_to_moi(aids);
}

function sel_proc_data(aid) {
send_samples_to_moi([aid]);
}

function remove_pd_from_html(data) {
pid = data.proc_data;
sid = data.sid;
$('#proc' + pid).remove();
$('#proc' + pid + '-samples').remove();
// remove study if all proc data removed
if($('#study'+ sid + '-table tbody').children().length === 1) { $('#study'+sid).remove(); }
check_empty();
}

function check_empty() {
if($('.row').length <= 1) {
$('#dflt-sel-info').removeAttr('style');
$('.topfloat').hide();
$('#no-selected').show();
}
}

function remove_sample_from_html(data) {
pid = data.proc_data;
sample = data.samples[0];
sid = data.sid;
document.getElementById(pid + '@' + sample).remove();
//decriment sample count for pid
var count = $('#proc' + pid + '-sample-count');
count.text(parseInt(count.text(), 10) - 1);
// remove proc data if all samples removed
if($('#proc' + pid + '-samples-table tbody').children().length === 0) { $('#proc'+pid).remove(); $('#proc' + pid + '-samples').remove(); }
// remove study if all proc data removed
if($('#study'+ sid + '-table tbody').children().length === 1) { $('#study'+sid).remove(); }
check_empty();
}

function clear_from_html(data) {
$.each($('.row'), function(index, value) { value.remove(); });
check_empty();
}

function error(evt) {
$('#ws-error').html("<b>Server communication error. Sample removal will not be recorded. Please try again later.</b>");
};

function show_alert(data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a different name for this function will be more useful, since it is not simply showing an alert, but showing the alert regarding the number of samples selected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added doc and changed the name

bootstrapAlert(data + ' samples selected.', "success", 10000);
$('#dflt-sel-info').css('color', 'rgb(0, 160, 0)');
}
Loading