-
Notifications
You must be signed in to change notification settings - Fork 10
Adding kneaddata pipeline #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
2ac9903
20495f5
b142aa8
6ff7909
7ee38b6
c12f0d0
370611b
6715d47
8b5abb3
978f613
e65adba
d306b90
4f1cbf4
85a15ee
3c2bbda
caf10fd
38a3724
8aa7664
0415af8
e7f42ef
47fb1cc
55c7cb2
e6a41a5
7ba8411
2a718f5
c24774d
c6e7632
3817629
1c00529
726c580
5c9dade
7536a5f
b6452a3
6bacd88
2e953f1
25f47a5
0d94279
2b3abf7
a86586d
7b18a9f
4d6498d
87f1caf
416f407
13fdbb3
31bbe78
440bac9
4f56331
74ad741
5dea337
482703d
36a6f31
6b5bb74
9c6635a
6d03d18
af9278a
8b5fc09
a22922d
cd5f23f
8457f93
6f0f4a6
830bd81
6faf771
04533eb
d0ec922
1448263
40e9ed7
5167ad4
d76b614
ea6dfff
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2014--, The Qiita Development Team. | ||
# | ||
# Distributed under the terms of the BSD 3-clause License. | ||
# | ||
# The full license is in the file LICENSE, distributed with this software. | ||
# ----------------------------------------------------------------------------- | ||
|
||
from qiita_client import QiitaPlugin, QiitaCommand | ||
|
||
from .kneaddata import kneaddata | ||
|
||
__all__ = ['kneaddata'] | ||
|
||
# Initialize the plugin | ||
plugin = QiitaPlugin( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure if this is going to work. Each repository should have a single "QiitaPlugin" instance. In case that we want to have multiple "QiitaPlugin" instances we need a configure and a start_* script for each of them, in which case I don't see the benefit of having all these tools in a single repository. I think this can be added as a command to the shotgun plugin. |
||
'KneadData', '0.5.1', 'KneadData is a tool designed to perform quality ' | ||
'control on metagenomic and metatranscriptomic sequencing data, ' | ||
'especially data from microbiome experiments.') | ||
|
||
# Define the HUMAnN2 command | ||
req_params = {'input': ('artifact', ['per_sample_FASTQ'])} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do we specify that it needs forward and (or or) reverse fastqs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The code will always return all the available ones, in your code you need to check if they exist, for example this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok I get it... Currently this check occurs later on in processing. |
||
opt_params = { | ||
# there are other parameters not included that will be ignored in this | ||
# configuration as we assume that the correct values were set in the env | ||
# by the admin installing the tools: | ||
# trimmomatic | ||
# bowtie2 | ||
# --input # input FASTQ file (add a second argument for paired input files) | ||
# --output # directory to write output files | ||
# --output-prefix # prefix for output files [ DEFAULT : $SAMPLE_kneaddata ] | ||
# --log # filepath for log [ DEFAULT : $OUTPUT_DIR/$SAMPLE_kneaddata.log ] | ||
# --bowtie2 # path to bowtie executable | ||
# --bmtagger # path to bmtagger exectuable | ||
# --trf # path to TRF executable | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure how these executable paths get specified. I assume that's what line 19-23 are referring to, and that we just need to have an executable in the $PATH? |
||
'reference-db': ['choice:["human_genome"]', 'human_genome'], # ref db | ||
'bypass-trim': ['boolean', 'False'], # bypass the trim step | ||
'threads': ['integer', '1'], # threads to run | ||
'processes': ['integer', '1'], # processes to run | ||
'quality-scores': ['choice:["phred33","phred64"]', 'phred33'], # quality | ||
'run-bmtagger': ['boolean', 'False'], # run BMTagger instead of Bowtie2 | ||
'run-trf': ['boolean', 'False'], # run TRF repeat finder tool | ||
'run-fastqc-start': ['boolean', 'True'], # run FastQC on original data | ||
'run-fastqc-end': ['boolean', 'True'], # run FastQC on filtered data | ||
'store-temp-output': ['boolean', 'False'], # store temp output files | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this parameter needs to be exposed. In which cases are we going to let the user store the temp output? |
||
'log-level': ['choice:["DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"]', | ||
'DEBUG'], | ||
|
||
# Trimmomatic options | ||
'max-memory': ['string', '500m'], # max memory in mb [ DEFAULT : 500m ] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think I need more info about this parameter. In general we don't expose these types of parameters to the user since it is system dependent (and we don't want the users to request, let's say, 3TB...) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Again, for admin and this is required by JAVA! 😭 |
||
'trimmomatic': ['string', '$TRIMMOMATIC_DIR'], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This parameter and the one below shouldn't be exposed here. The parameters exposed here are parameters that are going to be eventually exposed to the user. I this case, this parameter should never be changed (i.e. it is set up at installation time and then it always is the same). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Removing this. |
||
'trimmomatic-options': ['string', 'ILLUMINACLIP:$TRIMMOMATIC_DIR/adapters/' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are we going to allow the user to change this parameter, o provide a different set of parameters? This looks like a long string of values that seems unlikely that the user is going to change. If the answer for both previous questions is no, this parameter should also be removed from here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need this so admins can change it. |
||
'TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 ' | ||
'SLIDINGWINDOW:4:15 MINLEN:36'], | ||
# Bowtie2 options | ||
# 'bowtie2-options': ['string', '--very-sensitive'] | ||
# BMTagger options | ||
# # TRF options | ||
# 'match': ['integer', '2'], # matching weight | ||
# 'mismatch': ['integer', '7'], # mismatching penalty | ||
# 'delta': ['integer', '7'], # indel penalty | ||
# 'pm': ['integer', '80'], # match probability | ||
# 'pi': ['integer', '10'], # indel probability | ||
# 'minscore': ['integer', '50'], # mimimum alignment score to report | ||
# 'maxperiod': ['integer', '500m'] # maximum period size to report | ||
# FastQC options | ||
} | ||
outputs = {'per_sample_FASTQ': 'per_sample_FASTQ'} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. KneadData should produce a lot of outputs, and the quantity will depend on the other options specified (e.g. if you provide multiple reference dbs, or skip fastQC). how can we deal with this? |
||
dflt_param_set = { | ||
'Defaults': { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This will need to be updated accordingly as the parameters above get removed There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup. |
||
'reference-db': 'human_genome', 'bypass-trim': False, 'threads': 1, | ||
'processes': 1, 'quality-scores': 'phred33', 'run-bmtagger': False, | ||
'run-trf': False, 'run-fastqc-start': True, 'run-fastqc-end': True, | ||
'store-temp-output': False, 'log-level': 'DEBUG', 'max-memory': '500m', | ||
'trimmomatic': '"$TRIMMOMATIC_DIR"', | ||
'trimmomatic-options': '"ILLUMINACLIP:$TRIMMOMATIC_DIR/adapters/' | ||
'TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 ' | ||
'MINLEN:36"' | ||
# , 'match': 2, 'mismatch': 7, | ||
# 'delta': 7, 'pm': 80, 'pi': 10, 'minscore': 50, 'maxperiod': '500m' | ||
} | ||
} | ||
kneaddata_cmd = QiitaCommand( | ||
"KneadData", "Sequence QC", kneaddata, req_params, opt_params, | ||
outputs, dflt_param_set) | ||
plugin.register_command(kneaddata_cmd) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,287 @@ | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2014--, The Qiita Development Team. | ||
# | ||
# Distributed under the terms of the BSD 3-clause License. | ||
# | ||
# The full license is in the file LICENSE, distributed with this software. | ||
# ----------------------------------------------------------------------------- | ||
|
||
from itertools import zip_longest | ||
from os.path import basename, join | ||
|
||
from qiita_client import ArtifactInfo | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. remove extra line |
||
from qiita_client.util import system_call, get_sample_names_by_run_prefix | ||
|
||
|
||
def make_read_pairs_per_sample(forward_seqs, reverse_seqs, map_file): | ||
"""Recovers read pairing information | ||
Parameters | ||
---------- | ||
forward_seqs : list of str | ||
The list of forward seqs filepaths | ||
reverse_seqs : list of str | ||
The list of reverse seqs filepaths | ||
map_file : str | ||
The path to the mapping file | ||
Returns | ||
------- | ||
samples: list of tup | ||
list of 3-tuples with run prefix, sample name, fwd read fp, rev read fp | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to add a catch so that if the run prefix isn't in the mapping file that it assumes the run prefix is equal to the full file name? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea but I think that change will change the behavior compared to other plugins, which points that perhaps we should have a method to do this in qiita_client vs. having ad hoc functions. What do you think? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agree in principle! Probably this is going to change if we make a more sensible fastq file format or artifact structure, though, so maybe wait until then? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yup, could you add an issue so we don't forget ... thanks! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should be covered by qiita-spots/qtp-sequencing#8? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure as that doesn't deal with run_prefix as the full name, perhaps pointing to this discussion, adding a note or both will make it easier to follow up. |
||
Raises | ||
------ | ||
ValueError | ||
If the rev is not an empty list and the same length than fwd seqs | ||
The prefixes of the run_prefix don't match the file names | ||
Notes | ||
----- | ||
At this stage it is required that if reverse sequences are present that all | ||
samples have both a forward and a reverse sequence. However, the read | ||
trimming step can sometimes eliminate all reverse reads, especially in low | ||
coverage samples with poor overall reverse read quality. | ||
""" | ||
|
||
# sort forward seqs | ||
forward_seqs.sort() | ||
|
||
# check that rev seqs are same len | ||
if reverse_seqs: | ||
if len(forward_seqs) != len(reverse_seqs): | ||
raise ValueError('Your reverse and forward files are of different ' | ||
'length. Forward: %s. Reverse: %s.' % | ||
(', '.join(forward_seqs), | ||
', '.join(reverse_seqs))) | ||
reverse_seqs.sort() | ||
|
||
# get run prefixes | ||
# These are prefixes that should match uniquely to forward reads | ||
# sn_by_rp is dict of samples keyed by run prefixes | ||
sn_by_rp = get_sample_names_by_run_prefix(map_file) | ||
|
||
# make pairings | ||
samples = [] | ||
used_prefixes = set() | ||
for i, (fwd_fp, rev_fp) in enumerate(zip_longest(forward_seqs, | ||
reverse_seqs)): | ||
# fwd_fp is the fwd read filepath | ||
fwd_fn = basename(fwd_fp) | ||
|
||
# iterate over run prefixes and make sure only one matches | ||
run_prefix = None | ||
for rp in sn_by_rp: | ||
if fwd_fn.startswith(rp) and run_prefix is None: | ||
run_prefix = rp | ||
elif fwd_fn.startswith(rp) and run_prefix is not None: | ||
raise ValueError('Multiple run prefixes match this fwd read: ' | ||
'%s' % fwd_fn) | ||
|
||
# make sure that we got one matching run prefix: | ||
if run_prefix is None: | ||
raise ValueError('No run prefix matching this fwd read: %s' | ||
% fwd_fn) | ||
|
||
if run_prefix in used_prefixes: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (rule of thumb) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok! |
||
raise ValueError('This run prefix matches multiple fwd reads: ' | ||
'%s' % run_prefix) | ||
|
||
if rev_fp is None: | ||
samples.append((run_prefix, sn_by_rp[run_prefix], fwd_fp, None)) | ||
else: | ||
rev_fn = basename(rev_fp) | ||
# if we have reverse reads, make sure the matching pair also | ||
# matches the run prefix: | ||
if not rev_fn.startswith(run_prefix): | ||
raise ValueError('Reverse read does not match this run prefix.' | ||
'\nRun prefix: %s\nForward read: %s\n' | ||
'Reverse read: %s\n' % | ||
(run_prefix, fwd_fn, rev_fn)) | ||
|
||
samples.append((run_prefix, sn_by_rp[run_prefix], fwd_fp, | ||
rev_fp)) | ||
|
||
used_prefixes.add(run_prefix) | ||
|
||
return(samples) | ||
|
||
|
||
def format_kneaddata_params(parameters): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add documentation |
||
|
||
params = [] | ||
|
||
for param in sorted(parameters): | ||
value = parameters[param] | ||
|
||
if value is True: | ||
params.append('--%s' % param) | ||
continue | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since these are if/elif, the |
||
elif value is False: | ||
continue | ||
elif value: | ||
params.append('--%s %s' % (param, value)) | ||
continue | ||
|
||
param_string = ' '.join(params) | ||
|
||
return(param_string) | ||
|
||
|
||
def generate_kneaddata_commands(forward_seqs, reverse_seqs, map_file, | ||
out_dir, parameters): | ||
"""Generates the KneadData commands | ||
Parameters | ||
---------- | ||
forward_seqs : list of str | ||
The list of forward seqs filepaths | ||
reverse_seqs : list of str | ||
The list of reverse seqs filepaths | ||
map_file : str | ||
The path to the mapping file | ||
out_dir : str | ||
The job output directory | ||
parameters : dict | ||
The command's parameters, keyed by parameter name | ||
Returns | ||
------- | ||
cmds: list of str | ||
The KneadData commands | ||
samples: list of tup | ||
list of 3-tuples with run prefix, sample name, fwd read fp, rev read fp | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. extra space |
||
Raises | ||
------ | ||
ValueError | ||
If the rev is not an empty list and the same length than fwd seqs | ||
The prefixes of the run_prefix don't match the file names | ||
Notes | ||
----- | ||
Currently this is requiring matched pairs in the make_read_pairs_per_sample | ||
step but implicitly allowing empty reverse reads in the actual command | ||
generation. This behavior may allow support of situations with empty | ||
reverse reads in some samples, for example after trimming and QC. | ||
""" | ||
# making sure the forward and reverse reads are in the same order | ||
|
||
# we match filenames, samples, and run prefixes | ||
samples = make_read_pairs_per_sample(forward_seqs, reverse_seqs, map_file) | ||
|
||
cmds = [] | ||
|
||
param_string = format_kneaddata_params(parameters) | ||
prefixes = [] | ||
for run_prefix, sample, f_fp, r_fp in samples: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is it ever the case that some samples will have reverse reads and others will not within the same study? right now the forloop implicitly supports this but I don't think that is the case and open the door for a subtle bug in the future as code fluxes over time There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, and it's a big pain in the ass. This frequently happens with shallow shotgun runs like we have been getting a lot of recently, where the quality is not so good and you end up with the reverse read being trimmed to extinction. I agree that this kind of thing can set us up for problems and we should keep aware of it! Perhaps for early QC tools like kneaddata we can enforce proper paired reads (assuming no trimming so everything should be consistent) while downstream tools can be more flexible? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think not forcing paring makes sense and more cause it sounds like is something that is already happening. What about adding a comment here, perhaps in the Notes and on this line so future coders/reviewers are aware of this. Additionally, we could add a test for this but I don't feel strongly about this addition. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good idea, adding that comment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, I think that helps. Perhaps adding a comment on top of the for loop, like |
||
prefixes.append(run_prefix) | ||
if r_fp is None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Minor, but I think this if/else block can be reduced to: r_fp_str = ' --input "%s"' % r_fp if r_fp is not None else ""
cmds.append('kneaddata --input "%s"%s --output "%s" --output-prefix "%s" %s'
% (f_fp, r_fp_str, join(out_dir, run_prefix), run_prefix, param_string)) to avoid some code duplication |
||
cmds.append('kneaddata --input "%s" --output "%s" ' | ||
'--output-prefix "%s" %s' % | ||
(f_fp, join(out_dir, run_prefix), | ||
run_prefix, param_string)) | ||
else: | ||
cmds.append('kneaddata --input "%s" --input "%s" --output "%s" ' | ||
'--output-prefix "%s" %s' % | ||
(f_fp, r_fp, join(out_dir, run_prefix), | ||
run_prefix, param_string)) | ||
|
||
return cmds, samples | ||
|
||
|
||
def _run_commands(qclient, job_id, commands, msg): | ||
for i, cmd in enumerate(commands): | ||
qclient.update_job_step(job_id, msg % i) | ||
std_out, std_err, return_value = system_call(cmd) | ||
if return_value != 0: | ||
error_msg = ("Error running KneadData:\nStd out: %s\nStd err: %s" | ||
"\n\nCommand run was:\n%s" | ||
% (std_out, std_err, cmd)) | ||
return False, error_msg | ||
|
||
return True, "" | ||
|
||
|
||
def _per_sample_ainfo(out_dir, samples): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is adding 1 artifact per output file - I don't think this is the desired behavior right? I think what we want to get is a single artifact with all the sequence file. Note that the number of output artifacts from a command should be deterministic (i.e. it is known in advance how many artifacts a command is going to generate). Note that artifact != file so it shouldn't be a problem. Now, extending on that, it looks like we can potentially group the output files in 3 different artifacts: (1) clean paired reads with R1 and R2, (2) clean unpaired R1, and (3) clean unpaired R2. However, this doesn't work for the case in which R2 is not provided, since we only have 1 output artifact. To solve this case, I would add 2 commands, 1 for paired reads and 1 for unpaired reads. Note that internally you can be using the same functions if you want, but from the Qiita point of view this simplifies things as the output artifacts are known in advance. To create the 3 artifacts for the paired case, the code would be something like: paired_files = []
r1_files = []
r2_files = []
for run_prefix, sample, f_fp, r_fp in samples:
sam_out_dir = join(out_dir, run_prefix)
paired_files.append((join(sam_out_dir, '%s_paired_1.fastq' % run_prefix), 'preprocessed_fastq'))
paired_files.append((join(sam_out_dir, '%s_paired_2.fastq' % run_prefix), 'preprocessed_fastq'))
r1_files.append((join(sam_out_dir, '%s_unmatched_1.fastq' % run_prefix), 'preprocessed_fastq'))
r2_files.append((join(sam_out_dir, '%s_unmatched_2.fastq' % run_prefix), 'preprocessed_fastq'))
ainfo = [ArtifactInfo('clean paired', 'per_sample_FASTQ', paired_files),
ArtifactInfo('clean unmatched R1', 'per_sample_FASTQ', r1_files),
ArtifactInfo('clean unmatched R1', 'per_sample_FASTQ', r2_files) Note that the definition of the command in the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok I understand better now how this works. How can we retain information about the readdirectionality in the paired reads then given the current structure? Is there something besides 'preprocessed_fastq'? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That is a totally valid point and it shows how much we need a small type decorator system that will allow this differentiation. |
||
ainfo = [] | ||
for run_prefix, sample, f_fp, r_fp in samples: | ||
sam_out_dir = join(out_dir, run_prefix) | ||
|
||
if r_fp: | ||
ainfo.extend([ | ||
ArtifactInfo('clean paired R1', 'per_sample_FASTQ', | ||
[(join(sam_out_dir, '%s_paired_1.fastq' % | ||
run_prefix), 'per_sample_FASTQ')]), | ||
ArtifactInfo('clean paired R2', 'per_sample_FASTQ', | ||
[(join(sam_out_dir, '%s_paired_2.fastq' % | ||
run_prefix), 'per_sample_FASTQ')]), | ||
ArtifactInfo('clean unpaired R1', 'per_sample_FASTQ', | ||
[(join(sam_out_dir, '%s_unmatched_1.fastq' % | ||
run_prefix), 'per_sample_FASTQ')]), | ||
ArtifactInfo('clean unpaired R2', 'per_sample_FASTQ', | ||
[(join(sam_out_dir, '%s_unmatched_2.fastq' % | ||
run_prefix), 'per_sample_FASTQ')])]) | ||
else: | ||
ainfo.extend([ | ||
ArtifactInfo('cleaned reads', 'per_sample_FASTQ', | ||
[(join(sam_out_dir, '%s.fastq' % | ||
run_prefix), 'per_sample_FASTQ')])]) | ||
|
||
return ainfo | ||
|
||
|
||
def kneaddata(qclient, job_id, parameters, out_dir): | ||
"""Run kneaddata with the given parameters | ||
Parameters | ||
---------- | ||
qclient : tgp.qiita_client.QiitaClient | ||
The Qiita server client | ||
job_id : str | ||
The job id | ||
parameters : dict | ||
The parameter values to run split libraries | ||
out_dir : str | ||
The path to the job's output directory | ||
Returns | ||
------- | ||
bool, list, str | ||
The results of the job | ||
""" | ||
# Step 1 get the rest of the information need to run kneaddata | ||
qclient.update_job_step(job_id, "Step 1 of 3: Collecting information") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. K |
||
artifact_id = parameters['input'] | ||
|
||
# removing input from parameters so it's not part of the final command | ||
del parameters['input'] | ||
|
||
# Get the artifact filepath information | ||
artifact_info = qclient.get("/qiita_db/artifacts/%s/" % artifact_id) | ||
fps = artifact_info['files'] | ||
|
||
# Get the artifact metadata | ||
prep_info = qclient.get('/qiita_db/prep_template/%s/' | ||
% artifact_info['prep_information'][0]) | ||
qiime_map = prep_info['qiime-map'] | ||
|
||
# Step 2 generating command kneaddata | ||
qclient.update_job_step(job_id, "Step 2 of 5: Generating" | ||
" KneadData command") | ||
rs = fps['raw_reverse_seqs'] if 'raw_reverse_seqs' in fps else [] | ||
commands, samples = generate_kneaddata_commands(fps['raw_forward_seqs'], | ||
rs, qiime_map, out_dir, | ||
parameters) | ||
|
||
# Step 3 execute kneaddata | ||
msg = "Step 3 of 5: Executing KneadData job (%d/{0})".format(len(commands)) | ||
success, msg = _run_commands(qclient, job_id, commands, msg) | ||
if not success: | ||
return False, None, msg | ||
|
||
# Step 4 generating artifacts | ||
ainfo = _per_sample_ainfo(out_dir, samples) | ||
|
||
return True, ainfo, "" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
# ----------------------------------------------------------------------------- | ||
# Copyright (c) 2014--, The Qiita Development Team. | ||
# | ||
# Distributed under the terms of the BSD 3-clause License. | ||
# | ||
# The full license is in the file LICENSE, distributed with this software. | ||
# ----------------------------------------------------------------------------- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you are using this anymore, but this no longer running on travis so this path is incorrect. Check this line for an idea on how to fix it, unless you don't need anymore, in which case can be removed.