New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fastp: correctly parse sample name from --in1/--in2 command. Fallback to file name #2139
Conversation
fastp.json
; fallback to file name when error
@multiqc-bot changelog |
Let's switch the priority, as discussed here: #2138 (comment) |
fastp.json
; fallback to file name when error
🚀 Deployed on https://mqc-pr-2139--multiqc.netlify.app |
Hi @ewels and @vladsavelyev with your proposed switch, how will this below be handled
Will my general stats table show this as |
Hi Anand - yes, exactly. Unless you specify |
Hi @vladsavelyev and @ewels - I do not agree with this - #2138 (comment) - please correct me if I am wrong
Will
That's why I asked here - #2138 (comment)
I request you to revisit this. |
We want to generalise the MultiQC behaviour as much as possible, and for that reason we want fastp:
s_name_filenames: true That option can be conveniently specified in the command line as well: multiqc -m fastp test_data/data/modules/fastp -f -v --cl-config "fastp: {s_name_filenames: true}" Hope that helps. |
It's been working for you, but equally for other users it's been broken for years now. It depends on your analysis setup. That's what we try to avoid - the previous behaviour required you to organise your data in a particular way. The newer behaviour (consistent with the rest of MultiQC) does not. There is a standard way to achieve the behaviour that you want via a config file, mentioned by Vlad. I think this is fine.
I agree that this isn't ideal. Again it's not specific to this module, and MultiQC is riddled with imperfections such as this. The data that we have to obtain sample names is by nature imperfect. I've created a new issue to discuss a potential new general functionality to better clean sample names when we have a pair of filenames available: #2162 |
Fixes #2138
The sample name parsing logic:
command
field found in JSON. E.g."fastp -c -g -y --in1 Sample 1 1.fastq.gz --in2 Sample 1_2.fastq.gz --out1 ..."
will parse everything between--in1
and the following-
, plus do normal file name trimming, ending up withSample 1 1
.fastp.json
) to extract the sample name (e.g.fastp
).config.fastp.s_name_filenames
option to force using the file names for sample names.Additionally, fix the
self.ignore_samples()
logic for the module (it previously affected only the mainself.fastp_data
, but not the derived dictionaries).