Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need to build a specification verification system for all experiment sheets #4

Closed
jsa-aerial opened this issue Jun 12, 2017 · 8 comments

Comments

@jsa-aerial
Copy link
Owner

No description provided.

@jsa-aerial
Copy link
Owner Author

jsa-aerial commented Jun 14, 2017

Basically this would be a full feature 'sanity checker' for any experiment configuration. It would run on any command to aerobio and determine that all the configuration sheets are properly specified for the experiment in question

This issue will be for enumerating all the possible things we can think of to check for. It will likely remain open for some time...

To start off:

  • Need to make sure SampleSheet.csv and Exp-SampleSheet.csv are both present :check
  • Make sure SampleSheet.csv uses correct naming for sample id and sample name :check
  • Make sure each Illumina (sample) barcode is represented in Exp sheet and that only those are represented :check
  • Make sure every every name in comparison sheet is in Exp sheet. Actually make sure iff. :check

@jsa-aerial
Copy link
Owner Author

  • Look into possibly using split fastq.gz sizes to warn of possible bad barcodes in SampleSheet and Exp-SampleSheet.

The idea here is that if some 'samples' fastq.gz files seem to be 'empty', it may well be due to incorrect library barcodes used in defining the samples.

  • As long as we at this - could also check on sizes of bcl2fastq fastq.gz files. If these look 'empty', that could well indicate bad Illumina barcode specifications in SampleSheet.csv

@jsa-aerial
Copy link
Owner Author

jsa-aerial commented Jun 27, 2017

  • Need to check that the order of barcodes is correct in Exp-SampleSheet.csv. Illumina bcs come before the lib/samp bc!!! :check

@jsa-aerial
Copy link
Owner Author

jsa-aerial commented Sep 25, 2017

  • Make sure the names and numbers of replicates in comparisons match up (for RNASeq, TNSeq, and TermSeq - This is NA for WGS and TranSeq) :check

@jsa-aerial
Copy link
Owner Author

jsa-aerial commented Feb 13, 2018

  • Make sure that no replicate names repeat in Exp-SampleSheet.csv. This is for RNASeq,TNSeq and TermSeq.

  • For WGSeq, make sure the Sample(name/id) in SampleSheet.csv do not repeat

Actually, making sure that SampleName/IDs do not repeat in SampleSheet should be done generally!! This is because bcl2fastq will catch this but it has no good way of relaying the error to Aerobio...

Generally, this sort of error is due to a copy/paste of some strain-condition-repid, where the repid is not incremented to reflect the replicates (they are all '1' or 'a' across several identical strain-condition

@jsa-aerial
Copy link
Owner Author

jsa-aerial commented Apr 17, 2018

Make sure all names match with case!! For example 12,AbC-WTNoAB-1,ATCGATCG,GAGGCAGAAGC in Exp sheet vs AbC-WTT0,AbC-WTNoAb,... in Comparison sheet! :check

@jsa-aerial
Copy link
Owner Author

jsa-aerial commented Jan 18, 2019

  • Make sure that all references sited in sheets actually exist and are in the proper locations. GBKs and GTFs in /Refs, bowtie 1 and 2 indices built and in place, normative gene lists in place, fasta files built and in place. :check

@jsa-aerial
Copy link
Owner Author

OK, most of this is done in new validation system. Only using fastq sizes to warn of possible barcode mess up and the part about repeating replicate names is not yet done. I'm not sure about the barcode stuff anymore anyway as that can only be checked once bcl2fastq and phase-0c for sample fqs. So, that is likely just out of scope.

So, I will close this and open one just for the repeating rep names

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant