- I am not a geneticist, but my sister is. From time to time, I help her with data processing by writing small tools that reduce her drudgery work. This repo is one such example.
- I make no claims or warranties about the accuracy of this work or the choice of formulas. I am open sourcing it in case it is useful for others.
- I have only tested this on a Mac with pyenv and Anaconda and with files from my sister.
- I am not a UI person. ;-)
- There is very little error handling and no unit tests.
A simple user interface and processing layer to concatenate together large numbers of files that come from the QIAGEN QIAcuity instrument and do some basic calculations on them.
It is primarily designed to be run locally and interacted with via your browser.
- Adds Plate ID to both analysis and occupancy files based off the file name pattern described in the Using section
- Concatenates together any number of analysis files generated by the QIAGEN QIAcuity instrument
- Concatenates together any number of multiple occupancy files generated by the QIAGEN QIAcuity instrument
- Fixes the empty "Well" column header in the Analysis files
- Updates/add/modifies the following fields and adds them to the output (see "Explanation of Column Names" below for more details):
- Analysis
- Replaces "-" in the
CI (95%)
column with NAN so that computations don't fail - Turns
CI (95%)
into a float instead of a "string percentage" - Adds columns for the user supplied
Upstream DF
(dilution factor),uL into Reaction
,Reaction Volume
fields - Calculates:
Concentration in Sample tube (cp/uL)"] = df["Concentration (copies/µL)
95% CI (cp/uL
Valid Partitions (%)
E
Lambda
0.00
(don't ask, I don't fully understand this)1.00
(don't ask, I don't fully understand this)+1
(don't ask, I don't fully understand this)Partitions with 0 molecules
Partitions with 1 molecules
Partitions with >1 molecules
- Replaces "-" in the
- Occupancy (Multiple Occupancy?)
- Replaces Category values (e.g.
YELLOW-CRIMSON-RED
) with user provide assay names - Joins with a paired analysis file to add the
Sample/NTC/Control
column and values. This is a "left" join using the "Plate ID" and "Well" as a primary key.
- Replaces Category values (e.g.
- Analysis
- Python 3.x (I've only tested on 3.9.7)
- Familiarity with using the command line.
Some way of managing your Python virtual environments or otherwise installing Python dependencies:
pyenv install 3.9.7
pyenv virtualenv 3.9.7 qiacuity
- Clone this repository and change to the directory:
git clone git@github.com:gsingers/qiacuity-concatenator.git
cd qiacuity-concatenator
pyenv activate qiacuity
pip install -r requirements.txt
- If you are using a Python Virtual Environment, be sure to activate it first (I recommend pyenv, see above). If you don't know what that is, don't worry about it, it should "just work".
pyenv activate qiacuity
- If you want to use the default data locations (in the
data
directory under the current directory):./run.sh
- If you have your data somewhere else, you can pass in the folder locations like:
./run.sh -u /path/to/upload/folder -c /path/to/completed/folder -r /path/to/results/folder
- For example:
./run.sh -u /tmp/data/uploads -r /tmp/data/results -c /tmp/data/completed
- For example:
http://localhost:5000
The basic workflow is:
- Provide input data, using one of two ways:
- Upload the files you want to merge. Two CSV header types are supported:
- Analysis files. The header must be:
"","Sample/NTC/Control","Reaction Mix","Target","IC","Control type","Concentration (copies/muL)","CI (95%)","Partitions (valid)","Partitions (positive)","Partitions (negative)","Threshold"
- Note: the empty first column name is the Well. The program will try to auto-detect and replace that.
- Occupancy files. The header must be:
"Well","Hyperwell","Categories","Group","Count","Total","Volume"
- Analysis files. The header must be:
- Since this is running locally on your machine, you can also bulk copy data into the
./data/uploads
directory, as incp /path/to/csv_files /path/to/this/project/data/uploads
or using whatever file viewer tool you want (e.g. Finder on the Mac) - IMPORTANT: All files must be of the format:
- Analysis files: <PLATE_ID>-[USER DEFINED]-analysis.csv, e.g.
D123433F-My-Customer-analysis.csv
- Occupancy Files: <PLATE_ID>-[USER DEFINED].csv e.g.
D123433F-My-Customer.csv
- IMPORTANT: For occuupancy files, it is assumed there is a matching "Analysis" file which we can join on to extract the Sample/NTC/Control (see below) value. If the program can't find the matching file, it will return a 400 error code.
- IMPORTANT: Files must be encoded either as
Windows-1252
orUTF-8
.
- Analysis files: <PLATE_ID>-[USER DEFINED]-analysis.csv, e.g.
- Upload the files you want to merge. Two CSV header types are supported:
- Click the
Start Concatenator
link (e.g. http://localhost:3000/concatenate/select_files) - Fill in the form values and select the files you want to process and hit submit
- Your results will be in
data/results
and you can download from the app or you can access them via your file viewer or the command line.
The main work of merging and calculating vallues is done in concatenate.py
in the process_analysis_file
method and the process_occupancy_file
where we add
things like the Plate ID, Sample/NTC/Control, rename some missing columns and calculate some statistics. All of this work is done in Pandas in case you
want to change what is calculated.
This program very little file management. We move processed files under the "COMPLETED_FOLDER", but that's about it.
In order to declutter the file listings, you should periodically move the files out of the data
directory
- Well: 24 well plates are A01-H03; 96 well plates are A01-H12. Samples are loaded in column order.
- Sample/NTC/Control: The name of the Sample, NTC, or Control in the well.
- Reaction Mix: This identifies the combination of assays used for the sample in question. See Assays used section for details.
- Target: The name of the assay being reported. If there are multiplexed assays, the results of each assay will be reported on a separate line.
- IC: This column is not used
- Type: Identifies Samples vs. controls (This was not annotated in this file)
- Concentration in Reaction Mix (cp/ul): This is the number of copies per microliter in the mix as it was loaded onto the instrument. It is calculated based on the number of valid partitions. This value has been corrected by Poisson, but has not had the original dilution factor added to it.
- 95% CI (%): Confidence interval for the concentration.
- Partitions (Valid): The number of partitions that were filled with master mix
- Partitions (Positive): The number of partitions that have signal*
- Partitions (Negative): The number of partitions that do not have signal*
- Threshold: The rfu value setting which distinguishes positive from negative partitions.
- Plate ID: Our internal reference name for the run plate
- Upstream DF: Dilution factor prior to mixing sample with mastermix
- uL into Reaction: The amount of (diluted) template added to the mastermix
- Reaction Volume: The volume of reaction mix added to the plate
- Concentration in Sample Tube: The concentration in the original sample, prior to dilution. This is the number that should be used for evaluating the data.
- 95% CI (cp/ul): Confidence interval for the original sample concentration
- Valid Partitions (%): The percent of the total possible partitions that were filled
- E: The ratio of Positive Partitions/Valid Partitions
- Lambda: The -LN of E. Results with a lambda less than 0.01 are likely to contain only one molecule in each partition. This is important when looking at multiple occupancy. These results are highlighted in green.
- Fraction with (0, 1, >1) molecule(s): These columns are the fraction or partitions with 0, 1, or >1 molecules/partition. (Calculation:
- Expected partitions with 0 molecules: The number of partitions predicted to be negative
- Expected partitions with 1 molecule: The number of partitions predicted to have one molecule
- Expected partitions with >1 molecule: The number of partitions predicted to have more than one molecule.
Note: These numbers are the raw counts and have not been adjusted by Poisson. The adjusted numbers (not provided) are the numbers used to calculate the concentrations. These values should not be used, as they will be statistically incorrect.
- Well: 24 well plates are A01-H03; 96 well plates are A01-H12. Samples are loaded in column order.
- Hyperwell: An indicator if the data is from combining multiple wells.
- Sample Name: Your sample name
- Categories: Description of the order of assays for the “group” column
- Group: An indication of which assays are giving signal for the given row. (i.e. ++++ indicates all four assays are giving signal; +- -+ means the signal being reported is from the first assay and the last assay, etc.)
- Count: The number of partitions positive for the group
- Total: The total number of valid partitions
- Volume: The total volume contained in the valid partitions
- Plate ID: Our internal reference name for the run plate
- Sample ID: The name you provided for your sample