Skip to content

A small tool I wrote for my sister, who is a geneticist, to concatenate together large numbers of files that come from the QIAGEN QIAcuity instrument and does some basic calculations and adds in things like dilution factors.

License

Notifications You must be signed in to change notification settings

gsingers/qiacuity-concatenator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

QIAcuity Concatenator

CAVEAT EMPTOR

  1. I am not a geneticist, but my sister is. From time to time, I help her with data processing by writing small tools that reduce her drudgery work. This repo is one such example.
  2. I make no claims or warranties about the accuracy of this work or the choice of formulas. I am open sourcing it in case it is useful for others.
  3. I have only tested this on a Mac with pyenv and Anaconda and with files from my sister.
  4. I am not a UI person. ;-)
  5. There is very little error handling and no unit tests.

Intro

A simple user interface and processing layer to concatenate together large numbers of files that come from the QIAGEN QIAcuity instrument and do some basic calculations on them.

It is primarily designed to be run locally and interacted with via your browser.

Features

  • Adds Plate ID to both analysis and occupancy files based off the file name pattern described in the Using section
  • Concatenates together any number of analysis files generated by the QIAGEN QIAcuity instrument
  • Concatenates together any number of multiple occupancy files generated by the QIAGEN QIAcuity instrument
  • Fixes the empty "Well" column header in the Analysis files
  • Updates/add/modifies the following fields and adds them to the output (see "Explanation of Column Names" below for more details):
    • Analysis
      • Replaces "-" in the CI (95%) column with NAN so that computations don't fail
      • Turns CI (95%) into a float instead of a "string percentage"
      • Adds columns for the user supplied Upstream DF (dilution factor), uL into Reaction, Reaction Volume fields
      • Calculates:
        • Concentration in Sample tube (cp/uL)"] = df["Concentration (copies/µL)
        • 95% CI (cp/uL
        • Valid Partitions (%)
        • E
        • Lambda
        • 0.00 (don't ask, I don't fully understand this)
        • 1.00 (don't ask, I don't fully understand this)
        • +1 (don't ask, I don't fully understand this)
        • Partitions with 0 molecules
        • Partitions with 1 molecules
        • Partitions with >1 molecules
    • Occupancy (Multiple Occupancy?)
      • Replaces Category values (e.g. YELLOW-CRIMSON-RED) with user provide assay names
      • Joins with a paired analysis file to add the Sample/NTC/Control column and values. This is a "left" join using the "Plate ID" and "Well" as a primary key.

Prerequisites

  1. Python 3.x (I've only tested on 3.9.7)
  2. Familiarity with using the command line.

Some way of managing your Python virtual environments or otherwise installing Python dependencies:

  1. Pyenv
  2. Pyenv-virtualenv

Installing and Running

Pyenv (Optional)

  1. pyenv install 3.9.7
  2. pyenv virtualenv 3.9.7 qiacuity

From the Command Line

One Time Setup

  1. Clone this repository and change to the directory:
    1. git clone git@github.com:gsingers/qiacuity-concatenator.git
    2. cd qiacuity-concatenator
  2. pyenv activate qiacuity
  3. pip install -r requirements.txt
    1. If you are using a Python Virtual Environment, be sure to activate it first (I recommend pyenv, see above). If you don't know what that is, don't worry about it, it should "just work".

Running

  1. pyenv activate qiacuity
  2. If you want to use the default data locations (in the data directory under the current directory): ./run.sh
  3. If you have your data somewhere else, you can pass in the folder locations like: ./run.sh -u /path/to/upload/folder -c /path/to/completed/folder -r /path/to/results/folder
    1. For example: ./run.sh -u /tmp/data/uploads -r /tmp/data/results -c /tmp/data/completed

In your browser

  1. http://localhost:5000

Using

The basic workflow is:

  1. Provide input data, using one of two ways:
    1. Upload the files you want to merge. Two CSV header types are supported:
      1. Analysis files. The header must be: "","Sample/NTC/Control","Reaction Mix","Target","IC","Control type","Concentration (copies/muL)","CI (95%)","Partitions (valid)","Partitions (positive)","Partitions (negative)","Threshold"
        1. Note: the empty first column name is the Well. The program will try to auto-detect and replace that.
      2. Occupancy files. The header must be: "Well","Hyperwell","Categories","Group","Count","Total","Volume"
    2. Since this is running locally on your machine, you can also bulk copy data into the ./data/uploads directory, as in cp /path/to/csv_files /path/to/this/project/data/uploads or using whatever file viewer tool you want (e.g. Finder on the Mac)
    3. IMPORTANT: All files must be of the format:
      1. Analysis files: <PLATE_ID>-[USER DEFINED]-analysis.csv, e.g. D123433F-My-Customer-analysis.csv
      2. Occupancy Files: <PLATE_ID>-[USER DEFINED].csv e.g. D123433F-My-Customer.csv
      3. IMPORTANT: For occuupancy files, it is assumed there is a matching "Analysis" file which we can join on to extract the Sample/NTC/Control (see below) value. If the program can't find the matching file, it will return a 400 error code.
      4. IMPORTANT: Files must be encoded either as Windows-1252 or UTF-8.
  2. Click the Start Concatenator link (e.g. http://localhost:3000/concatenate/select_files)
  3. Fill in the form values and select the files you want to process and hit submit
  4. Your results will be in data/results and you can download from the app or you can access them via your file viewer or the command line.

Changing formulas

The main work of merging and calculating vallues is done in concatenate.py in the process_analysis_file method and the process_occupancy_file where we add things like the Plate ID, Sample/NTC/Control, rename some missing columns and calculate some statistics. All of this work is done in Pandas in case you want to change what is calculated.

Cleaning out old files

This program very little file management. We move processed files under the "COMPLETED_FOLDER", but that's about it.
In order to declutter the file listings, you should periodically move the files out of the data directory

Explanation of Column Names

Explanation of columns in the Analysis Concatenated Data file:

  • Well: 24 well plates are A01-H03; 96 well plates are A01-H12. Samples are loaded in column order.
  • Sample/NTC/Control: The name of the Sample, NTC, or Control in the well.
  • Reaction Mix: This identifies the combination of assays used for the sample in question. See Assays used section for details.
  • Target: The name of the assay being reported. If there are multiplexed assays, the results of each assay will be reported on a separate line.
  • IC: This column is not used
  • Type: Identifies Samples vs. controls (This was not annotated in this file)
  • Concentration in Reaction Mix (cp/ul): This is the number of copies per microliter in the mix as it was loaded onto the instrument. It is calculated based on the number of valid partitions. This value has been corrected by Poisson, but has not had the original dilution factor added to it.
  • 95% CI (%): Confidence interval for the concentration.
  • Partitions (Valid): The number of partitions that were filled with master mix
  • Partitions (Positive): The number of partitions that have signal*
  • Partitions (Negative): The number of partitions that do not have signal*
  • Threshold: The rfu value setting which distinguishes positive from negative partitions.
  • Plate ID: Our internal reference name for the run plate
  • Upstream DF: Dilution factor prior to mixing sample with mastermix
  • uL into Reaction: The amount of (diluted) template added to the mastermix
  • Reaction Volume: The volume of reaction mix added to the plate
  • Concentration in Sample Tube: The concentration in the original sample, prior to dilution. This is the number that should be used for evaluating the data.
  • 95% CI (cp/ul): Confidence interval for the original sample concentration
  • Valid Partitions (%): The percent of the total possible partitions that were filled
  • E: The ratio of Positive Partitions/Valid Partitions
  • Lambda: The -LN of E. Results with a lambda less than 0.01 are likely to contain only one molecule in each partition. This is important when looking at multiple occupancy. These results are highlighted in green.
  • Fraction with (0, 1, >1) molecule(s): These columns are the fraction or partitions with 0, 1, or >1 molecules/partition. (Calculation:
  • Expected partitions with 0 molecules: The number of partitions predicted to be negative
  • Expected partitions with 1 molecule: The number of partitions predicted to have one molecule
  • Expected partitions with >1 molecule: The number of partitions predicted to have more than one molecule.

Note: These numbers are the raw counts and have not been adjusted by Poisson. The adjusted numbers (not provided) are the numbers used to calculate the concentrations. These values should not be used, as they will be statistically incorrect.

Explanation of columns in the Occupancy Concatenated Data file:

  • Well: 24 well plates are A01-H03; 96 well plates are A01-H12. Samples are loaded in column order.
  • Hyperwell: An indicator if the data is from combining multiple wells.
  • Sample Name: Your sample name
  • Categories: Description of the order of assays for the “group” column
  • Group: An indication of which assays are giving signal for the given row. (i.e. ++++ indicates all four assays are giving signal; +- -+ means the signal being reported is from the first assay and the last assay, etc.)
  • Count: The number of partitions positive for the group
  • Total: The total number of valid partitions
  • Volume: The total volume contained in the valid partitions
  • Plate ID: Our internal reference name for the run plate
  • Sample ID: The name you provided for your sample

About

A small tool I wrote for my sister, who is a geneticist, to concatenate together large numbers of files that come from the QIAGEN QIAcuity instrument and does some basic calculations and adds in things like dilution factors.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published