-
Notifications
You must be signed in to change notification settings - Fork 5
/
basic_usage_python.rst
281 lines (234 loc) · 11.7 KB
/
basic_usage_python.rst
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
***********************
Basic usage with python
***********************
The following steps show how to get started with :mod:`wellmap` in python:
.. make-list-from-sections::
1. Install wellmap
==================
Install :mod:`wellmap` from PyPI. Note that python≥3.6 is required:
.. code-block:: console
$ pip install wellmap
2. Describe the plate layout
============================
Write a `TOML file <file_format>` describing the layout of an experiment. For
example, the following layout might be used for a standard curve:
.. literalinclude:: basic_usage/std_curve.toml
:language: toml
:caption: :download:`std_curve.toml <basic_usage/std_curve.toml>`
3. Confirm the plate layout
===========================
Confirm that the layout is correct by using the :prog:`wellmap` command-line
program to produce a visualization of the layout. This is an important step,
because it's much easier to spot mistakes in the visualization than in the
layout file itself.
.. code-block:: console
$ wellmap std_curve.toml
This map shows that:
- Each row is a different replicate.
- Each column is a different dilution.
.. figure:: basic_usage/std_curve_map.svg
It is also possible to create maps like this directly from python, which may be
useful in interactive sessions such as Jupyter notebooks:
.. code-block:: pycon
>>> import wellmap
>>> wellmap.show("std_curve.toml")
<Figure size 320.75x255 with 4 Axes>
4. Prepare the data
===================
Load the data from the experiment in question into a tidy_ data frame. Tidy
data are easier to work with in general, and are required by :mod:`wellmap` in
particular. If you aren't familiar with the concept of tidy data, `this
article`__ is a good introduction. The basic idea is to ensure that:
__ https://tomaugspurger.github.io/modern-5-tidy
- Each variable is represented by a single column.
- Each observation is represented by a single row.
If possible, it's best to export data from the instrument that collected it
directly to a tidy format. When this isn't possible, though, you'll need to
tidy the data yourself. For example, consider the following data (which
corresponds to the layout from above). This is qPCR data, where a higher
:math:`C_q` value indicates that less material is present. The data are shaped
like the plate itself, e.g. a row in the data for every row on the plate, and a
column in the data for every column on the plate. It's not uncommon for
microplate instruments to export data in this format.
.. csv-table:: :download:`std_curve.csv <basic_usage/std_curve.csv>`
:file: basic_usage/std_curve.csv
:header-rows: 1
Below is the code to load this data into a `pandas.DataFrame` with the
following columns:
- *row*: A letter identifying a row on the microplate, e.g. A-H
- *col*: A number identifying a column on the microplate, e.g. 1-12
- *Cq*: The :math:`C_q` value measured for the identified well.
.. code-block:: pycon
>>> def load_cq(path):
>>> return (pd
>>> .read_csv(path)
>>> .rename(columns={'Cq': 'row'})
>>> .melt(
>>> id_vars=['row'],
>>> var_name='col',
>>> value_name='Cq',
>>> )
>>> )
>>> data = load_cq('std_curve.csv')
>>> data
row col Cq
0 A 1 24.180859
1 B 1 24.157118
2 C 1 24.238230
3 A 2 20.740120
4 B 2 20.779703
5 C 2 20.787008
6 A 3 17.183802
7 B 3 17.171795
8 C 3 17.147598
9 A 4 13.774300
10 B 4 13.768831
11 C 4 13.779314
12 A 5 10.294983
13 B 5 10.362967
14 C 5 10.292967
15 A 6 6.967062
16 B 6 6.870273
17 C 6 6.735704
5. Label the data
=================
Use `wellmap.load()` to associate the labels specified in the TOML file (e.g. the
dilutions and replicates) with the experimental data (e.g. the :math:`C_q`
values). This process has three steps:
- Load a data frame containing the data (see above).
- Load another data frame containing the labels.
- Merge the two data frames.
For the sake of clarity and completeness, we will first show how to perform
these steps `manually <#manual-merge>`__. Practically, though, it's easier to
let :mod:`wellmap` perform them `automatically <#automatic-merge>`__.
Manual merge
------------
Use the `wellmap.load()` function to create a `pandas.DataFrame` containing the
information from the TOML file. This data frame will have columns for each
label we specified: *replicate*, *dilution*. It will also have six columns
identifying the wells in different ways: *well*, *well0*, *row*, *col*,
*row_i*, *col_j*. These columns are redundant, but this redundancy makes it
easier to merge the labels with the data. For example, if the wells are named
"A1,A2,..." in the data, the *well* column can be used for the merge. If the
wells are named "A01,A02,...", the *well0* column can be used instead. If the
wells are named in some non-standard way, the *row_i* and *col_j* columns can
be used to calculate an appropriate merge column.
.. code-block:: pycon
>>> import wellmap
>>> labels = wellmap.load('std_curve.toml')
>>> labels
well well0 row col row_i col_j replicate dilution
0 A1 A01 A 1 0 0 1 100000.0
1 A2 A02 A 2 0 1 1 10000.0
2 A3 A03 A 3 0 2 1 1000.0
3 A4 A04 A 4 0 3 1 100.0
4 A5 A05 A 5 0 4 1 10.0
5 A6 A06 A 6 0 5 1 1.0
6 B1 B01 B 1 1 0 2 100000.0
7 B2 B02 B 2 1 1 2 10000.0
8 B3 B03 B 3 1 2 2 1000.0
9 B4 B04 B 4 1 3 2 100.0
10 B5 B05 B 5 1 4 2 10.0
11 B6 B06 B 6 1 5 2 1.0
12 C1 C01 C 1 2 0 3 100000.0
13 C2 C02 C 2 2 1 3 10000.0
14 C3 C03 C 3 2 2 3 1000.0
15 C4 C04 C 4 2 3 3 100.0
16 C5 C05 C 5 2 4 3 10.0
17 C6 C06 C 6 2 5 3 1.0
Use the :func:`pandas.merge` function to associate the labels with the data.
In this case, both data frames have columns named *row* and *col*, so
:mod:`pandas` will automatically use those for the merge. It is also easy to
merge using columns with different names; see the documentation on
:func:`pandas.merge` for more information.
.. code-block:: pycon
>>> import pandas as pd
>>> df = pd.merge(labels, data)
>>> df
well well0 row col row_i col_j replicate dilution Cq
0 A1 A01 A 1 0 0 1 100000.0 24.180859
1 A2 A02 A 2 0 1 1 10000.0 20.740120
2 A3 A03 A 3 0 2 1 1000.0 17.183802
3 A4 A04 A 4 0 3 1 100.0 13.774300
4 A5 A05 A 5 0 4 1 10.0 10.294983
5 A6 A06 A 6 0 5 1 1.0 6.967062
6 B1 B01 B 1 1 0 2 100000.0 24.157118
7 B2 B02 B 2 1 1 2 10000.0 20.779703
8 B3 B03 B 3 1 2 2 1000.0 17.171795
9 B4 B04 B 4 1 3 2 100.0 13.768831
10 B5 B05 B 5 1 4 2 10.0 10.362967
11 B6 B06 B 6 1 5 2 1.0 6.870273
12 C1 C01 C 1 2 0 3 100000.0 24.238230
13 C2 C02 C 2 2 1 3 10000.0 20.787008
14 C3 C03 C 3 2 2 3 1000.0 17.147598
15 C4 C04 C 4 2 3 3 100.0 13.779314
16 C5 C05 C 5 2 4 3 10.0 10.292967
17 C6 C06 C 6 2 5 3 1.0 6.735704
Automatic merge
---------------
While it's good to understand how the labels are merged with the data, it's
better to let :mod:`wellmap` perform the merge for you. Not only is this more
succinct, it also handles some tricky corner cases behind the scenes, e.g.
layouts with multiple data files.
To load *and* merge the data using :func:`wellmap.load`, you need to provide the
following arguments:
- **data_loader**: A function that accepts a path to a file and returns a
:class:`pandas.DataFrame` containing the data from that file. Note that the
function we wrote in the previous section fulfills these requirements. If
the raw data are tidy to begin with, it is often possible to directly use
:func:`pandas.read_csv` or similar for this argument.
- **merge_cols**: An indication of which columns to merge. In the snippet
below, ``True`` means to use any columns that are shared between the two data
frames (e.g. that have the same name). You can also use a dictionary to be
more explicit about which columns to merge on.
Here we also provide the **path_guess** argument, which specifies that the
experimental data can be found in a CSV file with the same base name as the
layout. It also would've been possible to specify the path to the CSV directly
from the TOML file (see `meta.path`), in which case this argument would've been
unnecessary.
.. code-block:: pycon
>>> df = wellmap.load(
... 'std_curve.toml',
... data_loader=load_cq,
... merge_cols=True,
... path_guess='{0.stem}.csv',
... )
>>> df
well well0 row ... replicate dilution Cq
0 A1 A01 A ... 1 100000.0 24.180859
1 A2 A02 A ... 1 10000.0 20.740120
2 A3 A03 A ... 1 1000.0 17.183802
3 A4 A04 A ... 1 100.0 13.774300
4 A5 A05 A ... 1 10.0 10.294983
5 A6 A06 A ... 1 1.0 6.967062
6 B1 B01 B ... 2 100000.0 24.157118
7 B2 B02 B ... 2 10000.0 20.779703
8 B3 B03 B ... 2 1000.0 17.171795
9 B4 B04 B ... 2 100.0 13.768831
10 B5 B05 B ... 2 10.0 10.362967
11 B6 B06 B ... 2 1.0 6.870273
12 C1 C01 C ... 3 100000.0 24.238230
13 C2 C02 C ... 3 10000.0 20.787008
14 C3 C03 C ... 3 1000.0 17.147598
15 C4 C04 C ... 3 100.0 13.779314
16 C5 C05 C ... 3 10.0 10.292967
17 C6 C06 C ... 3 1.0 6.735704
<BLANKLINE>
[18 rows x 10 columns]
6. Analyze the data
===================
Analyze the data given the connection between the labels and the data. This
step doesn't involve :mod:`wellmap`, but is included here for completeness.
The example below makes a linear regression of the data in log-space:
.. literalinclude:: basic_usage/std_curve.py
:language: python
:caption: :download:`std_curve.py <basic_usage/std_curve.py>`
.. figure:: basic_usage/std_curve_py.svg
R² is a measure of how well the line fits the data. In this case, the
fit is very good. Note that there are three data points for each
dilution, but they are hard to tell apart because they are almost perfectly
superimposed. Efficiency is a measure of how well the qPCR reaction worked,
or more specifically, how close the amount of DNA came to doubling (as would
be expected) on each cycle. 100% indicates perfect doubling; 94% is a
little on the low side.
.. _tidy: https://www.jstatsoft.org/article/view/v059i10