Skip to content

Commit

Permalink
docs: explain **merge_cols** more clearly
Browse files Browse the repository at this point in the history
  • Loading branch information
kalekundert committed Sep 24, 2020
1 parent ecd7577 commit 42c1d03
Show file tree
Hide file tree
Showing 2 changed files with 50 additions and 44 deletions.
5 changes: 5 additions & 0 deletions docs/_static/css/corrections.css
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,11 @@
margin-bottom: 12px;
}

.field-list p {
margin-bottom: 12px !important;
}


/* Try to show the example code side-by-side with the image. */

.rst-content .wellmap-example {
Expand Down
89 changes: 45 additions & 44 deletions wellmap/file.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,55 +54,56 @@ def load(toml_path, *, data_loader=None, merge_cols=None, path_guess=None,
**path_required** is True.
:param bool,dict merge_cols:
Indicates that `load()` should attempt to merge the plate layout and
the actual data associated with it into a single data frame. This
functionality requires several conditions to be met:
1. The **data_loader** argument must be specified (otherwise there'd be
no data to merge).
2. The data frame returned by **data_loader()** must be `"tidy"`__.
Briefly, a data frame is tidy if each of its columns represents a
single variable (e.g. time, fluorescence) and each of its rows
represents a single observation.
__ http://vita.had.co.nz/papers/tidy-data.html
3. The data frame returned by **data_loader()** must have one (or more)
columns/variables indicating which well each row/observation comes
from. For example, a column called "Well" with values like "A1",
"A2", "B1", "B2", etc. would satisfy this requirement.
The **merge_cols** argument specifies which columns to use when merging
the data frames representing the layout and the actual data (i.e. the
two data frames that would be returned if **data_loader** was specified
but **merge_cols** was not) into one. The argument can either be a
bool or a dictionary:
Indicates whether or not---and if so, how---`load()` should merge the
data frames representing the plate layout and the actual data (provided
by **data_loader**). The argument can either be a boolean or a
dictionary:
If *False* (or falsey, e.g. ``None``, ``{}``, etc.), the data frames
will be returned separately and not be merged. This is the default
behavior.
If *True*, the data frames will be merged using any columns that share
the same name. For example, the layout will always have a column named
*well*, so if the actual data also has a column named *well*, the merge
would happen on those columns.
If a dictionary, the keys and values identify the names of the columns
that correspond with each other for the purpose of merging. Each key
should be one of the columns from the data frame representing the
layout loaded from the TOML file. This data frame has 8 columns which
identify the wells: *plate*, *path*, *well*, *well0*, *row*, *col*,
*row_i*, *row_j*. See the "Returns" section below for more details on
the differences between these columns. Note that the *path* column is
included in the merge automatically and never has to be specified.
Each value should be one of the columns from the data frame
representing the actual data. This data frame will have whatever
columns were created by **data_loader()**.
Note that the columns named in each key-value pair must contain values
that correspond exactly (i.e. not "A1" and "A01"). It is the
responsibility of **data_loader()** to ensure that this is possible.
If *True*, the data frames will be merged using any columns that
share the same name. For example, the layout will always have a
column named *well*, so if the actual data also has a column named
*well*, the merge would happen on those columns.
If a dictionary, the data frames will be merged using the columns
identified in each key-value pair of the dictionary. The keys should
be column names from the data frame representing the plate layout
(described below; see the **layout** return value), and the values
should be column names from the data frame returned by
**data_loader**. Below are some examples of this argument:
- :code:`{'well0': 'Well'}`: Indicates that the "Well" column in the
data contains zero-padded well names, like "A01", "A02", etc.
- :code:`{'row_i': 'Row', 'col_j': 'Col'}`: Indicates that the 'Row'
and 'Col' columns in the data contain 0-indexed coordinates (e.g. 0,
1, 2, ...) identifying each row and column, respectively.
Some details and caveats:
- In order to successfully merge two columns, the values in those
columns must correspond exactly. For example, a column that contains
unpadded well names like "A1" cannot be merged with a column that
contains padded well names like "A01". This is why the **layout**
data frame contains so many redundant columns: to increase the chance
that one will correpond with a column provided by the data. In some
cases, though, it may be necessary for the **data_loader** function
to construct an appropriate merge column.
- The data frame returned by **data_loader()** must be `"tidy"`__.
Briefly, a data frame is tidy if each of its columns represents a
single variable (e.g. time, fluorescence) and each of its rows
represents a single observation.
__ http://vita.had.co.nz/papers/tidy-data.html
- The *path* column of the layout is automatically included in the
merge and never has to be specified (although it is not an error to
do so). This is makes sense because `load()` itself knows what path
each data frame was loaded from.
:param str path_guess:
Where to look for a data file if none is specified in the given TOML
Expand Down

0 comments on commit 42c1d03

Please sign in to comment.