-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
3e914ae
commit cfd9bbc
Showing
9 changed files
with
317 additions
and
11 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
**Input Data Format** | ||
===================== | ||
|
||
Gene Expression Data Format | ||
|
||
|
||
The input gene expression data is expected in one of the following formats: | ||
|
||
1. Spreadsheet of comma-separated values ``csv`` containing condensed matrix in a form ``('cell', 'gene', 'expr')``. | ||
If there are batches in the data the matrix has to be of the form ``('batch', 'cell', 'gene', 'expr')``. Columns order can be arbitrary. | ||
|
||
|
||
+------+------+------+ | ||
| cell | gene | expr | | ||
+======+======+======+ | ||
| C1 | G1 | 3 | | ||
+------+------+------+ | ||
| C1 | G2 | 2 | | ||
+------+------+------+ | ||
| C1 | G3 | 1 | | ||
+------+------+------+ | ||
| C2 | G1 | 1 | | ||
+------+------+------+ | ||
| C2 | G4 | 5 | | ||
+------+------+------+ | ||
| ... | ... | ... | | ||
+------+------+------+ | ||
|
||
or: | ||
|
||
+--------+------+------+------+ | ||
| batch | cell | gene | expr | | ||
+========+======+======+======+ | ||
| batch0 | C1 | G1 | 3 | | ||
+--------+------+------+------+ | ||
| batch0 | C1 | G2 | 2 | | ||
+--------+------+------+------+ | ||
| batch0 | C1 | G3 | 1 | | ||
+--------+------+------+------+ | ||
| batch1 | C2 | G1 | 1 | | ||
+--------+------+------+------+ | ||
| batch1 | C2 | G4 | 5 | | ||
+--------+------+------+------+ | ||
| ... | ... | ... | ... | | ||
+--------+------+------+------+ | ||
|
||
|
||
|
||
|
||
2. Spreadsheet of comma-separated values ``csv`` where rows are genes, columns are cells with gene expression counts. | ||
If there are batches in the data the spreadsheet the first row should be ``'batch'`` and the second ``'cell'``. | ||
|
||
|
||
+-------+--------+--------+--------+--------+ | ||
| cell | C1 | C2 | C3 | C4 | | ||
+=======+========+========+========+========+ | ||
| G1 | | 3 | 1 | 7 | | ||
+-------+--------+--------+--------+--------+ | ||
| G2 | 2 | 2 | | 2 | | ||
+-------+--------+--------+--------+--------+ | ||
| G3 | 3 | 1 | | 5 | | ||
+-------+--------+--------+--------+--------+ | ||
| G4 | 10 | | 5 | 4 | | ||
+-------+--------+--------+--------+--------+ | ||
| ... | ... | ... | ... | ... | | ||
+-------+--------+--------+--------+--------+ | ||
|
||
or: | ||
|
||
+-------+--------+--------+--------+--------+ | ||
| batch | batch0 | batch0 | batch1 | batch1 | | ||
+=======+========+========+========+========+ | ||
| cell | C1 | C2 | C3 | C4 | | ||
+-------+--------+--------+--------+--------+ | ||
| G1 | | 3 | 1 | 7 | | ||
+-------+--------+--------+--------+--------+ | ||
| G2 | 2 | 2 | | 2 | | ||
+-------+--------+--------+--------+--------+ | ||
| G3 | 3 | 1 | | 5 | | ||
+-------+--------+--------+--------+--------+ | ||
| G4 | 10 | | 5 | 4 | | ||
+-------+--------+--------+--------+--------+ | ||
| ... | ... | ... | ... | ... | | ||
+-------+--------+--------+--------+--------+ | ||
|
||
3. ``Pandas DataFrame`` where ``axis 0`` is genes and ``axis 1`` are cells. | ||
If the are batched in the data then the index of ``axis 1`` should have two levels, e.g. ``('batch', 'cell')``, | ||
with the first level indicating patient, batch or expreriment where that cell was sequenced, and the | ||
second level containing cell barcodes for identification. | ||
|
||
.. code:: python | ||
df = pd.DataFrame(data=[[2,np.nan],[3,8],[3,5],[np.nan,1]], | ||
index=['G1','G2','G3','G4'], | ||
columns=pd.MultiIndex.from_arrays([['batch0','batch1'],['C1','C2']], names=['batch', 'cell'])) | ||
4. ``Pandas Series`` where index should have two levels, e.g. ``('cell', 'gene')``. If there are batched in the data | ||
the first level should be indicating patient, batch or expreriment where that cell was sequenced, the second level cell barcodes for | ||
identification and the third level gene names. | ||
|
||
.. code:: python | ||
se = pd.Series(data=[1,8,3,5,5], | ||
index=pd.MultiIndex.from_arrays([['batch0','batch0','batch1','batch1','batch1'], | ||
['C1','C1','C1','C2','C2'], | ||
['G1','G2','G3','G1','G4']], names=['batch', 'cell', 'gene'])) | ||
Any of the data types outlined above need to be prepared/validated with a function ``prepare()``. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,7 @@ | ||
**Demo** | ||
============ | ||
|
||
|
||
.. literalinclude:: ../examples/demo.py | ||
|
||
|
Oops, something went wrong.