Skip to content

Commit

Permalink
Merge pull request #13 from klarman-cell-observatory/yiming
Browse files Browse the repository at this point in the history
Yiming
  • Loading branch information
yihming committed Jun 5, 2020
2 parents 20c8b6c + 833d2a9 commit 2ea83cb
Show file tree
Hide file tree
Showing 18 changed files with 495 additions and 46 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,8 @@ build/
dist/
pegasusio.egg-info/
pegasusio/cylib/*.so

# docs
docs/_build
docs/api/*.rst
!docs/api/index.rst
6 changes: 3 additions & 3 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
=========================================================
Pegasusio for reading / writing single-cell genomics data
PegasusIO for reading / writing single-cell genomics data
=========================================================

|PyPI|
Expand All @@ -8,6 +8,6 @@ Pegasusio for reading / writing single-cell genomics data
:target: https://pypi.org/project/pegasusio


Pegasusio is the IO package for Pegasus.
PegasusIO is the IO package for Pegasus.

`Read documentation <http://pegasus.readthedocs.io>`__
`Read documentation <http://pegasusio.readthedocs.io>`__
20 changes: 20 additions & 0 deletions docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line.
SPHINXOPTS =
SPHINXBUILD = sphinx-build
SPHINXPROJ = pegasusio
SOURCEDIR = .
BUILDDIR = _build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
22 changes: 22 additions & 0 deletions docs/_static/my_theme.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
.wy-nav-content {
max-width: none;
}

.wy-table-responsive table td, .wy-table-responsive table th {
white-space: normal;
}

.wy-table-responsive {
margin-bottom: 24px;
max-width: 100%;
overflow: visible;
}

.red {
color: red;
}

.red-bold {
color: red;
font-weight: bold;
}
20 changes: 20 additions & 0 deletions docs/api/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
.. automodule:: pegasusio
:noindex:

API
===

Import *PegasusIO* to Python environment by::

import pegasusio as io

Read and Write
----------------

.. autosummary::
:toctree: .

infer_file_type
read_input
write_output
aggregate_matrices
93 changes: 93 additions & 0 deletions docs/command_line.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
Use PegasusIO as a command line tool
======================================

PegasusIO can be used as a command line tool. Type::

pegasus -h

to see the help information::

Usage:
pegasus <command> [<args>...]
pegasus -h | --help
pegasus -v | --version

PegasusIO currently has only one sub-command:

- ``aggregate_matrix``: Aggregate sample count matrices into a single count matrix. It also enables users to import metadata into the count matrix.

``pegasusio aggregate_matrix``
-------------------------------

pegasus aggregate_matrix allows aggregating arbitrary matrices with the help of a CSV format sample sheet.

Type::

pegasusio aggregate_matrix -h

to see the usage information::

Usage:
pegasusio aggregate_matrix <csv_file> <output_name> [--restriction <restriction>... options]
pegasusio aggregate_matrix -h

* Arguments:

csv_file
Input csv-formatted file containing information of each sc/snRNA-seq sample. This file must contain at least 2 columns - Sample, sample name and Location, location of the sample count matrix in either 10x v2/v3, DGE, mtx, csv, tsv or loom format. Additionally, an optional Reference column can be used to select samples generated from a same reference (e.g. mm10). If the count matrix is in either DGE, mtx, csv, tsv, or loom format, the value in this column will be used as the reference since the count matrix file does not contain reference name information. In addition, the Reference column can be used to aggregate count matrices generated from different genome versions or gene annotations together under a unified reference. For example, if we have one matrix generated from mm9 and the other one generated from mm10, we can write mm9_10 for these two matrices in their Reference column. Pegasus will change their references to 'mm9_10' and use the union of gene symbols from the two matrices as the gene symbols of the aggregated matrix. For HDF5 files (e.g. 10x v2/v3), the reference name contained in the file does not need to match the value in this column. In fact, we use this column to rename references in HDF5 files. For example, if we have two HDF files, one generated from mm9 and the other generated from mm10. We can set these two files' Reference column value to 'mm9_10', which will rename their reference names into mm9_10 and the aggregated matrix will contain all genes from either mm9 or mm10. This renaming feature does not work if one HDF5 file contain multiple references (e.g. mm10 and GRCh38). See below for an example csv::

Sample,Source,Platform,Donor,Reference,Location
sample_1,bone_marrow,NextSeq,1,GRCh38,/my_dir/sample_1/filtered_gene_bc_matrices_h5.h5
sample_2,bone_marrow,NextSeq,2,GRCh38,/my_dir/sample_2/filtered_gene_bc_matrices_h5.h5
sample_3,pbmc,NextSeq,1,GRCh38,/my_dir/sample_3/filtered_gene_bc_matrices_h5.h5
sample_4,pbmc,NextSeq,2,GRCh38,/my_dir/sample_4/filtered_gene_bc_matrices_h5.h5

output_name
The output file name.

* Options:

-\\-restriction <restriction>...
Select data that satisfy all restrictions. Each restriction takes the format of name:value,...,value or name:~value,..,value, where ~ refers to not. You can specifiy multiple restrictions by setting this option multiple times.

-\\-attributes <attributes>
Specify a comma-separated list of outputted attributes. These attributes should be column names in the csv file.

-\\-default-reference <reference>
If sample count matrix is in either DGE, mtx, csv, tsv or loom format and there is no Reference column in the csv_file, use <reference> as the reference.

-\\-select-only-singlets
If we have demultiplexed data, turning on this option will make pegasusio only include barcodes that are predicted as singlets.

-\\-min-genes <number>
Only keep cells with at least <number> of genes.

-\\-max-genes <number>
Only keep cells with less than <number> of genes.

-\\-min-umis <number>
Only keep cells with at least <number> of UMIs.

-\\-max-umis <number>
Only keep cells with less than <number> of UMIs.

-\\-mito-prefix <prefix>
Prefix for mitochondrial genes. If multiple prefixes are provided, separate them by comma (e.g. "MT-,mt-").

-\\-percent-mito <percent>
Only keep cells with mitochondrial percent less than <percent>%. Only when both mito_prefix and percent_mito set, the mitochondrial filter will be triggered.

-\\-no-append-sample-name
Turn this option on if you do not want to append sample name in front of each sample's barcode (concatenated using '-').

\-h, -\\-help
Print out help information.

* Outputs:

output_name.zarr.zip
A zipped Zarr file containing aggregated data.

* Examples::

pegasusio aggregate_matrix --restriction Source:BM,CB --restriction Individual:1-8 --attributes Source,Platform count_matrix.csv aggr_data
212 changes: 212 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,212 @@
# -*- coding: utf-8 -*-
#
# Configuration file for the Sphinx documentation builder.
#
# This file does only contain a selection of the most common options. For a
# full list see the documentation:
# http://www.sphinx-doc.org/en/master/config

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
import os
from pathlib import Path
import sys

HERE = Path(__file__).parent
sys.path.insert(0, str(HERE.parent))

import pegasusio

# -- Project information -----------------------------------------------------

project = "PegasusIO"
copyright = "2020 The Broad Institute, Inc. and The General Hospital Corporation. All rights reserved."
author = (
"Bo Li, Yiming Yang"
)

# The short X.Y version
version = "0.1"
# The full version, including alpha/beta/rc tags
release = "0.1.6"


# -- General configuration ---------------------------------------------------

# If your documentation needs a minimal Sphinx version, state it here.
#
needs_sphinx = '1.7'

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
"sphinx.ext.autodoc",
"sphinx.ext.intersphinx",
"sphinx.ext.doctest",
"sphinx.ext.todo",
"sphinx.ext.mathjax",
"sphinx.ext.coverage",
"sphinx.ext.imgmath",
"sphinx.ext.ifconfig",
"sphinx.ext.viewcode",
"sphinx.ext.githubpages",
"sphinx.ext.autosummary",
"sphinx.ext.napoleon",
"sphinx_autodoc_typehints",
]

autodoc_default_options = {
"members": True,
"member-order": "bysource"
}
autosummary_generate = True
todo_include_todos = False

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = ".rst"

# The master toctree document.
master_doc = "index"

# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
#
# This is also used if you do content translation via gettext catalogs.
# Usually you set "language" from the command line for these cases.
language = None

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path .
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

# The name of the Pygments (syntax highlighting) style to use.
pygments_style = "sphinx"

intersphinx_mapping = dict(
anndata=('https://anndata.readthedocs.io/en/latest/', None),
numpy=('https://docs.scipy.org/doc/numpy/', None),
pandas=('http://pandas.pydata.org/pandas-docs/stable/', None),
)


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "sphinx_rtd_theme"

# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
#
html_theme_options = {"navigation_depth": 4}

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]

# Custom sidebar templates, must be a dictionary that maps document names
# to template names.
#
# The default sidebars (for documents that don't match any pattern) are
# defined by theme itself. Builtin themes are using these templates by
# default: ``['localtoc.html', 'relations.html', 'sourcelink.html',
# 'searchbox.html']``.
#
# html_sidebars = {}

html_context = dict(
display_github=True, # Integrate GitHub
github_user="klarman-cell-observatory", # Username
github_repo="pegasusio", # Repo name
github_version="master", # Version
conf_py_path="/docs/", # Path in the checkout to the docs root
)


# -- Options for HTMLHelp output ---------------------------------------------

# Output file base name for HTML help builder.
htmlhelp_basename = "pegasusio_doc"


def setup(app):
app.add_stylesheet("css/custom.css")


# -- Options for LaTeX output ------------------------------------------------

latex_elements = {
# The paper size ('letterpaper' or 'a4paper').
#
# 'papersize': 'letterpaper',
# The font size ('10pt', '11pt' or '12pt').
#
# 'pointsize': '10pt',
# Additional stuff for the LaTeX preamble.
#
# 'preamble': '',
# Latex figure (float) alignment
#
# 'figure_align': 'htbp',
}

# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
latex_documents = [
(
master_doc,
"pegasusio.tex",
"PegasusIO Documentation",
"Bo Li, Yiming Yang",
"manual",
)
]


# -- Options for manual page output ------------------------------------------

# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
man_pages = [(master_doc, "pegasusio", "PegasusIO Documentation", [author], 1)]


# -- Options for Texinfo output ----------------------------------------------

# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
texinfo_documents = [
(
master_doc,
"pegasusio",
"PegasusIO Documentation",
author,
"pegasusio",
"One line description of project.",
"Miscellaneous",
)
]


# -- Extension configuration -------------------------------------------------

# -- Options for todo extension ----------------------------------------------

# If true, `todo` and `todoList` produce output, else they produce nothing.
todo_include_todos = True

0 comments on commit 2ea83cb

Please sign in to comment.