Skip to content

Commit

Permalink
documentation updates
Browse files Browse the repository at this point in the history
  • Loading branch information
aewebb80 committed Jul 15, 2020
1 parent 78ddaae commit 165e282
Show file tree
Hide file tree
Showing 7 changed files with 49 additions and 30 deletions.
13 changes: 9 additions & 4 deletions docs/source/PPP_pages/examples.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,16 +12,21 @@ PPP functions may be called at the command-line as shown in this example:
vcf_filter.py --vcf examples/files/merged_chr1_10000.vcf.gz --filter-only-biallelic --out-format bcf
Details on the usage of a specific function may be found within the *Example usage* section of the function in question. In addition, all example files used may be found within **examples/files** directory.

Details on the usage of each specific function may be found within the *Example usage* section of the function’s documentation. In addition, all files shown within these examples may be found within **examples/files** directory of the PPP repository.

##########################
Jupyter Notebook Pipelines
##########################

All PPP functions may also be used within a `Jupyter Notebook <https://jupyter.org/>`_. We have included some examples below:
All PPP functions may also be used within a `Jupyter Notebook <https://jupyter.org/>`_. We have included two example notebooks.

.. toctree::
:maxdepth: 1

jupyter/example_pipeline_pan.ipynb
jupyter/example_pipeline_pan.ipynb

.. only:: html

The Jupyter Notebooks may also be download:

* :download:`Example Jupyter Pipleine <jupyter/example_pipeline_pan.ipynb>`.
10 changes: 5 additions & 5 deletions docs/source/PPP_pages/functions.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
=============
PPP Functions
=============
==============
Core Functions
==============

The functions below were developed to perform many of the core operations typically used in population genetic analyses. Each of these functions were designed to perform a single operation (i.e. filtering, phasing, etc.).

Expand All @@ -9,8 +9,8 @@ The functions below were developed to perform many of the core operations typica

Functions/vcf_filter
Functions/vcf_calc
Functions/stat_sampler
Functions/loci_filter
Functions/vcf_split
Functions/vcf_phase
Functions/vcf_four_gamete
Functions/vcf_four_gamete

29 changes: 29 additions & 0 deletions docs/source/PPP_pages/intro.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
============
Introduction
============

The Popgen Pipeline Platform (PPP) was written using the Python programming language and designed to operate using Python 3.7. In comparison to a fixed pipeline, the PPP was designed as a collection of modular functions that may combined to generate a wide variety of analyses and pipelines.

For simplicity, PPP functions are separated into four categories:
# **Core functions**: Frequently used methods and procedures in population genomic pipelines (e.g. phasing, filtering, four-gamete test, etc.).
# **Input file** generators: Input generators for creating the necessary input for population genomic analysis (e.g. generating input for IMa3, TreeMix, G-PhoCS, etc.)
# **Analyses**: Common population genomic analyses (e.g. isolation and migration, admixture, linkage disequilibrium, etc.)
# **Utilities**: Simple file-specific procedures often required in population genomic pipelines

For details on specific functions, please see the documentation on each section.

.. image:: PPP_assets/PPP_Pipeline_Figure.png
:scale: 50 %
:align: center

.. centered::
Figure 1: Structure of the PPP


##################
Creating Pipelines
##################

Most PPP-based pipelines are expected to primarily consist of core functions. To simplify development, all core functions were designed to operate using VCF-based files. The VCF format was selected due to the frequent support for the format among publicly available datasets and population genomics software. At present, pipelines may be generated in one of two methods: i) calling each function by command-line or ii) calling the function within a script, such as a jupyter notebook. Example usage of both methods may be found within <examples.rst>`__.


2 changes: 1 addition & 1 deletion docs/source/PPP_pages/model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Model File and Creation
=======================

A core aspect of the PPP is the use of Model files, JSON-based files used to assign and store **population models**. A population model primarily consists of: the populations within the model; the individuals in each population; and a population tree. Model files offer various benefits within the PPP: i) automatic assignment of relevant populations, individuals, or other potential meta-data; ii) simplifed process to examine multiple models; and iii) a single repository of all relevant meta-data.
A core aspect of the PPP is the use of Model files, JSON-based files used to assign and store **population models**. A population model primarily consists of: the populations within the model; the individuals in each population; and a population tree. Model files offer various benefits within the PPP: i) automatic assignment of relevant populations, individuals, or other potential meta-data; ii) simplified process to examine multiple models; and iii) a single repository of all relevant meta-data.


Model files may be created and edited using our model creator.
Expand Down
1 change: 1 addition & 0 deletions docs/source/PPP_pages/utilities.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ The utility functions were developed to perform various tasks often needed when
Utilities/vcf_utilities
Utilities/bed_utilities
Utilities/vcf_bed_to_seq
Utilities/stat_sampler
24 changes: 4 additions & 20 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,32 +2,16 @@
Popgen Pipeline Platform
========================

.. only:: not html

------------
Introduction
------------
The Popgen Pipeline Platform (PPP) is a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, and output generation. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect, depending on the expertise of the investigator. The platform has also been developed with reproducibility and extensibility of analyses in mind.

The Popgen Pipeline Platform (PPP) is a software platform with the goal of reducing the computational expertise required for conducting population genomic analyses. The PPP was designed as a collection of scripts that facilitate common population genomic workflows in a consistent and standardized environment. Functions were developed to encompass entire workflows, including: input preparation, file format conversion, various population genomic analyses, output generation, and visualization. By facilitating entire workflows, the PPP offers several benefits to prospective end users - it reduces the need of redundant in-house software and scripts that would require development time and may be error-prone, or incorrect, depending on the expertise of the investigator. The platform has also been developed with reproducibility and extensibility of analyses in mind.

The PPP was written using the Python programming language and designed to operate using either Python 2.7 or 3.7. However, as `Python 2 will no longer be maintained past January 1, 2020 <https://www.python.org/dev/peps/pep-0373/>`_ we strongly recommend using Python 3. We designed the PPP as a collection of modular functions that users may combine to generate a wide variety of analyses and pipelines. The functions within the PPP are also seperated into four groups: core VCF-based functions; optional BED/STAT file functions; file conversion functions; and analysis functions (Figure 1).

.. image:: PPP_assets/PPP_Pipeline_Figure.png
:scale: 50 %
:align: center

.. centered::
Figure 1: Structure of the PPP

The core functions of the PPP were designed to operate using VCF-based files primarily due to frequent support for the format among publicly available datasets and population genomics software. Most users will begin their pipelines with these core functions before moving onto an analysis function. Please note that most analysis functions require a preceding file conversion function to operate.

Please Note: This documentation is currently being devloped and will be updated freqeuntly in the coming days
Please Note: This documentation is currently being developed and will be updated freqeuntly in the coming days

.. toctree::
:maxdepth: 2
:caption: Contents:
:hidden:
:hidden:

PPP_pages/intro
PPP_pages/install
PPP_pages/examples
PPP_pages/functions
Expand Down

0 comments on commit 165e282

Please sign in to comment.