- What is PPNet? PPNet is designed to uses genome information and analysis of phylogenetic profiles with binary similarity and distance measures to derive large-scale bacterial association networks of a single species.
PPNet has the following dependencies:
- prokka
- roary
- Python(>=version 3.7)
- Python modules:
- biopython
- pyvis
- numpy
- scipy
- statsmodels
- kneed
- pyani
- Install with the source codes
- Download the source codes:
git clone https://github.com/liyangjie/PPNet.git
- Rename the main program and add the path to the environment variable:
# Rename PPNet.py to PPNet mv PPNet/bin/ppnet.py PPNet/bin/ppnet # Give the scripts executable permission chmod +x PPNet/bin/* # Add the path to the environment variable echo export PATH="/Path/to/PPNet/bin:$PATH" >> ~/.bashrc source ~/.bashrc
- Install the Python dependencies:
pip install biopython pyvis numpy scipy statsmodels pyani
- You must install Prokka and Roary independently.
- Download the source codes:
ppnet [Options]
Options:
[-h] show this help message and exit
[-i1] [Required] The path of input genomes
[-i2] [Required] The path of phenotype (e.g., pathogenic or non-pathogenic) of all strains
[-o] The path of output (Default "./PPNet_output")
[-x] The suffix of genomes data (Default "fasta")
[-c] number of CPUs to use
[-a] [Required] Select the algorithm for calculating the correlation coefficient [1-81], or set 0 to use all algorithm.
[-pt] What percentage of interactions will be visualized (Default "1")
[-t1] The threshold of ANI [0-0.9999], or set as "auto" to select the inflection point as the threshold for ANI. (Default "auto")
[-t2] The threshold of |1-ANC| [0-0.999], or set as "auto" to select the inflection point as the threshold for |1-ANC|. (Default "auto")
See Algorithm.docx
ppnet -i1 PATH/to/your/genomes/ -i2 group.csv -x fasta -c 4 -a 1
The genome file should be in fasta format and placed in the same path. The group.csv
- PPNet_output/HQ_data/*: High quality genomes which with N50 > 10000;
- PPNet_output/NR_data/*: Non-redundant genome sets after deduplication;
- PPNet_output/Prokka_result/*: The result files of Prokka
- PPNet_output/Gff_file/*: Include the GFF file extracted from the prokka_result folder with the input file for roary
- PPNet_output/Roary_result/*: Result files generated by roary
- PPNet_output/Roary_result/Statistical_test_result.csv: The result of Fisher's exact test for the distribution of each gene, by default, PPNet reports all genes with a adjusted p-value <0.05.
- PPNet_output/Roary_result/filted_phylogenetic_profile.csv: The phylogenetic profile of orthologs with significantly different distributions.
- PPNet_output/Roary_result/netwrok_result_method_x.csv: List the association coefficient calculated by algorithm x between each pair of genes.
- PPNet_output/Gene_net_x.html: A network plot inferred by algorithm x that can be opened with a browser(Google Chrome,Microsoft Edge etc.).By default, only first percent of interactions were visualized.
PPNet is free software under a GPLv3 license.