Skip to content
Sequencing and analysis of crAssphage regions from around the globe
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
Combined_Analysis Instructions for the new images Mar 11, 2019
Global_Survey adding date information Apr 28, 2018
Guerin_Phages Comparing crAssphage and Guerin sequences Dec 13, 2018
Local_Survey Correcting metadata Dec 1, 2017
Metagenomes Merge branch 'master' of Jan 31, 2019
Primate Adding ape crAssphage sequences Jan 8, 2019
bin Merge branch 'master' of Mar 29, 2019
data removing \r Jan 5, 2019
.gitattributes moving .gz to LFS Oct 7, 2017
.gitignore Open Office Mar 28, 2018 instal, use, and then cite Jul 24, 2017 Changing affiliation to New York University Apr 22, 2018 Global Sequences Contributed by Everyone Nov 25, 2017
JQ995537.gbk The Netherlands Dec 1, 2017
LICENSE Initial commit Nov 9, 2015 Software used in the analysis Apr 26, 2018
crAssphage_PCR_details.txt Details of the PCR reactions Dec 18, 2015
requirements.txt updating list of required modules Feb 14, 2019


Sequencing and analysis of crAssphage regions from around the globe and from all the metagenomes we can find.

This is a repository of DNA sequences, analysis, and other information for the global crAssphage project being developed by Bas Dutilh and Rob Edwards.

Together, we have developed this site as a common resource for everyone to add sequences, get sequences from, and build alignments and analyses. If you want to become a collaborator and add data to the repository, please contact Rob Edwards. If you want to just take the data and analyze it, you can clone this whole data set using git clone [Make sure you read the note below about Git_LFS before you do that.]

All of the data here is provided under the MIT License. Basically, you can do whatever you want as long as you include the original copyright and license notice in any copy of the software, source code, data, or other parts of the repository. However, if you use this data or analysis please let Rob know.

Please be aware that Rob and our collaborators are working on a manuscript describing this data, and so please do not use this data as part of a manuscript without asking Rob. We will almost certainly say yes, unless you are writing exactly the same manuscript that we are writing, and in which case we'll invite you to join our team!

crAssphage sequences

Genbank Sequence

The original sequence that was published in Genbank (but has since been deleted) is available as JQ995537.gbk. It should be available in RefSeq although that comes and goes sometimes.

Global Survey

Together with our amazing collaborators we have been sampling sites all over the world for crAssphage. We are still looking for more collaborators to provide sequences. Some sites have reported not finding any crAssphage, but not many!

Local Survey

At a couple of locations we have sampled the same sites more than once. This data tells an interesting story.


We have used the awesome Hansel and Gretel to extract haplotypes from metagenomes that contain crAssphage.

Volunteer Studies

Our awesome volunteers have provided samples time and again for us to test to see if they have crAssphage.

Software for the analysis

We have included a list of all the software we used to analyze these sequences in requirements.txt.


We use Git Large File Storage to store some of the large files. You will need to install Git LFS for that to work properly. Its easy to do, and you only need to do it once.

Questions or Comments? Want to be a collaborator?

Contact Rob Edwards and let him know.

You can’t perform that action at this time.