Linking high GC content to the repair of double strand breaks in prokaryotic genomes

Here you can find intermediate datasets and corresponding code to generate the figures and analysis described in the paper. Note that all data was acquired from publicly available databases (e.g., RefSeq, ProTraits, REBASE, ATGC). Intermediate datasets (mostly summary tables, described in detail below) can be found in /Data. Scripts to generate figures are in /Rcode (output in /Figs).

Datasets

The majority of datafiles have the extension .RData and can be loaded into R using the load() function. Central datasets include:

GCKU.RData

Data table with columns listing accession numbers, genomic GC content, presence/absence of Ku, and Species names for RefSeq assemblies.

SILVAGCKu.RData

Table matching tip labels from the SILVA 16s tree to genomic GC content and Ku presence/absence.

PTKU.RData

Table matcing genomic GC content and Ku presence/absence to various trait values (columns) for species (rows) in the ProTraits database.

GCRMFlanks.RData

Output of analyses of bases flanking restriction sequences on the genome. Columns describe (1) the distance from restiction sites being considered (in bases, x-axis of Fig 4b), (2) the excess GC content over the null expectation (y-axis of Fig 4b), (3) the null expectation for GC, (4) actual GC, (5-6) bootstrapped 95% confidence intervals of mean difference from null, (7-8) upper and lower quantiles (0.025, 0.975) of the difference from null.

REBASEGenomeEnzymes.RData

List of genome-enzyme pairs from REBASE for which a predicted recognition sequence was available.

PolyGCBiasRate.RData and PolyGC4BiasRate_GC4expected.RData

Polymorphism dataset to get "expected" GC content from mutational biases. The column "m" gives the mutational biases (See Long et al. 2018 for more details this calculation). GCseq and GC4seq give the background GC content of the sequences being examined.

PhiRecombTest.RData

Results from PhiPack. Pvalues for recombination in each cluster-gene pair. Use with ATGC_GC.RData for GC comparisons.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Data		Data
Figs		Figs
Rcode		Rcode
Readme.md		Readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Linking high GC content to the repair of double strand breaks in prokaryotic genomes

Datasets

GCKU.RData

SILVAGCKu.RData

PTKU.RData

GCRMFlanks.RData

REBASEGenomeEnzymes.RData

PolyGCBiasRate.RData and PolyGC4BiasRate_GC4expected.RData

PhiRecombTest.RData

About

Releases

Packages

Languages

jlw-ecoevo/gcku

Folders and files

Latest commit

History

Repository files navigation

Linking high GC content to the repair of double strand breaks in prokaryotic genomes

Datasets

GCKU.RData

SILVAGCKu.RData

PTKU.RData

GCRMFlanks.RData

REBASEGenomeEnzymes.RData

PolyGCBiasRate.RData and PolyGC4BiasRate_GC4expected.RData

PhiRecombTest.RData

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages