GitHub - elifesciences-publications/Vakirlis_Carvunis_McLysaght_2019: Data and scripts used in Vakirlis, Carvunis and McLysaght 2019 Biorxiv

This repository contains all data necessary to reproduce all figures and statistics of Vakirlis, Carvunis and McLysaht 2019 eLife, 2020. http://doi.org/10.7554/eLife.53500.

The scripts can be found in the scripts/ folder and the figures to which they relate should be evident from the names of the files. The figures, which are the output of the scripts, can be found in the figures/ folder.

Figure5B.csv is an extended version of Figure 5B of the article in text table format.

All source data that are referenced in the article can be found here:

Figure_3-source_data_1.csv : Data on undetectable homologies for different E-value cut-offs used to generate the top panel of Figure 3A. Column names: "total" : total number of genes in conserved micro-synteny, "not_found" : number of genes without significant sequence similarity, "div" : time since divergence from focal species

Figure_3-source_data_2.csv : Data on false homologies for different E-value cut-offs used to generate the bottom panel of Figure 3A. Column names: "found" : number of genes with significant sequence similarity, The rest are the same as in the file above.

Figure_5_source_data_1.csv : dN and dS data used to generate Figure 5 and the accompanying stats. See Methods section for how these data were generated. Column names: "micro-synteny" : whether the gene satisfies our conserved micro-synteny criteria with the relevant species (see Methods).

Figure_6-source_data_1.xls : An excel file with one dataset per sheet, containing the similarity and micro-synteny conservation information for every focal - target species comparison. The tables can also be found separately in text format in the synt_simil_tables/ folder.

Figure_7_source_data_1.csv : Data on common Pfam matches and gene/protein properties used to generate Figure 7. Column names: "Gene_focal" : Name of the focal species gene, "Gene_ortho" : name of the target species gene, "same" : whether a common Pfam match was found or these genes are in the same OrthoDB group, "focal" : the value for the property in the focal gene, "ortho" : the value for the property in the target species gene, "var" : the name of the property

Figure_8_source_data_1.csv : CDS/protein properties for all undetectable homologue pairs, partially redundant with Figure_7_source_data_1.csv.

The table figure supplements are also provided here, details can be found in the article supplementary material.

Furthermore, we provide some additional/raw data used by the scripts:

all_gene_pairs/ : Data (for each dataset separately) for pairs of undetectable homologues used to calculate correlations after removal of pairs with high percentage of low complexity. Column names should be self-explanatory.

Pfam_search_raw_data/ : PfamScan search output files for focal and target species proteins of interest.

synt_simil_tables/ : Individual data tables that make up Figure_6-source_data_1.xls . See readme inside the folder.

divergence_times/ : Divergence times from the focal species for each target species.

dnds/ : Raw data used for the dN, dS analyses.

full_lists_genes/ : lists with gene names and IDs used in the analyses.

nr_results/ : Parsed results of similarity searches in NCBI's NR database, see Methods for details.

oo_dfs/ : Raw files containing all the focal-target gene pairs found in conserved micro-synteny.

synt_relaxed_data/ : Raw files containing the focal genes in conserved synteny with the corresponding species when using the relaxed and stringent synteny criteria.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Pfam_search_raw_data		Pfam_search_raw_data
all_gene_pairs		all_gene_pairs
divergence_times		divergence_times
dnds		dnds
figures		figures
full_lists_genes		full_lists_genes
nr_results		nr_results
oo_dfs		oo_dfs
scripts		scripts
synt_relaxed_data		synt_relaxed_data
synt_simil_tables		synt_simil_tables
Figure3-supplement_1.csv		Figure3-supplement_1.csv
Figure3-supplement_1_forPaper.csv		Figure3-supplement_1_forPaper.csv
Figure5B.csv		Figure5B.csv
Figure_3-source_data_1.csv		Figure_3-source_data_1.csv
Figure_3-source_data_2.csv		Figure_3-source_data_2.csv
Figure_5_source_data_1.csv		Figure_5_source_data_1.csv
Figure_6-source_data_1.xls		Figure_6-source_data_1.xls
Figure_7-supplement_1.csv		Figure_7-supplement_1.csv
Figure_7-supplement_2.csv		Figure_7-supplement_2.csv
Figure_7_source_data_1.csv		Figure_7_source_data_1.csv
Figure_8-supplement_1.csv		Figure_8-supplement_1.csv
Figure_8_source_data_1.csv		Figure_8_source_data_1.csv
LICENSE		LICENSE
README.md		README.md

License

elifesciences-publications/Vakirlis_Carvunis_McLysaght_2019

Folders and files

Latest commit

History

Repository files navigation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages