CALeDNA transect analyses

This repository stores the scripts used to generate the analyses in "Landscape Analyses Using eDNA Metabarcoding and Earth Observation Predict Community Biodiversity in California" by Lin et al. in press at Ecological Applications.

The complete repository including raw data, maps and rasters and derived data can be found at zenodo archive https://doi.org/10.5281/zenodo.4516670.

The raw sequencing results are deposited in Sequence Reads Archive (Bioproject: PRJNA702201).

Contact: Meixi Lin (meixilin[at]ucla[d0t]edu)

Scripts

Please adapt the scripts input and output directory accordingly to your working directories. Important derived data can be found in the zenodo archive as well.

Set up the environment

Computational environment
1. Most downstream analyses can be conducted on a PC/Mac.
2. Some analyses (e.g. sequence processing and gradient forest analyses) were performed on UCLA Hoffman2 cluster program
3. Some environmental layers were obtained using Google Earth Engine
Most analyses are performed using R. The required R packages are listed in install_packages.R
The utility script that records necessary variables and functions can be sourced at function_transect.R

Step0: prepare data

.
├── asv_tables
│   ├── 1_Transect_data_decontamination.R
│   ├── 2_remove_5_dup_sites_1028.R
│   └── 3_import_taxonomy_detailed_10172019.R
├── metadata
│   ├── 1_reproject_raster.R
│   ├── 2_extract_raster.R
│   ├── 3_calculate_geographical_dist_0827.R
│   ├── 3_combine_old_meta.R
│   ├── 4_specify_biom_RData_0412.R
│   ├── 5_update_coastal_0531.R
│   ├── 6.1_correlation_test_metadata_1026.R
│   ├── 6.2_correlation_grouping_metadata_0523.R
│   ├── 7.1_density_distribution.R
│   ├── function_raster.R
│   ├── get_bioclim.sh
│   ├── get_earth_engine.js
│   ├── get_land_cover.js
│   ├── get_soilgrid_100m.sh
│   ├── get_uncertainty_earth_engine.js
│   ├── get_uncertainty_percent_earth_engine.js
│   ├── metadata_reextract_04302019.sh
│   ├── metadata_reproject_04032019.sh
│   ├── x.2_get_other_uncertainty_20200930.R
│   └── x_get_soilgrid_uncertainty_20200904.R
└── sample_map
    ├── ca_base_map.R
    └── sample_map.R

Usage:

decontaminate asv tables and remove five replicate sites
obtain and extract environmental layers

Output:

final taxonomy tables and metadata
environmental layers with Statewide coverage

Step1: create phyloseq

.
├── 1_step1_read_newdeco_data_0412.R
├── 2_step1_rarefaction_1107.R
└── rarefaction_evaluation
    ├── 1_rarefaction_plot_0506.R
    ├── 2_evaluate_rarefaction_0529.R
    └── 3_iNext_evaluation.R

Usage:

create phyloseq object from final dataset
perform rarefaction evaluation and select the appropriate rarefaction depth.

Output:

decontaminated phyloseq objects
rarefied phyloseq objects

├── derive_data
│   ├── phy_deco # phyloseq objects of each decontaminated metabarcode dataset 
│   │   ├── phy_deco_16S.RData
│   │   ├── phy_deco_18S.RData
│   │   ├── phy_deco_CO1.RData
│   │   ├── phy_deco_FITS.RData
│   │   ├── phy_deco_PITS.RData
│   │   ├── phy_deco_all.RData
│   │   ├── phydeco.RData
│   │   └── phydeco_uc.RData
│   ├── phy_rare # phyloseq objects of each rarefied metabarcode dataset 
│   │   ├── phy_1000_CO1.RData
│   │   ├── phy_1000_PITS.RData
│   │   ├── phy_2000_16S.RData
│   │   ├── phy_4000_18S.RData
│   │   ├── phy_4000_FITS.RData
│   │   └── phyrare.RData
│   ├── step1_create_phyloseq # evaluation output 
│   │   ├── eval_rarefaction
│   │   └── physeq_rare_otu

Step2: data description

.
├── concordance
│   ├── 1_summary_5_dup_sites_20200107.R
│   ├── 2_describe_5_dup_sites_20200107.R
│   ├── 2_describe_5_dup_sites_full_20200107.R
│   ├── 3_summary_plots.R
│   ├── 3_summary_plots_cutoff.R
│   ├── 4_pcoa_dup_sites_supp_fig6.R
│   └── describe_5_dup_sites_call.sh
├── gbif_tos
│   └── 1_check_gbif_tos_20200928.R
├── summary_stats
│   ├── 1_decontam_seqs_read_depths.R
│   ├── 2.1_metadata_distributions.R
│   ├── 2.2_raster_data_distribution.R
│   ├── 3_category_correlation.R
│   ├── 4_step1_physeq_basic_stats_0830.R
│   ├── 4_step1_physeq_basic_stats_rarefied.R
│   ├── 5_step1_physeq_plot_stats_06122019.R
│   ├── 6_step1_physeq_plot_by_sample_stats_09022019.R
│   ├── 7_README_step1_physeq_phylo_tree_plot.md
│   ├── 7_step1_physeq_phylo_tree_plot.R
│   └── 7_step1_physeq_phylo_tree_plot.sh
└── ucnrs
    ├── 1_mk_phyloseq_ucnrs_sites.R
    ├── 2.1_clean_ucnrs_records_format_typo.R
    ├── 2.2_clean_ucnrs_taxlevel_genus.R
    ├── 2.3_clean_ucnrs_taxlevel_fillin.R
    ├── 3.1_compare_ucnrs_subset_sites.R
    ├── 3.2_compare_ucnrs_intersect.R
    ├── 3.3_compare_ucnrs_overall.R
    ├── README_ucnrs.md
    ├── README_ucnrs.pdf
    └── x.1_match_loc_ucnrs_name.R

Usage:

Perform evaluation of stability by comparing the concordance of the five replicated sites
Generate summary statistics such as metadata distribution, read depth, read count per taxa, etc.
Perform cleaning of ucnrs records and comparison of ucnrs records with eDNA

Output:

Concordance analyses phyloseq objects and results
Summary statistics: metadata distribution, sequencing read distribution and taxonomic coverage distribution
UCNRS analyses: UCNRS cleaned species list and comparison results

Step3: alpha diversity

.
├── 1.2_step2_alpha_diversity_summary_20200113.R
├── 1_step2_alpha_diversity_calc_0530.R
├── 2.1_step2_alpha_diversity_kruskal_20200113.R
├── 2.2_step2_alpha_diversity_kruskal_eval_20200113.R
├── 3.1_step2_alpha_diversity_individual_lm_20200125.R
├── 3.2_step2_alpha_diversity_indi_lm_plot_FITS_20200504.R
├── 4.1_step2_alpha_diversity_reduce_pls_20200125.R
├── 5.1_step2_alpha_diversity_plot_fig2.R
├── 5.2_step2_alpha_diversity_plot_supp_fig7_loc.R
├── 5.3_step2_alpha_diversity_plot_supp_fig8.R
└── 6.1_step2_alpha_diversity_map.R

Usage:

Calculate observed and Shannon index for rarefied dataset.
Perform Kruskal Walis testing on the categorical variables.
Perform individual linear regression on the continuous variables.
Perfrom partial least square analyses on the continuous variables.

Output:

Alpha diversity values.
Test results.

Step4: beta diversity

.
├── 0_util_get_var_color.R
├── 0_util_query_var_color.R
├── 1_step4_beta_diversity_cal_diss_60152019.R
├── 2.1_step4_beta_diversity_lcbd_06242019.R
├── 3.1_step4_beta_diversity_adonis_0804.R
├── 3.2_step4_beta_diversity_out_adonis_0804.R
├── 4.1_step4_beta_diversity_pcoa_generate_plotobject.R
├── 5.1_step4_beta_diversity_pcoa_envfit.R
├── 5.2_step4_beta_diversity_pcoa_envfit_ordisurf_plot.R
├── 6.1_step4_beta_diversity_cap_generate_plotobject.R
├── 6.2_step4_beta_diversity_out_cap_0805.R
├── manuscript_plotting
│   ├── m_step4_beta_diversity_combined_lcbd_fig3.R
│   ├── m_step4_beta_diversity_combined_ordinations_fig4.R
│   ├── m_step4_beta_diversity_combined_ordinations_resub_fig2.R
│   ├── m_step4_beta_diversity_combined_ordinations_supp_figs.R
│   └── m_step4_beta_diversity_pcoa_envfit_ordisurf_plot_fig6.R
├── no_coast
│   ├── x.1_step3_beta_diversity_cal_diss_nocoast_60152019.R
│   ├── x.2_step3_beta_diversity_nocoast_adonis_0804.R
│   ├── x.3_step3_beta_diversity_nocoast_out_adonis_0804.R
│   ├── x.4_step4_beta_diversity_nocoast_pcoa_plot_0805.R
│   └── x.5_step3_beta_diversity_pcoa_combine_0805.R
└── zoom_in
    ├── z.1_step4_beta_diversity_adonis_minor_in_major_09302019.R
    └── z.2_step4_beta_diversity_pcoa_minor_in_major_09302019.R

Usage:

Calculate binary Jaccard dissimilarity
Generate relative abundance plots
Perform beta dispersion and PERMANOVA analysis
Perform PCoA ordination and plotting
Perform envfit on the PCoA ordination post hoc
Perform additional beta dispersion and PERMANOVA analysis on samples 1) excluded coastal sites 2) according to minor habitats within major habitats
Perform cap scale and varpart analyses to remove location effects

Output:

Jaccard dissimilarity measures
Analyses output

Step5: zeta diver

Author: Ariel Levi Simons. His github repository for this section

.
├── CALeDNAZetaFactors.R
├── LICENSE
├── PresenceAbsence.R
├── README.md
├── Zeta4ClassMap.R
├── Zeta4ClassMapCluster.R
├── Zeta4FamilyMap.R
├── Zeta4FamilyMapCluster.R
├── Zeta4eDNA.R
├── ZetaClassCluster.R
├── ZetaFamilyCluster.R
├── ZetaMap.R
├── ZetaeDNA.R
├── step5_glom_to_family.R
├── step5_subset_phyloseq.R
└── zeta.sh

Usage:

Calculate $\zeta_4$ diversity at family level
Perform variable importance testing

Step6: gradient forest modeling

.
├── 1_step6_gradient_forest_final
│   ├── 1_step6_gradient_forest_final_all_06060219.R
│   ├── 1_step6_gradient_forest_final_all_qsub_06060219.sh
│   ├── 1_step6_gradient_forest_final_coast_10182019.R
│   ├── 1_step6_gradient_forest_final_coast_qsub_10182019.sh
│   ├── 1_step6_gradient_forest_final_nocoast_10182019.R
│   ├── 1_step6_gradient_forest_final_nocoast_qsub_10182019.sh
│   ├── 2_step4_gradient_forest_final_run_stability_10152019.R
│   └── 2_step4_gradient_forest_final_run_stability_10152019.sh
├── 2_step6_gf_permutate
│   ├── 3.2_plot_permutation_result.R
│   ├── 3_step4_gradient_forest_permutate_06060219.R
│   ├── 3_step4_gradient_forest_permutate_qsub_06060219.sh
│   └── 3_step4_gradient_forest_permutate_qsub_wrapper_06060219.sh
├── 3_step6_gf_plot_object
│   ├── 4.1_step6_gradient_forest_summary_stability.R
│   ├── 4.1_step6_gradient_forest_summary_stability_output.R
│   ├── 4.2_plot_permutation_result.R
│   ├── 4.3_validate_reads_10172019.R
│   ├── 4.4_check_occurrences_predictability.R
│   ├── 4_step6_gradient_forest_plotting_for_publication.R
│   ├── 4_step6_gradient_forest_plotting_function_local.R
│   ├── 4_step6_gradient_forest_species_response_10162019.R
│   ├── other_save_gf_x_y_manuscript.R
│   └── species_response_legend_key.R
└── functions
    ├── change_density_plot.R
    ├── change_performance.plot.R
    ├── change_species_cumulative_plot.R
    └── function_gfprep.R

Usage:

Perform gradient forest data preparation
Perform gradient forest analyses using classification tree setting
Perform gradient forest validation
Plot gradient forest output

Output:

gradient forest input table
gradient forest result

Step7: gradient forest predictions

.
├── 0.1_prepare_latlong_raster.R
├── 0.2_prepare_raster.R
├── 0.2_prepare_raster_qsub.sh
├── 0.3_prepare_raster_splits.R
├── 0.3_prepare_raster_splits_qsub.sh
├── 0.4_get_random_points.sh
├── all_100m
│   ├── 1_step7_make_prediction_all_20201014.R
│   ├── 1_step7_make_prediction_extrap_qsub.sh
│   ├── 1_step7_make_prediction_noextrap_qsub.sh
│   ├── 2_step7_combine_prediction_20201014.R
│   ├── 2_step7_combine_prediction_extrap_qsub.sh
│   ├── 2_step7_combine_prediction_noextrap_qsub.sh
│   ├── 3_step7_prediction_pc_20201014.R
│   ├── 3_step7_prediction_pc_extrap_qsub.sh
│   ├── 3_step7_prediction_pc_noextrap_qsub.sh
│   ├── 4.1_step7_ref_scale_grid_20201014.R
│   ├── 4.1_step7_ref_scale_grid_qsub.sh
│   ├── 4.2_step7_ref_scale_grid_plot_20201014.R
│   ├── 4.2_step7_ref_scale_grid_plot_qsub.sh
│   ├── 4.3_step7_store_ref_scale_grid_20201014.R
│   ├── 4.3_step7_store_ref_scale_grid_qsub.sh
│   ├── 5_step7_store_prediction_20201014.R
│   ├── 5_step7_store_prediction_extrap_qsub.sh
│   ├── 5_step7_store_prediction_noextrap_qsub.sh
│   ├── record_all_100m_predictions.sh
│   ├── util_compare_previous.R
│   ├── util_compare_previous_qsub.sh
│   └── util_get_md5sum.sh
├── manuscript_plotting
│   ├── 1_fig9_nobio_prediction_pc_plots.R
│   ├── 1_fig9_noextrap_prediction_pc_plots.R
│   └── record_manuscript_plotting_20201025.sh
└── sites_rand_points
    ├── 1_step7_make_prediction_gf_sites.R
    ├── 2_step7_make_prediction_random_sites.R
    ├── 3_step7_ref_scale_sites_random_sites_20200421.R
    ├── 3_step7_ref_scale_sites_random_sites_20200421_qsub.sh
    ├── 4_evaluate_prediction.R
    └── 5_evaluate_prediction_uncertainty.R

Usage:

Perform predictions based on the gradient forest output

Output:

Stacked layers of 33 environmental variables in California at 100 m x 100 m resolution.
Community turnover map predicted from the gradient forest model.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

meixilin/caledna_transect

Folders and files

Latest commit

History

Repository files navigation

CALeDNA transect analyses

Scripts

Set up the environment

Step0: prepare data

Usage:

Output:

Step1: create phyloseq

Usage:

Output:

Step2: data description

Usage:

Output:

Step3: alpha diversity

Usage:

Output:

Step4: beta diversity

Usage:

Output:

Step5: zeta diver

Usage:

Step6: gradient forest modeling

Usage:

Output:

Step7: gradient forest predictions

Usage:

Output:

About

Resources

License

Stars

Watchers

Forks

Languages