In partial-connected neural networks, SNPs can be divided into groups by users, and each group connects to its own intermediate trait in the middle layer. Genotype groups should be loaded seperatly.
- nonlinear function (to define relationship between middle layer and phenotye): tanh
- number of nodes in the middle layer: 3
- Bayesian model (to sample marker effects on intemediate traits):
- genotype group 1: single-trait BayesA
- genotype group 2: single-trait BayesB
- genotype group 3: single-trait BayesC
- sample the unobserved intemediate traits in the middle layer: Hamiltonian Monte Carlo
# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random
Random.seed!(123)
# Step 2: Read data
phenofile = dataset("phenotypes.csv")
genofile1 = dataset("genotypes_group1.csv") #path of genotype group1
genofile2 = dataset("genotypes_group2.csv") #path of genotype group2
genofile3 = dataset("genotypes_group3.csv") #path of genotype group3
phenotypes = CSV.read(phenofile,DataFrame,delim = ',',header=true,missingstrings=["NA"])
geno1 = get_genotypes(genofile1,separator=',',method="BayesA"); #read genotype group1
geno2 = get_genotypes(genofile2,separator=',',method="BayesB"); #read genotype group2
geno3 = get_genotypes(genofile3,separator=',',method="BayesC"); #read genotype group3
# Step 3: Build Model Equations
model_equation = "y1 = intercept + geno1 + geno2 + geno3"; #middle node1=intercept + geno1
#middle node2=intercept + geno2
#middle node3=intercept + geno3
model = build_model(model_equation,nonlinear_function="tanh")
# Step 4: Run Analysis
out = runMCMC(model, phenotypes, chain_length=5000);
# Step 5: Check Accuruacy
results = innerjoin(out["EBV_NonLinear"], phenotypes, on = :ID)
accuruacy = cor(results[!,:EBV],results[!,:bv1])
- Same as for fully-connected neural network with intermediate omics data, the names of omics features should be put in the
build_model()
function through thelatent_traits
argument. The order of omics and the order of genotype groups in the model equation should be consistant.
In below example, we assume genotype group i only affect omics i (here i =1,...,5).
# Step 1: Load packages
using JWAS,DataFrames,CSV,Statistics,JWAS.Datasets,Random,HTTP
Random.seed!(1)
# Step 2: Read data
phenofile = HTTP.get("https://raw.githubusercontent.com/zhaotianjing/nnmm_doc/main/data_simulation/y.csv").body
omicsfile = HTTP.get("https://raw.githubusercontent.com/zhaotianjing/nnmm_doc/main/data_simulation/omics.csv").body
genofile1 = HTTP.get("https://raw.githubusercontent.com/zhaotianjing/nnmm_doc/main/data_simulation/geno_group1.csv").body
genofile2 = HTTP.get("https://raw.githubusercontent.com/zhaotianjing/nnmm_doc/main/data_simulation/geno_group2.csv").body
genofile3 = HTTP.get("https://raw.githubusercontent.com/zhaotianjing/nnmm_doc/main/data_simulation/geno_group3.csv").body
genofile4 = HTTP.get("https://raw.githubusercontent.com/zhaotianjing/nnmm_doc/main/data_simulation/geno_group4.csv").body
genofile5 = HTTP.get("https://raw.githubusercontent.com/zhaotianjing/nnmm_doc/main/data_simulation/geno_group5.csv").body
phenotypes = CSV.read(phenofile,DataFrame)
omics = CSV.read(omicsfile,DataFrame)[:,1:6] #only use first 5 omics for demonstration
omics_names = names(omics)[2:end] #get names of omics
insertcols!(omics,2,:y => phenotypes[:,:y], :bv => phenotypes[:,:bv]) #phenotype and omics should be in the same dataframe
geno1_df = CSV.read(genofile1,DataFrame)
geno2_df = CSV.read(genofile2,DataFrame)
geno3_df = CSV.read(genofile3,DataFrame)
geno4_df = CSV.read(genofile4,DataFrame)
geno5_df = CSV.read(genofile5,DataFrame)
geno1 = get_genotypes(geno1_df,separator=',',method="BayesA");
geno2 = get_genotypes(geno2_df,separator=',',method="BayesB");
geno3 = get_genotypes(geno3_df,separator=',',method="BayesC");
geno4 = get_genotypes(geno4_df,separator=',',method="RR-BLUP");
geno5 = get_genotypes(geno5_df,separator=',',method="BayesL");
# Step 3: Build Model Equations
model_equation = "y = intercept + geno1 + geno2 + geno3 + geno4 + geno5"; #omics1=intercept+geno1; omics2=intercept+geno2; ...
model = build_model(model_equation,
num_hidden_nodes=5,
nonlinear_function="sigmoid",
latent_traits=omics_names)
# Step 4: Run Analysis
out = runMCMC(model, omics, chain_length=5000, printout_model_info=false);
# Step 5: Check Accuruacy
results = innerjoin(out["EBV_NonLinear"], omics, on = :ID)
accuruacy = cor(results[!,:EBV],results[!,:bv])