-
Notifications
You must be signed in to change notification settings - Fork 37
Closed
Description
It's common practice to use mean imputation for missing data when calculating the VanRaden GRM. We should add an example of this to the docstring. The following example avoids needing to call compute on intermediate variables:
import numpy as np
import xarray as xr
import sgkit as sg
ds = sg.simulate_genotype_call_dataset(n_variant=6, n_sample=5, missing_pct=0.1, seed=0)
sg.display_genotypes(ds)
ds = sg.call_allele_frequencies(ds)
# mean imputation
ds["variant_allele_frequency"] = ds.call_allele_frequency.mean(dim="samples", skipna=True)
ds["call_allele_frequency_imputed"] = xr.where(
~np.isnan(ds.call_allele_frequency), # where call allele frequency is not missing
ds.call_allele_frequency, # use the call allele frequency
ds.variant_allele_frequency, # else use the variant mean allele frequency
)
ds["call_dosage"] = ds["call_allele_frequency_imputed"][:,:,1] * 2 # multiply by ploidy
ds["ancestral_frequency"] = ds["variant_allele_frequency"][:,1] # use mean frequency as ancestral frequency
ds = sg.genomic_relationship(ds, ancestral_frequency='ancestral_frequency')
ds.stat_genomic_relationship.values Metadata
Metadata
Assignees
Labels
No labels