Skip to content

Document GRM estimation using mean imputation. #1025

@timothymillar

Description

@timothymillar

It's common practice to use mean imputation for missing data when calculating the VanRaden GRM. We should add an example of this to the docstring. The following example avoids needing to call compute on intermediate variables:

import numpy as np
import xarray as xr
import sgkit as sg

ds = sg.simulate_genotype_call_dataset(n_variant=6, n_sample=5, missing_pct=0.1, seed=0)
sg.display_genotypes(ds)
ds = sg.call_allele_frequencies(ds)
# mean imputation
ds["variant_allele_frequency"] = ds.call_allele_frequency.mean(dim="samples", skipna=True)
ds["call_allele_frequency_imputed"] = xr.where(
    ~np.isnan(ds.call_allele_frequency),  # where call allele frequency is not missing
    ds.call_allele_frequency,  # use the call allele frequency
    ds.variant_allele_frequency,  # else use the variant mean allele frequency
)
ds["call_dosage"] = ds["call_allele_frequency_imputed"][:,:,1] * 2  # multiply by ploidy
ds["ancestral_frequency"] = ds["variant_allele_frequency"][:,1]  # use mean frequency as ancestral frequency
ds = sg.genomic_relationship(ds, ancestral_frequency='ancestral_frequency')
ds.stat_genomic_relationship.values 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions