Skip to content

lntran26/ConfuseNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ConfuseNN: Interpreting convolutional neural networks inferences in population genomics by data shuffling

Convolutional neural network (CNN) is an increasingly popular supervised machine learning approach that has been applied to many inference tasks in population genetics. Under this framework, population genomic variation data are typically represented as 2D images with sampled haplotypes as rows and segregating sites as columns. While many published studies reported promising performance of CNNs on various inference tasks, understanding which features in the data were picked up by the CNNs and meaningfully contributed to the reported performance remains challenging. Here we propose a novel approach to interpreting CNN performance motivated by population genetic theory on genomic data. Specifically, we designed a suite of scramble tests where each test deliberately disrupts a feature in the genomic image data (e.g. allele frequency, linkage disequilibrium, etc.) to assess how each feature affects the CNN performance. We apply these tests to three networks designed to infer demographic history and natural selection, identifying the fundamental population genomic features that drive inference for each network.

Early result reference

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published