This repository includes QC pipelines for genotype data used in both the BiU-Net model and the GenoBERT model, covering the 1000 Genomes Project (1KGP), Louisiana Osteoporosis Study (LOS), and Simons Genome Diversity Project (SGDP) datasets.
The HLA dataset refers to the HLA region on chromosome 6 from the SGDP dataset.
For public datasets (1KGP/SGDP), we provide links to the original publicly available data sources. Due to data usage restrictions, access to the in-house LOS dataset can be granted upon request and approval.