Skip to content

BIONIC sample quality control and data upload manual

Patrick Deelen edited this page Sep 16, 2021 · 3 revisions

This document contains the SOP for performing sample quality control and data upload for the BIONIC project. The SOP assume you have access to a Linux terminal. Please note that if your team is unfamiliar with the software and steps involved in this SOP, one of our team members would happily visit your facility in person to assist in the process. To schedule a meeting day for this, please contact

Questions or suggestions?

If you have any questions regarding the steps below, please don’t hesitate to contact Floris Huider at, who coordinates the current phase of the BIONIC project. For general questions regarding the overarching project, you can contact Mariska Bot at

SOP Content

  1. Project background
  2. Obtaining an account for the upload server
  3. Phenotype data
  4. Genotype data software download
  5. Genotype data sample quality control
  6. Step-by-step data upload

Project Background

Cohorts that take part in the BBMRI-NL consortium have already agreed to the collaboration in data collection and data upload regarding the BIONIC project.

What is the BIONIC project?

In the BIObanks Netherlands Internet Collective (BIONIC) project, Dutch academic institutions and biobanks collaborate in the standardised and harmonised assessment of Major Depressive Disorder, with the aim of uncovering its genetic etiology. In the first phase of the project, our team developed and validated a rapid online DSM 5-based MDD assessment tool which would serve as BIONIC’s main tool for data collection (Bot et al., 2017). In the second phase, Fedko et al. (2020) used a subset of the newly acquired MDD data to introduce the cohorts, and demonstrate the large alignment between estimates of prevalence and heritability found in BIONIC and those of previous MDD efforts. Now, in the third phase, we plan to identify the specific genetic variants associated with MDD in the Dutch population. To this end, we will perform a genome-wide association meta-analysis on the MDD and genotype data from all participating cohorts. Additional projects involve symptom-specific genetic association analyses and networking analyses using the phenotype data.

Data Upload

We require both phenotype and genome-wide genotype data:

  1. The phenotype data should include all individuals with lifetime Major Depressive Disorder data (according to the LIDAS questionnaire or a DSM 5-based structured interview). a. This data should include the responses to all LIDAS questions, including but not limited to: responses to individual MDD symptom items, past treatment/diagnosis, and other sociodemographic, lifestyle, health, and mental health-related variables. For a complete list of the LIDAS items, please see the supplementary material of:, page 16-19. b. For lifetime MDD not measured by the LIDAS, please refer to the enclosed document “BIONIC Phenotype data non-LIDAS studies.docx” for the complete list of phenotype variables that we request.
  2. We request you to upload the genotype data of everyone with MDD data. The genotype data should be largely unedited, or ‘raw’, apart from having underwent sample QC for which the code will be provided below. Please note that the QC protocol described below should be performed on raw genotype data, even if a cleaned dataset is already available from previous QC efforts. This is to ensure that genotype data from each cohort is cleaned in an identical way. The log files containing the number of samples that failed each QC step should be uploaded together with the final sample-cleaned genotype data. In summary, we request: a. Sample QC’ed genotype files in PLINK format (.bim, .bed, .fam) of every individual with lifetime MDD data. b. Log files of the sample QC.

What will happen to the data and who has access? Both phenotype and genotype data will be uploaded to the UMCG GCC HPC cluster ‘Gearshift’ using a SFTP protocol, which will ensure secure file transfer from a cohort’s database to the Gearshift cluster. There, the data will be accessible only to a select number of researchers who will conduct the analyses of the mentioned projects. Hence, the data will remain in the Netherlands, and is securely protected from both outside forces as well as non-affiliated Gearshift cluster users. The receiving party UCMG has set up a DTA conforming to the contemporary GDPR.

Clone this wiki locally