The Kids First Program supports projects studying the underlying genetic basis of pediatric cancers and structural birth defects. Clinical data from the project investigators and genetic data from sequencing center partners are harmonized together by the Kids First DRC to facilitate use by the research community.
The harmonized clinical and genomic datasets are available to access through the Kids First Data Resource Portal. Once users log in, they can explore the available datasets using the Studies and Explore Data tools. Information about studies is also available outside the Portal in the Kids First Studies and Access Page in the Kids First DRC Help Center.
Once users identify studies for their research project, they have immediate access to open-source files and can request access to the controlled-access files within the datasets through NIH's database of Genotypes and Phenotypes (dbGaP).
Users can log in to the Kids First Data Portal using either...
- The NIH Research Authentication Server to log in with an eRA Commons ID
- An ORCID Login
- Any Google-autneticated email address, such as a Gmail account
Once logged in, the Portal's Explore Data Tool allows users to build a virtual cohort of participants based on clinical data fields, such as diagnosis, phenotype, or demographic information. By using the menus to apply filters, researchers can identify specific participants across all Kids First studies for their unique research project.
A demonstration of how to use the Explore Data tool from a recent workshop is available here.
Users can also review data on the Portal at a broad level using its Studies tab. This tab allows users to see an overview of studies on the Portal, including the number of participants and the number of data files that are available separated by file type. Links out to dbGaP are also available for each study.
A demonstration of how to use the Studies tab from a recent workshop is available here.
The Studies and Access page within the Kids First DRC Help Center is another way to familiarize yourself with available datasets outside of the Portal. You can navigate to the page directly from the web (linked here) or navigate to it through the Portal iteself.
The Studies and Access page houses more information about the individual studies, including links to the original author abstracts and dbGaP study pages.
While users can browse all available files in the Kids First Portal, they may have to apply to dbGaP to access some datasets of interest. Files generated by the Kids First DRC are organized into two broad categories. Registration-access files are available for immediate access and analysis by any user who creates an account on the Kids First Portal. Controlled-access files require dbGaP approval before access is granted.
Both levels of access require users to accept the Kids First DRC Disclaimers, Terms & Conditions, and Privacy Policy, as they agreed to follow upon creating their Kids First Portal account.
Requesting dbGaP access for controlled-access files requires a proposal submission. dbGaP has provided their own documentation with tips for preparing a successful request, as well as their own tutorial video to support users in the submission process.
| Registration-Access Files | Controlled-Access Files | |
|---|---|---|
| Alignment and GATK Haplotype Caller | -- | Aligned Reads; Germline Variants in gVCF Format |
| Joint-Genotyping Workflow | -- | Trio-Based Joint-Called Germline Variants |
| Somatic Workflow | Annotated SNVs with Predicted Germline Variants Removed; Copy Number Variants; Structural Variants |
Annotated SNVs with Predicted Germline Variants Flagged |
| RNA-Seq Workflow | Quantified Gene Expression; Called Gene Fusions |
Aligned Reads; Unaligned Reads |
dbGaP - Tips for Preparing a Successful Data Access Request
This submission can also help prepare users for applying for computational cloud credits to analyze these data on the Kids First DRC's cloud-based analysis platform CAVATICA. The platform allows users to run analyses with either existing applications or by developing one's own workflows and performing interactive analyses with Jupyter notebooks, achieved through the application Data Cruncher.



