# Getting Started

This notebook gives a simple introduction to VA-Spec Python, which is used to create 
and validate models defined in the Global Alliance for Genomics and Health (GA4GH) 
Genomic Knowledge Standards (GKS) Variant Annotation Specification 
[(VA-Spec)](https://github.com/ga4gh/va-spec).

There are three community guidelines that are currently supported:

1. Association for Molecular Pathology (AMP), American Society of Clinical Oncology 
(ASCO), and College of American Pathologists (CAP) 2017
1. American College of Medical Genetics and Genomics (ACMG) 2015
1. Clinical Genome Resource (ClinGen), Cancer Genomics Consortium (CGC),and Variant
Interpretation for Cancer Consortium (VICC) 2022

This notebook will show you examples on how to create Pydantic models for each of these
guidelines.

_Note: Pydantic offers a lot of extra functionality that won't be covered in this
notebook. We recommend checking out the 
[Pydantic documentation](https://docs.pydantic.dev/latest/)._

## Core Classes

VA-Spec Python has base models ([src/ga4gh/va_spec/base/core.py](../src/ga4gh/va_spec/base/core.py)) that correspond to VA-Spec core classes.

One example of a core class is `CohortAlleleFrequencyStudyResult`.

In order to use the model, you must import it:

In [1]:
from ga4gh.va_spec.base import CohortAlleleFrequencyStudyResult

The simplest and most common ways to create Pydantic model instances are by using
dictionary unpacking or directly passing keyword arguments (kwargs).

### Creating a Model Using Dictionary Unpacking

In [2]:
caf_dict = {
    "id": "gnomad4:1-10120-T-G",
    "type": "CohortAlleleFrequencyStudyResult",
    "name": "Example Cohort Allele Frequency for 1-10120-T-G",
    "sourceDataSet": {
        "id": "gnomad4.1.0",
        "type": "DataSet",
        "name": "gnomAD v4.1.0",
        "version": "4.1.0"
    },
    "focusAllele": "allele.json#/1-10120-T-G",  # Pretend this file exists somewhere
    "focusAlleleCount": 0,
    "locusAlleleCount": 34086,
    "focusAlleleFrequency": 0,
    "cohort": {
        "id": "ALL",
        "name": "Overall",
        "type": "StudyGroup"
    }
}
caf = CohortAlleleFrequencyStudyResult(**caf_dict)
caf

CohortAlleleFrequencyStudyResult(id='gnomad4:1-10120-T-G', type='CohortAlleleFrequencyStudyResult', name='Example Cohort Allele Frequency for 1-10120-T-G', description=None, aliases=None, extensions=None, specifiedBy=None, contributions=None, reportedIn=None, sourceDataSet=DataSet(id='gnomad4.1.0', type='DataSet', name='gnomAD v4.1.0', description=None, aliases=None, extensions=None, subtype=None, reportedIn=None, releaseDate=None, version='4.1.0', license=None), ancillaryResults=None, qualityMeasures=None, focusAllele=iriReference(root='allele.json#/1-10120-T-G'), focusAlleleCount=0, locusAlleleCount=34086, focusAlleleFrequency=0.0, cohort=StudyGroup(id='ALL', type='StudyGroup', name='Overall', description=None, aliases=None, extensions=None, memberCount=None, characteristics=None), subCohortFrequency=None)

### Creating a Model Using keyword arguments

For this example, we'll import other models that are used in `CohortAlleleFrequencyStudyResult`.

In [3]:
from ga4gh.va_spec.base import DataSet, StudyGroup

In [4]:
caf = CohortAlleleFrequencyStudyResult(
    id="gnomad4:1-10120-T-G",
    type="CohortAlleleFrequencyStudyResult",
    name="Example Cohort Allele Frequency for 1-10120-T-G",
    sourceDataSet=DataSet(
        id="gnomad4.1.0",
        type="DataSet",
        name="gnomAD v4.1.0",
        version="4.1.0"
    ),
    focusAllele= "allele.json#/1-10120-T-G",  # Pretend this file exists somewhere
    focusAlleleCount=0,
    locusAlleleCount=34086,
    focusAlleleFrequency=0,
    cohort=StudyGroup(
        id="ALL",
        name="Overall",
        type="StudyGroup"
    )
)
caf

CohortAlleleFrequencyStudyResult(id='gnomad4:1-10120-T-G', type='CohortAlleleFrequencyStudyResult', name='Example Cohort Allele Frequency for 1-10120-T-G', description=None, aliases=None, extensions=None, specifiedBy=None, contributions=None, reportedIn=None, sourceDataSet=DataSet(id='gnomad4.1.0', type='DataSet', name='gnomAD v4.1.0', description=None, aliases=None, extensions=None, subtype=None, reportedIn=None, releaseDate=None, version='4.1.0', license=None), ancillaryResults=None, qualityMeasures=None, focusAllele=iriReference(root='allele.json#/1-10120-T-G'), focusAlleleCount=0, locusAlleleCount=34086, focusAlleleFrequency=0.0, cohort=StudyGroup(id='ALL', type='StudyGroup', name='Overall', description=None, aliases=None, extensions=None, memberCount=None, characteristics=None), subCohortFrequency=None)

### Validating a Model

The `CohortAlleleFrequencyStudyResult` requires `cohort` and this should be a
`StudyGroup`. 

When we try to create a `CohortAlleleFrequencyStudyResult` object without `cohort`, 
it will raise a validation error.

In [5]:
from pydantic import ValidationError

try:
    CohortAlleleFrequencyStudyResult(
        id="gnomad4:1-10120-T-G",
        type="CohortAlleleFrequencyStudyResult",
        name="Example Cohort Allele Frequency for 1-10120-T-G",
        sourceDataSet=DataSet(
            id="gnomad4.1.0",
            type="DataSet",
            name="gnomAD v4.1.0",
            version="4.1.0"
        ),
        focusAllele= "allele.json#/1-10120-T-G",  # Pretend this file exists somewhere
        focusAlleleCount=0,
        locusAlleleCount=34086,
        focusAlleleFrequency=0
    )
except ValidationError as e:
    print(e)

1 validation error for CohortAlleleFrequencyStudyResult
cohort
  Field required [type=missing, input_value={'id': 'gnomad4:1-10120-T...ocusAlleleFrequency': 0}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.10/v/missing


When we try to create a `CohortAlleleFrequencyStudyResult` object with `cohort` that is
not a `StudyGroup`, it will raise a validation error.

In [6]:
try:
    CohortAlleleFrequencyStudyResult(
        id="gnomad4:1-10120-T-G",
        type="CohortAlleleFrequencyStudyResult",
        name="Example Cohort Allele Frequency for 1-10120-T-G",
        sourceDataSet=DataSet(
            id="gnomad4.1.0",
            type="DataSet",
            name="gnomAD v4.1.0",
            version="4.1.0"
        ),
        focusAllele= "allele.json#/1-10120-T-G",  # Pretend this file exists somewhere
        focusAlleleCount=0,
        locusAlleleCount=34086,
        focusAlleleFrequency=0,
        cohort=DataSet(  # NOT a `StudyGroup`
            id="ALL",
            name="Overall",
            type="StudyGroup"
        )
    )
except ValidationError as e:
    print(e)

1 validation error for DataSet
type
  Input should be 'DataSet' [type=literal_error, input_value='StudyGroup', input_type=str]
    For further information visit https://errors.pydantic.dev/2.10/v/literal_error


### Converting a Model to a Dictionary

Pydantic makes it easy to convert a model back to a dictionary.

In [7]:
caf.model_dump()

{'id': 'gnomad4:1-10120-T-G',
 'type': 'CohortAlleleFrequencyStudyResult',
 'name': 'Example Cohort Allele Frequency for 1-10120-T-G',
 'description': None,
 'aliases': None,
 'extensions': None,
 'specifiedBy': None,
 'contributions': None,
 'reportedIn': None,
 'sourceDataSet': {'id': 'gnomad4.1.0',
  'type': 'DataSet',
  'name': 'gnomAD v4.1.0',
  'description': None,
  'aliases': None,
  'extensions': None,
  'subtype': None,
  'reportedIn': None,
  'releaseDate': None,
  'version': '4.1.0',
  'license': None},
 'ancillaryResults': None,
 'qualityMeasures': None,
 'focusAllele': 'allele.json#/1-10120-T-G',
 'focusAlleleCount': 0,
 'locusAlleleCount': 34086,
 'focusAlleleFrequency': 0.0,
 'cohort': {'id': 'ALL',
  'type': 'StudyGroup',
  'name': 'Overall',
  'description': None,
  'aliases': None,
  'extensions': None,
  'memberCount': None,
  'characteristics': None},
 'subCohortFrequency': None}

Sometimes the output from a model can be large, as we see above. If you don't care about
null values, you can exclude them using `model_dump(exclude_none=True)`.

In [8]:
caf.model_dump(exclude_none=True)

{'id': 'gnomad4:1-10120-T-G',
 'type': 'CohortAlleleFrequencyStudyResult',
 'name': 'Example Cohort Allele Frequency for 1-10120-T-G',
 'sourceDataSet': {'id': 'gnomad4.1.0',
  'type': 'DataSet',
  'name': 'gnomAD v4.1.0',
  'version': '4.1.0'},
 'focusAllele': 'allele.json#/1-10120-T-G',
 'focusAlleleCount': 0,
 'locusAlleleCount': 34086,
 'focusAlleleFrequency': 0.0,
 'cohort': {'id': 'ALL', 'type': 'StudyGroup', 'name': 'Overall'}}

## AMP/ASCO/CAP 2017

## ACMG 2015

## CCV 2022