# Genomic diagnosis of rare disease
***

## **Step 1.** We start this session with selecting one individual from CLM population from 1000GP. For our case example of rare disease, we will be focusing on chromosome X.

### 1. Our VCF file:
+ `ls data/ALL.chrX*`

```
data/ALL.chrX.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf
```

### 2. We will use plink's _--keep_ flag to extract one CLM individual. (1000GP code = HG01459). First, we create a simple text file: 
+ `cat data/randomCLMIndividual.txt`

```
HG01459 HG01459
```

 
### 3. Using plink and --keep
+ `plink --vcf data/ALL.chrX.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf --keep data/randomCLMIndividual.txt --make-bed --out data/randomCLMIndividual.chrX`

```
PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to data/randomCLMIndividual.chrX.log.
Options in effect:
  --keep data/randomCLMIndividual.txt
  --make-bed
  --out data/randomCLMIndividual.chrX
  --vcf data/ALL.chrX.shapeit2_integrated_snvindels_v2a_27022019.GRCh38.phased.vcf

511706 MB RAM detected; reserving 255853 MB for main workspace.
--vcf: data/randomCLMIndividual.chrX-temporary.bed +
data/randomCLMIndividual.chrX-temporary.bim +
data/randomCLMIndividual.chrX-temporary.fam written.
106963 variants loaded from .bim file.
2548 people (0 males, 0 females, 2548 ambiguous) loaded from .fam.
Ambiguous sex IDs written to data/randomCLMIndividual.chrX.nosex .
--keep: 1 person remaining.
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies...  done.
106963 variants and 1 person pass filters and QC.
Note: No phenotypes present.
--make-bed to data/randomCLMIndividual.chrX.bed +
data/randomCLMIndividual.chrX.bim + data/randomCLMIndividual.chrX.fam ... .
```

### 4. We have BED file (and other related files) for the selected CLM individual. 

+ `ls data/randomCLMIndividual*`

```
data/randomCLMIndividual.chrX.bed  data/randomCLMIndividual.chrX.log
data/randomCLMIndividual.chrX.bim  data/randomCLMIndividual.chrX.nosex
data/randomCLMIndividual.chrX.fam  data/randomCLMIndividual.txt
```

+ `cat data/randomCLMIndividual.chrX.fam`

```
HG01459 HG01459 0 0 0 -9
```

### 5. Creating a VCF for our selected CLM random individual.

+ `plink --bfile data/randomCLMIndividual.chrX --recode vcf --out data/randomCLMIndividual.chrX`

```
PLINK v1.90b6.21 64-bit (19 Oct 2020)          www.cog-genomics.org/plink/1.9/
(C) 2005-2020 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to data/randomCLMIndividual.chrX.log.
Options in effect:
  --bfile data/randomCLMIndividual.chrX
  --out data/randomCLMIndividual.chrX
  --recode vcf

511706 MB RAM detected; reserving 255853 MB for main workspace.
106963 variants loaded from .bim file.
1 person (0 males, 0 females, 1 ambiguous) loaded from .fam.
Ambiguous sex ID written to data/randomCLMIndividual.chrX.nosex .
Using 1 thread (no multithreaded calculations invoked).
Before main variant filters, 1 founder and 0 nonfounders present.
Calculating allele frequencies...  done.
106963 variants and 1 person pass filters and QC.
Note: No phenotypes present.
--recode vcf to data/randomCLMIndividual.chrX.vcf ...

```
***
___

## **Step 2.** We will now engineer a variant inside the VCF file manually.

### 1. Let's take a quick look at our VCF file.

+ `head data/randomCLMIndividual.chrX.vcf -n 97890 | tail -n15`

```
23	2781309	.	A	T	.	.	PR	GT	0/0
23	2781317	.	T	G	.	.	PR	GT	0/0
23	2781319	.	T	C	.	.	PR	GT	0/0
23	2781409	.	A	C	.	.	PR	GT	0/0
23	2781423	.	G	A	.	.	PR	GT	0/0
23	2781454	.	CTTAG	C	.	.	PR	GT	0/0
23	2781457	.	A	G	.	.	PR	GT	0/0
23	155703812	.	G	A	.	.	PR	GT	0/0
23	155703847	.	C	A	.	.	PR	GT	0/0
23	155703850	.	A	C	.	.	PR	GT	0/0
23	155703853	.	T	G	.	.	PR	GT	0/0
23	155703950	.	A	T	.	.	PR	GT	0/0
23	155703951	.	G	C	.	.	PR	GT	0/0
23	155703976	.	C	T	.	.	PR	GT	0/0
23	155704002	.	C	T	.	.	PR	GT	0/0
```

### 2. Let's insert the following entry into the VCF.

> **```
23  154863125  .  G  A  .  .  PR  GT  1/1
```**

### 3. Let's look at the engineered file.

+ `head data/randomCLMIndividual.chrX.vcf -n 97890 | tail -n15`

```
23	2781309	.	A	T	.	.	PR	GT	0/0
23	2781317	.	T	G	.	.	PR	GT	0/0
23	2781319	.	T	C	.	.	PR	GT	0/0
23	2781409	.	A	C	.	.	PR	GT	0/0
23	2781423	.	G	A	.	.	PR	GT	0/0
23	2781454	.	CTTAG	C	.	.	PR	GT	0/0
23	2781457	.	A	G	.	.	PR	GT	0/0
```
> `23	154863125	.	G	A	.	.	PR	GT	1/1`
```
23	155703812	.	G	A	.	.	PR	GT	0/0
23	155703847	.	C	A	.	.	PR	GT	0/0
23	155703850	.	A	C	.	.	PR	GT	0/0
23	155703853	.	T	G	.	.	PR	GT	0/0
23	155703950	.	A	T	.	.	PR	GT	0/0
23	155703951	.	G	C	.	.	PR	GT	0/0
23	155703976	.	C	T	.	.	PR	GT	0/0
```
***
___

## **Step 3.** We will now process this file through VEP (Ensembl Variant Effect Predictor) tool.