# Identify Y Chromosome Haplogroups Using Y-LineageTracker
- **Author(s)** - Frank Grenn
- **Date Started** - May  2021
- **Quick Description:** Code to run Y-LineageTrackers and quickly compare results of samples using difference reference genome data.

In [None]:
import pandas as pd

In [None]:
WRKDIR = ""
OUTDIR = "$PATH/output_ltracker"
VCFDIR = ""

download from [here](https://www.picb.ac.cn/PGG/resource.php) or [here](https://codeocean.com/capsule/7424381/tree/v2)  
note if you are running for the first time you may need to delete ```...``` in the python code to avoid ascii errors

## Run

on the hg19 vcf:

```python $PATH/Y-LineageTracker/LineageTracker/RunLineagerTracker.py classify --vcf $PATH/chrY_male_hemizygous_only_het_filter_hg19_final.vcf --build 37 --output $PATH/output_ltracker/ltrack_hg19```

on the hg38 vcf(with more variants):  

```python $PATH/RunLineagerTracker.py classify --vcf $PATH/chrY_male_hemizygous_only_het_filter_hg38.vcf --build 38 --output $PATH/output_ltracker/ltrack_hg38```

## Read Output:

In [None]:
hg19_out = pd.read_table(f"{OUTDIR}/ltrack_hg19.lineageresult.txt")
print(hg19_out.shape)
print(hg19_out.head())

In [None]:
hg19_out.loc[hg19_out.Haplogroup==hg19_out.KeyHaplogroup,].shape

In [None]:
hg38_out = pd.read_table(f"{OUTDIR}/ltrack_hg38.lineageresult.txt")
print(hg38_out.shape)
print(hg38_out.head())

In [None]:
hg38_out.loc[hg38_out.Haplogroup==hg38_out.KeyHaplogroup,].shape

In [None]:
#try merging to see how much is in common
merged = pd.merge(left = hg19_out, right = hg38_out, on = ['SampleID','Haplogroup'],how = 'inner')
print(merged.shape)

In [None]:
merged_samples = list(set(merged.SampleID))
print(len(merged_samples))

In [None]:
differing_samples = list(set(hg38_out.SampleID) ^ set(merged_samples))
print(differing_samples)

In [None]:
for s in differing_samples:
    print("")
    print(s)
    !grep {s} $PATH/output_ltracker/ltrack_hg38.lineageresult.txt
    !grep {s} $PATH/output_ltracker/ltrack_hg19.lineageresult.txt

In [None]:
hg19_out[hg19_out.KeyHaplogroup=="."]

In [None]:
#try a more strict merge with the KeyHaplogroup
mergedboth = pd.merge(left = hg19_out, right = hg38_out, on = ['SampleID','Haplogroup','KeyHaplogroup'],how = 'inner')
print(mergedboth.shape)

In [None]:
#try merging but with only KeyHaplogroup and SampleID
mergedkey = pd.merge(left = hg19_out, right = hg38_out, on = ['SampleID','KeyHaplogroup'],how = 'inner')
print(mergedkey.shape)

In [None]:
print(len(set(hg19_out.SampleID)))
print(len(set(hg38_out.SampleID)))

In [None]:
merged