# Opioid plexus data for association analysis

We need to several tasks for the HiC project. Ideally we would have this done during January – as soon as different opioid GWAS get run.

The attached file contains information on 8 high priority genes for look up in opioid GWAS. In particular the second table in each file gives us a set of regions (positionally defined in hg19) for which we need to:

1.	Identify variants
2.	Look up those variants for association with opioid addiction phenotypes.
3.	The first dataset for look up would be MVP (EA, AA, and meta) – probably best to do this with the largest results set (including SAGE and YP)
4.	The second data set is the PGC cases versus unexposed controls that Ravi is using in the gSEM
5.	The third results set will be the NGC when it is complete
6.	The fourth will the gSEM results (EA, then AA, meta).

It would be great if one of you could identify the variant sets from these regions and either do the look up or pass it to Ravi or Nathan who have the result files. We want to assign a variable to each of the snps that indicates whether they are from a loss VEL or gain VEL region. This is indicated by the asterisks. Regions with asterisks are loss VELs. 


# Identify variants

In [None]:
mkdir -p /shared/rti-hic/opioid_plexus_data_for_association_analysis/identify_variants/grch37_variants 
cd /shared/rti-hic/opioid_plexus_data_for_association_analysis/identify_variants/grch37_variants

# download chromosomes containing gene regions of interest
aws s3 cp s3://rti-common/variants/b153/GRCh37.p13/variants_chr5.tsv.gz .
aws s3 cp s3://rti-common/variants/b153/GRCh37.p13/variants_chr7.tsv.gz .
aws s3 cp s3://rti-common/variants/b153/GRCh37.p13/variants_chr7.tsv.gz .
aws s3 cp s3://rti-common/variants/b153/GRCh37.p13/variants_chr8.tsv.gz .
aws s3 cp s3://rti-common/variants/b153/GRCh37.p13/variants_chr9.tsv.gz .
aws s3 cp s3://rti-common/variants/b153/GRCh37.p13/variants_chr10.tsv.gz .
aws s3 cp s3://rti-common/variants/b153/GRCh37.p13/variants_chr11.tsv.gz .
    
# upload file containing 8 high priority genes from local

In [None]:
cd /shared/rti-hic/opioid_plexus_data_for_association_analysis/identify_variants/opioid_plexus_data/

unzip plx_opioid.zip



In [None]:
### python3

"""
Identify variants within a set of GRCh37 positional ranges. 
More specifically, we import a list of positional ranges
and we then identify the variants witin these ranges and also indicate if
they are a Loss or a Gain VEL, as indicated in the imported list
along with the positional ranges.
"""

import gzip

def main():

    gene_list = ["ADCY2", "ANKS6", "ASTN2", "DUSP4", "GABBR2", "KCNC1", "KCNMA1", "WBSCR17"]

    for gene in gene_list:
        infile = "/shared/rti-hic/opioid_plexus_data_for_association_analysis/identify_variants/opioid_plexus_data/plx_{}.txt".format(gene)
        outfile = "/shared/rti-hic/opioid_plexus_data_for_association_analysis/identify_variants/results/plx_{}_variants_identified.txt".format(gene)

        vel_dict, chrom = create_vel_dict(infile)

        varfile = "/shared/rti-hic/opioid_plexus_data_for_association_analysis/identify_variants/grch37_variants/variants_{}.tsv.gz".format(chrom)
        parse_variant_file(varfile,outfile, vel_dict)



# Populate a dictionary that holds the range for the VEL 
# and if it is a Loss or a Gain {lower_bound: [upperbound, loss]}
def create_vel_dict(infile):

    with open(infile) as inF:
        line = inF.readline()

        while line[0:3] != "chr":
            line = inF.readline()

        chrom = line.split()[0]
        range_dict = {}

        while line[0:3] == "chr":
            vel_status = "Gain"
            line = line.split()
            lower_bound = line[1]
            upper_bound = line[2]
            vel = line[3]

            if "*" in vel:
                vel_status = "Loss"

            range_dict[lower_bound] = [upper_bound, vel_status]
            line = inF.readline()

    return range_dict, chrom



# Helper function that returns the Gain or Loss status of a variant if it is 
# within the range of interest, else return None
def within_range(test_dic, number):
    for item in test_dic.items():
        smallest = int(item[0])
        largest = int(item[1][0])
        in_range = smallest <= int(number) <= largest 
        if in_range:
            return item[1][1]
            break

# Parse each line of the chromosome-specific variant file to determine
# which variants are within the positional ranges of interest.
# Print out these lines to file along with the VEL Gain or Loss status.
def parse_variant_file(varfile, outfile, vel_dict):
    with gzip.open(varfile, 'rt') as varF, open(outfile, 'w') as outF:
        head = varF.readline().strip()
        head += "\tVEL\n" 
        outF.write(head)
        line = varF.readline()
        while line:
            sl = line.split()
            pos = sl[2]
            vel = within_range(vel_dict, pos)
            if vel != None:
                sl.append(vel)
                outline = "\t".join(sl) + "\n"
                outF.write(outline)
            line = varF.readline()


####################################################################################################
if __name__ == "__main__":
    main()
