# CIViC Design Studio

#### Purpose 
Create SECOND Generation Output File For CIViC DesignStudio

#### Use
CIViC_DesignStudio will pull existing variants from the CIViC Knowledgebase, iterate through all variants, and create an output that can be used to create IDT probes

#### Inputs
1) variants of interest = from the CIViC Probe DesignStudio Interface, any variants that are selected will be evaluated for probe design.

#### Outputs
1) CIViC_DesignStudio_coordinates.tsv = for each variant selected, we will provide the chromosome, start, stop, gene, variant and pipeline


In [148]:
#!/usr/bin/env ruby

require "rubygems"
require "json"
require "net/http"
require "uri"
require 'yaml'


false

# Evaluate DNA-based Variants

In [149]:
#pull in CIViC API for DNA-based variants 
url_DNA = 'https://civicdb.org/api/panels/DNA-based/qualifying_variants?minimum_score=200'
resp_DNA = Net::HTTP.get_response(URI.parse(url_DNA))
variants_DNA = JSON.parse(resp_DNA.body)['records']

[{"id"=>306, "entrez_name"=>"ERBB2", "entrez_id"=>2064, "name"=>"AMPLIFICATION", "description"=>"Her2 (ERBB2) amplifications are seen in up to 20% of breast cancers and were associated with aggressive disease and poor prognosis when discovered. Trastuzumab first found considerable success, and was approved for treatment, in HER2 positive metastatic breast cancer (MBC) which had progressed under chemotherapy.  These metastatic cancers nonetheless progressed under trastuzumab, and other forms of therapy were studied for next line treatments, including tyrosine kinase inhibitors.  Lapatinib, a reversible inhibitor of tyrosine kinase activity in ErbB1 and ErbB2, and afatinib, an irreversible inhibitor of all 4 ErbB forms were shown to have activity in trastuzumab-progressed HER2 MBC.  Interestingly trastuzumab itself also turned out to remain effective in treatment of trastuzumab-progressed HER2 MBC.  The LUX-Breast 1 trial compared a TKI-based treatment (afatinib) to a trastuzumab-based t

In [158]:
def get_exons(gene_id, chromosome)
  server='http://grch37.rest.ensembl.org'
  path = "/overlap/id/#{gene_id}?feature=exon;expand=1;utr=0"
  print path
  url = URI.parse(server)
  http = Net::HTTP.new(url.host, url.port)
  request = Net::HTTP::Get.new(path, {'Content-Type' => 'application/json'})

  response = http.request(request)
  result = JSON.parse(response.body)
  exons = []
  for item in result
    stop = item["end"]
    start = item["start"]
    exons << [chromosome, start, stop]
  end
  
  ###NEED TO SORT EXONS AND MERGE
  
  return exons
end

:get_exons

In [159]:
#set list for DNA-based capture
capture_DNA = []

## For variants listed in the DNA-based API, create bed-like files for capture design
pipeline = 'DNA-based' #set pipeline
blank = ' ' #set blanks

#iterate through evidence items
for item in variants_DNA
  exons1 = []
  exons2 = []
  flag = false
  
  #Get Gene Information from JSON
  gene = item['entrez_name']  #Call Gene name
  variant = item['name'] #call variant
  puts variant
  chrom = item['coordinates']['chromosome'].to_i #call chrom
  start = item['coordinates']['start'].to_i #call start
  stop = item['coordinates']['stop'].to_i
  gene_ENST = item['coordinates']["representative_transcript"]
  
  #If there are 2 sets of coordiantes pull them
  if item['coordinates']['chromosome2'] and item['coordinates']['start2'] and item['coordinates']['stop2'] #determine if there is a second set of coordinates
    gene_ENST2 = item['coordinates']["representative_transcript2"]
    chrom2 = item['coordinates']['chromosome2'].to_i  # call chrom2
    start2 = item['coordinates']['start2'].to_i  # call start2
    stop2 = item['coordinates']['stop2'].to_i  # call stop2
    flag = true #set flag for coordinates; true = second set of coordinates available
  end

  #If there is a coordinates flag pull both genes and get the exons
  if flag
    flag2 = true
    exons1 = get_exons(gene_ENST, chrom)
    exons2 = get_exons(gene_ENST2, chrom2)
    probe_type = 'exon_coding'

  #Evaluate for SNVs/Indels
  elsif (start - stop).abs < 25
    probe_type = 'SNV/indel'
    region_length = (start - stop).abs 

  #Evaluate for small variants
  elsif (start - stop).abs  < 1000
    flag2 = true
    probe_type = 'small_variant'
    region_length = (start - stop).abs

  #Evaluate for Exon Coding
  elsif (start - stop).abs  > 1000
    exons1 = get_exons(gene_ENST, chrom)
    probe_type = 'exon_coding'
  end

  #Print to lists
  if exons1.any? #if there is two sets of coordinates
    for item in exons1
      start = item[1].to_i
      stop = item[2].to_i
      region_length = (start - stop).abs
      chrom = item[0]
      capture_DNA << [gene, variant, probe_type, pipeline, region_length, chrom, start, stop] #append to list
    end
  if exons2.any?
      start = item[1].to_i
      stop = item[2].to_i
      region_length = (start - stop).abs
      chrom = item[0]
      capture_DNA << [gene, variant, probe_type, pipeline, region_length, chrom, start, stop] #append to list
    end
  end

  if exons1.empty? and exons2.empty? #if there is only one set of coordinates
    capture_DNA << [gene, variant, probe_type, pipeline, region_length, chrom, start, stop] #append to list
  end
end 

for item in capture_DNA
  puts item
end

AMPLIFICATION
/overlap/id/ENST00000269571.5?feature=exon;expand=1;utr=0LOSS
/overlap/id/ENST00000371953.3?feature=exon;expand=1;utr=0ITD
MUTATION
/overlap/id/ENST00000269305.4?feature=exon;expand=1;utr=0MUTATION
/overlap/id/ENST00000256078.4?feature=exon;expand=1;utr=0V600E
R882
L858R
T790M
EXON 12 MUTATION
EXON 19 DELETION
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 151, 17, 37863243, 37863394]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 213, 17, 37864574, 37864787]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 134, 17, 37865571, 37865705]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 68, 17, 37866066, 37866134]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 115, 17, 37866339, 37866454]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 141, 17, 37866593, 37866734]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 119, 17, 37868181, 37868300]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 126, 17, 37868575, 37868701]
["E

["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 126, 17, 37868575, 37868701]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 73, 17, 37871539, 37871612]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 90, 17, 37871699, 37871789]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 199, 17, 37871993, 37872192]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 132, 17, 37872554, 37872686]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 90, 17, 37872768, 37872858]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 160, 17, 37873573, 37873733]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 47, 17, 37876040, 37876087]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 138, 17, 37879572, 37879710]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 122, 17, 37879791, 37879913]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 98, 17, 37880165, 37880263]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 185, 17, 37880979, 3788116

["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 90, 17, 37871699, 37871789]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 199, 17, 37871993, 37872192]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 132, 17, 37872554, 37872686]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 90, 17, 37872768, 37872858]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 160, 17, 37873573, 37873733]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 47, 17, 37876040, 37876087]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 138, 17, 37879572, 37879710]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 122, 17, 37879791, 37879913]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 98, 17, 37880165, 37880263]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 185, 17, 37880979, 37881164]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 155, 17, 37881302, 37881457]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 75, 17, 37881580, 3788165

["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 116, 17, 37869406, 37869522]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 6, 17, 37871539, 37871545]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 54, 17, 37872804, 37872858]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 160, 17, 37873573, 37873733]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 47, 17, 37876040, 37876087]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 138, 17, 37879572, 37879710]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 86, 17, 37879791, 37879877]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 38, 17, 37880165, 37880203]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 506, 17, 37873227, 37873733]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 47, 17, 37876040, 37876087]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 138, 17, 37879572, 37879710]
["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 122, 17, 37879791, 37879913]

["TP53", "MUTATION", "exon_coding", "DNA-based", 106, 17, 7573927, 7574033]
["TP53", "MUTATION", "exon_coding", "DNA-based", 1286, 17, 7571722, 7573008]
["TP53", "MUTATION", "exon_coding", "DNA-based", 104, 17, 7590695, 7590799]
["TP53", "MUTATION", "exon_coding", "DNA-based", 101, 17, 7579839, 7579940]
["TP53", "MUTATION", "exon_coding", "DNA-based", 21, 17, 7579700, 7579721]
["TP53", "MUTATION", "exon_coding", "DNA-based", 278, 17, 7579312, 7579590]
["TP53", "MUTATION", "exon_coding", "DNA-based", 183, 17, 7578371, 7578554]
["TP53", "MUTATION", "exon_coding", "DNA-based", 112, 17, 7578177, 7578289]
["TP53", "MUTATION", "exon_coding", "DNA-based", 109, 17, 7577499, 7577608]
["TP53", "MUTATION", "exon_coding", "DNA-based", 136, 17, 7577019, 7577155]
["TP53", "MUTATION", "exon_coding", "DNA-based", 73, 17, 7576853, 7576926]
["TP53", "MUTATION", "exon_coding", "DNA-based", 132, 17, 7576525, 7576657]
["TP53", "MUTATION", "exon_coding", "DNA-based", 106, 17, 7573927, 7574033]
["TP53", "MUT

[["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 151, 17, 37863243, 37863394], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 213, 17, 37864574, 37864787], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 134, 17, 37865571, 37865705], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 68, 17, 37866066, 37866134], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 115, 17, 37866339, 37866454], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 141, 17, 37866593, 37866734], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 119, 17, 37868181, 37868300], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 126, 17, 37868575, 37868701], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 73, 17, 37871539, 37871612], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 90, 17, 37871699, 37871789], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 199, 17, 37871993, 37872192], ["ERBB2", "AMPLIFICATION", "exon_coding", "DNA-based", 132, 17, 378

In [34]:
#pull in CIViC API for RNA-based variants 
url_RNA = 'https://civicdb.org/api/panels/RNA-based/qualifying_variants?minimum_score=100'
resp_RNA = Net::HTTP.get_response(URI.parse(url_RNA))
variants_RNA = JSON.parse(resp_RNA.body)['records']

[{"id"=>20, "entrez_name"=>"CCND1", "entrez_id"=>595, "name"=>"OVEREXPRESSION", "description"=>"Cyclin D has been shown in many cancer types to be misregulated. Well established for their oncogenic properties, the cyclins and the cyclin-dependent kinases (CDK's) they activate have been the focus of major research and development efforts over the past decade. The methods by which the cyclins are misregulated are widely variable, and range from genomic amplification to promoter methylation changes. While Cyclin D2 has only been found to be significantly misregulated in glioma, Cyclin D1 in particular seems to be a pan-cancer actor. Cyclin D misregulation has been shown to lead to poorer outcomes in a number of studies, and currently there are no FDA-approved targeted therapies.", "gene_id"=>8, "type"=>"variant", "variant_types"=>[{"id"=>183, "name"=>"N/A", "display_name"=>"N/A", "so_id"=>"N/A", "description"=>"No suitable Sequence Ontology term exists.", "url"=>"http://www.sequenceontolo

# Evaluate RNA-based Variants

[]

In [143]:
#set list for RNA-based capture
capture_RNA = []

## For variants listed in the DNA-based API, create bed-like files for capture design
pipeline = 'RNA-based' #set pipeline
blank = ' ' #set blanks

#iterate through evidence items
for item in variants_RNA
  exons1 = []
  exons2 = []
  
  #Get Gene Information from JSON
  gene = item['entrez_name']  #Call Gene name
  variant = item['name'] #call variant
  puts variant
  chrom = item['coordinates']['chromosome'].to_i #call chrom
  start = item['coordinates']['start'].to_i #call start
  stop = item['coordinates']['stop'].to_i
  gene_ENST = item['coordinates']["representative_transcript"]
  
  #If there are 2 sets of coordiantes pull them
  if item['coordinates']['chromosome2'] and item['coordinates']['start2'] and item['coordinates']['stop2'] #determine if there is a second set of coordinates
    gene_ENST2 = item['coordinates']["representative_transcript2"]
    chrom2 = item['coordinates']['chromosome2'].to_i  # call chrom2
    start2 = item['coordinates']['start2'].to_i  # call start2
    stop2 = item['coordinates']['stop2'].to_i  # call stop2
    flag = true #set flag for coordinates; true = second set of coordinates available
  end

  #If there is a coordinates flag pull both genes and get the exons
  if flag
    flag2 = true
    exons1 = get_exons(gene_ENST, chrom)
    exons2 = get_exons(gene_ENST2, chrom2)
    probe_type = 'exon_coding'

  #Evaluate for SNVs/Indels
  elsif (start - stop).abs < 25
    probe_type = 'SNV/indel'
    region_length = (start - stop).abs 

  #Evaluate for small variants
  elsif (start - stop).abs  < 1000
    flag2 = true
    probe_type = 'small_variant'
    region_length = (start - stop).abs

  #Evaluate for Exon Coding
  elsif (start - stop).abs  > 1000
    exons1 = get_exons(gene_ENST, chrom)
    probe_type = 'exon_coding'
  end

  #Print to lists
  if exons1.any? #if there is two sets of coordinates
    for item in exons1
      start = item[1].to_i
      stop = item[2].to_i
      region_length = (start - stop).abs
      chrom = item[0]
      capture_RNA << [gene, variant, probe_type, pipeline, region_length, chrom, start, stop] #append to list
    end
  if exons2.any?
      start = item[1].to_i
      stop = item[2].to_i
      region_length = (start - stop).abs
      chrom = item[0]
      capture_RNA << [gene, variant, probe_type, pipeline, region_length, chrom, start, stop] #append to list
    end
  end

  if exons1.empty? and exons2.empty? #if there is only one set of coordinates
    capture_RNA << [gene, variant, probe_type, pipeline, region_length, chrom, start, stop] #append to list
  end
end 


OVEREXPRESSION
/overlap/id/ENST00000227507.2?feature=exonPROMOTER METHYLATION
/overlap/id/ENST00000306010.7?feature=exonOVEREXPRESSION
/overlap/id/ENST00000275493.2?feature=exonp16 EXPRESSION
/overlap/id/ENST00000498124.1?feature=exonEXPRESSION
/overlap/id/ENST00000381577.3?feature=exonMUTATION
/overlap/id/ENST00000498907.2?feature=exonMUTATION
/overlap/id/ENST00000300305.3?feature=exonMUTATION
/overlap/id/ENST00000264709.3?feature=exonMUTATION
/overlap/id/ENST00000369535.4?feature=exonMUTATION
/overlap/id/ENST00000263967.3?feature=exonBCR-ABL
/overlap/id/ENST00000305877.8?feature=exon/overlap/id/ENST00000318560.5?feature=exonPML-RARA
/overlap/id/ENST00000268058.3?feature=exon/overlap/id/ENST00000254066.5?feature=exonALK FUSIONS
/overlap/id/ENST00000389048.3?feature=exon/overlap/id/ENST00000254066.5?feature=exonAMPLIFICATION
/overlap/id/ENST00000275493.2?feature=exon/overlap/id/ENST00000254066.5?feature=exonAMPLIFICATION
/overlap/id/ENST00000425967.3?feature=exon/overlap/id/ENST0000025

[{"id"=>20, "entrez_name"=>"CCND1", "entrez_id"=>595, "name"=>"OVEREXPRESSION", "description"=>"Cyclin D has been shown in many cancer types to be misregulated. Well established for their oncogenic properties, the cyclins and the cyclin-dependent kinases (CDK's) they activate have been the focus of major research and development efforts over the past decade. The methods by which the cyclins are misregulated are widely variable, and range from genomic amplification to promoter methylation changes. While Cyclin D2 has only been found to be significantly misregulated in glioma, Cyclin D1 in particular seems to be a pan-cancer actor. Cyclin D misregulation has been shown to lead to poorer outcomes in a number of studies, and currently there are no FDA-approved targeted therapies.", "gene_id"=>8, "type"=>"variant", "variant_types"=>[{"id"=>183, "name"=>"N/A", "display_name"=>"N/A", "so_id"=>"N/A", "description"=>"No suitable Sequence Ontology term exists.", "url"=>"http://www.sequenceontolo

In [230]:

def get_exons(gene_id, chromosome)
  server='http://grch37.rest.ensembl.org'
  path = "/overlap/id/#{gene_id}?feature=exon;expand=1;utr=1"
  print path
  url = URI.parse(server)
  http = Net::HTTP.new(url.host, url.port)
  request = Net::HTTP::Get.new(path, {'Content-Type' => 'application/json'})

  response = http.request(request)
  result = JSON.parse(response.body)
  intervals = []
  for item in result
    stop = item["end"].to_i
    start = item["start"].to_i
    intervals << [start, stop]
  end
  
  #Merge overlapping intervals
  intervals.sort! { |x, y|
      x[0] <=> y[0]
    }
  merged_intervals = []
  temp_interval = []
  
  intervals.each_with_index do |interval, i|
    if i == 0
      temp_interval = interval
      puts i
    elsif interval[0] <= temp_interval[1] && interval[1] >= temp_interval[1]
      temp_interval[1] = interval[1]
    elsif interval[0] > temp_interval[1]
      merged_intervals << temp_interval
      temp_interval = interval
    end
  return merged_intervals
  end
end
#   merged_intervals = []

    
#     if i == 0
#       temp_interval = interval
#     elsif interval[0] <= temp_interval[1] && interval[1] > temp_interval[1]
#       temp_interval[1] = interval[1]
#     elsif interval[0] > temp_interval[1]
#       merged_intervals << temp_interval
#       temp_interval = interval
#     end
#   merged_intervals << temp_interval
#   return merged_intervals
# end

tp53 = get_exons("ENST00000269305", 17)

/overlap/id/ENST00000269305?feature=exon;expand=1;utr=10
0


[]

In [226]:
puts tp53

[]
