## Some initial setup steps


### Step 1:  Set-up the analytics environment

We first need to do some "housekeeping" so that our environment can make requests over the Web and plot them...


In [None]:
require 'daru/view'
require 'rest-client'
require 'sparql'
require 'rdf/rdfxml'
require 'erb'

Daru::View.plotting_library = :googlecharts

puts  "thanks!  Go to the next box now :-)"

<pre>


</pre>
## Call each registry's "Phenotype Count" service interface

We need call the URL of the "phenotype frequencies" service for each registry. We will then do a bit of processing on the output to put it into a more useful structure for our analytics.

If you want to join this demo yourself, add the abbreviation of your registry, and the URL to your data service, into the "registries" variable in the following code box. Everything else is done for you automatically!


In [None]:
registries = {
  'DPP' => 'https://www.fairdata.services/proxy/grlc/phenotype-frequencies/phenotype-frequencies',
  'ENMD' => 'https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/phenotype-frequencies',
  #    ADD YOUR REGISTRY HERE!!!  :-)
}

phenotype_hash = Hash.new

registries.each do |registry, url|
  begin
      csv = RestClient.get(url)
  rescue
      puts "Registry #{registry} failed to respond successfully"
      registries.delete(registry)
      next
  end
  
  csv.body.split[2..].each do |tmp|
      phenourl, frequency = tmp.split(',')
      phenotype_hash[registry] = Hash.new unless phenotype_hash[registry]
      phenotype_hash[registry][phenourl] = frequency
  end
end

puts "data loaded successfully "
#puts"#{phenotype_hash}"


<pre>


</pre>
## Calculate the total number of phenotypic observations in each registry

To compare the scale of phenotypic observations in the participating registries


In [None]:
# Calculate the total number of phenotype observations

registries.each do |registry, url|
  puts "#{registry} total number of phenotypic observations: #{phenotype_hash[registry].values.map(&:to_i).sum}"
end
puts ""

<pre>


</pre>
## Find the common phenotypes for both registries
Next, we will compare the phenotypes themselves, to check which of them are present in both DPP and Euro-NMD

At the same time, we're going to take advantage of FAIR, and reach-out to the Human Phenotype Ontology to ask it for the phenotype terms associated with these URLs.


In [None]:
puts "Common phenotypes"

# do an intersection over all registries
common_phenotypes = []
registries.each do |registry, url|
  common_phenotypes = phenotype_hash[registry].keys unless common_phenotypes.first
  common_phenotypes = common_phenotypes.intersection phenotype_hash[registry].keys
end


# Go to the Web to get more FAIR data about each pheno code
phenolookup = {}
common_phenotypes.each do |pheno|
  #coded = ERB::Util.url_encode pheno
  case
    when pheno.match(/\/NCIT_/)
      onto = "NCIT"    # National Cancer Institute Thesaurus
    when pheno.match(/\/HP_/)
      onto = "HP"      # The Human Phenotype Ontology
    when pheno.match(/\/UBERON_/)
      onto = "UBERON"   # The UBER Anatomy Ontology
    when pheno.match(/\/MP_/)
      onto = "MP"       # The Mamalian Phenotype Ontology
    when pheno.match(/\/SYMP_/)
      onto = "SYMP"      # The Symptom Ontology
  end
  g = RDF::Graph.load("https://ontobee.org/ontology/#{onto}?iri=#{pheno}")
  
  
# Query the FAIR data using the SPARQL query language (the same language we used to query DPP and EURONMD!)
  res = SPARQL.execute("SELECT ?label where {<#{pheno}> <http://www.w3.org/2000/01/rdf-schema#label> ?label}", g)
  label = res.first['label']
  
  
# print output to screen
  puts "URl: #{pheno}  Term: #{label}"
  phenolookup[pheno] = label
end
puts ""

<pre>


</pre>
## Show the frequencies for the shared phenotypes, as well as their relative frequencies
Since both registries have a considerable difference in the overall number of patients, we will calculate the relative frequencies in each registry to get a better comparison between them:

In [None]:

common_freqs_hash = Hash.new
rel_freqs_hash = Hash.new

puts "Common phenotypes:"

registries.each do |registry, url|
  common_phenotypes.each do |pheno|
      freq = phenotype_hash[registry][pheno].to_i
      rel_freq = (freq.to_f/phenotype_hash[registry].values.map(&:to_i).sum).round(3)
      puts "Registry: #{registry}; Phenotype: #{pheno};  Frequency: #{freq}; Relative frequency: #{rel_freq}"
    
# Capture these calculations so that we can graph them 
      common_freqs_hash[registry] = {} unless common_freqs_hash[registry]
      rel_freqs_hash[registry] = {} unless rel_freqs_hash[registry]
      common_freqs_hash[registry][pheno] = freq
      rel_freqs_hash[registry][pheno] = rel_freq
  end
end
puts ""

<pre>


</pre>
## Analytics
Here is a simple plot of the frequencies of the shared phenotypes

In [None]:
data_rows = []
common_phenotypes.each do |pheno|
  registries.each do |registry, url|
    phenolabel = phenolookup[pheno]
    data_rows.append ["#{registry} #{phenolabel}", common_freqs_hash[registry][pheno]]
  end
end

index = Daru::Index.new ['Phenotype', 'Number of people with the phenotype',]
frame = Daru::DataFrame.rows(data_rows)
frame.vectors = index
table =  Daru::View::Table.new(frame)

options =  { title: 'Phenotype frequencies',
             type: :bar,
             height: 500

}
chart = Daru::View::Plot.new(table.table, options)
chart.show_in_iruby

<pre>


</pre>
## Analytics 2
Now, let's compare the relative frequencies of those same phenotypes

In [None]:

data_rows = []
common_phenotypes.each do |pheno|
  registries.each do |registry, url|
    phenolabel = phenolookup[pheno]
    data_rows.append ["#{registry} #{phenolabel}", rel_freqs_hash[registry][pheno]]
  end
end


index = Daru::Index.new ['Phenotype', 'Relative phenotype frequency',]
frame = Daru::DataFrame.rows(data_rows)
frame.vectors = index
table =  Daru::View::Table.new(frame)

options =  { title: 'Relative phenotype frequencies',
             type: :bar,
             height: 500

}
chart = Daru::View::Plot.new(table.table, options)
chart.show_in_iruby