## Some initial setup steps

### Step 1:  Select the Ruby kernel
IF YOU SEE THE WORD "Ruby" in the top right of your screen, go directly to Step 2 :-)

IF YOU DO NOT SEE THE WORD "Ruby" in the top right side of your screen, you need to set the Ruby kernel for this demo.  In the menu bar at the top of this page, click on "kernel" --> "Change Kernel" --> "Ruby 3.x.x"


### Step 2:  Set-up the analytics environment

This demo has been coded to request the number of Duchenne and Becker patients in the DPP.  We first need to do some "housekeeping" so that our environment can make reequests over the web and plot them...


In [None]:
require 'daru/view'
require 'rest-client'
require 'sparql'
require 'rdf/rdfxml'
require 'erb'

Daru::View.plotting_library = :googlecharts

puts  "thanks!  Go to the next box now :-)"

## Call the interface

We need call the URL of the "phenotype frequencies" service for each registry. We will then do a bit of processing on the output to put it into a more useful structure for our analytics.


In [None]:
registries = {
  'DPP' => 'https://www.fairdata.services/proxy/grlc/phenotype-frequencies/phenotype-frequencies',
  'ENMD' => 'https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/phenotype-frequencies',
}

phenotype_hash = Hash.new

registries.each do |registry, url|
csv = RestClient.get(url)
#  dppcsv = RestClient.get('https://www.fairdata.services/proxy/grlc/phenotype-frequencies/phenotype-frequencies')
# enmdcsv = RestClient.get('https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/phenotype-frequencies')
#dpp_phenotype_hash = Hash.new
#enmd_phenotype_hash = Hash.new

csv.body.split[2..].each do |tmp|
    phenourl, frequency = tmp.split(',')
    phenotype_hash[registry] = Hash.new unless phenotype_hash[registry]
    phenotype_hash[registry][phenourl] = frequency
    #dpp_phenotype_hash[phenourl] = frequency
end

end
# dppcsv.body.split[2..].each do |tmp|
#     phenourl, frequency = tmp.split(',')
#     phenotype_hash['DPP'] = Hash.new unless phenotype_hash['DPP']
#     phenotype_hash['DPP'][phenourl] = frequency
#     dpp_phenotype_hash[phenourl] = frequency
# end
# enmdcsv.body.split[2..].each do |tmp|
#     phenourl, frequency = tmp.split(',')
#     phenotype_hash['ENMD'] = Hash.new unless phenotype_hash['ENMD']
#     phenotype_hash['ENMD'][phenourl] = frequency
#     enmd_phenotype_hash[tmp.split(',')[0]] = tmp.split(',')[1]
# end

puts "data loaded successfully "
#puts"#{phenotype_hash}"


## Calculate the total number of phenotypic observations in each registry

To compare the scale of phenotypic observations in the participating registries


In [None]:
# Calculate the total number of phenotype observations

#puts phenotype_hash['DPP'].values.map(&:to_i).sum

#dpp_total_phenotypes = dpp_phenotype_hash.values.map(&:to_i).sum
registries.each do |registry, url|
puts "#{registry} total number of phenotypic observations: #{phenotype_hash[registry].values.map(&:to_i).sum}"
end
#Calculate the total amount of phenotype frequencies in EURO-NMD
#enmd_total_phenotypes = enmd_phenotype_hash.values.map(&:to_i).sum
#puts "EURO-NMD total number of phenotypic observations: #{phenotype_hash['ENMD'].values.map(&:to_i).sum}"#
puts ""

## Find the common phenotypes for both registries
Next, we will compare the phenotypes themselves, to check which of them are present in both DPP and Euro-NMD

At the same time, we're going to take advantage of FAIR, and reach-out to the Human Phenotype Ontology to ask it for the phenotype terms associated with these URLs.


In [None]:
puts "Common phenotypes"

# do an intersection over all registries
common_phenotypes = []
registries.each do |registry, url|
  common_phenotypes = phenotype_hash[registry].keys unless common_phenotypes.first
  common_phenotypes = common_phenotypes.intersection phenotype_hash[registry].keys
end

# Go to the Web to get more information about each pheno code
phenolookup = {}
common_phenotypes.each do |pheno|
  coded = ERB::Util.url_encode pheno
  g = RDF::Graph.load("https://ontobee.org/ontology/HP?iri=#{coded}")
  res = SPARQL.execute("SELECT ?label where {<#{pheno}> <http://www.w3.org/2000/01/rdf-schema#label> ?label}", g)
  label = res.first['label']
  puts "URl: #{pheno}  Term: #{label}"
  phenolookup[pheno] = label
end

puts ""



## Show the frequencies for the shared phenotypes, as well as their relative frequencies
Since both registries have a considerable difference in the overall number of patients, we will calculate the relative frequencies in each registry to get a better comparison between them:

In [None]:
# dpp_common_freqs_hash = Hash.new
# enmd_common_freqs_hash = Hash.new
# dpp_rel_freqs_hash = Hash.new
# enmd_rel_freqs_hash = Hash.new

common_freqs_hash = Hash.new
rel_freqs_hash = Hash.new
# Print the common phenotypes and their frequencies
puts "Common phenotypes"

['DPP', 'ENMD'].each do |registry|
  common_phenotypes.each do |pheno|
      freq = phenotype_hash[registry][pheno].to_i
      rel_freq = (freq.to_f/phenotype_hash[registry].values.map(&:to_i).sum).round(3)
      puts "Registry: #{registry}; Phenotype: #{pheno};  Frequency: #{freq}; Relative frequency: #{rel_freq}"
      common_freqs_hash[registry] = {} unless common_freqs_hash[registry]
      rel_freqs_hash[registry] = {} unless rel_freqs_hash[registry]
      common_freqs_hash[registry][pheno] = freq
      rel_freqs_hash[registry][pheno] = rel_freq
  end
end
puts ""
# puts "EURO-NMD common phenotypes"
# common_phenotypes.each do |pheno|
#     freq = enmd_phenotype_hash[pheno].to_i
#     rel_freq = (freq.to_f/enmd_total_phenotypes.to_f).round(3)
#     puts "Phenotype: #{pheno};  Frequency: #{freq}; Relative frequency: #{rel_freq}"
#     enmd_common_freqs_hash[pheno] = freq
#     enmd_rel_freqs_hash[pheno] = rel_freq
# end

## Analytics
Here is a simple plot of the frequencies of the shared phenotypes

In [None]:
data_rows = []
common_phenotypes.each do |pheno|
  registries.each do |registry, url|
    phenolabel = pheno.gsub(/.*?\/(\w+)$/, "#{$1}")
    data_rows.append ["#{registry} #{phenolabel}", common_freqs_hash[registry][pheno]]
#    data_rows.append ["ENMD #{phenolabel}", enmd_common_freqs_hash[pheno]]
  end
end

index = Daru::Index.new ['Phenotype', 'Number of people with the phenotype',]
frame = Daru::DataFrame.rows(data_rows)
frame.vectors = index
table =  Daru::View::Table.new(frame)

options =  { title: 'Phenotype frequencies',
             type: :bar,
             height: 500

}
chart = Daru::View::Plot.new(table.table, options)
chart.show_in_iruby

## Analytics 2
Now, let's compare the relative frequencies of those same phenotypes

In [None]:

data_rows = []
common_phenotypes.each do |pheno|
  registries.each do |registry, url|

    phenolabel = pheno.gsub(/.*?\/(\w+)$/, "#{$1}")
    data_rows.append ["#{registry} #{phenolabel}", rel_freqs_hash[registry][pheno]]
#    data_rows.append ["ENMD #{phenolabel}", enmd_rel_freqs_hash[pheno]]
  end
end


index = Daru::Index.new ['Phenotype', 'Relative phenotype frecuency',]
frame = Daru::DataFrame.rows(data_rows)
frame.vectors = index
table =  Daru::View::Table.new(frame)

options =  { title: 'Relative phenotype frequencies',
             type: :bar,
             height: 500

}
chart = Daru::View::Plot.new(table.table, options)
chart.show_in_iruby