## Some initial setup steps


### Step 1:  Set-up the analytics environment

This demo has been coded in the Ruby programming language, but you would often see data scientists using this same Web environment (called a "Jupyter Notebook") to run analytics in R or Python.  

We first need to do some "housekeeping" so that our environment can make reequests over the web and plot them...


In [None]:
require 'daru/view'
require 'rest-client'
require 'sparql'
require 'rdf/rdfxml'
require 'erb'

Daru::View.plotting_library = :googlecharts

puts  "thanks!  Go to the next box now :-)"

## Call the registry data service interface

We need call the URL of the "Time To Diagnosis" (ttd) service for each participating registry. We will then do a bit of processing on the output to put it into a more useful structure for our analytics.

If you want to join this demo yourself, add the abbreviation of your registry, and the URL to your data service, into the "registries" variable in the following code box. Everything else is done for you automatically!


In [None]:
registries = {
  'DPP' => 'https://www.fairdata.services/proxy/shallot/dpp-kpi-ttd',
  'ENMD' => 'https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/kpi-ttd',
  # 'YOU' => 'https://your.registry.here/join/us/kpi-ttd'

}

disease_hash = Hash.new

registries.each do |registry, url|
  begin
    csv = RestClient.get(url)
    warn csv
  rescue
    next
  end

  # Data Structure is:
  # ORDO,   yearOfDiagnosis,    AverageOffset
  # http://www.orpha.net/ORDO/Orphanet_98896,  1996,   334

  csv.body.split[2..].each do |tmp|
      disease, year, delay = tmp.split(',')
      disease_hash[registry] = Hash.new unless disease_hash[registry]
      disease_hash[registry][year] = Array.new unless disease_hash[registry][year]
      disease_hash[registry][year] << [disease, delay.to_i]
  end
end

puts "data loaded successfully "
puts"#{disease_hash}"


# Use FAIR to get more data

Gather additional external data about the diseases.  Here we are reaching out to Orphanet using their FAIR data to gather additional information about the disease, in this case, the disease name.  We are using a language called 'SPARQL', which is the standard language for exploration of FAIR data.




In [None]:
puts "Getting FAIR data about Orphanet diseases"

diseasenames = Hash.new
registries.each do |registry, url|
  next unless disease_hash[registry]
  disease_hash[registry].keys.each do |year|  # need to check all years, since some diseases may only appear in a certain year
    disease_hash[registry][year].each do |disease, _offset|
      next if diseasenames[disease]
      match = disease.match(/.*\/(.*)\s?/)  # need to capture just the ORPHA code
      code = match[1]
      orphanet_call = "https://www.orpha.net/sparql?default-graph-uri=&query=PREFIX+ordo%3A%3Chttp%3A%2F%2Fwww.orpha.net%2FORDO%2F%3E%0D%0APREFIX+w3%3A+%3Chttp%3A%2F%2Fwww.w3.org%2F2000%2F01%2Frdf-schema%23%3E%0D%0ASELECT+%3Flabel%0D%0AWHERE+{%0D%0Aordo%3A#{code}+w3%3Acomment+%3Flabel%0D%0A}%0D%0A&should-sponge=&format=text%2Fcsv"
      csv = RestClient.get(orphanet_call)
      label = csv.body.split(/\n/)[1]
      puts "url: #{disease}  Name: #{label}"
      diseasenames[disease] = label
    end
  end
end
puts ""



## Analytics
Here is a simple plot of the time-to-diagnosis for a specific disease over time

In [None]:
registry = "DPP"   # I am asking about the data in DPP
diseasecode = "Orphanet_98896"  # For Duchenne

data_rows = []; label = ""
disease_hash[registry].keys.each do |year|
  disease_hash[registry][year].each do |disease, offset|
    match = disease.match(/.*\/(.*)\s?/)
    next unless match[1] == diseasecode   # filter only for duchenne
    label = diseasenames[disease]
    data_rows.append [year, offset]   # add it to our data rows
  end
end

index = Daru::Index.new ['Year', 'Delay-to-Diagnosis (days)']
frame = Daru::DataFrame.rows(data_rows)
frame.vectors = index
table =  Daru::View::Table.new(frame)

options =  { title: "Time to diagnosis for #{label} in the #{registry} registry",
             type: :bar,
             height: 500

}
chart = Daru::View::Plot.new(table.table, options)
chart.show_in_iruby

# Further analytics

We would likely now continue with our analytics by combining the observations of TTD over all registries, to get more statistical accuracy and a more harmonized, global view...