<center><img src='./images/ERN_DPP_FDS.png'></center>
  

# Demonstration of privacy-preserving FAIR Data integration

## An example of federated query over independently FAIRified Muscular Dystrophy registries

###  https://github.com/markwilkinson/duchenne-daru

This demo is done in a Jupyter Notebook.  This allows us to run software live, edit it, and run it again to show that we are dynamically integrating data from multiple registries.  We can also show you exactly the data that is being passed, to give assurance that no private data is exposed.


# The DPP Components

On the DPP Server, _*within the secure space*_ we have three components.  
* The FAIR data(base)
* A Shallot server that sends SPARQL queries to the database, and returns the results
* A Secure Shell proxy into the Shallot server to ensure that all external requests are encrypted

<br/><br/>

<hr/>

## The Shared Components

Outside of the DPP server, on the World Duchenne Organization GitHub, there is a public folder of SPARQL queries.  Those queries can be constucted by anyone, but must be approved by some representative of the WDO FAIR Data Project (e.g. Nawel or me).  This ensures that queries cannot expose any private data.

When the Shallot server starts, it calls to the WDO GitHub and loads a copy of those queries into the secure space.  From that point on, it can only execute the queries in that copy of the folder
<br/><br/><br/><br/><br/>

<img src='./images/public_components.png' width=700>
<hr/>


# The Request

A user discovers the DPP service (e.g. likely through querying the DPP FAIR Data Point) and decides to request the count.  They can see documentation about how to call the Shallot service, so that they understand it.

Using those instructions, the user creates the URL that will cause the query to be executed using her desired parameters.  

     For example: type="Orphanet_98895"  (Beker Muscular Dystrophy)

# Enough Talk... Let's see it in action!

## Some initial setup steps


### Set-up the analytics environment

This demo has been coded to request the number of Duchenne and Becker patients in the DPP.  We first need to do some "housekeeping" so that our environment can make reequests over the web and plot them...


In [None]:
# DON'T DISCONNECT MY BINDER!
#(1..100).to_a.each {|t| puts t; sleep 30}
# TO STOP THIS, CLICK THE "STOP" ICON IN THE TOOLBAR,
# THEN DELETE THE CODE

require 'daru/view'
require 'rest-client'

Daru::View.plotting_library = :googlecharts

puts  "thanks!  Go to the next box now :-)"

## Call the interface

All of the private components are constantly running on the DPP server, so we do not need to do anything in that regard.

All we need to do is call the URL of the Secure Shell proxy, sending it our desired disease code...

In [None]:

duchenne = "Orphanet_98896"  # the Orphanet code for Duchenne
becker = "Orphanet_98895"    # the Orphanet code for Becker
als = 'Orphanet_803'         # the Orphanet code for Amyotrophic lateral sclerosis

duchennecsv = RestClient.get("https://www.fairdata.services/proxy/shallot/dpp-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{duchenne}")  # The URL to the interface
beckercsv =   RestClient.get("https://www.fairdata.services/proxy/shallot/dpp-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{becker}")
alscsv =         RestClient.get("https://www.fairdata.services/proxy/shallot/dpp-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{als}")

puts "Duchenne Patients"
puts duchennecsv   # note that this is the ENTIRE OUTPUT from that request...
                   # this proves that no private data is being exposed
puts
puts "Becker"
puts beckercsv

puts
puts "Amyotrophic lateral sclerosis"
puts alscsv


## Analytics

Now that we have the data (stored in the `duchennecsv` and `beckercsv` variables), we can do analytics on that data.  For example, a simple plot:

In [None]:
duchenne_count = duchennecsv.body.split.last.to_i
becker_count = beckercsv.body.split.last.to_i
als_count = alscsv.body.split.last.to_i

data_rows = [
  ['Duchenne', duchenne_count],
  ['Becker', becker_count],
  ['ALS', als_count],
  ]
  index = Daru::Index.new ['Disease', 'Patient Count']
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby

<hr/>

# FAIR is about interoperability... where's the interoperability?

## Enter Dagmar Jäger!  EURO-NMD

EURO-NMD has built their own FAIR database.  They want to integrate their data with ours... how?

## Simply start the Shallot server on EURO-NMD!

<br/>
<img src='./images/reuse_public_components.png' width=1000>

<hr/>

# Now just add the URL to the code and run it again...

In [None]:

# EURO-NMD
enmd_duchennecsv = RestClient.get("https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{duchenne}")
enmd_beckercsv = RestClient.get("https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{becker}")
enmd_alscsv = RestClient.get("https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{als}")

# EURO-NMD
enmd_duchenne_count = enmd_duchennecsv.body.split.last.to_i
enmd_becker_count = enmd_beckercsv.body.split.last.to_i
enmd_als_count = enmd_alscsv.body.split.last.to_i


data_rows = [
  ['DPP Duchenne', duchenne_count],
  ['DPP Becker', becker_count],
  ['DPP ALS', als_count],
  # EURO-NMD
  ['ENMD Duchenne', enmd_duchenne_count],
  ['ENMD Becker', enmd_becker_count],
  ['ENMD ALS', enmd_als_count],
  ]
  index = Daru::Index.new ['Disease', 'Patient Count',]
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby

# Recently, the CRAMP database came online

<br/>
<br/>

<img src="./images/cramp.png" width=500>


The CRAMP database contains patients with ALS

So we add them to the demo

In [None]:

als_crampcsv = RestClient.get(
  "https://www.fairdata.services/proxy/shallot/cramp-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{als}")

# CRAMP
cramp_als_count = als_crampcsv.body.split.last.to_i

data_rows = [
  ['DPP Duchenne', duchenne_count],
  ['DPP Becker', becker_count],
  ['DPP ALS', als_count],
  # EURO-NMD
  ['ENMD Duchenne', enmd_duchenne_count],
  ['ENMD Becker', enmd_becker_count],
  ['ENMD ALS', enmd_als_count],
  # CRAMP
  ['CRAMP ALS', cramp_als_count],
  ]

  index = Daru::Index.new ['Disease', 'Patient Count',]
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby

# A few days ago the DMScope database came online
<br/><br/>
<img src="./images/dmscope.png" width=500>

The DMScope database contains patients with <b>Myotonic dystrophy type 1</b> [Orphanet 273](http://www.orpha.net/ORDO/Orphanet_273)



  

In [None]:
myotonic_dystrophy = "Orphanet_273"

md_dmscopecsv = RestClient.get(
  "https://www.fairdata.services/proxy/shallot/dmscope-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{myotonic_dystrophy}")

# DMSCOPE
dmscope_md_count = md_dmscopecsv.body.split.last.to_i

data_rows = [
  ['DPP Duchenne', duchenne_count],
  ['DPP Becker', becker_count],
  ['DPP ALS', als_count],
  # EURO-NMD
  ['ENMD Duchenne', enmd_duchenne_count],
  ['ENMD Becker', enmd_becker_count],
  ['ENMD ALS', enmd_als_count],
  # CRAMP
  ['CRAMP ALS', cramp_als_count],
  # DMSCOPE
  ['DMSCOPE MD', dmscope_md_count],
  ]

  index = Daru::Index.new ['Disease', 'Patient Count',]
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby

# And finally, SmartCare arrived a few days ago!

SmartCare includes patients with <b>Proximal spinal muscular atrophy</b> [Orphanet_70](http://www.orpha.net/ORDO/Orphanet_70)


In [None]:

  

psma = "Orphanet_70"

psma_smartcsv = RestClient.get(
  "https://zks-docker.ukl.uni-freiburg.de/grlc-smartcare/api-local/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{psma}")

# SMARTCARE
smart_psma_count = psma_smartcsv.body.split.last.to_i


data_rows = [
  ['DPP Duchenne', duchenne_count],
  ['DPP Becker', becker_count],
  ['DPP ALS', als_count],
  # EURO-NMD
  ['ENMD Duchenne', enmd_duchenne_count],
  ['ENMD Becker', enmd_becker_count],
  ['ENMD ALS', enmd_als_count],
  # CRAMP
  ['CRAMP ALS', cramp_als_count],
  # DMSCOPE
  ['DMSCOPE MD', dmscope_md_count],
  # DMSCOPE
  ['SMART PSMA', smart_psma_count],
  ]

  index = Daru::Index.new ['Disease', 'Patient Count',]
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby

# Prove that they all work together!

We are now going to send the same query to all registries, to prove interoperability.

## Which registries contain records of Proximal spinal muscular atrophy?

Let's go!


In [None]:
psma = "Orphanet_70"  # Proximal spinal muscular atrophy

#SMARTCARE
psma_smartcsv = RestClient.get(
  "https://zks-docker.ukl.uni-freiburg.de/grlc-smartcare/api-local/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{psma}")

# DMSCOPE
psma_dmscopecsv = RestClient.get(
  "https://www.fairdata.services/proxy/shallot/dmscope-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{psma}")

# CRAMP
psma_crampcsv = RestClient.get(
  "https://www.fairdata.services/proxy/shallot/cramp-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{psma}")

# EURO-NMD
psma_euronmdcsv = RestClient.get(
  "https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{psma}")

# DUCHENNE PARENT PROJECT
psma_dppcsv = RestClient.get(
  "https://www.fairdata.services/proxy/shallot/dpp-count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{psma}")  # The URL to the interface

smart_psma_count = psma_smartcsv.body.split.last.to_i
dmscope_psma_count = psma_dmscopecsv.body.split.last.to_i
cramp_psma_count = psma_crampcsv.body.split.last.to_i
euronmd_psma_count = psma_euronmdcsv.body.split.last.to_i
dpp_psma_count = psma_dppcsv.body.split.last.to_i


data_rows = [
  ['DPP PSMA', dpp_psma_count],
  ['SMART PSMA', smart_psma_count],
  ['DMSCOPE PSMA', dmscope_psma_count],
  ['ENMD PSMA', euronmd_psma_count],
  ['CRAMP PSMA', cramp_psma_count],

  ]

  index = Daru::Index.new ['Disease', 'Patient Count',]
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby