<center><img src='./images/ERN_DPP_FDS.png'></center>
  

# Demonstration of privacy-preserving FAIR Data integration

## An example of federated query over two independently FAIRified Musacular Dystrophy registries

###  https://github.com/markwilkinson/duchenne-daru

This demo is done in a Jupyter Notebook.  This allows us to run software live, edit it, and run it again to show that we are dynamically integrating data from multiple registries.  We can also show you exactly the data that is being passed, to give assurance that no private data is exposed.


# The DPP Components

On the DPP Server, _*within the secure space*_ we have three components.  
* The FAIR data(base)
* A 'grlc' server that sends SPARQL queries to the database, and returns the results
* A Secure Shell proxy into the grlc server to ensure that all external requests are encrypted

<br/><br/>

<img src='./images/components.png'/>

<hr/>

## The Shared Components

Outside of the DPP server, on the World Duchenne Organization GitHub, there is a public folder of SPARQL queries.  Those queries can be constucted by anyone, but must be approved by some representative of the WDO FAIR Data Project (e.g. Nawel or me).  This ensures that queries cannot expose any private data.

When the grlc server starts, it calls to the WDO GitHub and loads a copy of those queries into the secure space.  From that point on, it can only execute the queries in that copy of the folder (all other grlc server capabilities have been disabled)
<br/><br/><br/><br/><br/>

<img src='./images/public_components.png'>
<hr/>


# The Request

A user discovers the DPP service (somehow - likely through querying the DPP FAIR Data Point) and decides to collect the data.  They can see documentation about how to call the grlc service, so that they understand it.
<br/><br/>
<img src='./images/grlc_metadata.png'>
<hr/>

Using those instructions, the user creates the URL that will cause the query to be executed using her desired parameters.  

     For example: type="Orphanet_98895"  (Beker Muscular Dystrophy)

# Enough Talk... Let's see it in action!

## Some initial setup steps

### Step 1:  Select the Ruby kernel
IF YOU SEE THE WORD "Ruby" in the top right of your screen, go directly to Step 2 :-)

IF YOU DO NOT SEE THE WORD "Ruby" in the top right side of your screen, you need to set the Ruby kernel for this demo.  In the menu bar at the top of this page, click on "kernel" --> "Change Kernel" --> "Ruby 3.x.x"


### Step 2:  Set-up the analytics environment

This demo has been coded to request the number of Duchenne and Becker patients in the DPP.  We first need to do some "housekeeping" so that our environment can make reequests over the web and plot them...


In [None]:
require 'daru/view'
require 'rest-client'

Daru::View.plotting_library = :googlecharts

puts  "thanks!  Go to the next box now :-)"

## Call the interface

All of the private components are constantly running on the DPP server, so we do not need to do anything in that regard.

All we need to do is call the URL of the Secure Shell proxy, sending it our desired disease code...

In [None]:

duchenne = "Orphanet_98896"  # the Orphanet code for Duchenne
becker = "Orphanet_98895"    # the Orphanet code for Becker
duchennecsv = RestClient.get("https://www.fairdata.services/proxy/shallot/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{duchenne}")  # The URL to the interface
beckercsv =   RestClient.get("https://www.fairdata.services/proxy/shallot/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{becker}")

puts "Duchenne Patients"
puts duchennecsv   # note that this is the ENTIRE OUTPUT from that request...
                   # this proves that no private data is being exposed
puts
puts "Becker"
puts beckercsv

## Analytics

Now that we have the data (stored in the `duchennecsv` and `beckercsv` variables), we can do analytics on that data.  For example, a simple plot:

In [None]:
duchenne_count = duchennecsv.body.split.last.to_i
becker_count = beckercsv.body.split.last.to_i

data_rows = [
  ['Duchenne', duchenne_count],
  ['Becker', becker_count]
  ]
  index = Daru::Index.new ['Disease', 'Patient Count']
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby

<hr/>

# FAIR is about interoperability... where's the interoperability?

## Enter Dagmar Jäger!  EURO-NMD

EURO-NMD has built their own FAIR database.  They want to integrate their data with ours... how?

## Simply start the grlc server on EURO-NMD!

<br/>
<img src='./images/reuse_public_components.png'>

<hr/>

# Now just add the URL to the code and run it again...

In [None]:
duchennecsv = RestClient.get("https://www.fairdata.services/proxy/grlc/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{duchenne}")  
beckercsv =   RestClient.get("https://www.fairdata.services/proxy/grlc/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{becker}")
# EURO-NMD
enmd_duchennecsv = RestClient.get("https://zks-docker.ukl.uni-freiburg.de/grlc-euronmd/api-local/count?type=http%3A%2F%2Fwww.orpha.net%2FORDO%2F#{duchenne}")


duchenne_count = duchennecsv.body.split.last.to_i
becker_count = beckercsv.body.split.last.to_i
# EURO-NMD
enmd_duchenne_count = enmd_duchennecsv.body.split.last.to_i

data_rows = [
  ['DPP Duchenne', duchenne_count],
  ['DPP Becker', becker_count],
  # EURO-NMD
  ['Duchenne in EURO-NMD', enmd_duchenne_count],
  ]
  index = Daru::Index.new ['Disease', 'Patient Count',]
  frame = Daru::DataFrame.rows(data_rows)
  frame.vectors = index
  table =  Daru::View::Table.new(frame)
  
  options =  { title: 'Patient Counts',
               type: :bar}
  chart = Daru::View::Plot.new(table.table, options)
  chart.show_in_iruby