# Instructions

The assignment consists of three tasks:

- Run the T-test for the means of two independent samples underlying the statement "IRE binding activity was significantly reduced in failing hearts" (originally published by Haddad et al. in https://doi.org/10.1093/eurheartj/ehw333) using the following example data.

| non-failing heart (NF) | failing heart (F) |
| ---------------------- | ----------------- |
| 95 | 50 |
| 103 | 35 |
| 99 | 21 | 
| &nbsp; | 15 | 
| &nbsp; | 7 | 
| &nbsp; | 40 |

- Describe the statistical hypothesis test in machine readable form following the [statistical methods ontology concept for "two sample t-test with unequal variance"](https://www.ebi.ac.uk/ols/ontologies/stato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSTATO_0000304) using Semantic Web technologies, namely the Resource Description Framework (RDF)
- Process the resulting machine readable description using Semantic Web technologies, namely the SPARQL Protocol and RDF Query Language.

Please return the assignment with all outputs visible (i.e., do not clear the outputs).

Good luck!

In [3]:
!pip install rdflib pandas scipy numpy



In [4]:
# Import all required libraries (some are missing)
from rdflib import Graph, URIRef
from rdflib.namespace import RDF
import math
import numpy as np

In [12]:
# Run the T-test for the means of two independent samples using the example data
print ("If IRE binding activity is significantly reduced then it can leads to  heart failure")
# labels = nf(non-failing heart),  f(failing heart)
# data = ...
nf = [95, 103, 99]
f = [50,35,21,15,7,40]

#step 1: Sum up the elements of each data set
nfSum = sum(nf)
fSum = sum(f)
print (nfSum)
print (fSum)

#steep 2: Square the sums of step 1
nfSSquare = np.square(nfSum)
fSSquare= np.square(fSum)
print (nfSSquare)
print (fSSquare)

#step 3: Calculate the means of each data set
nfNElements = len(nf)
fNElements = len(f)
nfMean = nfSum / float(nfNElements)
fMean = fSum / float(fNElements)
print (nfMean)
print (fMean)

#step 4: Sume the squares of each data set element
nf2 = np.square(nf)
f2= np.square(f)
nfSSum= sum (nf2)
fSSum = sum(f2)
print (nfSSum)
print (fSSum)

#step 5: calculation of t
t = (nfMean -fMean)/float (math.sqrt(((((nfSSum)-((nfSSquare)/float(nfNElements)))+((fSSum)-((fSSquare)/float(fNElements))))/float((nfNElements +fNElements -2)))*((1/float(nfNElements))+(1/float(fNElements)))))
print (t)

# computing the degrees of freedom
##dFree =(nfNElements-1)+(fNElements-1)
#The confidence level is 95% which leads to an alfa value of 0.05( optimun value to avoid error I and II)
##pValue = 1.895 #taken from T-table
# Print the p-value
ttest = ttest_ind ( nf , f , axis=0, equal_var = False )
pValue = ttest.pvalue
print (pValue)

if (t<pValue):
    print ("The experiment shows that the data is normal, failing to reject the null hypothesis at a confidence level of 95%")
else:
    print ("The experiment shows that the data is normal, there is enough evidence to reject the null hypotesis at a confidence level of 95%  ")

If IRE binding activity is significantly reduced then it can leads to  heart failure
297
168
88209
28224
99.0
28.0
29435
6040
7.182560914224794
5.318725263490542e-05
The experiment shows that the data is normal, there is enough evidence to reject the null hypotesis at a confidence level of 95%  


In [63]:
# Describe the statistical hypothesis test in machine readable form

# First, we initialize an RDF Graph and bind some prefixes
g = Graph()
g.bind('obo', 'http://purl.obolibrary.org/obo/')
g.bind('ex', 'http://example.org/')

# Next, we define some needed vocabulary
obo = dict()
obo['two sample t-test with unequal variance'] = URIRef('http://purl.obolibrary.org/obo/STATO_0000304')
obo['has specified input'] = URIRef('http://purl.obolibrary.org/obo/OBI_0000293')
obo['has specified output'] = URIRef('http://purl.obolibrary.org/obo/OBI_0000299')
obo['p-value'] = URIRef('http://purl.obolibrary.org/obo/OBI_0000175')
obo['has value specification'] = URIRef('http://purl.obolibrary.org/obo/OBI_0001938')
obo['scalar value specification'] = URIRef('http://purl.obolibrary.org/obo/OBI_0001931')
obo['has specified numeric value'] = URIRef('http://purl.obolibrary.org/obo/OBI_0001937')
obo['iron-responsive element binding'] = URIRef('http://purl.obolibrary.org/obo/GO_0030350')

# Now, populate the graph with statements
# As an example, the following statement types the resource (blank node) 'n1' as 'two sample t-test with unequal variance'
n1 = BNode()
g.add((n1, RDF.type, obo['two sample t-test with unequal variance']))

# Now, state that the t-test has iron-responsive element binding as a specified input (one statement to complete)
n2 = BNode()
g.add((n1, obo['has specified input'], n2))
g.add((n2, RDF.type, obo['iron-responsive element binding']))

# Next, state that the t-test has the p-value computed above as a specified output (two statements to complete)
n3 = BNode()
g.add((n1, obo['has specified output'], n3))
g.add((n3, RDF.type, obo['p-value']))

# The literal value of the p-value is represented as a scalar value specification with a specified numeric value (three statements to complete)
n4 = BNode()
g.add((n3, obo['has value specification'], n4))
g.add((n4, RDF.type, obo['scalar value specification']))
# Type the literal as double
g.add((n4, obo['has specified numeric value'], Literal("5.31872")))

# Finally, serialize and print the graph in RDF/XML format
print( g.serialize(format='application/rdf+xml') )


b'<?xml version="1.0" encoding="UTF-8"?>\n<rdf:RDF\n   xmlns:obo="http://purl.obolibrary.org/obo/"\n   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"\n>\n  <rdf:Description rdf:nodeID="Nf03c7668af174a7e8061d2a191ce95af">\n    <obo:OBI_0001937>1</obo:OBI_0001937>\n    <rdf:type rdf:resource="http://purl.obolibrary.org/obo/OBI_0001931"/>\n  </rdf:Description>\n  <rdf:Description rdf:nodeID="Naaa5951003c84ee4910d0ddf983a1827">\n    <obo:OBI_0000299 rdf:nodeID="Nf3f928765ba04a64baea6d3bbd6f287f"/>\n    <rdf:type rdf:resource="http://purl.obolibrary.org/obo/STATO_0000304"/>\n    <obo:OBI_0000293 rdf:nodeID="Nfabfd4099946475ca0941363cca320a3"/>\n  </rdf:Description>\n  <rdf:Description rdf:nodeID="Nfabfd4099946475ca0941363cca320a3">\n    <rdf:type rdf:resource="http://purl.obolibrary.org/obo/GO_0030350"/>\n  </rdf:Description>\n  <rdf:Description rdf:nodeID="Nf3f928765ba04a64baea6d3bbd6f287f">\n    <obo:OBI_0001938 rdf:nodeID="Nf03c7668af174a7e8061d2a191ce95af"/>\n    <rdf:type rdf:

In [14]:
# Process the machine readable statistical hypothesis test by completing the following SPARQL query that returns the p-value

q = """
PREFIX obo: <http://purl.obolibrary.org/obo/>

SELECT ?pvalue WHERE {
  ?r a obo:STATO_0000304 .
  ?r obo:OBI_0000293 [ a obo:GO_0030350 ] .
  ?r obo:OBI_0000299 ?b .
  ?b a obo:OBI_0000175 .
  ?b a obo:OBI_0001938 .
  ?b ?pvalue obo:OBI_0001937.                   
}
"""

for qs in g.query(q):
    print(qs[0])

NameError: name 'g' is not defined