# Prediction with genetic algorithms and correction with probabilistic rules

This notebook gathers the results for the predictions made on the neural network models, considering a binary threshold of 100, and the later correction using probabilistic rules.

### Brief explanation

With the already chosen best models, we predicted hypometabolism for the patients in our database. By predicting each brain region independently, we created a whole brain map. However, and since variability was a big deal in some regions, we performed 5 predictions per region, which led to 5 different brain maps. For each of them, absolute and normalised probabilistic rules were applied. Eventually, consensus brain maps were calculated before and after rule application. The relevance threshold, which determines how strict we are in considering rules, has been set as a parameter varying from 0.5 to 1.

### Import the packages

We import the packages that will be needed. In this case, everything is already contained in the prtools.py file.

In [1]:
import sys
sys.path.insert(0,'../../Tools')
from prtools import *
da = Datasets()
ra = RuleApplication()

In [2]:
pandas.set_option('display.max_rows', None)
pandas.set_option('display.max_columns', None)
pandas.set_option('display.width', None)
pandas.set_option('display.max_colwidth', None)

### Preparation of results

#### General stuff

In [3]:
real_aal = da.define_binary(pandas.read_csv('../../Data/Y_aal_quan.csv'),100)
real_brodmann = da.define_binary(pandas.read_csv('../../Data/Y_brodmann_quan.csv'),100)
prediction_aal = pandas.read_csv('./Prediction_100/Prediction/prediction_aal.csv')
prediction_brodmann = pandas.read_csv('./Prediction_100/Prediction/prediction_brodmann.csv')

In [4]:
relevance_thresholds_absolute = [1,0.975,0.950,0.925,0.9,0.8,0.7,0.6,0.5]
relevance_thresholds_normalised = [0.9,0.8,0.7,0.6,0.5]

#### Absolute rules

In [5]:
comparison_absolute_aal = ra.compare_with_real(prediction_aal,real_aal)
comparison_absolute_brodmann = ra.compare_with_real(prediction_brodmann,real_brodmann)
index = ['Prediction']
for relevance_threshold in relevance_thresholds_absolute:
    absolute_aal = pandas.read_csv('./Prediction_100/Absolute/correction_aal_'+str(relevance_threshold)+'.csv')
    absolute_brodmann = pandas.read_csv('./Prediction_100/Absolute/correction_brodmann_'+str(relevance_threshold)+'.csv')
    comparison_absolute_aal = pandas.concat([comparison_absolute_aal,ra.compare_with_real(absolute_aal,real_aal)],axis=0)
    comparison_absolute_brodmann = pandas.concat([comparison_absolute_brodmann,ra.compare_with_real(absolute_brodmann,real_brodmann)],axis=0)
    index.append('Relevance threshold = '+str(relevance_threshold))
comparison_absolute_aal.index = index
comparison_absolute_brodmann.index = index

In [6]:
TFP_absolute_aal = comparison_absolute_aal[['TP','FP']].transpose()
TFP_absolute_brodmann = comparison_absolute_brodmann[['TP','FP']].transpose()
added_absolute_aal = []
added_absolute_brodmann = []
for relevance_threshold in relevance_thresholds_absolute:
        temp_TFP_aal = TFP_absolute_aal['Relevance threshold = '+str(relevance_threshold)]-TFP_absolute_aal['Prediction']
        temp_TFP_brodmann = TFP_absolute_brodmann['Relevance threshold = '+str(relevance_threshold)]-TFP_absolute_brodmann['Prediction']
        added_absolute_aal.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_aal[0]+temp_TFP_aal[1])+' regions were marked as hypometabolic: '+str(temp_TFP_aal[0])+' correctly ('+str(round(temp_TFP_aal[0]*100/(temp_TFP_aal[0]+temp_TFP_aal[1]),2))+' %) and '+str(temp_TFP_aal[1])+' incorrectly')
        added_absolute_brodmann.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_brodmann[0]+temp_TFP_brodmann[1])+' regions were marked as hypometabolic: '+str(temp_TFP_brodmann[0])+' correctly ('+str(round(temp_TFP_brodmann[0]*100/(temp_TFP_brodmann[0]+temp_TFP_brodmann[1]),2))+' %) and '+str(temp_TFP_brodmann[1])+' incorrectly')

#### Normalised rules

In [7]:
comparison_normalised_aal = ra.compare_with_real(prediction_aal,real_aal)
comparison_normalised_brodmann = ra.compare_with_real(prediction_brodmann,real_brodmann)
index = ['Prediction']
for relevance_threshold in relevance_thresholds_normalised:
    normalised_aal = pandas.read_csv('./Prediction_100/Normalised/correction_aal_'+str(relevance_threshold)+'.csv')
    normalised_brodmann = pandas.read_csv('./Prediction_100/Normalised/correction_brodmann_'+str(relevance_threshold)+'.csv')
    comparison_normalised_aal = pandas.concat([comparison_normalised_aal,ra.compare_with_real(normalised_aal,real_aal)],axis=0)
    comparison_normalised_brodmann = pandas.concat([comparison_normalised_brodmann,ra.compare_with_real(normalised_brodmann,real_brodmann)],axis=0)
    index.append('Relevance threshold = '+str(relevance_threshold))
comparison_normalised_aal.index = index
comparison_normalised_brodmann.index = index

In [8]:
TFP_normalised_aal = comparison_normalised_aal[['TP','FP']].transpose()
TFP_normalised_brodmann = comparison_normalised_brodmann[['TP','FP']].transpose()
added_normalised_aal = []
added_normalised_brodmann = []
for relevance_threshold in relevance_thresholds_normalised:
        temp_TFP_aal = TFP_normalised_aal['Relevance threshold = '+str(relevance_threshold)]-TFP_normalised_aal['Prediction']
        temp_TFP_brodmann = TFP_normalised_brodmann['Relevance threshold = '+str(relevance_threshold)]-TFP_normalised_brodmann['Prediction']
        added_normalised_aal.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_aal[0]+temp_TFP_aal[1])+' regions were marked as hypometabolic: '+str(temp_TFP_aal[0])+' correctly ('+str(round(temp_TFP_aal[0]*100/(temp_TFP_aal[0]+temp_TFP_aal[1]),2))+' %) and '+str(temp_TFP_aal[1])+' incorrectly')
        added_normalised_brodmann.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_brodmann[0]+temp_TFP_brodmann[1])+' regions were marked as hypometabolic: '+str(temp_TFP_brodmann[0])+' correctly ('+str(round(temp_TFP_brodmann[0]*100/(temp_TFP_brodmann[0]+temp_TFP_brodmann[1]),2))+' %) and '+str(temp_TFP_brodmann[1])+' incorrectly')

#### Random rules

In [9]:
FN_aal = comparison_absolute_aal.loc['Prediction']['FN']
TFN_aal = comparison_absolute_aal.loc['Prediction']['FN'] + comparison_absolute_aal.loc['Prediction']['TN']
FN_brodmann = comparison_absolute_brodmann.loc['Prediction']['FN']
TFN_brodmann = comparison_absolute_brodmann.loc['Prediction']['FN'] + comparison_absolute_brodmann.loc['Prediction']['TN']
random_aal = str(round((FN_aal*100/TFN_aal),2))+' %'
random_brodmann = str(round((FN_brodmann*100/TFN_brodmann),2))+' %'

### Results

#### AAL atlas (90 regions)

##### Using random corrector

A program that randomly changes negative labels to positive labels in our predicted dataset would label correctly what percentage of values?

In [10]:
print(random_aal)

5.86 %


##### Using absolute rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [11]:
comparison_absolute_aal

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,1618,23704,3083,1475,0.847,0.415,0.344,0.523
Relevance threshold = 1,1881,22542,4245,1212,0.817,0.408,0.307,0.608
Relevance threshold = 0.975,1995,21653,5134,1098,0.791,0.39,0.28,0.645
Relevance threshold = 0.95,2000,21635,5152,1093,0.791,0.39,0.28,0.647
Relevance threshold = 0.925,2019,21535,5252,1074,0.788,0.39,0.278,0.653
Relevance threshold = 0.9,2041,21448,5339,1052,0.786,0.39,0.277,0.66
Relevance threshold = 0.8,2168,20721,6066,925,0.766,0.383,0.263,0.701
Relevance threshold = 0.7,2307,19547,7240,786,0.731,0.365,0.242,0.746
Relevance threshold = 0.6,2440,18417,8370,653,0.698,0.351,0.226,0.789
Relevance threshold = 0.5,2515,17477,9310,578,0.669,0.337,0.213,0.813


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [12]:
for i in added_absolute_aal: print(i)

With a relevance threshold of 1, 1425 regions were marked as hypometabolic: 263 correctly (18.46 %) and 1162 incorrectly
With a relevance threshold of 0.975, 2428 regions were marked as hypometabolic: 377 correctly (15.53 %) and 2051 incorrectly
With a relevance threshold of 0.95, 2451 regions were marked as hypometabolic: 382 correctly (15.59 %) and 2069 incorrectly
With a relevance threshold of 0.925, 2570 regions were marked as hypometabolic: 401 correctly (15.6 %) and 2169 incorrectly
With a relevance threshold of 0.9, 2679 regions were marked as hypometabolic: 423 correctly (15.79 %) and 2256 incorrectly
With a relevance threshold of 0.8, 3533 regions were marked as hypometabolic: 550 correctly (15.57 %) and 2983 incorrectly
With a relevance threshold of 0.7, 4846 regions were marked as hypometabolic: 689 correctly (14.22 %) and 4157 incorrectly
With a relevance threshold of 0.6, 6109 regions were marked as hypometabolic: 822 correctly (13.46 %) and 5287 incorrectly
With a relevan

##### Using normalised rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [13]:
comparison_normalised_aal

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,1618,23704,3083,1475,0.847,0.415,0.344,0.523
Relevance threshold = 0.9,1894,22465,4322,1199,0.815,0.407,0.305,0.612
Relevance threshold = 0.8,1900,22433,4354,1193,0.814,0.407,0.304,0.614
Relevance threshold = 0.7,1926,22304,4483,1167,0.811,0.405,0.301,0.623
Relevance threshold = 0.6,1997,21878,4909,1096,0.799,0.399,0.289,0.646
Relevance threshold = 0.5,2044,21543,5244,1049,0.789,0.394,0.28,0.661


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [14]:
for i in added_normalised_aal: print(i)

With a relevance threshold of 0.9, 1515 regions were marked as hypometabolic: 276 correctly (18.22 %) and 1239 incorrectly
With a relevance threshold of 0.8, 1553 regions were marked as hypometabolic: 282 correctly (18.16 %) and 1271 incorrectly
With a relevance threshold of 0.7, 1708 regions were marked as hypometabolic: 308 correctly (18.03 %) and 1400 incorrectly
With a relevance threshold of 0.6, 2205 regions were marked as hypometabolic: 379 correctly (17.19 %) and 1826 incorrectly
With a relevance threshold of 0.5, 2587 regions were marked as hypometabolic: 426 correctly (16.47 %) and 2161 incorrectly


#### Brodmann atlas (47 regions)

##### Using random corrector

A program that randomly changes negative labels to positive labels in our predicted dataset would label correctly what percentage of values?

In [15]:
print(random_brodmann)

4.48 %


##### Using absolute rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [16]:
comparison_absolute_brodmann

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,838,12980,1177,609,0.886,0.484,0.416,0.579
Relevance threshold = 1,933,12555,1602,514,0.864,0.469,0.368,0.645
Relevance threshold = 0.975,995,12282,1875,452,0.851,0.461,0.347,0.688
Relevance threshold = 0.95,998,12271,1886,449,0.85,0.461,0.346,0.69
Relevance threshold = 0.925,999,12260,1897,448,0.85,0.46,0.345,0.69
Relevance threshold = 0.9,1005,12243,1914,442,0.849,0.46,0.344,0.695
Relevance threshold = 0.8,1056,12044,2113,391,0.84,0.458,0.333,0.73
Relevance threshold = 0.7,1117,11776,2381,330,0.826,0.452,0.319,0.772
Relevance threshold = 0.6,1141,11545,2612,306,0.813,0.439,0.304,0.789
Relevance threshold = 0.5,1164,11375,2782,283,0.804,0.432,0.295,0.804


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [17]:
for i in added_absolute_brodmann: print(i)

With a relevance threshold of 1, 520 regions were marked as hypometabolic: 95 correctly (18.27 %) and 425 incorrectly
With a relevance threshold of 0.975, 855 regions were marked as hypometabolic: 157 correctly (18.36 %) and 698 incorrectly
With a relevance threshold of 0.95, 869 regions were marked as hypometabolic: 160 correctly (18.41 %) and 709 incorrectly
With a relevance threshold of 0.925, 881 regions were marked as hypometabolic: 161 correctly (18.27 %) and 720 incorrectly
With a relevance threshold of 0.9, 904 regions were marked as hypometabolic: 167 correctly (18.47 %) and 737 incorrectly
With a relevance threshold of 0.8, 1154 regions were marked as hypometabolic: 218 correctly (18.89 %) and 936 incorrectly
With a relevance threshold of 0.7, 1483 regions were marked as hypometabolic: 279 correctly (18.81 %) and 1204 incorrectly
With a relevance threshold of 0.6, 1738 regions were marked as hypometabolic: 303 correctly (17.43 %) and 1435 incorrectly
With a relevance threshol

##### Using normalised rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [18]:
comparison_normalised_brodmann

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,838,12980,1177,609,0.886,0.484,0.416,0.579
Relevance threshold = 0.9,933,12552,1605,514,0.864,0.468,0.368,0.645
Relevance threshold = 0.8,934,12540,1617,513,0.863,0.467,0.366,0.645
Relevance threshold = 0.7,936,12504,1653,511,0.861,0.464,0.362,0.647
Relevance threshold = 0.6,976,12360,1797,471,0.855,0.463,0.352,0.674
Relevance threshold = 0.5,1019,12177,1980,428,0.846,0.458,0.34,0.704


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [19]:
for i in added_normalised_brodmann: print(i)

With a relevance threshold of 0.9, 523 regions were marked as hypometabolic: 95 correctly (18.16 %) and 428 incorrectly
With a relevance threshold of 0.8, 536 regions were marked as hypometabolic: 96 correctly (17.91 %) and 440 incorrectly
With a relevance threshold of 0.7, 574 regions were marked as hypometabolic: 98 correctly (17.07 %) and 476 incorrectly
With a relevance threshold of 0.6, 758 regions were marked as hypometabolic: 138 correctly (18.21 %) and 620 incorrectly
With a relevance threshold of 0.5, 984 regions were marked as hypometabolic: 181 correctly (18.39 %) and 803 incorrectly


### Conclusions

In order to measure the validity of our rules, it is important to know what result should we expect if we did random changes from negative labels to positive labels in our predicted datasets. As we have seen, it would be expected that only 5-6% of values are correctly labelled. This low percentage can be explained by two facts. First, the percentage of positive labels is by itself very low in the real dataset (15%). Second, our models have already predicted correctly part of this positive labels, which mean that only very few remain available for correction. Taking this into consideration, it is evident that **our rules perform better than a random generator would do**. Even so, **incorrect modifications are more common than correct modifications**, which explains why accuracy and precision always fall. Recall, on the other hand, can only improve. 

**As we consider lower relevance thresholds**, we allow more rules and more modifications to be considered. When the number of modifications is low, they tend to be more accurate; when the number of modifications is high, they tend to be less accurate. The consequence is that **recall increase while accuracy and precision decrease**. **As for the two different type of rules**, normalised rules result in less changes being made, which makes sense because we are being more strict. However, **absolute rules usually perform better at similar magnitudes**. The explanation is that normalised rules are very useful from the clinical point of view, but a pure data driven mechanism is expected to yield better metrics.

So, why are we struggling with correctly modifying non-hipometabolism to hipometabolism? 

- **Rules are very simple, which brings uncertainty**: in the rule '*if A is hypometabolic, then B is hypometabolic with a probability of 0.6*' we know the rule can be applied in 60% of cases but we don't know which cases they are; therefore, we may apply it when not needed or not apply it when needed.
- **We are in a very difficult scenario**: many positive labels have already been predicted correctly with our models, so what is left are the difficult cases. 
- **We are in a very unstable scenario**: the models predicted wrongly in some cases, so the rules are acting over uncertain prior knowledge. With neural networks, the prediction is worse than with genetic algorithm. This explains why the rules perform worst even when the scenario is easier (not so many positive labels have been predicted correctly).