# Prediction with genetic algorithms and correction with probabilistic rules

This notebook gathers the results for the predictions made on the neural network models, considering a binary threshold of 30, and the later correction using probabilistic rules.

### Brief explanation

With the already chosen best models, we predicted hypometabolism for the patients in our database. By predicting each brain region independently, we created a whole brain map. However, and since variability was a big deal in some regions, we performed 5 predictions per region, which led to 5 different brain maps. For each of them, absolute and normalised probabilistic rules were applied. Eventually, consensus brain maps were calculated before and after rule application. The relevance threshold, which determines how strict we are in considering rules, has been set as a parameter varying from 0.5 to 1.

### Import the packages

We import the packages that will be needed. In this case, everything is already contained in the prtools.py file.

In [1]:
import sys
sys.path.insert(0,'../../Tools')
from prtools import *
da = Datasets()
ra = RuleApplication()

In [2]:
pandas.set_option('display.max_rows', None)
pandas.set_option('display.max_columns', None)
pandas.set_option('display.width', None)
pandas.set_option('display.max_colwidth', None)

### Preparation of results

#### General stuff

In [3]:
real_aal = da.define_binary(pandas.read_csv('../../Data/Y_aal_quan.csv'),30)
real_brodmann = da.define_binary(pandas.read_csv('../../Data/Y_brodmann_quan.csv'),30)
prediction_aal = pandas.read_csv('./Prediction_30/Prediction/prediction_aal.csv')
prediction_brodmann = pandas.read_csv('./Prediction_30/Prediction/prediction_brodmann.csv')

In [4]:
relevance_thresholds_absolute = [1,0.975,0.950,0.925,0.9,0.8,0.7,0.6,0.5]
relevance_thresholds_normalised = [0.9,0.8,0.7,0.6,0.5]

#### Absolute rules

In [5]:
comparison_absolute_aal = ra.compare_with_real(prediction_aal,real_aal)
comparison_absolute_brodmann = ra.compare_with_real(prediction_brodmann,real_brodmann)
index = ['Prediction']
for relevance_threshold in relevance_thresholds_absolute:
    absolute_aal = pandas.read_csv('./Prediction_30/Absolute/correction_aal_'+str(relevance_threshold)+'.csv')
    absolute_brodmann = pandas.read_csv('./Prediction_30/Absolute/correction_brodmann_'+str(relevance_threshold)+'.csv')
    comparison_absolute_aal = pandas.concat([comparison_absolute_aal,ra.compare_with_real(absolute_aal,real_aal)],axis=0)
    comparison_absolute_brodmann = pandas.concat([comparison_absolute_brodmann,ra.compare_with_real(absolute_brodmann,real_brodmann)],axis=0)
    index.append('Relevance threshold = '+str(relevance_threshold))
comparison_absolute_aal.index = index
comparison_absolute_brodmann.index = index

In [6]:
TFP_absolute_aal = comparison_absolute_aal[['TP','FP']].transpose()
TFP_absolute_brodmann = comparison_absolute_brodmann[['TP','FP']].transpose()
added_absolute_aal = []
added_absolute_brodmann = []
for relevance_threshold in relevance_thresholds_absolute:
        temp_TFP_aal = TFP_absolute_aal['Relevance threshold = '+str(relevance_threshold)]-TFP_absolute_aal['Prediction']
        temp_TFP_brodmann = TFP_absolute_brodmann['Relevance threshold = '+str(relevance_threshold)]-TFP_absolute_brodmann['Prediction']
        added_absolute_aal.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_aal[0]+temp_TFP_aal[1])+' regions were marked as hypometabolic: '+str(temp_TFP_aal[0])+' correctly ('+str(round(temp_TFP_aal[0]*100/(temp_TFP_aal[0]+temp_TFP_aal[1]),2))+' %) and '+str(temp_TFP_aal[1])+' incorrectly')
        added_absolute_brodmann.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_brodmann[0]+temp_TFP_brodmann[1])+' regions were marked as hypometabolic: '+str(temp_TFP_brodmann[0])+' correctly ('+str(round(temp_TFP_brodmann[0]*100/(temp_TFP_brodmann[0]+temp_TFP_brodmann[1]),2))+' %) and '+str(temp_TFP_brodmann[1])+' incorrectly')

#### Normalised rules

In [7]:
comparison_normalised_aal = ra.compare_with_real(prediction_aal,real_aal)
comparison_normalised_brodmann = ra.compare_with_real(prediction_brodmann,real_brodmann)
index = ['Prediction']
for relevance_threshold in relevance_thresholds_normalised:
    normalised_aal = pandas.read_csv('./Prediction_30/Normalised/correction_aal_'+str(relevance_threshold)+'.csv')
    normalised_brodmann = pandas.read_csv('./Prediction_30/Normalised/correction_brodmann_'+str(relevance_threshold)+'.csv')
    comparison_normalised_aal = pandas.concat([comparison_normalised_aal,ra.compare_with_real(normalised_aal,real_aal)],axis=0)
    comparison_normalised_brodmann = pandas.concat([comparison_normalised_brodmann,ra.compare_with_real(normalised_brodmann,real_brodmann)],axis=0)
    index.append('Relevance threshold = '+str(relevance_threshold))
comparison_normalised_aal.index = index
comparison_normalised_brodmann.index = index

In [8]:
TFP_normalised_aal = comparison_normalised_aal[['TP','FP']].transpose()
TFP_normalised_brodmann = comparison_normalised_brodmann[['TP','FP']].transpose()
added_normalised_aal = []
added_normalised_brodmann = []
for relevance_threshold in relevance_thresholds_normalised:
        temp_TFP_aal = TFP_normalised_aal['Relevance threshold = '+str(relevance_threshold)]-TFP_normalised_aal['Prediction']
        temp_TFP_brodmann = TFP_normalised_brodmann['Relevance threshold = '+str(relevance_threshold)]-TFP_normalised_brodmann['Prediction']
        added_normalised_aal.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_aal[0]+temp_TFP_aal[1])+' regions were marked as hypometabolic: '+str(temp_TFP_aal[0])+' correctly ('+str(round(temp_TFP_aal[0]*100/(temp_TFP_aal[0]+temp_TFP_aal[1]),2))+' %) and '+str(temp_TFP_aal[1])+' incorrectly')
        added_normalised_brodmann.append('With a relevance threshold of '+str(relevance_threshold)+', '+str(temp_TFP_brodmann[0]+temp_TFP_brodmann[1])+' regions were marked as hypometabolic: '+str(temp_TFP_brodmann[0])+' correctly ('+str(round(temp_TFP_brodmann[0]*100/(temp_TFP_brodmann[0]+temp_TFP_brodmann[1]),2))+' %) and '+str(temp_TFP_brodmann[1])+' incorrectly')

#### Random rules

In [9]:
FN_aal = comparison_absolute_aal.loc['Prediction']['FN']
TFN_aal = comparison_absolute_aal.loc['Prediction']['FN'] + comparison_absolute_aal.loc['Prediction']['TN']
FN_brodmann = comparison_absolute_brodmann.loc['Prediction']['FN']
TFN_brodmann = comparison_absolute_brodmann.loc['Prediction']['FN'] + comparison_absolute_brodmann.loc['Prediction']['TN']
random_aal = str(round((FN_aal*100/TFN_aal),2))+' %'
random_brodmann = str(round((FN_brodmann*100/TFN_brodmann),2))+' %'

### Results

#### AAL atlas (90 regions)

##### Using random corrector

A program that randomly changes negative labels to positive labels in our predicted dataset would label correctly what percentage of values?

In [10]:
print(random_aal)

8.83 %


##### Using absolute rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [11]:
comparison_absolute_aal

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,2481,21252,4088,2059,0.794,0.447,0.378,0.546
Relevance threshold = 1,2841,19570,5770,1699,0.75,0.432,0.33,0.626
Relevance threshold = 0.975,2937,18946,6394,1603,0.732,0.423,0.315,0.647
Relevance threshold = 0.95,2937,18946,6394,1603,0.732,0.423,0.315,0.647
Relevance threshold = 0.925,2972,18793,6547,1568,0.728,0.423,0.312,0.655
Relevance threshold = 0.9,3006,18699,6641,1534,0.726,0.424,0.312,0.662
Relevance threshold = 0.8,3206,17889,7451,1334,0.706,0.422,0.301,0.706
Relevance threshold = 0.7,3479,16531,8809,1061,0.67,0.413,0.283,0.766
Relevance threshold = 0.6,3722,14935,10405,818,0.624,0.399,0.263,0.82
Relevance threshold = 0.5,3865,13783,11557,675,0.591,0.387,0.251,0.851


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [12]:
for i in added_absolute_aal: print(i)

With a relevance threshold of 1, 2042 regions were marked as hypometabolic: 360 correctly (17.63 %) and 1682 incorrectly
With a relevance threshold of 0.975, 2762 regions were marked as hypometabolic: 456 correctly (16.51 %) and 2306 incorrectly
With a relevance threshold of 0.95, 2762 regions were marked as hypometabolic: 456 correctly (16.51 %) and 2306 incorrectly
With a relevance threshold of 0.925, 2950 regions were marked as hypometabolic: 491 correctly (16.64 %) and 2459 incorrectly
With a relevance threshold of 0.9, 3078 regions were marked as hypometabolic: 525 correctly (17.06 %) and 2553 incorrectly
With a relevance threshold of 0.8, 4088 regions were marked as hypometabolic: 725 correctly (17.73 %) and 3363 incorrectly
With a relevance threshold of 0.7, 5719 regions were marked as hypometabolic: 998 correctly (17.45 %) and 4721 incorrectly
With a relevance threshold of 0.6, 7558 regions were marked as hypometabolic: 1241 correctly (16.42 %) and 6317 incorrectly
With a relev

##### Using normalised rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [13]:
comparison_normalised_aal

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,2481,21252,4088,2059,0.794,0.447,0.378,0.546
Relevance threshold = 0.9,2853,19510,5830,1687,0.748,0.432,0.329,0.628
Relevance threshold = 0.8,2855,19499,5841,1685,0.748,0.431,0.328,0.629
Relevance threshold = 0.7,2887,19410,5930,1653,0.746,0.432,0.327,0.636
Relevance threshold = 0.6,3003,18985,6355,1537,0.736,0.432,0.321,0.661
Relevance threshold = 0.5,3092,18675,6665,1448,0.728,0.433,0.317,0.681


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [14]:
for i in added_normalised_aal: print(i)

With a relevance threshold of 0.9, 2114 regions were marked as hypometabolic: 372 correctly (17.6 %) and 1742 incorrectly
With a relevance threshold of 0.8, 2127 regions were marked as hypometabolic: 374 correctly (17.58 %) and 1753 incorrectly
With a relevance threshold of 0.7, 2248 regions were marked as hypometabolic: 406 correctly (18.06 %) and 1842 incorrectly
With a relevance threshold of 0.6, 2789 regions were marked as hypometabolic: 522 correctly (18.72 %) and 2267 incorrectly
With a relevance threshold of 0.5, 3188 regions were marked as hypometabolic: 611 correctly (19.17 %) and 2577 incorrectly


#### Brodmann atlas (47 regions)

##### Using random corrector

A program that randomly changes negative labels to positive labels in our predicted dataset would label correctly what percentage of values?

In [15]:
print(random_brodmann)

7.51 %


##### Using absolute rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [16]:
comparison_absolute_brodmann

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,1347,11588,1728,941,0.829,0.502,0.438,0.589
Relevance threshold = 1,1510,11019,2297,778,0.803,0.495,0.397,0.66
Relevance threshold = 0.975,1629,10640,2676,659,0.786,0.494,0.378,0.712
Relevance threshold = 0.95,1629,10637,2679,659,0.786,0.494,0.378,0.712
Relevance threshold = 0.925,1642,10576,2740,646,0.783,0.492,0.375,0.718
Relevance threshold = 0.9,1654,10538,2778,634,0.781,0.492,0.373,0.723
Relevance threshold = 0.8,1731,10243,3073,557,0.767,0.488,0.36,0.757
Relevance threshold = 0.7,1797,9930,3386,491,0.752,0.481,0.347,0.785
Relevance threshold = 0.6,1845,9640,3676,443,0.736,0.473,0.334,0.806
Relevance threshold = 0.5,1868,9461,3855,420,0.726,0.466,0.326,0.816


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [17]:
for i in added_absolute_brodmann: print(i)

With a relevance threshold of 1, 732 regions were marked as hypometabolic: 163 correctly (22.27 %) and 569 incorrectly
With a relevance threshold of 0.975, 1230 regions were marked as hypometabolic: 282 correctly (22.93 %) and 948 incorrectly
With a relevance threshold of 0.95, 1233 regions were marked as hypometabolic: 282 correctly (22.87 %) and 951 incorrectly
With a relevance threshold of 0.925, 1307 regions were marked as hypometabolic: 295 correctly (22.57 %) and 1012 incorrectly
With a relevance threshold of 0.9, 1357 regions were marked as hypometabolic: 307 correctly (22.62 %) and 1050 incorrectly
With a relevance threshold of 0.8, 1729 regions were marked as hypometabolic: 384 correctly (22.21 %) and 1345 incorrectly
With a relevance threshold of 0.7, 2108 regions were marked as hypometabolic: 450 correctly (21.35 %) and 1658 incorrectly
With a relevance threshold of 0.6, 2446 regions were marked as hypometabolic: 498 correctly (20.36 %) and 1948 incorrectly
With a relevance 

##### Using normalised rules

Down below we show the **metrics associated to both the predictions and later corrections through probabilistic rules**.

In [18]:
comparison_normalised_brodmann

Unnamed: 0,TP,TN,FP,FN,accuracy,f1,precision,recall
Prediction,1347,11588,1728,941,0.829,0.502,0.438,0.589
Relevance threshold = 0.9,1510,11016,2300,778,0.803,0.495,0.396,0.66
Relevance threshold = 0.8,1515,10970,2346,773,0.8,0.493,0.392,0.662
Relevance threshold = 0.7,1570,10831,2485,718,0.795,0.495,0.387,0.686
Relevance threshold = 0.6,1614,10707,2609,674,0.79,0.496,0.382,0.705
Relevance threshold = 0.5,1645,10569,2747,643,0.783,0.493,0.375,0.719


Down below we show the **effect that the application of rules have over the predicted brain map**.

In [19]:
for i in added_normalised_brodmann: print(i)

With a relevance threshold of 0.9, 735 regions were marked as hypometabolic: 163 correctly (22.18 %) and 572 incorrectly
With a relevance threshold of 0.8, 786 regions were marked as hypometabolic: 168 correctly (21.37 %) and 618 incorrectly
With a relevance threshold of 0.7, 980 regions were marked as hypometabolic: 223 correctly (22.76 %) and 757 incorrectly
With a relevance threshold of 0.6, 1148 regions were marked as hypometabolic: 267 correctly (23.26 %) and 881 incorrectly
With a relevance threshold of 0.5, 1317 regions were marked as hypometabolic: 298 correctly (22.63 %) and 1019 incorrectly


### Conclusions

In order to measure the validity of our rules, it is important to know what result should we expect if we did random changes from negative labels to positive labels in our predicted datasets. As we have seen, it would be expected that only 5-6% of values are correctly labelled. This low percentage can be explained by two facts. First, the percentage of positive labels is by itself very low in the real dataset (15%). Second, our models have already predicted correctly part of this positive labels, which mean that only very few remain available for correction. Taking this into consideration, it is evident that **our rules perform better than a random generator would do**. Even so, **incorrect modifications are more common than correct modifications**, which explains why accuracy and precision always fall. Recall, on the other hand, can only improve. 

**As we consider lower relevance thresholds**, we allow more rules and more modifications to be considered. When the number of modifications is low, they tend to be more accurate; when the number of modifications is high, they tend to be less accurate. The consequence is that **recall increase while accuracy and precision decrease**. **As for the two different type of rules**, normalised rules result in less changes being made, which makes sense because we are being more strict. However, **absolute rules usually perform better at similar magnitudes**. The explanation is that normalised rules are very useful from the clinical point of view, but a pure data driven mechanism is expected to yield better metrics.

So, why are we struggling with correctly modifying non-hipometabolism to hipometabolism? 

- **Rules are very simple, which brings uncertainty**: in the rule '*if A is hypometabolic, then B is hypometabolic with a probability of 0.6*' we know the rule can be applied in 60% of cases but we don't know which cases they are; therefore, we may apply it when not needed or not apply it when needed.
- **We are in a very difficult scenario**: many positive labels have already been predicted correctly with our models, so what is left are the difficult cases.
- **We are in a very unstable scenario**: the models predicted wrongly in some cases, so the rules are acting over uncertain prior knowledge. With neural networks, the prediction is worse than with genetic algorithm. This explains why the rules perform worst even when the scenario is easier (not so many positive labels have been predicted correctly).