# Grouping measures
In this notebook, I show an initial insight into the measures and I group the different adaptation measures for further analysis. 

## 0. Data preparation

### 0.1 Import packages

In [1]:
import pandas as pd
import numpy as np

### 0.1 Import data

In [2]:
#Reading the merged_names file
survey_data = pd.read_csv('../datafiles/transformed_names.csv')

## 1. Measure Insights

As seen in Step0Exploratory Notebook, the measure SM8 has empty strings as values because this measure is only relevant for the respondents from the coast. Before a value_counts function is performed to calculate how many respondents have taken the measure, the column needs to be transformed to numeric. 

In [3]:
survey_data['R2_implementation_SM8'] = pd.to_numeric(survey_data['R2_implementation_SM8'], errors='coerce')

For the visualisation of the tables in my thesis, I have divided the measures according to structural and non-structural measures. 

In [4]:
selected_columns_structural=['R2_implementation_SM1', 'R2_implementation_SM2',  
          'R2_implementation_SM3' , 'R2_implementation_SM4', 'R2_implementation_SM5','R2_implementation_SM6','R2_implementation_SM7','R2_implementation_SM8']

selected_columns_non_structural=['R2_implementation_NM1', 'R2_implementation_NM2', 'R2_implementation_NM3','R2_implementation_NM4','R2_implementation_NM5',
         'R2_implementation_NM6','R2_implementation_NM7','R2_implementation_NM8','R2_implementation_NM9','R2_implementation_NM10',
         'R2_implementation_NM11']

all_measures=selected_columns_structural + selected_columns_non_structural

Before I group the measures I want to invert the measures for better interpretation. Thus, the value of 1 (implemented the measure) becomes 4. 

In [5]:
#Inverting the answers to the measures
for measure in all_measures:
    survey_data[measure]= 5 - survey_data[measure]

Now I can create tables based on the value counts. 

In [6]:
#Creating a value counts dictionary
value_counts_dict = {}
#Value counting for each measure
for col in survey_data[selected_columns_structural].columns:
    counts = survey_data[col].value_counts()
    value_counts_dict[col] = counts

# Combining into a single dataframe and align on the index
result = pd.concat(value_counts_dict, axis=1)

# Replacing NaNs with 0 and set type to integer
result = result.fillna(0).astype(int)

# Sorting index for better table view
result = result.sort_index()

#Removing the entire measure name from the columns
result.columns = [col[-3:] for col in result.columns]

#Printing for Latex result
print(result.to_latex(index=True, float_format=lambda x: f"{x:.2f}".rstrip('0').rstrip('.')))

result

\begin{tabular}{lrrrrrrrr}
\toprule
 & SM1 & SM2 & SM3 & SM4 & SM5 & SM6 & SM7 & SM8 \\
\midrule
1.000000 & 677 & 679 & 679 & 639 & 654 & 671 & 679 & 245 \\
2.000000 & 23 & 25 & 26 & 38 & 32 & 37 & 27 & 4 \\
3.000000 & 25 & 23 & 22 & 22 & 39 & 21 & 26 & 1 \\
4.000000 & 18 & 16 & 16 & 44 & 18 & 14 & 11 & 6 \\
\bottomrule
\end{tabular}



Unnamed: 0,SM1,SM2,SM3,SM4,SM5,SM6,SM7,SM8
1.0,677,679,679,639,654,671,679,245
2.0,23,25,26,38,32,37,27,4
3.0,25,23,22,22,39,21,26,1
4.0,18,16,16,44,18,14,11,6


I have done the same thing for non-structural measures

In [7]:
#Same thing for non-structural measures
value_counts_dict_non = {}
for col in survey_data[selected_columns_non_structural].columns:
    counts_non = survey_data[col].value_counts()
    value_counts_dict_non[col] = counts_non

result_non = pd.concat(value_counts_dict_non, axis=1)

result_non = result_non.fillna(0).astype(int)

result_non = result_non.sort_index()

result_non.columns = [col[-3:] for col in result_non.columns]

print(result_non.to_latex(index=True, float_format=lambda x: f"{x:.2f}".rstrip('0').rstrip('.')))

result_non

\begin{tabular}{lrrrrrrrrrrr}
\toprule
 & NM1 & NM2 & NM3 & NM4 & NM5 & NM6 & NM7 & NM8 & NM9 & M10 & M11 \\
\midrule
1 & 182 & 605 & 622 & 459 & 427 & 584 & 283 & 505 & 538 & 385 & 357 \\
2 & 58 & 64 & 68 & 89 & 101 & 60 & 68 & 83 & 75 & 78 & 61 \\
3 & 122 & 48 & 27 & 101 & 128 & 61 & 125 & 116 & 92 & 160 & 113 \\
4 & 381 & 26 & 26 & 94 & 87 & 38 & 267 & 39 & 38 & 120 & 212 \\
\bottomrule
\end{tabular}



Unnamed: 0,NM1,NM2,NM3,NM4,NM5,NM6,NM7,NM8,NM9,M10,M11
1,182,605,622,459,427,584,283,505,538,385,357
2,58,64,68,89,101,60,68,83,75,78,61
3,122,48,27,101,128,61,125,116,92,160,113
4,381,26,26,94,87,38,267,39,38,120,212


##  2. Grouping of the measures

The measures are grouped according to the following categories:

**Non-structural measures**:
- Informative: NM4,NM5,NM8, NM9
- Preventive low effort: NM1, NM7, NM10, NM11
- Preventive high effort: NM2, NM6, NM3

**Structural measures**:
- Elevation: SM1
- Wetproofing: SM2,SM3, SM4  
- Dry-proofing: SM5, SM6, SM7
- Barrier: SM8

In [8]:
# Subsetting the different columns according to the group measures: non-structural
informative=['R2_implementation_NM4', 'R2_implementation_NM5','R2_implementation_NM8','R2_implementation_NM9']
preventive_low=['R2_implementation_NM1', 'R2_implementation_NM7','R2_implementation_NM10',
         'R2_implementation_NM11']
preventive_high=['R2_implementation_NM2','R2_implementation_NM6', 'R2_implementation_NM3']

In [9]:
# Subsetting the different columns according to the group measures: structural
elevation=['R2_implementation_SM1']
wet_proofing=[ 'R2_implementation_SM2', 'R2_implementation_SM3', 'R2_implementation_SM4']
dry_proofing=['R2_implementation_SM5','R2_implementation_SM6', 'R2_implementation_SM7']
barrier=['R2_implementation_SM8']

In [10]:
#Taking the maximum of the inverted grouped columns based on assumption that if respondents take one of the 
#measures, they would take the others as well. 
survey_data['Informative non-structural']= (survey_data[informative].max(axis=1)).astype(int)
survey_data['Preventive-low non-structural']= (survey_data[preventive_low].max(axis=1)).astype(int)
survey_data['Preventive-high non-structural']= (survey_data[preventive_high].max(axis=1)).astype(int)
survey_data['Elevation structural']= (survey_data[elevation].max(axis=1)).astype(int)
survey_data['Wet-proofing structural']= (survey_data[wet_proofing].max(axis=1)).astype(int)
survey_data['Dry-proofing structural']= (survey_data[dry_proofing].max(axis=1)).astype(int)

## 3. Seperating Measures in Done and Plan
I also want to seperate the aforementioned groups into having taken the measure (value 4) or planning to take the measure in the near future (value 3)

In [11]:
#Creating dichotomous variables for people that have taken the measure (1) or not (0)
survey_data['Done informative'] = (survey_data['Informative non-structural'] == 4).astype(int)
survey_data['Done preventive low'] = (survey_data['Preventive-low non-structural'] == 4).astype(int)
survey_data['Done preventive high'] = (survey_data['Preventive-high non-structural'] == 4).astype(int)
survey_data['Done elevation'] = (survey_data['Elevation structural'] == 4).astype(int)
survey_data['Done wet-proofing'] = (survey_data['Wet-proofing structural'] == 4).astype(int)
survey_data['Done dry-proofing'] = (survey_data['Dry-proofing structural'] == 4).astype(int)

In [12]:
#Creating dichotomous variables for people that are planning to take the measure (1) or not (0)
survey_data['Plan informative'] = (survey_data['Informative non-structural'] == 3).astype(int)
survey_data['Plan preventive low'] = (survey_data['Preventive-low non-structural'] == 3).astype(int)
survey_data['Plan preventive high'] = (survey_data['Preventive-high non-structural'] == 3).astype(int)
survey_data['Plan elevation'] = (survey_data['Elevation structural'] == 3).astype(int)
survey_data['Plan wet-proofing'] = (survey_data['Wet-proofing structural'] == 3).astype(int)
survey_data['Plan dry-proofing'] = (survey_data['Dry-proofing structural'] == 3).astype(int)

Moreover, for the clustering I do an additional grouping: seperating the done and plan measures into structural and non-structural. I will do this below. The assumption has been taken that if a respondent takes a structural measure, he/she would take other measures in the same group as well. 

In [13]:
#New groups of measures
meas_done_non=['Done informative', 'Done preventive low', 'Done preventive high']
meas_done_struc=['Done elevation', 'Done wet-proofing',  'Done dry-proofing']
meas_plan_non=['Plan informative', 'Plan preventive low', 'Plan preventive high' ]
meas_plan_struc=['Plan elevation', 'Plan wet-proofing', 'Plan dry-proofing'  ]

In [14]:
#Aggregating the groups
survey_data['Any measure done non-structural']=survey_data[meas_done_non].eq(1).any(axis=1)
survey_data['Any measure done structural']=survey_data[meas_done_struc].eq(1).any(axis=1)
survey_data['Any measure plan non-structural']=survey_data[meas_plan_non].eq(1).any(axis=1)
survey_data['Any measure plan structural']=survey_data[meas_plan_struc].eq(1).any(axis=1)

I have created four more columns with seperate grouping for the clustering analysis. Now I can write this dataframe to a csv file for further use. 

In [15]:
#Writing the new survey data into a new csv file for further analysis
survey_data.to_csv('../datafiles/grouped_survey.csv', index=False)

# 4. Initial insights into the new grouped measures

I want to check how many respondents have taken or not taken measures. For that, I have grouped the variables in "Done" variables and "Plan" variables.

In [16]:
#Defining measures
meas_done=['Done informative', 'Done preventive low', 'Done preventive high',  'Done elevation', 'Done wet-proofing',  'Done dry-proofing']
meas_plan=['Plan informative', 'Plan preventive low', 'Plan preventive high', 'Plan elevation', 'Plan wet-proofing', 'Plan dry-proofing'  ]

In [17]:
#Finding how many respondents have implemented all or any kind of measure
meas_done_total= survey_data[meas_done].eq(1).any(axis=1).sum()
all_done_total=survey_data[meas_done].eq(1).all(axis=1).sum()
print('Any kind of done measure: ', meas_done_total)
print('All of done measure: ', all_done_total)

Any kind of done measure:  494
All of done measure:  3


In [18]:
#Finding how many respondents intend to implement all or any kind of measure in the near future
meas_plan_total= survey_data[meas_plan].eq(1).any(axis=1).sum()
all_plan_total= survey_data[meas_plan].eq(1).all(axis=1).sum()
print('Any kind of done measure: ', meas_plan_total)
print('All of done measure: ', all_done_total)

Any kind of done measure:  260
All of done measure:  3


In [19]:
#Finding how many respondents have implemented per group measure
for col in meas_done: 
    sum_meas_done= survey_data[col].sum()
    print(sum_meas_done)

157
462
69
18
54
32


In [20]:
#Finding how many respondents intend to implement per group measure in the future
for col in meas_plan: 
    sum_meas_plan= survey_data[col].sum()
    print(sum_meas_plan)

138
110
84
25
32
38
