# DATA ANALYSIS. SOCIODEMOGRAPHIC STATISTICS

In this notebook we obtain the main statistics (only numbers) of the number of participants, number of votes, etc. We also obtain the statistics of participation in each place and the socio-demographic statistics.


The **votes_ID.csv** dataset contains the sociodemographic data and the thermal survey answers for each participant (unique ID). Therefore, each row is a unique participant. The columns are the sociodemographic data (gender, age, newhbourhood knowledge and spent time in public space), and the answers to each of the surveys in which they participated (as well as the codes of such spaces).

We have to take into account that in some places, we collected data in two different days with the same participants (the first day was a "test"). However, there is a unique ID for each pack of bullets for the surveys. For this reason, it is possible that some participants are the same person but they were assigned a different ID in the two days. 


The **all_votes_processed.csv** dataset contains the data for each "vote". A "ticket" (vote) is the paper that contains the questions of the survey for a specific place. Each bullet has 5 questions about:

1. Thermal comfort while walking 
2. Thermal comfort
3. Thermal sensation
4. Wind
5. Sun

Therefore, we use the notation of TSV or TCV as a "thermal sensation vote" or "thermal confort vote", which corresponds a unique vote corresponding to a participant at a specific place. The tickets for the vote 1 (at the starting point) do not contain the question of "Thermal comfort walking" (wTCV), since it is done before start walking, therefore there are less wTCV.

Also, in order to statistically compare some of the answers of the surveys, we group some of the categories. In particular, we redefine the "age" categories with <12, 13-15, 16-24, 25-54 and >55. We also merge the categories "Coneixement parcial" and "No el conec" from the neighbourhood kwnoledge. And finally, for the time spent in public spaces, we merge "Menys de 30 min" and "30 min - 1h". 



### INDEX

1. Participants and bullets (votes). Statistics


2. Socio-demographic statistics

        2.1. Using participants
        2.2. Using votes (TSV/TCV)


3. Statistics per place

       3.1. Participants per place
       3.2. Votes (TSV/TCV) per place
       3.3. Walking Thermal Confort votes (wTCV) per place
       
       
4. Socio-demographic statistics per place

        4.1. Votes (TSV/TCV)
        4.2. Votes (wTCV)
        4.3. Participants

In [44]:
# Import necessary libraries and read the two data-sets (processed votes and IDs)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np


all_votes_processed = pd.read_csv('processed_surveys\\all_votes_18Janunary2025.csv')
votes_ID = pd.read_csv('processed_surveys\\votes_ID.csv')


## 1. Participants and tickets (votes). Statistics



In [46]:
print('The number of participants is:',len(votes_ID))
print('The number of tickets (TCV and TSV, votes per space and per participant) is:',len(all_votes_processed))

The number of participants is: 439
The number of tickets (TCV and TSV, votes per space and per participant) is: 1867


<br>

## 2. Socio-demographic statistics


### 2.1. Using participants

In [120]:
sociodem_columns = ['gender','age2','neighbourhood_knowledge2','spent_time_public_space2']

for column in sociodem_columns:
    print(column)
    print('-----')
    #print(votes_ID[column].value_counts())
    print(votes_ID[column].value_counts(normalize=True).mul(100).round(1)) # For percentage
    print('')
    print('#######################################')
    print('')

gender
-----
gender
Dona                    51.3
Home                    44.9
Prefereixo no dir-la     2.7
No-binària               1.1
Name: proportion, dtype: float64

#######################################

age2
-----
age2
13-15    41.2
<12      36.0
16-24    10.5
25-54     7.3
>55       5.0
Name: proportion, dtype: float64

#######################################

neighbourhood_knowledge2
-----
neighbourhood_knowledge2
Molt bo       34.4
Bo            28.2
Mitjà         20.5
No/parcial    16.9
Name: proportion, dtype: float64

#######################################

spent_time_public_space2
-----
spent_time_public_space2
2h - 4h    29.8
1h - 2h    26.0
>4h        24.8
<1h        19.4
Name: proportion, dtype: float64

#######################################



<br>

### 2.2. Using votes (TSV/TCV)

In [127]:
sociodem_columns = ['gender','age2','neighbourhood_knowledge2','spent_time2']

for column in sociodem_columns:
    print(column)
    print('-----')
    #print(all_votes_processed[column].value_counts())
    print(all_votes_processed[column].value_counts(normalize=True).mul(100).round(1)) # For percentage
    print('')
    print('#######################################')
    print('')

gender
-----
gender
Dona                    51.3
Home                    44.8
Prefereixo no dir-la     2.7
No-binària               1.1
Name: proportion, dtype: float64

#######################################

age2
-----
age2
13-15    39.4
<12      36.8
16-24    10.5
25-54     7.9
55-84     5.4
Name: proportion, dtype: float64

#######################################

neighbourhood_knowledge2
-----
neighbourhood_knowledge2
Molt bo       33.6
Bo            28.2
Mitjà         20.6
No/Parcial    17.6
Name: proportion, dtype: float64

#######################################

spent_time2
-----
spent_time2
2h - 4h    29.6
1h - 2h    25.6
>4h        24.5
<1h        20.3
Name: proportion, dtype: float64

#######################################



<br>

## 3. Statistics per place

### 3.1. Participants per place

In [180]:
#votes_ID['place'].value_counts()
votes_ID['place'].value_counts(normalize=True).mul(100).round(1)

place
L'Hospitalet de Llobregat                   38.0
Montcada i Reixac                           27.8
Barri Sant Pere / La Ribera - Barcelona     19.4
Barri Congrés / Els Indians  - Barcelona    10.7
Sant Vicençs dels Horts                      4.1
Name: proportion, dtype: float64

### 3.2. Votes (TSV/TCV) per place

In [181]:
#all_votes_processed['place'].value_counts()
all_votes_processed['place'].value_counts(normalize=True).mul(100).round(1)

place
L'Hospitalet de Llobregat                   39.3
Montcada i Reixac                           25.9
Barri Sant Pere / La Ribera - Barcelona     17.3
Barri Congrés / Els Indians  - Barcelona    12.7
Sant Vicençs dels Horts                      4.8
Name: proportion, dtype: float64

### 3.3. Walking Thermal Confort votes (wTCV) per place

In [27]:
for i,j in all_votes_processed.groupby(['place']):
    print(i[0])
    print('-------------')
    print(j['thermal_confort_walking'].value_counts().sum())
    print('')
    print('')

Barri Congrés / Els Indians  - Barcelona
-------------
192


Barri Sant Pere / La Ribera - Barcelona
-------------
242


L'Hospitalet de Llobregat
-------------
567


Montcada i Reixac
-------------
362


Sant Vicençs dels Horts
-------------
72




<br>

## 4. Sociodem statistics per place

### 4.1. Votes (TSV/TCV)

In [206]:
sociodem_columns = ['gender','age2','neighbourhood_knowledge2','spent_time2']
for column in sociodem_columns:
    
    print('############################################################')
    print(column)
    print('############################################################')
    print('')
    
    for i,j in all_votes_processed.groupby(['place']):

        print('place:', i[0])
        print('')
        #print(j[column].value_counts())
        print(j[column].value_counts(normalize=True).mul(100).round(1))
        print('-------------------------')
        print('')
    
    print('')
    print('')

############################################################
gender
############################################################

place: Barri Congrés / Els Indians  - Barcelona

gender
Home                    50.8
Dona                    45.0
Prefereixo no dir-la     2.1
No-binària               2.1
Name: proportion, dtype: float64
-------------------------

place: Barri Sant Pere / La Ribera - Barcelona

gender
Dona          64.7
Home          33.7
No-binària     1.5
Name: proportion, dtype: float64
-------------------------

place: L'Hospitalet de Llobregat

gender
Dona                    50.2
Home                    48.7
Prefereixo no dir-la     1.1
Name: proportion, dtype: float64
-------------------------

place: Montcada i Reixac

gender
Dona                    48.4
Home                    42.4
Prefereixo no dir-la     6.8
No-binària               2.3
Name: proportion, dtype: float64
-------------------------

place: Sant Vicençs dels Horts

gender
Home                    50.0
D

### 4.2. Votes (wTCV)
Using walking thermal comfort votes (there are less than TCV/TSV)

In [43]:
sociodem_columns = ['gender','age2','neighbourhood_knowledge2','spent_time2']

all_votes_processed_walking_tc = all_votes_processed.loc[all_votes_processed['thermal_confort_walking'].notnull()]


for column in sociodem_columns:
    
    print('############################################################')
    print(column)
    print('############################################################')
    print('')
    
    for i,j in all_votes_processed_walking_tc.groupby(['place']):

        print('place:', i[0])
        print('')
        #print(j[column].value_counts())
        print(j[column].value_counts(normalize=True).mul(100).round(1))
        print('-------------------------')
        print('')
    
    print('')
    print('')

############################################################
gender
############################################################

place: Barri Congrés / Els Indians  - Barcelona

gender
Home                    52.1
Dona                    43.8
No-binària               2.1
Prefereixo no dir-la     2.1
Name: proportion, dtype: float64
-------------------------

place: Barri Sant Pere / La Ribera - Barcelona

gender
Dona          65.7
Home          32.6
No-binària     1.7
Name: proportion, dtype: float64
-------------------------

place: L'Hospitalet de Llobregat

gender
Dona                    50.6
Home                    48.3
Prefereixo no dir-la     1.1
Name: proportion, dtype: float64
-------------------------

place: Montcada i Reixac

gender
Dona                    48.6
Home                    42.3
Prefereixo no dir-la     6.9
No-binària               2.2
Name: proportion, dtype: float64
-------------------------

place: Sant Vicençs dels Horts

gender
Home                    50.0
D

### 4.3. Participants

In [207]:
sociodem_columns = ['gender','age2','neighbourhood_knowledge2','spent_time_public_space2']
for column in sociodem_columns:
    
    print('############################################################')
    print(column)
    print('############################################################')
    print('')
    
    for i,j in votes_ID.groupby(['place']):

        print('place:', i[0])
        print('')
        print(j[column].value_counts())
        #print(j[column].value_counts(normalize=True).mul(100).round(1))
        print('-------------------------')
        print('')
    
    print('')
    print('')

############################################################
gender
############################################################

place: Barri Congrés / Els Indians  - Barcelona

gender
Dona                    24
Home                    21
Prefereixo no dir-la     1
No-binària               1
Name: count, dtype: int64
-------------------------

place: Barri Sant Pere / La Ribera - Barcelona

gender
Dona          53
Home          31
No-binària     1
Name: count, dtype: int64
-------------------------

place: L'Hospitalet de Llobregat

gender
Home                    84
Dona                    81
Prefereixo no dir-la     2
Name: count, dtype: int64
-------------------------

place: Montcada i Reixac

gender
Dona                    59
Home                    52
Prefereixo no dir-la     8
No-binària               3
Name: count, dtype: int64
-------------------------

place: Sant Vicençs dels Horts

gender
Home                    9
Dona                    8
Prefereixo no dir-la    1
Name: co