# Health Services

## 2. Health Services Data

In [370]:
import pandas as pd

### 1.1. Load Datasets

Health Services:

- Hospitals & CAPs: *Hospitales y servicios de atención primaria de la ciudad de Barcelona - Open Data BCN*

- Pharmacies: *Farmacias de la ciudad de Barcelona - Open Data BCN*, *Catálogo de farmacias de Cataluña*

- Day Centers: *Centros de día para las personas mayores de la ciudad de Barcelona - Open Data BCN*

- Ederly Residences: *Residencias para las personas mayores de la ciudad de Barcelona - Open Data BCN*

- Indicators: *Barcelona. La ciudad al día. → relación indicadores de ámbito de la salud - Open Data BCN*, *Indicadores de salud de la Central de Resultados: atención primaria*

- Other Equipments: *Listado de equipamientos de sanidad de la ciudad de Barcelona - Open Data BCN*, *Establecimientos sanitarios autorizados en Cataluña*, *Clínicas Dentales autorizadas en Cataluña*

- Drugs/Specialists/Insurances/Sociosanitario?

In [371]:
# Remove Warnings
import warnings
warnings.filterwarnings("ignore")

In [372]:
import chardet

def detect_encoding(file_path):
    with open(file_path, 'rb') as f:
        result = chardet.detect(f.read())
    return result['encoding']

In [373]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [374]:
# Usage
#encoded = detect_encoding(path_files + hospitals_caps_file)
#print(encoded)

In [375]:
# Dataset Files

hospitals_caps_file = "opendatabcn_sanitat_hospitals-i-centres-atencio-primaria.csv"
pharmacies_bcn_file = "opendatabcn_sanitat_farmacies.csv"
pharmacies_cat_file = "Cat_leg_de_farm_cies_de_Catalunya_20240403.csv"
day_centers_file = "opendatabcn_serveis-socials_centres-dia-gent-gran.csv"
residences_file = "opendatabcn_serveis-socials_residencies-gent-gran.csv"
health_equip_bcn_file = "opendatabcn_llista-equipaments_sanitat-csv.csv"
health_equip_cat_file = "Establiments_sanitaris_autoritzats_a_Catalunya_20240403.csv"
dental_cat_file = "Cl_niques_dentals_autoritzades_a_Catalunya_20240403.csv"
indicators_bcn_file = "2024_laciutataldia.csv"
# falta: indicators_CAP_file = ""

path_files = "/content/drive/MyDrive/TFG/2. Ejecución/CODE/HealthDatasets/"


# Read CSVs
hospitals_caps_df = pd.read_csv(path_files + hospitals_caps_file, encoding='utf-16')
pharmacies_bcn_df = pd.read_csv(path_files + pharmacies_bcn_file, encoding='utf-16')
pharmacies_cat_df = pd.read_csv(path_files + pharmacies_cat_file, encoding='utf-8')
day_centers_df = pd.read_csv(path_files + day_centers_file, encoding='utf-16')
residences_df = pd.read_csv(path_files + residences_file, encoding='utf-16')
health_equip_bcn_df = pd.read_csv(path_files + health_equip_bcn_file, encoding='utf-16')
health_equip_cat_df = pd.read_csv(path_files + health_equip_cat_file, encoding='utf-8')
dental_cat_df = pd.read_csv(path_files + dental_cat_file, encoding='utf-8')
indicators_bcn_df = pd.read_csv(path_files + indicators_bcn_file, encoding='utf-8')
# falta: indicators_CAP_df = pd.read_csv(path_files + indicators_CAP_file, encoding='utf-8')

### 1.2. Dataset Analysis

In [376]:
# Functions Definition
def null_function (df):
  # Extract columns without nulls
  non_null_cols = df.columns[df.notnull().all()]

  # Extract columns with nulls
  null_cols = df.columns[df.isnull().any()]

  # Get count and percentage of null values for columns with nulls
  null_count = df[null_cols].isnull().sum()
  null_percentage = (null_count / len(df)) * 100

  # Combine information into a DataFrame
  info_df = pd.DataFrame({
      'Column': null_count.index,
      'Null Count': null_count.values,
      'Null Percentage': null_percentage.values
  })
  return non_null_cols, info_df

def has_duplicates(df, col_name):
  if df[col_name].duplicated().any():
    print("There's duplicate values in column", col_name)
  else:
    print("No duplicate values in column", col_name)

#### 1) Hospitals & CAPs

In [377]:
display(hospitals_caps_df.info())
print("\nSTATISTICS:\n")
#display(hospitals_caps_df.describe())
non_null_cols, info_df = null_function(hospitals_caps_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 142 entries, 0 to 141
Data columns (total 39 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   register_id                    142 non-null    object 
 1   name                           142 non-null    object 
 2   institution_id                 17 non-null     float64
 3   institution_name               17 non-null     object 
 4   created                        142 non-null    object 
 5   modified                       142 non-null    object 
 6   addresses_roadtype_id          0 non-null      float64
 7   addresses_roadtype_name        0 non-null      float64
 8   addresses_road_id              141 non-null    float64
 9   addresses_road_name            141 non-null    object 
 10  addresses_start_street_number  141 non-null    float64
 11  addresses_end_street_number    13 non-null     float64
 12  addresses_neighborhood_id      142 non-null    int

None


STATISTICS:


Columns with no null values: Index(['register_id', 'name', 'created', 'modified',
       'addresses_neighborhood_id', 'addresses_neighborhood_name',
       'addresses_district_id', 'addresses_district_name', 'addresses_town',
       'addresses_main_address', 'secondary_filters_id',
       'secondary_filters_name', 'secondary_filters_fullpath',
       'secondary_filters_tree', 'secondary_filters_asia_id',
       'geo_epgs_25831_x', 'geo_epgs_25831_y', 'geo_epgs_4326_lat',
       'geo_epgs_4326_lon'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,institution_id,125,88.028169
1,institution_name,125,88.028169
2,addresses_roadtype_id,142,100.0
3,addresses_roadtype_name,142,100.0
4,addresses_road_id,1,0.704225
5,addresses_road_name,1,0.704225
6,addresses_start_street_number,1,0.704225
7,addresses_end_street_number,129,90.84507
8,addresses_zip_code,1,0.704225
9,addresses_type,142,100.0


*Seeing the amount of NULL values for each column, we will drop: institution_id, institution_name, addresses_roadtype_id, addresses_roadtype_name	, addresses_end_street_number, addresses_type, values_description, estimated_dates, start_date, end_date. Also, the columns created and modified won't be used for the moment.*

In [378]:
# Drop columns with almost all null values
columns_to_drop_hospitals_caps = ['created', 'modified', 'institution_id', 'institution_name', 'addresses_roadtype_id', 'addresses_roadtype_name' ,
                                  'addresses_end_street_number', 'addresses_type', 'values_description', 'estimated_dates', 'start_date', 'end_date']

hospitals_caps_df.drop(columns=columns_to_drop_hospitals_caps, inplace=True)

# Useless: Identifiador Via, Indicador si l'adreça, Valor de l'atribut identificador, Valor de l'atribut, Valors d'atribut Destacat,
# Arbres d'equivalència Identificador, Arbres d'equivalència Ruta, Arbres d'equivalència, Arbres d'equivalència  Id Asia
#useless_cols = ['addresses_road_id', 'addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
#                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']
useless_cols = ['addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']
hospitals_caps_df.drop(columns=useless_cols, inplace=True)
hospitals_caps_df.head(10)

Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4584550.0,41.409469,2.165559
1,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4583269.0,41.39767,2.129935
2,﻿75990049258,Clínica Mi Tres Torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospitals i clíniques,427179.337491,4583445.0,41.399249,2.12886
3,﻿93056132443,Centre d'Atenció Primària Montnegre,219100.0,C Montnegre,21.0,19,les Corts,4,Les Corts,8029.0,BARCELONA,933632965,CAPs,427992.37187,4582146.0,41.387618,2.13874
4,﻿75990060288,Hospital de Barcelona,144601.0,Av Diagonal,660.0,19,les Corts,4,Les Corts,8034.0,BARCELONA,932542400,Hospitals i clíniques,427228.518238,4582406.0,41.389892,2.129574
5,﻿75990027608,Hospital Universitari Quirón Dexeus,285002.0,C Sabino Arana,5.0,20,la Maternitat i Sant Ramon,4,Les Corts,8028.0,BARCELONA,932274747,Hospitals i clíniques,426932.169229,4581895.0,41.385265,2.126091
6,﻿92086031184,Centre d'Atenció Primària Ciutat Meridiana,300503.0,C Sant Feliu de Codines,2.0,54,Torre Baró,8,Nou Barris,8033.0,BARCELONA,933508889,CAPs,431637.809801,4590193.0,41.460414,2.181428
7,﻿92086031184,Centre d'Atenció Primària Ciutat Meridiana,300503.0,C Sant Feliu de Codines,2.0,54,Torre Baró,8,Nou Barris,8033.0,BARCELONA,933508889,Centres urgències (CUAPs),431637.809801,4590193.0,41.460414,2.181428
8,﻿75990029810,Clínica Corachan,51500.0,C Buïgas,19.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932545800,Hospitals i clíniques,427330.726158,4582808.0,41.393523,2.130748
9,﻿99400187136,Clinica del Son Estivill,288601.0,C Rosales,9.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932121354,Hospitals i clíniques,427155.105668,4583714.0,41.401672,2.128538


In [379]:
# Observe New Info
#display(hospitals_caps_df.info())

# Check if duplicates
has_duplicates(hospitals_caps_df, 'register_id')

There's duplicate values in column register_id


In [380]:
# Print unique values for characteristics column before merging stuff
print("Unique Characteristics:\n")
print(hospitals_caps_df['secondary_filters_name'].unique(), "\n")

# Remove weird characteristic (None)

# Aggregate secondary_filters_name (caract.) by register_id and replace column
aggregated_sec_filters = hospitals_caps_df.groupby('register_id')['secondary_filters_name'].transform(lambda x: ', '.join(x))
hospitals_caps_df['secondary_filters_name'] = aggregated_sec_filters
#display(schools_reglat_df.head(10))

# Drop duplicated register_id
hospitals_caps_df.drop_duplicates(subset='register_id', inplace=True)
display(hospitals_caps_df.head(10))

Unique Characteristics:

['CAPs' 'Hospitals i clíniques' 'Centres urgències (CUAPs)'
 'Centres de dia gent gran' 'Residències gent gran'] 



Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4584550.0,41.409469,2.165559
1,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4583269.0,41.39767,2.129935
2,﻿75990049258,Clínica Mi Tres Torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospitals i clíniques,427179.337491,4583445.0,41.399249,2.12886
3,﻿93056132443,Centre d'Atenció Primària Montnegre,219100.0,C Montnegre,21.0,19,les Corts,4,Les Corts,8029.0,BARCELONA,933632965,CAPs,427992.37187,4582146.0,41.387618,2.13874
4,﻿75990060288,Hospital de Barcelona,144601.0,Av Diagonal,660.0,19,les Corts,4,Les Corts,8034.0,BARCELONA,932542400,Hospitals i clíniques,427228.518238,4582406.0,41.389892,2.129574
5,﻿75990027608,Hospital Universitari Quirón Dexeus,285002.0,C Sabino Arana,5.0,20,la Maternitat i Sant Ramon,4,Les Corts,8028.0,BARCELONA,932274747,Hospitals i clíniques,426932.169229,4581895.0,41.385265,2.126091
6,﻿92086031184,Centre d'Atenció Primària Ciutat Meridiana,300503.0,C Sant Feliu de Codines,2.0,54,Torre Baró,8,Nou Barris,8033.0,BARCELONA,933508889,"CAPs, Centres urgències (CUAPs)",431637.809801,4590193.0,41.460414,2.181428
8,﻿75990029810,Clínica Corachan,51500.0,C Buïgas,19.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932545800,Hospitals i clíniques,427330.726158,4582808.0,41.393523,2.130748
9,﻿99400187136,Clinica del Son Estivill,288601.0,C Rosales,9.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932121354,Hospitals i clíniques,427155.105668,4583714.0,41.401672,2.128538
10,﻿99400417756,Cliníca Axisclínic *Comte d'Urgell,349706.0,C Comte d'Urgell,168.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8036.0,BARCELONA,934546947,Hospitals i clíniques,429152.144616,4582051.0,41.386871,2.152621


In [381]:
# Rename Columns
hospitals_caps_df = hospitals_caps_df.rename(columns={"addresses_road_id": "Street ID", "name": "Name", "addresses_road_name": "Road Name", "addresses_start_street_number": "Street Number",
                                                      "addresses_neighborhood_id": "Neighborhood ID", "addresses_neighborhood_name": "Neighborhood Name",
                                                      "addresses_district_id": "District ID", "addresses_district_name": "District Name",
                                                      "addresses_zip_code": "ZipCode","addresses_town": "Town",
                                                      "values_value": "Phone Number", "secondary_filters_name": "Service Type",
                                                      "geo_epgs_4326_lat": "Latitude", "geo_epgs_4326_lon": "Longitude"})
display(hospitals_caps_df.head(10))

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4584550.0,41.409469,2.165559
1,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4583269.0,41.39767,2.129935
2,﻿75990049258,Clínica Mi Tres Torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospitals i clíniques,427179.337491,4583445.0,41.399249,2.12886
3,﻿93056132443,Centre d'Atenció Primària Montnegre,219100.0,C Montnegre,21.0,19,les Corts,4,Les Corts,8029.0,BARCELONA,933632965,CAPs,427992.37187,4582146.0,41.387618,2.13874
4,﻿75990060288,Hospital de Barcelona,144601.0,Av Diagonal,660.0,19,les Corts,4,Les Corts,8034.0,BARCELONA,932542400,Hospitals i clíniques,427228.518238,4582406.0,41.389892,2.129574
5,﻿75990027608,Hospital Universitari Quirón Dexeus,285002.0,C Sabino Arana,5.0,20,la Maternitat i Sant Ramon,4,Les Corts,8028.0,BARCELONA,932274747,Hospitals i clíniques,426932.169229,4581895.0,41.385265,2.126091
6,﻿92086031184,Centre d'Atenció Primària Ciutat Meridiana,300503.0,C Sant Feliu de Codines,2.0,54,Torre Baró,8,Nou Barris,8033.0,BARCELONA,933508889,"CAPs, Centres urgències (CUAPs)",431637.809801,4590193.0,41.460414,2.181428
8,﻿75990029810,Clínica Corachan,51500.0,C Buïgas,19.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932545800,Hospitals i clíniques,427330.726158,4582808.0,41.393523,2.130748
9,﻿99400187136,Clinica del Son Estivill,288601.0,C Rosales,9.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932121354,Hospitals i clíniques,427155.105668,4583714.0,41.401672,2.128538
10,﻿99400417756,Cliníca Axisclínic *Comte d'Urgell,349706.0,C Comte d'Urgell,168.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8036.0,BARCELONA,934546947,Hospitals i clíniques,429152.144616,4582051.0,41.386871,2.152621


### 2) Pharmacies

Use pharmacies_bcn_df and pharmacies_cat_df, then check if common columns and merge datasets

A- PHARMACIES BARCELONA

In [382]:
# PHARMACIES BARCELONA
display(pharmacies_bcn_df.info())
print("\nSTATISTICS:\n")
#display(pharmacies_bcn_df.describe())
non_null_cols, info_df = null_function(pharmacies_bcn_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1133 entries, 0 to 1132
Data columns (total 39 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   register_id                    1133 non-null   object 
 1   name                           1133 non-null   object 
 2   institution_id                 0 non-null      float64
 3   institution_name               0 non-null      float64
 4   created                        1133 non-null   object 
 5   modified                       1133 non-null   object 
 6   addresses_roadtype_id          0 non-null      float64
 7   addresses_roadtype_name        0 non-null      float64
 8   addresses_road_id              1133 non-null   int64  
 9   addresses_road_name            1133 non-null   object 
 10  addresses_start_street_number  1133 non-null   int64  
 11  addresses_end_street_number    33 non-null     float64
 12  addresses_neighborhood_id      1133 non-null   i

None


STATISTICS:


Columns with no null values: Index(['register_id', 'name', 'created', 'modified', 'addresses_road_id',
       'addresses_road_name', 'addresses_start_street_number',
       'addresses_neighborhood_id', 'addresses_neighborhood_name',
       'addresses_district_id', 'addresses_district_name',
       'addresses_zip_code', 'addresses_town', 'addresses_main_address',
       'values_id', 'values_attribute_id', 'values_category',
       'values_attribute_name', 'values_value', 'values_outstanding',
       'secondary_filters_id', 'secondary_filters_name',
       'secondary_filters_fullpath', 'secondary_filters_tree',
       'secondary_filters_asia_id', 'geo_epgs_25831_x', 'geo_epgs_25831_y',
       'geo_epgs_4326_lat', 'geo_epgs_4326_lon'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,institution_id,1133,100.0
1,institution_name,1133,100.0
2,addresses_roadtype_id,1133,100.0
3,addresses_roadtype_name,1133,100.0
4,addresses_end_street_number,1100,97.087379
5,addresses_type,1133,100.0
6,values_description,1104,97.440424
7,estimated_dates,1133,100.0
8,start_date,1133,100.0
9,end_date,1133,100.0


*Seeing the amount of NULL values for each column, we will drop: institution_id, institution_name, addresses_roadtype_id, addresses_roadtype_name, addresses_end_street_number, addresses_type, values_description, estimated_dates, start_date, end_date. Also, the columns created and modified won't be used for the moment.*

In [383]:
# Drop columns with almost all null values
columns_to_drop_pharmacies_bcn = ['created', 'modified', 'institution_id', 'institution_name', 'addresses_roadtype_id', 'addresses_roadtype_name' ,
                                  'addresses_end_street_number', 'addresses_type', 'values_description', 'estimated_dates', 'start_date', 'end_date']

pharmacies_bcn_df.drop(columns=columns_to_drop_pharmacies_bcn, inplace=True)

# Useless: Identifiador Via, Indicador si l'adreça, Valor de l'atribut identificador, Valor de l'atribut, Valors d'atribut Destacat,
# Arbres d'equivalència Identificador, Arbres d'equivalència Ruta, Arbres d'equivalència, Arbres d'equivalència  Id Asia
#useless_cols = ['addresses_road_id', 'addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
#                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']
useless_cols = ['addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']

pharmacies_bcn_df.drop(columns=useless_cols, inplace=True)
pharmacies_bcn_df.head(10)

Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿75990017391,Farmàcia Hereus Lavilla,23200,Carrer d'Argullós,49,52,la Prosperitat,8,Nou Barris,8016,Barcelona,933540304,Farmàcies,431711.951527,4587997.0,41.440645,2.182564
1,﻿75990017537,Farmàcia Franquesa Massó,23403,Aribau,18,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933174684,Farmàcies,429952.32885,4581992.0,41.386411,2.162198
2,﻿75990017537,Farmàcia Franquesa Massó,23403,Aribau,18,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933174684,Farmàcies servei de 9 a 22 h tot l'any,429952.32885,4581992.0,41.386411,2.162198
3,﻿75990048243,Farmàcia Bañeres Merinero,100800,Carrer de la Diputació,336,7,la Dreta de l'Eixample,2,Eixample,8009,Barcelona,93 232 81 65,Farmàcies,431011.609336,4583053.0,41.396057,2.174745
4,﻿75990048357,Farmàcia Sales Llavià,100800,Diputació,391,7,la Dreta de l'Eixample,2,Eixample,8013,BARCELONA,932454391,Farmàcies,431136.071095,4583205.0,41.397434,2.176216
5,﻿75990064521,Farmàcia Pujol Pereita,148307,Pg Gràcia,129,31,la Vila de Gràcia,6,Gràcia,8008,BARCELONA,932181993,Farmàcies,429586.276147,4583242.0,41.39763,2.157675
6,﻿75990027321,Farmàcia Marta Homs Balló,44403,C Bolívia,19,66,el Parc i la Llacuna del Poblenou,10,Sant Martí,8018,BARCELONA,933007615,Farmàcies,432210.448565,4583653.0,41.401557,2.189017
7,﻿75990032969,Farmàcia Forcada Llamusí,700032,Ptge Andalusia,14,16,la Bordeta,3,Sants-Montjuïc,8014,BARCELONA,933325745,Farmàcies,427532.206744,4580282.0,41.370794,2.133459
8,﻿75990052487,Farmàcia Rodriguez Martinez,200000,C Marquesa,3,4,"Sant Pere, Santa Caterina i la Ribera",1,Ciutat Vella,8003,BARCELONA,933199143,Farmàcies,431857.013416,4581596.0,41.383003,2.185021
9,﻿75990038139,Farmàcia Sanchís Foret,74506,Rbla Catalunya,117,7,la Dreta de l'Eixample,2,Eixample,8008,BARCELONA,932375556,Farmàcies,429653.058572,4582887.0,41.394445,2.158515


In [384]:
# Observe New Info
#display(pharmacies_bcn_df.info())

# Check if duplicates
has_duplicates(pharmacies_bcn_df, 'register_id')

There's duplicate values in column register_id


In [385]:
# Print unique values for characteristics column before merging stuff
print("Unique Characteristics:\n")
print(pharmacies_bcn_df['secondary_filters_name'].unique(), "\n")

# Remove weird characteristic (None)

# Aggregate secondary_filters_name (caract.) by register_id and replace column
aggregated_sec_filters = pharmacies_bcn_df.groupby('register_id')['secondary_filters_name'].transform(lambda x: ', '.join(x))
pharmacies_bcn_df['secondary_filters_name'] = aggregated_sec_filters
#display(schools_no_reglat_df.head(10))

# Drop duplicated register_id
pharmacies_bcn_df.drop_duplicates(subset='register_id', inplace=True)
display(pharmacies_bcn_df.head(10))

Unique Characteristics:

['Farmàcies' "Farmàcies servei de 9 a 22 h tot l'any"
 'Farmàcies permanents'] 



Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿75990017391,Farmàcia Hereus Lavilla,23200,Carrer d'Argullós,49,52,la Prosperitat,8,Nou Barris,8016,Barcelona,933540304,Farmàcies,431711.951527,4587997.0,41.440645,2.182564
1,﻿75990017537,Farmàcia Franquesa Massó,23403,Aribau,18,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933174684,"Farmàcies, Farmàcies servei de 9 a 22 h tot l'any",429952.32885,4581992.0,41.386411,2.162198
3,﻿75990048243,Farmàcia Bañeres Merinero,100800,Carrer de la Diputació,336,7,la Dreta de l'Eixample,2,Eixample,8009,Barcelona,93 232 81 65,Farmàcies,431011.609336,4583053.0,41.396057,2.174745
4,﻿75990048357,Farmàcia Sales Llavià,100800,Diputació,391,7,la Dreta de l'Eixample,2,Eixample,8013,BARCELONA,932454391,Farmàcies,431136.071095,4583205.0,41.397434,2.176216
5,﻿75990064521,Farmàcia Pujol Pereita,148307,Pg Gràcia,129,31,la Vila de Gràcia,6,Gràcia,8008,BARCELONA,932181993,Farmàcies,429586.276147,4583242.0,41.39763,2.157675
6,﻿75990027321,Farmàcia Marta Homs Balló,44403,C Bolívia,19,66,el Parc i la Llacuna del Poblenou,10,Sant Martí,8018,BARCELONA,933007615,Farmàcies,432210.448565,4583653.0,41.401557,2.189017
7,﻿75990032969,Farmàcia Forcada Llamusí,700032,Ptge Andalusia,14,16,la Bordeta,3,Sants-Montjuïc,8014,BARCELONA,933325745,Farmàcies,427532.206744,4580282.0,41.370794,2.133459
8,﻿75990052487,Farmàcia Rodriguez Martinez,200000,C Marquesa,3,4,"Sant Pere, Santa Caterina i la Ribera",1,Ciutat Vella,8003,BARCELONA,933199143,Farmàcies,431857.013416,4581596.0,41.383003,2.185021
9,﻿75990038139,Farmàcia Sanchís Foret,74506,Rbla Catalunya,117,7,la Dreta de l'Eixample,2,Eixample,8008,BARCELONA,932375556,Farmàcies,429653.058572,4582887.0,41.394445,2.158515
10,﻿75990095731,Farmàcia Rubio Moreno,238400,C Padilla,378,33,el Baix Guinardó,7,Horta-Guinardó,8025,BARCELONA,934558245,Farmàcies,430573.810227,4584815.0,41.411885,2.169306


In [386]:
# Rename Columns
pharmacies_bcn_df = pharmacies_bcn_df.rename(columns={"addresses_road_id": "Street ID", "name": "Name", "addresses_road_name": "Road Name", "addresses_start_street_number": "Street Number",
                                                      "addresses_neighborhood_id": "Neighborhood ID", "addresses_neighborhood_name": "Neighborhood Name",
                                                      "addresses_district_id": "District ID", "addresses_district_name": "District Name",
                                                      "addresses_zip_code": "ZipCode","addresses_town": "Town",
                                                      "values_value": "Phone Number", "secondary_filters_name": "Service Type",
                                                      "geo_epgs_4326_lat": "Latitude", "geo_epgs_4326_lon": "Longitude"})
display(pharmacies_bcn_df.head(10))

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿75990017391,Farmàcia Hereus Lavilla,23200,Carrer d'Argullós,49,52,la Prosperitat,8,Nou Barris,8016,Barcelona,933540304,Farmàcies,431711.951527,4587997.0,41.440645,2.182564
1,﻿75990017537,Farmàcia Franquesa Massó,23403,Aribau,18,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933174684,"Farmàcies, Farmàcies servei de 9 a 22 h tot l'any",429952.32885,4581992.0,41.386411,2.162198
3,﻿75990048243,Farmàcia Bañeres Merinero,100800,Carrer de la Diputació,336,7,la Dreta de l'Eixample,2,Eixample,8009,Barcelona,93 232 81 65,Farmàcies,431011.609336,4583053.0,41.396057,2.174745
4,﻿75990048357,Farmàcia Sales Llavià,100800,Diputació,391,7,la Dreta de l'Eixample,2,Eixample,8013,BARCELONA,932454391,Farmàcies,431136.071095,4583205.0,41.397434,2.176216
5,﻿75990064521,Farmàcia Pujol Pereita,148307,Pg Gràcia,129,31,la Vila de Gràcia,6,Gràcia,8008,BARCELONA,932181993,Farmàcies,429586.276147,4583242.0,41.39763,2.157675
6,﻿75990027321,Farmàcia Marta Homs Balló,44403,C Bolívia,19,66,el Parc i la Llacuna del Poblenou,10,Sant Martí,8018,BARCELONA,933007615,Farmàcies,432210.448565,4583653.0,41.401557,2.189017
7,﻿75990032969,Farmàcia Forcada Llamusí,700032,Ptge Andalusia,14,16,la Bordeta,3,Sants-Montjuïc,8014,BARCELONA,933325745,Farmàcies,427532.206744,4580282.0,41.370794,2.133459
8,﻿75990052487,Farmàcia Rodriguez Martinez,200000,C Marquesa,3,4,"Sant Pere, Santa Caterina i la Ribera",1,Ciutat Vella,8003,BARCELONA,933199143,Farmàcies,431857.013416,4581596.0,41.383003,2.185021
9,﻿75990038139,Farmàcia Sanchís Foret,74506,Rbla Catalunya,117,7,la Dreta de l'Eixample,2,Eixample,8008,BARCELONA,932375556,Farmàcies,429653.058572,4582887.0,41.394445,2.158515
10,﻿75990095731,Farmàcia Rubio Moreno,238400,C Padilla,378,33,el Baix Guinardó,7,Horta-Guinardó,8025,BARCELONA,934558245,Farmàcies,430573.810227,4584815.0,41.411885,2.169306


### NO EJECUTAR CAT

B- PHARMACIES CATALUNYA

In [294]:
# PHARMACIES CATALUNYA
display(pharmacies_cat_df.info())
print("\nSTATISTICS:\n")
#display(pharmacies_bcn_df.describe())
non_null_cols, info_df = null_function(pharmacies_cat_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3236 entries, 0 to 3235
Data columns (total 10 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   CODI FARMACIA  3236 non-null   object 
 1   NOM FARMACIA   3236 non-null   object 
 2   CODI ABS       3202 non-null   float64
 3   NOM ABS        3202 non-null   object 
 4   CODI MUNICIPI  3236 non-null   int64  
 5   NOM MUNICIPI   3236 non-null   object 
 6   CODI POSTAL    3236 non-null   int64  
 7   TIPUS VIA      3236 non-null   object 
 8   NOM VIA        3236 non-null   object 
 9   NUMERO VIA     3236 non-null   object 
dtypes: float64(1), int64(2), object(7)
memory usage: 252.9+ KB


None


STATISTICS:


Columns with no null values: Index(['CODI FARMACIA', 'NOM FARMACIA', 'CODI MUNICIPI', 'NOM MUNICIPI',
       'CODI POSTAL', 'TIPUS VIA', 'NOM VIA', 'NUMERO VIA'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,CODI ABS,34,1.05068
1,NOM ABS,34,1.05068


*Seeing the amount of NULL values for each column, no column will be dropped.*

In [295]:
# Observe New Info
#display(pharmacies_cat_df.info())

# Check if duplicates
has_duplicates(pharmacies_cat_df, 'CODI FARMACIA')

No duplicate values in column CODI FARMACIA


In [296]:
display(pharmacies_cat_df)

Unnamed: 0,CODI FARMACIA,NOM FARMACIA,CODI ABS,NOM ABS,CODI MUNICIPI,NOM MUNICIPI,CODI POSTAL,TIPUS VIA,NOM VIA,NUMERO VIA
0,0F0801960,"BRUNA REVERTER, M.ANGELS",76.0,BARCELONA 10-F,8019,BARCELONA,8025,CR,DOS DE MAIG,288
1,F08000025,"BERTRAN AMBROS, ROSA MARIA",128.0,BAIX BERGUEDÀ,8175,PUIG-REIG,8692,CR,SANT JORDI,11
2,F08000032,"ABRIL GARCIA, MARIA",9.0,ARENYS DE MAR,8006,ARENYS DE MAR,8350,RI,BISBE POL,72
3,F08000040,"RIGOLA GARROFE, DOLÇA",9.0,ARENYS DE MAR,8006,ARENYS DE MAR,8350,PL,ESGLESIA,3
4,F08000057,"MIAS NAVARRO, MIREIA",9.0,ARENYS DE MAR,8007,ARENYS DE MUNT,8358,RB,SANT MARTI,56
...,...,...,...,...,...,...,...,...,...,...
3231,F43711170,JANËS CASALS MARIA,357.0,CALAFELL,43037,CALAFELL,43820,PS,Maritim Sant Joan de Deu,238
3232,F43718867,"DÍAZ MARTÍ, MARÍA",91.0,CAMBRILS,43038,CAMBRILS,43850,AV,Diputació,12
3233,F43718874,"MARTÍNEZ NAVAS, MARTA",359.0,CUBELLES-CUNIT,43051,CUNIT,43881,CR,Puig i Cadafalch,3
3234,F43721965,Gas Aixendrí Carlos,13.0,EL VENDRELL,43163,EL VENDRELL,43700,CR,Narcís Monturiol,16-18


In [297]:
# Rename Columns
pharmacies_cat_df = pharmacies_cat_df.rename(columns={"CODI FARMACIA": "Id", "NOM FARMACIA": "Name", "CODI ABS": "ABS ID",
                                                      "NOM ABS": "ABS Name", "CODI MUNICIPI": "Town ID", "NOM MUNICIPI": "Town",
                                                      "CODI POSTAL": "ZipCode", "TIPUS VIA": "Roadtype", "NOM VIA": "Road Name",
                                                      "NUMERO VIA": "Street Number"})
display(pharmacies_cat_df.head(10))

Unnamed: 0,Id,Name,ABS ID,ABS Name,Town ID,Town,ZipCode,Roadtype,Road Name,Street Number
0,0F0801960,"BRUNA REVERTER, M.ANGELS",76.0,BARCELONA 10-F,8019,BARCELONA,8025,CR,DOS DE MAIG,288
1,F08000025,"BERTRAN AMBROS, ROSA MARIA",128.0,BAIX BERGUEDÀ,8175,PUIG-REIG,8692,CR,SANT JORDI,11
2,F08000032,"ABRIL GARCIA, MARIA",9.0,ARENYS DE MAR,8006,ARENYS DE MAR,8350,RI,BISBE POL,72
3,F08000040,"RIGOLA GARROFE, DOLÇA",9.0,ARENYS DE MAR,8006,ARENYS DE MAR,8350,PL,ESGLESIA,3
4,F08000057,"MIAS NAVARRO, MIREIA",9.0,ARENYS DE MAR,8007,ARENYS DE MUNT,8358,RB,SANT MARTI,56
5,F08000064,"GUILLEN VIDAL, M NURIA",10.0,ARGENTONA,8009,ARGENTONA,8310,CR,GRAN,22
6,F08000071,"BOIXAREU CARRERA,M.ANGELS",11.0,ARTÉS,8010,ARTÉS,8271,CR,ROCAFORT,26
7,F08000089,"MIMO CALLIS, LLUIS",121.0,LA GARRIGA,8014,AIGUAFREDA,8591,CT,RIBES,31
8,F08000096,"JOFRESA PRATS, E/PORTILLO SAAVEDRA, P",273.0,BADALONA 2,8015,BADALONA,8911,CR,PRIM,156
9,F08000106,"JIMENEZ GONZALEZ, ROSA M./MARTINEZ ROBLES, MAN...",273.0,BADALONA 2,8015,BADALONA,8911,CR,CANONGE BARANERA,60


Create New Dataset just containing those pharmacies in BARCELONA

In [298]:
pharmacies_cat_bcn_df = pharmacies_cat_df[pharmacies_cat_df['Town'] == 'BARCELONA']
display(pharmacies_cat_bcn_df)

Unnamed: 0,Id,Name,ABS ID,ABS Name,Town ID,Town,ZipCode,Roadtype,Road Name,Street Number
0,0F0801960,"BRUNA REVERTER, M.ANGELS",76.0,BARCELONA 10-F,8019,BARCELONA,8025,CR,DOS DE MAIG,288
248,F08002664,"GONZALEZ SOLA,FRANCESC M",75.0,BARCELONA 10-E,8019,BARCELONA,8026,CR,MALLORCA,519
249,F08002671,"MAURI MARTINEZ, ROSA MA",32.0,BARCELONA 3-A,8019,BARCELONA,8004,CR,SALVA,33
250,F08002689,"SANZ JORNET, M. ALEJANDRA",35.0,BARCELONA 3-D,8019,BARCELONA,8014,CR,CREU COBERTA,95
251,F08002696,"DOMINGO ESGLEYES, MIRIAM",49.0,BARCELONA 6-C,8019,BARCELONA,8012,CR,GRAN DE GRACIA,237
...,...,...,...,...,...,...,...,...,...,...
1864,F08020014,"GONZALEZ MERCADER, MARGARITA",358.0,BARCELONA 6-E,8019,BARCELONA,8024,AV,POMPEU FABRA,2
1869,F08020060,"MORILLO BASAS, LOURDES",74.0,BARCELONA 10-D,8019,BARCELONA,8020,CR,PARAGUAY,25
1878,F08020173,"BRIO GUIBERNAU, MONTSERRAT",62.0,BARCELONA 8-F,8019,BARCELONA,8042,CR,ANTONIO MACHADO,18
1914,F08020624,"MARTINEZ PARDO, ISABEL",358.0,BARCELONA 6-E,8019,BARCELONA,8024,TS,DE DALT,110


Now add the 'Service Type' column and specify its a pharmacy

In [299]:
pharmacies_cat_bcn_df['Service Type'] = 'Farmàcia'
display(pharmacies_cat_bcn_df)

Unnamed: 0,Id,Name,ABS ID,ABS Name,Town ID,Town,ZipCode,Roadtype,Road Name,Street Number,Service Type
0,0F0801960,"BRUNA REVERTER, M.ANGELS",76.0,BARCELONA 10-F,8019,BARCELONA,8025,CR,DOS DE MAIG,288,Farmàcia
248,F08002664,"GONZALEZ SOLA,FRANCESC M",75.0,BARCELONA 10-E,8019,BARCELONA,8026,CR,MALLORCA,519,Farmàcia
249,F08002671,"MAURI MARTINEZ, ROSA MA",32.0,BARCELONA 3-A,8019,BARCELONA,8004,CR,SALVA,33,Farmàcia
250,F08002689,"SANZ JORNET, M. ALEJANDRA",35.0,BARCELONA 3-D,8019,BARCELONA,8014,CR,CREU COBERTA,95,Farmàcia
251,F08002696,"DOMINGO ESGLEYES, MIRIAM",49.0,BARCELONA 6-C,8019,BARCELONA,8012,CR,GRAN DE GRACIA,237,Farmàcia
...,...,...,...,...,...,...,...,...,...,...,...
1864,F08020014,"GONZALEZ MERCADER, MARGARITA",358.0,BARCELONA 6-E,8019,BARCELONA,8024,AV,POMPEU FABRA,2,Farmàcia
1869,F08020060,"MORILLO BASAS, LOURDES",74.0,BARCELONA 10-D,8019,BARCELONA,8020,CR,PARAGUAY,25,Farmàcia
1878,F08020173,"BRIO GUIBERNAU, MONTSERRAT",62.0,BARCELONA 8-F,8019,BARCELONA,8042,CR,ANTONIO MACHADO,18,Farmàcia
1914,F08020624,"MARTINEZ PARDO, ISABEL",358.0,BARCELONA 6-E,8019,BARCELONA,8024,TS,DE DALT,110,Farmàcia


Merge Both Datasets & Rename Columns

In [300]:
#pharmacies_merged_df = pd.merge(pharmacies_bcn_df, pharmacies_cat_bcn_df, on='Name', how='outer')
#display(pharmacies_merged_df.head(10))

### 3) Day Centers

In [387]:
display(day_centers_df.info())
print("\nSTATISTICS:\n")
#display(day_centers_df.describe())
non_null_cols, info_df = null_function(day_centers_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 153 entries, 0 to 152
Data columns (total 39 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   register_id                    153 non-null    object 
 1   name                           153 non-null    object 
 2   institution_id                 114 non-null    float64
 3   institution_name               114 non-null    object 
 4   created                        153 non-null    object 
 5   modified                       153 non-null    object 
 6   addresses_roadtype_id          0 non-null      float64
 7   addresses_roadtype_name        0 non-null      float64
 8   addresses_road_id              153 non-null    int64  
 9   addresses_road_name            153 non-null    object 
 10  addresses_start_street_number  153 non-null    int64  
 11  addresses_end_street_number    11 non-null     float64
 12  addresses_neighborhood_id      153 non-null    int

None


STATISTICS:


Columns with no null values: Index(['register_id', 'name', 'created', 'modified', 'addresses_road_id',
       'addresses_road_name', 'addresses_start_street_number',
       'addresses_neighborhood_id', 'addresses_neighborhood_name',
       'addresses_district_id', 'addresses_district_name',
       'addresses_zip_code', 'addresses_town', 'addresses_main_address',
       'values_id', 'values_attribute_id', 'values_category',
       'values_attribute_name', 'values_value', 'values_outstanding',
       'secondary_filters_id', 'secondary_filters_name',
       'secondary_filters_fullpath', 'secondary_filters_tree',
       'secondary_filters_asia_id', 'geo_epgs_25831_x', 'geo_epgs_25831_y',
       'geo_epgs_4326_lat', 'geo_epgs_4326_lon'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,institution_id,39,25.490196
1,institution_name,39,25.490196
2,addresses_roadtype_id,153,100.0
3,addresses_roadtype_name,153,100.0
4,addresses_end_street_number,142,92.810458
5,addresses_type,153,100.0
6,values_description,153,100.0
7,estimated_dates,153,100.0
8,start_date,153,100.0
9,end_date,153,100.0


*Seeing the amount of NULL values for each column, we will drop: institution_id, institution_name, addresses_roadtype_id, addresses_roadtype_name, addresses_end_street_number, addresses_type, values_description, estimated_dates, start_date, end_date. Also, the columns created and modified won't be used for the moment.*

In [388]:
# Drop columns with almost all null values
columns_to_drop_day_centers = ['created', 'modified', 'institution_id', 'institution_name', 'addresses_roadtype_id', 'addresses_roadtype_name' ,
                                  'addresses_end_street_number', 'addresses_type', 'values_description', 'estimated_dates', 'start_date', 'end_date']

day_centers_df.drop(columns=columns_to_drop_day_centers, inplace=True)

# Useless: Identifiador Via, Indicador si l'adreça, Valor de l'atribut identificador, Valor de l'atribut, Valors d'atribut Destacat,
# Arbres d'equivalència Identificador, Arbres d'equivalència Ruta, Arbres d'equivalència, Arbres d'equivalència  Id Asia
#useless_cols = ['addresses_road_id', 'addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
#                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']
useless_cols = ['addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']

day_centers_df.drop(columns=useless_cols, inplace=True)
day_centers_df.head(10)

Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿99400472632,Centre de Dia,268003,C Provença,514,6,la Sagrada Família,2,Eixample,8025,BARCELONA,934334180,Centres de dia gent gran,431326.451023,4584296.0,41.407278,2.17837
1,﻿99400472636,Centre de Dia,335100,Pl Tetuan,2,7,la Dreta de l'Eixample,2,Eixample,8010,BARCELONA,932323915,Centres de dia gent gran,431135.148181,4582831.0,41.394069,2.176248
2,﻿99400473710,Centre de Dia per a Gent Gran Verdum,361508,C Viladrosa,86,51,Verdun,8,Nou Barris,8042,BARCELONA,932765995,Centres de dia gent gran,431218.828248,4588311.0,41.443434,2.176626
3,﻿99400622969,Centre de Dia,226604,C Natzaret,16,39,Sant Genís dels Agudells,7,Horta-Guinardó,8035,BARCELONA,934343095,Centres de dia gent gran,428079.700192,4586023.0,41.422547,2.139324
4,﻿99400620504,Centre de Dia,89004,C Consell de Cent,210,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933239692,Centres de dia gent gran,429600.651297,4581859.0,41.385177,2.158007
5,﻿99400622743,Centre de Dia,362205,C Vilana,10,25,Sant Gervasi - la Bonanova,5,Sarrià-Sant Gervasi,8022,BARCELONA,934171922,Centres de dia gent gran,427171.996418,4584115.0,41.405278,2.128692
6,﻿99400622844,Centre de Dia,325005,C Sicília,402,32,el Camp d'en Grassot i Gràcia Nova,6,Gràcia,8025,BARCELONA,934570984,Centres de dia gent gran,430398.403054,4584214.0,41.406461,2.167277
7,﻿99400109729,Centre de Dia La Torre Setze,339305,C Torre,16,27,el Putxet i el Farró,5,Sarrià-Sant Gervasi,8006,BARCELONA,932372642,Centres de dia gent gran,428853.709103,4584010.0,41.404482,2.148822
8,﻿99400622701,Centre de Dia,248800,C Pedrell,111,36,la Font d'en Fargues,7,Horta-Guinardó,8032,BARCELONA,933570055,Centres de dia gent gran,430055.406981,4586166.0,41.424009,2.162948
9,﻿99400622702,Centre de Dia,187207,Av Madrid,210,18,Sants,3,Sants-Montjuïc,8014,BARCELONA,933301135,Centres de dia gent gran,427634.918496,4581437.0,41.381205,2.13455


In [389]:
# Observe New Info
#display(day_centers_df.info())

# Check if duplicates
has_duplicates(day_centers_df, 'register_id')

There's duplicate values in column register_id


In [390]:
# Print unique values for characteristics column before merging stuff
print("Unique Characteristics:\n")
print(day_centers_df['secondary_filters_name'].unique(), "\n")

# Remove weird characteristic (None: Centres de dia gent gran, Residències gent gran, Hospitals i clíniques)

# Aggregate secondary_filters_name (caract.) by register_id and replace column
aggregated_sec_filters = day_centers_df.groupby('register_id')['secondary_filters_name'].transform(lambda x: ', '.join(x))
day_centers_df['secondary_filters_name'] = aggregated_sec_filters
#display(education_equip_df.head(10))

# Drop duplicated register_id
day_centers_df.drop_duplicates(subset='register_id', inplace=True)
display(day_centers_df.head(10))


Unique Characteristics:

['Centres de dia gent gran' 'Residències gent gran'
 'Hospitals i clíniques'] 



Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿99400472632,Centre de Dia,268003,C Provença,514,6,la Sagrada Família,2,Eixample,8025,BARCELONA,934334180,Centres de dia gent gran,431326.451023,4584296.0,41.407278,2.17837
1,﻿99400472636,Centre de Dia,335100,Pl Tetuan,2,7,la Dreta de l'Eixample,2,Eixample,8010,BARCELONA,932323915,Centres de dia gent gran,431135.148181,4582831.0,41.394069,2.176248
2,﻿99400473710,Centre de Dia per a Gent Gran Verdum,361508,C Viladrosa,86,51,Verdun,8,Nou Barris,8042,BARCELONA,932765995,Centres de dia gent gran,431218.828248,4588311.0,41.443434,2.176626
3,﻿99400622969,Centre de Dia,226604,C Natzaret,16,39,Sant Genís dels Agudells,7,Horta-Guinardó,8035,BARCELONA,934343095,Centres de dia gent gran,428079.700192,4586023.0,41.422547,2.139324
4,﻿99400620504,Centre de Dia,89004,C Consell de Cent,210,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933239692,Centres de dia gent gran,429600.651297,4581859.0,41.385177,2.158007
5,﻿99400622743,Centre de Dia,362205,C Vilana,10,25,Sant Gervasi - la Bonanova,5,Sarrià-Sant Gervasi,8022,BARCELONA,934171922,Centres de dia gent gran,427171.996418,4584115.0,41.405278,2.128692
6,﻿99400622844,Centre de Dia,325005,C Sicília,402,32,el Camp d'en Grassot i Gràcia Nova,6,Gràcia,8025,BARCELONA,934570984,Centres de dia gent gran,430398.403054,4584214.0,41.406461,2.167277
7,﻿99400109729,Centre de Dia La Torre Setze,339305,C Torre,16,27,el Putxet i el Farró,5,Sarrià-Sant Gervasi,8006,BARCELONA,932372642,Centres de dia gent gran,428853.709103,4584010.0,41.404482,2.148822
8,﻿99400622701,Centre de Dia,248800,C Pedrell,111,36,la Font d'en Fargues,7,Horta-Guinardó,8032,BARCELONA,933570055,Centres de dia gent gran,430055.406981,4586166.0,41.424009,2.162948
9,﻿99400622702,Centre de Dia,187207,Av Madrid,210,18,Sants,3,Sants-Montjuïc,8014,BARCELONA,933301135,Centres de dia gent gran,427634.918496,4581437.0,41.381205,2.13455


In [391]:
# Rename Columns
day_centers_df = day_centers_df.rename(columns={"addresses_road_id": "Street ID", "name": "Name", "addresses_road_name": "Road Name", "addresses_start_street_number": "Street Number",
                                                      "addresses_neighborhood_id": "Neighborhood ID", "addresses_neighborhood_name": "Neighborhood Name",
                                                      "addresses_district_id": "District ID", "addresses_district_name": "District Name",
                                                      "addresses_zip_code": "ZipCode","addresses_town": "Town",
                                                      "values_value": "Phone Number", "secondary_filters_name": "Service Type",
                                                      "geo_epgs_4326_lat": "Latitude", "geo_epgs_4326_lon": "Longitude"})
display(day_centers_df.head(10))

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿99400472632,Centre de Dia,268003,C Provença,514,6,la Sagrada Família,2,Eixample,8025,BARCELONA,934334180,Centres de dia gent gran,431326.451023,4584296.0,41.407278,2.17837
1,﻿99400472636,Centre de Dia,335100,Pl Tetuan,2,7,la Dreta de l'Eixample,2,Eixample,8010,BARCELONA,932323915,Centres de dia gent gran,431135.148181,4582831.0,41.394069,2.176248
2,﻿99400473710,Centre de Dia per a Gent Gran Verdum,361508,C Viladrosa,86,51,Verdun,8,Nou Barris,8042,BARCELONA,932765995,Centres de dia gent gran,431218.828248,4588311.0,41.443434,2.176626
3,﻿99400622969,Centre de Dia,226604,C Natzaret,16,39,Sant Genís dels Agudells,7,Horta-Guinardó,8035,BARCELONA,934343095,Centres de dia gent gran,428079.700192,4586023.0,41.422547,2.139324
4,﻿99400620504,Centre de Dia,89004,C Consell de Cent,210,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933239692,Centres de dia gent gran,429600.651297,4581859.0,41.385177,2.158007
5,﻿99400622743,Centre de Dia,362205,C Vilana,10,25,Sant Gervasi - la Bonanova,5,Sarrià-Sant Gervasi,8022,BARCELONA,934171922,Centres de dia gent gran,427171.996418,4584115.0,41.405278,2.128692
6,﻿99400622844,Centre de Dia,325005,C Sicília,402,32,el Camp d'en Grassot i Gràcia Nova,6,Gràcia,8025,BARCELONA,934570984,Centres de dia gent gran,430398.403054,4584214.0,41.406461,2.167277
7,﻿99400109729,Centre de Dia La Torre Setze,339305,C Torre,16,27,el Putxet i el Farró,5,Sarrià-Sant Gervasi,8006,BARCELONA,932372642,Centres de dia gent gran,428853.709103,4584010.0,41.404482,2.148822
8,﻿99400622701,Centre de Dia,248800,C Pedrell,111,36,la Font d'en Fargues,7,Horta-Guinardó,8032,BARCELONA,933570055,Centres de dia gent gran,430055.406981,4586166.0,41.424009,2.162948
9,﻿99400622702,Centre de Dia,187207,Av Madrid,210,18,Sants,3,Sants-Montjuïc,8014,BARCELONA,933301135,Centres de dia gent gran,427634.918496,4581437.0,41.381205,2.13455


### 4) Elderly People Residences

In [392]:
display(residences_df.info())
print("\nSTATISTICS:\n")
#display(residences_df.describe())
non_null_cols, info_df = null_function(residences_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 267 entries, 0 to 266
Data columns (total 39 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   register_id                    267 non-null    object 
 1   name                           267 non-null    object 
 2   institution_id                 9 non-null      float64
 3   institution_name               9 non-null      object 
 4   created                        267 non-null    object 
 5   modified                       267 non-null    object 
 6   addresses_roadtype_id          0 non-null      float64
 7   addresses_roadtype_name        0 non-null      float64
 8   addresses_road_id              267 non-null    int64  
 9   addresses_road_name            267 non-null    object 
 10  addresses_start_street_number  267 non-null    int64  
 11  addresses_end_street_number    5 non-null      float64
 12  addresses_neighborhood_id      267 non-null    int

None


STATISTICS:


Columns with no null values: Index(['register_id', 'name', 'created', 'modified', 'addresses_road_id',
       'addresses_road_name', 'addresses_start_street_number',
       'addresses_neighborhood_id', 'addresses_neighborhood_name',
       'addresses_district_id', 'addresses_district_name',
       'addresses_zip_code', 'addresses_town', 'addresses_main_address',
       'secondary_filters_id', 'secondary_filters_name',
       'secondary_filters_fullpath', 'secondary_filters_tree',
       'secondary_filters_asia_id', 'geo_epgs_25831_x', 'geo_epgs_25831_y',
       'geo_epgs_4326_lat', 'geo_epgs_4326_lon'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,institution_id,258,96.629213
1,institution_name,258,96.629213
2,addresses_roadtype_id,267,100.0
3,addresses_roadtype_name,267,100.0
4,addresses_end_street_number,262,98.127341
5,addresses_type,267,100.0
6,values_id,2,0.749064
7,values_attribute_id,2,0.749064
8,values_category,2,0.749064
9,values_attribute_name,2,0.749064


*Seeing the amount of NULL values for each column, we will drop: institution_id, institution_name, addresses_roadtype_id, addresses_roadtype_name, addresses_end_street_number, addresses_type, values_description, estimated_dates, start_date, end_date. Also, the columns created and modified won't be used for the moment.*

In [393]:
# Drop columns with almost all null values
columns_to_drop_residences = ['created', 'modified', 'institution_id', 'institution_name', 'addresses_roadtype_id', 'addresses_roadtype_name' ,
                                  'addresses_end_street_number', 'addresses_type', 'values_description', 'estimated_dates', 'start_date', 'end_date']

residences_df.drop(columns=columns_to_drop_residences, inplace=True)

# Useless: Identifiador Via, Indicador si l'adreça, Valor de l'atribut identificador, Valor de l'atribut, Valors d'atribut Destacat,
# Arbres d'equivalència Identificador, Arbres d'equivalència Ruta, Arbres d'equivalència, Arbres d'equivalència  Id Asia
#useless_cols = ['addresses_road_id', 'addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
#                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']
useless_cols = ['addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']

residences_df.drop(columns=useless_cols, inplace=True)
residences_df.head(10)

Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿75990033549,Residència Assistida Francisco Darder,66100,C Cardener,49,30,la Salut,6,Gràcia,8024,BARCELONA,932853558,Residències gent gran,429558.99312,4584637.0,41.410198,2.157186
1,﻿98334111453,Residència Assistida per a Gent Gran Venero,356100,C Venero,4,68,el Poblenou,10,Sant Martí,8005,BARCELONA,933002976,Residències gent gran,433292.910052,4583631.0,41.401453,2.201968
2,﻿75990056448,Residència Assistida Vil·la Salut,132806,Francesc Alegre,26,34,Can Baró,7,Horta-Guinardó,8024,BARCELONA,932108787,Residències gent gran,430290.528859,4585389.0,41.417036,2.165851
3,﻿75990706118,Residència Assistida per a Gent Gran Las Violetas,169409,G.V. Corts Catalanes,573,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933238180,Residències gent gran,429941.00599,4581848.0,41.385113,2.162079
4,﻿98334113506,Residència Assistida Ama-Lur,350308,C València,281,7,la Dreta de l'Eixample,2,Eixample,8009,BARCELONA,934872875,Residències gent gran,430275.326517,4582925.0,41.394839,2.165953
5,﻿98334114740,Residència Assistida per a Gent Gran Urquinaon...,232902,Pl Urquinaona,1,7,la Dreta de l'Eixample,2,Eixample,8010,BARCELONA,933014499,Residències gent gran,430814.061413,4582287.0,41.389138,2.17247
6,﻿98336085708,Residència Assistida per a Gent Gran Royal Llar,169409,G.V. Corts Catalanes,695,5,el Fort Pienc,2,Eixample,8013,BARCELONA,932311118,Residències gent gran,431290.256604,4583169.0,41.397125,2.178065
7,﻿98336090941,Residència Assistida per a Gent Gran Virreina,109203,C Encarnació,31,31,la Vila de Gràcia,6,Gràcia,8012,BARCELONA,932198161,Residències gent gran,429725.930597,4584166.0,41.405968,2.159238
8,﻿99400001162,Residència Assistida per a Gent Gran La Vostra...,169409,G.V. Corts Catalanes,871,65,el Clot,10,Sant Martí,8018,BARCELONA,933208455,Residències gent gran,432469.816911,4584322.0,41.407612,2.192045
9,﻿99400468618,Residència Assistida per a Gent Gran Ballesol ...,11200,C Almogàvers,32,66,el Parc i la Llacuna del Poblenou,10,Sant Martí,8018,BARCELONA,933569383,Residències gent gran,431809.992654,4582677.0,41.392737,2.184337


In [394]:
# Observe New Info
#display(residences_df.info())

# Check if duplicates
has_duplicates(residences_df, 'register_id')

There's duplicate values in column register_id


In [395]:
# Print unique values for characteristics column before merging stuff
print("Unique Characteristics:\n")
print(residences_df['secondary_filters_name'].unique(), "\n")

# Remove weird characteristic (TNone: Residències gent gran, Centres de dia gent gran, Hospitals i clíniques)

# Aggregate secondary_filters_name (caract.) by register_id and replace column
aggregated_sec_filters = residences_df.groupby('register_id')['secondary_filters_name'].transform(lambda x: ', '.join(x))
residences_df['secondary_filters_name'] = aggregated_sec_filters
#display(education_equip_df.head(10))

# Drop duplicated register_id
residences_df.drop_duplicates(subset='register_id', inplace=True)
display(residences_df.head(10))

Unique Characteristics:

['Residències gent gran' 'Centres de dia gent gran'
 'Hospitals i clíniques'] 



Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿75990033549,Residència Assistida Francisco Darder,66100,C Cardener,49,30,la Salut,6,Gràcia,8024,BARCELONA,932853558,Residències gent gran,429558.99312,4584637.0,41.410198,2.157186
1,﻿98334111453,Residència Assistida per a Gent Gran Venero,356100,C Venero,4,68,el Poblenou,10,Sant Martí,8005,BARCELONA,933002976,Residències gent gran,433292.910052,4583631.0,41.401453,2.201968
2,﻿75990056448,Residència Assistida Vil·la Salut,132806,Francesc Alegre,26,34,Can Baró,7,Horta-Guinardó,8024,BARCELONA,932108787,Residències gent gran,430290.528859,4585389.0,41.417036,2.165851
3,﻿75990706118,Residència Assistida per a Gent Gran Las Violetas,169409,G.V. Corts Catalanes,573,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933238180,Residències gent gran,429941.00599,4581848.0,41.385113,2.162079
4,﻿98334113506,Residència Assistida Ama-Lur,350308,C València,281,7,la Dreta de l'Eixample,2,Eixample,8009,BARCELONA,934872875,Residències gent gran,430275.326517,4582925.0,41.394839,2.165953
5,﻿98334114740,Residència Assistida per a Gent Gran Urquinaon...,232902,Pl Urquinaona,1,7,la Dreta de l'Eixample,2,Eixample,8010,BARCELONA,933014499,Residències gent gran,430814.061413,4582287.0,41.389138,2.17247
6,﻿98336085708,Residència Assistida per a Gent Gran Royal Llar,169409,G.V. Corts Catalanes,695,5,el Fort Pienc,2,Eixample,8013,BARCELONA,932311118,Residències gent gran,431290.256604,4583169.0,41.397125,2.178065
7,﻿98336090941,Residència Assistida per a Gent Gran Virreina,109203,C Encarnació,31,31,la Vila de Gràcia,6,Gràcia,8012,BARCELONA,932198161,Residències gent gran,429725.930597,4584166.0,41.405968,2.159238
8,﻿99400001162,Residència Assistida per a Gent Gran La Vostra...,169409,G.V. Corts Catalanes,871,65,el Clot,10,Sant Martí,8018,BARCELONA,933208455,Residències gent gran,432469.816911,4584322.0,41.407612,2.192045
9,﻿99400468618,Residència Assistida per a Gent Gran Ballesol ...,11200,C Almogàvers,32,66,el Parc i la Llacuna del Poblenou,10,Sant Martí,8018,BARCELONA,933569383,Residències gent gran,431809.992654,4582677.0,41.392737,2.184337


In [396]:
# Rename Columns
residences_df = residences_df.rename(columns={"addresses_road_id": "Street ID", "name": "Name", "addresses_road_name": "Road Name", "addresses_start_street_number": "Street Number",
                                                      "addresses_neighborhood_id": "Neighborhood ID", "addresses_neighborhood_name": "Neighborhood Name",
                                                      "addresses_district_id": "District ID", "addresses_district_name": "District Name",
                                                      "addresses_zip_code": "ZipCode","addresses_town": "Town",
                                                      "values_value": "Phone Number", "secondary_filters_name": "Service Type",
                                                      "geo_epgs_4326_lat": "Latitude", "geo_epgs_4326_lon": "Longitude"})
display(residences_df.head(10))

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿75990033549,Residència Assistida Francisco Darder,66100,C Cardener,49,30,la Salut,6,Gràcia,8024,BARCELONA,932853558,Residències gent gran,429558.99312,4584637.0,41.410198,2.157186
1,﻿98334111453,Residència Assistida per a Gent Gran Venero,356100,C Venero,4,68,el Poblenou,10,Sant Martí,8005,BARCELONA,933002976,Residències gent gran,433292.910052,4583631.0,41.401453,2.201968
2,﻿75990056448,Residència Assistida Vil·la Salut,132806,Francesc Alegre,26,34,Can Baró,7,Horta-Guinardó,8024,BARCELONA,932108787,Residències gent gran,430290.528859,4585389.0,41.417036,2.165851
3,﻿75990706118,Residència Assistida per a Gent Gran Las Violetas,169409,G.V. Corts Catalanes,573,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011,BARCELONA,933238180,Residències gent gran,429941.00599,4581848.0,41.385113,2.162079
4,﻿98334113506,Residència Assistida Ama-Lur,350308,C València,281,7,la Dreta de l'Eixample,2,Eixample,8009,BARCELONA,934872875,Residències gent gran,430275.326517,4582925.0,41.394839,2.165953
5,﻿98334114740,Residència Assistida per a Gent Gran Urquinaon...,232902,Pl Urquinaona,1,7,la Dreta de l'Eixample,2,Eixample,8010,BARCELONA,933014499,Residències gent gran,430814.061413,4582287.0,41.389138,2.17247
6,﻿98336085708,Residència Assistida per a Gent Gran Royal Llar,169409,G.V. Corts Catalanes,695,5,el Fort Pienc,2,Eixample,8013,BARCELONA,932311118,Residències gent gran,431290.256604,4583169.0,41.397125,2.178065
7,﻿98336090941,Residència Assistida per a Gent Gran Virreina,109203,C Encarnació,31,31,la Vila de Gràcia,6,Gràcia,8012,BARCELONA,932198161,Residències gent gran,429725.930597,4584166.0,41.405968,2.159238
8,﻿99400001162,Residència Assistida per a Gent Gran La Vostra...,169409,G.V. Corts Catalanes,871,65,el Clot,10,Sant Martí,8018,BARCELONA,933208455,Residències gent gran,432469.816911,4584322.0,41.407612,2.192045
9,﻿99400468618,Residència Assistida per a Gent Gran Ballesol ...,11200,C Almogàvers,32,66,el Parc i la Llacuna del Poblenou,10,Sant Martí,8018,BARCELONA,933569383,Residències gent gran,431809.992654,4582677.0,41.392737,2.184337


### 5) Other Health Equipments (Bcn & Cat)

A- Health Equipments in Barcelona

In [397]:
display(health_equip_bcn_df.info())
print("\nSTATISTICS:\n")
#display(health_equip_bcn_df.describe())
non_null_cols, info_df = null_function(health_equip_bcn_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2312 entries, 0 to 2311
Data columns (total 39 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   register_id                    2312 non-null   object 
 1   name                           2312 non-null   object 
 2   institution_id                 138 non-null    float64
 3   institution_name               138 non-null    object 
 4   created                        2312 non-null   object 
 5   modified                       2312 non-null   object 
 6   addresses_roadtype_id          0 non-null      float64
 7   addresses_roadtype_name        0 non-null      float64
 8   addresses_road_id              2311 non-null   float64
 9   addresses_road_name            2310 non-null   object 
 10  addresses_start_street_number  2310 non-null   float64
 11  addresses_end_street_number    182 non-null    float64
 12  addresses_neighborhood_id      2312 non-null   i

None


STATISTICS:


Columns with no null values: Index(['register_id', 'name', 'created', 'modified',
       'addresses_neighborhood_id', 'addresses_district_id', 'addresses_town',
       'addresses_main_address', 'geo_epgs_25831_x', 'geo_epgs_25831_y',
       'geo_epgs_4326_lat', 'geo_epgs_4326_lon'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,institution_id,2174,94.031142
1,institution_name,2174,94.031142
2,addresses_roadtype_id,2312,100.0
3,addresses_roadtype_name,2312,100.0
4,addresses_road_id,1,0.043253
5,addresses_road_name,2,0.086505
6,addresses_start_street_number,2,0.086505
7,addresses_end_street_number,2130,92.128028
8,addresses_neighborhood_name,1,0.043253
9,addresses_district_name,1,0.043253


*Seeing the amount of NULL values for each column, we will drop: institution_id, institution_name, addresses_roadtype_id, addresses_roadtype_name, addresses_end_street_number, addresses_type, values_description, estimated_dates, start_date, end_date. Also, the columns created and modified won't be used for the moment.*

In [398]:
# Drop columns with almost all null values
columns_to_drop_health_equip_bcn = ['created', 'modified', 'institution_id', 'institution_name', 'addresses_roadtype_id', 'addresses_roadtype_name' ,
                                  'addresses_end_street_number', 'addresses_type', 'values_description', 'estimated_dates', 'start_date', 'end_date']

health_equip_bcn_df.drop(columns=columns_to_drop_health_equip_bcn, inplace=True)

# Useless: Identifiador Via, Indicador si l'adreça, Valor de l'atribut identificador, Valor de l'atribut, Valors d'atribut Destacat,
# Arbres d'equivalència Identificador, Arbres d'equivalència Ruta, Arbres d'equivalència, Arbres d'equivalència  Id Asia
#useless_cols = ['addresses_road_id', 'addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
#                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']
useless_cols = ['addresses_main_address', 'values_id', 'values_attribute_id', 'values_outstanding', 'secondary_filters_id',
                'secondary_filters_fullpath', 'secondary_filters_tree', 'secondary_filters_asia_id', 'values_category', 'values_attribute_name']

health_equip_bcn_df.drop(columns=useless_cols, inplace=True)
health_equip_bcn_df.head(10)

Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿75990003218,Farmàcia Berge Sahli,230601.0,Mare de Déu de Port,255.0,13,la Marina de Port,3,Sants-Montjuïc,8038.0,BARCELONA,933311341,Farmàcies,428303.684524,4578954.0,41.358896,2.142841
1,﻿75990023524,Laboratori Fornells Olo Crespo *Urgell,349706.0,C Comte d'Urgell,288.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8036.0,BARCELONA,933219870,,428579.714057,4582635.0,41.392077,2.145707
2,﻿75990017609,Farmàcia Urgelles Fabregas,23403.0,Aribau,36.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011.0,BARCELONA,934523190,Farmàcies,429858.894739,4582088.0,41.38726,2.161069
3,﻿75990017682,Farmàcia Torres *24 hores,23403.0,Aribau,62.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011.0,BARCELONA,934539220,Farmàcies,429704.219209,4582246.0,41.388671,2.159201
4,﻿75990017682,Farmàcia Torres *24 hores,23403.0,Aribau,62.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011.0,BARCELONA,934539220,Farmàcies permanents,429704.219209,4582246.0,41.388671,2.159201
5,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4584550.0,41.409469,2.165559
6,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4583269.0,41.39767,2.129935
7,﻿75990034354,Farmàcia Guasch Badell,68401.0,Carreras i Candi,49.0,17,Sants - Badal,3,Sants-Montjuïc,8028.0,BARCELONA,934224392,Farmàcies,427056.260311,4580566.0,41.373308,2.127735
8,﻿11090446,Centre d’Atenció a la Salut Sexual i Reproduct...,140203.0,Av Drassanes,17.0,1,el Raval,1,Ciutat Vella,8001.0,BARCELONA,934431864,Planificació familiar,430878.786071,4580889.0,41.37655,2.173403
9,﻿75990048878,Farmàcia Sabes Arenillas,102703.0,C Doctor Letamendi,13.0,43,Horta,7,Horta-Guinardó,8031.0,BARCELONA,934279094,Farmàcies,429739.770789,4587393.0,41.435032,2.159029


In [399]:
# Observe New Info
#display(health_equip_bcn_df.info())

# Check if duplicates
has_duplicates(health_equip_bcn_df, 'register_id')

There's duplicate values in column register_id


In [400]:
# Print unique values for characteristics column before merging stuff
print("Unique Characteristics:\n")
print(health_equip_bcn_df['secondary_filters_name'].unique(), "\n")

# Remove weird characteristic (NaN, Planificació familiar)
health_equip_bcn_df.dropna(subset=['secondary_filters_name'], inplace=True)
education_equip_df = health_equip_bcn_df[health_equip_bcn_df['secondary_filters_name'] != 'Planificació familiar']

# Aggregate secondary_filters_name (caract.) by register_id and replace column
aggregated_sec_filters = health_equip_bcn_df.groupby('register_id')['secondary_filters_name'].transform(lambda x: ', '.join(x))
health_equip_bcn_df['secondary_filters_name'] = aggregated_sec_filters
#display(education_equip_df.head(10))

# Drop duplicated register_id
health_equip_bcn_df.drop_duplicates(subset='register_id', inplace=True)
display(health_equip_bcn_df.head(10))


Unique Characteristics:

['Farmàcies' nan 'Farmàcies permanents' 'CAPs' 'Hospitals i clíniques'
 'Planificació familiar' "Farmàcies servei de 9 a 22 h tot l'any"
 'Centres urgències (CUAPs)' 'Centres de dia gent gran'
 'Residències gent gran'] 



Unnamed: 0,register_id,name,addresses_road_id,addresses_road_name,addresses_start_street_number,addresses_neighborhood_id,addresses_neighborhood_name,addresses_district_id,addresses_district_name,addresses_zip_code,addresses_town,values_value,secondary_filters_name,geo_epgs_25831_x,geo_epgs_25831_y,geo_epgs_4326_lat,geo_epgs_4326_lon
0,﻿75990003218,Farmàcia Berge Sahli,230601.0,Mare de Déu de Port,255.0,13,la Marina de Port,3,Sants-Montjuïc,8038.0,BARCELONA,933311341,Farmàcies,428303.684524,4578954.0,41.358896,2.142841
2,﻿75990017609,Farmàcia Urgelles Fabregas,23403.0,Aribau,36.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011.0,BARCELONA,934523190,Farmàcies,429858.894739,4582088.0,41.38726,2.161069
3,﻿75990017682,Farmàcia Torres *24 hores,23403.0,Aribau,62.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011.0,BARCELONA,934539220,"Farmàcies, Farmàcies permanents",429704.219209,4582246.0,41.388671,2.159201
5,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4584550.0,41.409469,2.165559
6,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4583269.0,41.39767,2.129935
7,﻿75990034354,Farmàcia Guasch Badell,68401.0,Carreras i Candi,49.0,17,Sants - Badal,3,Sants-Montjuïc,8028.0,BARCELONA,934224392,Farmàcies,427056.260311,4580566.0,41.373308,2.127735
8,﻿11090446,Centre d’Atenció a la Salut Sexual i Reproduct...,140203.0,Av Drassanes,17.0,1,el Raval,1,Ciutat Vella,8001.0,BARCELONA,934431864,Planificació familiar,430878.786071,4580889.0,41.37655,2.173403
9,﻿75990048878,Farmàcia Sabes Arenillas,102703.0,C Doctor Letamendi,13.0,43,Horta,7,Horta-Guinardó,8031.0,BARCELONA,934279094,Farmàcies,429739.770789,4587393.0,41.435032,2.159029
11,﻿75990049258,Clínica Mi Tres Torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospitals i clíniques,427179.337491,4583445.0,41.399249,2.12886
13,﻿75990034430,Farmàcia Genis Callol,68503.0,C Carretes,35.0,1,el Raval,1,Ciutat Vella,8001.0,BARCELONA,934417184,Farmàcies,430387.13083,4580995.0,41.377466,2.167512


In [401]:
# Rename Columns
health_equip_bcn_df = health_equip_bcn_df.rename(columns={"addresses_road_id": "Street ID", "name": "Name", "addresses_road_name": "Road Name", "addresses_start_street_number": "Street Number",
                                                      "addresses_neighborhood_id": "Neighborhood ID", "addresses_neighborhood_name": "Neighborhood Name",
                                                      "addresses_district_id": "District ID", "addresses_district_name": "District Name",
                                                      "addresses_zip_code": "ZipCode","addresses_town": "Town",
                                                      "values_value": "Phone Number", "secondary_filters_name": "Service Type",
                                                      "geo_epgs_4326_lat": "Latitude", "geo_epgs_4326_lon": "Longitude"})
display(health_equip_bcn_df.head(10))

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿75990003218,Farmàcia Berge Sahli,230601.0,Mare de Déu de Port,255.0,13,la Marina de Port,3,Sants-Montjuïc,8038.0,BARCELONA,933311341,Farmàcies,428303.684524,4578954.0,41.358896,2.142841
2,﻿75990017609,Farmàcia Urgelles Fabregas,23403.0,Aribau,36.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011.0,BARCELONA,934523190,Farmàcies,429858.894739,4582088.0,41.38726,2.161069
3,﻿75990017682,Farmàcia Torres *24 hores,23403.0,Aribau,62.0,8,l'Antiga Esquerra de l'Eixample,2,Eixample,8011.0,BARCELONA,934539220,"Farmàcies, Farmàcies permanents",429704.219209,4582246.0,41.388671,2.159201
5,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4584550.0,41.409469,2.165559
6,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4583269.0,41.39767,2.129935
7,﻿75990034354,Farmàcia Guasch Badell,68401.0,Carreras i Candi,49.0,17,Sants - Badal,3,Sants-Montjuïc,8028.0,BARCELONA,934224392,Farmàcies,427056.260311,4580566.0,41.373308,2.127735
8,﻿11090446,Centre d’Atenció a la Salut Sexual i Reproduct...,140203.0,Av Drassanes,17.0,1,el Raval,1,Ciutat Vella,8001.0,BARCELONA,934431864,Planificació familiar,430878.786071,4580889.0,41.37655,2.173403
9,﻿75990048878,Farmàcia Sabes Arenillas,102703.0,C Doctor Letamendi,13.0,43,Horta,7,Horta-Guinardó,8031.0,BARCELONA,934279094,Farmàcies,429739.770789,4587393.0,41.435032,2.159029
11,﻿75990049258,Clínica Mi Tres Torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospitals i clíniques,427179.337491,4583445.0,41.399249,2.12886
13,﻿75990034430,Farmàcia Genis Callol,68503.0,C Carretes,35.0,1,el Raval,1,Ciutat Vella,8001.0,BARCELONA,934417184,Farmàcies,430387.13083,4580995.0,41.377466,2.167512


### NO EJECUTAR CAT

B- Other Health Equipments in Catalunya

In [316]:
display(health_equip_cat_df.info())
print("\nSTATISTICS:\n")
#display(health_equip_cat_df.describe())
non_null_cols, info_df = null_function(health_equip_cat_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5743 entries, 0 to 5742
Data columns (total 34 columns):
 #   Column                  Non-Null Count  Dtype  
---  ------                  --------------  -----  
 0   Número registre         5742 non-null   object 
 1   Nom establiment         5743 non-null   object 
 2   Codi tipus establiment  5743 non-null   object 
 3   Tipus establiment       5743 non-null   object 
 4   Codi establiment pare   316 non-null    object 
 5   Titular                 5563 non-null   object 
 6   Telèfon                 4867 non-null   object 
 7   Codi tipus de via       5743 non-null   object 
 8   Tipus de via            5743 non-null   object 
 9   Nom de la via           5743 non-null   object 
 10  Número de la via        5732 non-null   object 
 11  Quilòmetre              8 non-null      float64
 12  Bloc                    22 non-null     object 
 13  Portal                  36 non-null     object 
 14  Escala                  18 non-null     

None


STATISTICS:


Columns with no null values: Index(['Nom establiment', 'Codi tipus establiment', 'Tipus establiment',
       'Codi tipus de via', 'Tipus de via', 'Nom de la via', 'Codi postal',
       'Codi municipi', 'Codi província'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,Número registre,1,0.017413
1,Codi establiment pare,5427,94.497649
2,Titular,180,3.13425
3,Telèfon,876,15.253352
4,Número de la via,11,0.191538
5,Quilòmetre,5735,99.8607
6,Bloc,5721,99.616925
7,Portal,5707,99.37315
8,Escala,5725,99.686575
9,Pis,5489,95.577224


*Seeing the amount of NULL values for each column, we will drop: Codi establiment pare, Quilòmetre, Bloc, Portal, Escala, Pis, Porta, Article municipi.*

In [317]:
# Drop columns with almost all null values
columns_to_drop_health_equip_cat = ['Codi establiment pare', 'Quilòmetre', 'Bloc', 'Portal', 'Escala', 'Pis', 'Porta', 'Article municipi']

health_equip_cat_df.drop(columns=columns_to_drop_health_equip_cat, inplace=True)

# Useless: Codi tipus establiment, Titular, Codi tipus de via
useless_cols = ['Codi tipus establiment', 'Titular', 'Codi tipus de via', 'Georeferència']
health_equip_cat_df.drop(columns=useless_cols, inplace=True)
health_equip_cat_df.head(10)

Unnamed: 0,Número registre,Nom establiment,Tipus establiment,Telèfon,Tipus de via,Nom de la via,Número de la via,Codi postal,Codi municipi,Municipi,...,Codi província,Província,Codi ABS,ABS,Codi sector sanitari,Sector sanitari,Codi regió sanitària,Regió sanitària,Longitud,Latitud
0,R25000226,MALDA,Farmaciola,,CARRER,JESUS,26,25266,25130,MALDÀ,...,25,LLEIDA,81.0,Bellpuig,6156.0,Lleida,61.0,Lleida,1.039757,41.550738
1,F25001621,"GARCIA ALDAVO, ERNEST",Farmàcia,973450214.0,PASSEIG,ESTACIO,10,25600,25040,BALAGUER,...,25,LLEIDA,14.0,Balaguer,6156.0,Lleida,61.0,Lleida,0.808943,41.789271
2,R43000113,"MUNTELLS, ELS",Farmaciola,468111.0,CARRER,CONSULTORI,,43877,43902,SANT JAUME D'ENVEJA,...,43,TARRAGONA,4.0,Amposta,6360.0,Terres de l'Ebre,63.0,Terres de l'Ebre,0.718199,40.705043
3,F08024315,"MATARO REIXACH, ROGER",Farmàcia,938831084.0,CARRER,SANT JOSEP,40,8540,8067,CENTELLES,...,8,BARCELONA,102.0,Centelles,6745.0,Osona,67.0,Catalunya Central,2.220891,41.796479
4,R25000113,"PLA DE LA FONT,EL",Farmaciola,748083.0,CARRER,ESCALES,S/N,25110,25912,GIMENELLS I EL PLA DE,...,25,LLEIDA,334.0,Almacelles,6156.0,Lleida,61.0,Lleida,0.391117,41.65192
5,V17703412,MURALLA OPTICA,Òptica,972332510.0,CARRER,BESALÚ,9,17600,17066,FIGUERES,...,17,GIRONA,119.0,Figueres,6461.0,Girona Nord,64.0,Girona,2.960206,42.267031
6,V17532460,VISTAOPTICA,Òptica,,CARRER,DEL BORN,14,17820,17015,BANYOLES,...,17,GIRONA,15.0,Banyoles,6462.0,Girona Sud,64.0,Girona,2.765803,42.118
7,R25000152,SUNYER,Farmaciola,,CARRER,LA BASSA,21,25174,25212,SUNYER,...,25,LLEIDA,329.0,Lleida rural - 2 Sud,6156.0,Lleida,61.0,Lleida,0.593033,41.522585
8,V43504375,ALAIN AFFLELOU ÓPTICO,Òptica,977549869.0,CARRETERA,DE REUS,4,43006,43148,TARRAGONA,...,43,TARRAGONA,0.0,No informat,0.0,Desconegut,0.0,No informat,1.233172,41.117733
9,V17713516,ALAIN AFFLELOU,Òptica,627790594.0,CARRER,PERELADA,11,17600,17066,FIGUERES,...,17,GIRONA,119.0,Figueres,6461.0,Girona Nord,64.0,Girona,2.961502,42.267731


In [318]:
# Observe New Info
#display(health_equip_cat_df.info())

# Check if duplicates
has_duplicates(health_equip_cat_df, 'Número registre')

No duplicate values in column Número registre


In [319]:
health_equip_cat_bcn_df = health_equip_cat_df[health_equip_cat_df['Municipi'] == 'BARCELONA']
display(health_equip_cat_bcn_df)

Unnamed: 0,Número registre,Nom establiment,Tipus establiment,Telèfon,Tipus de via,Nom de la via,Número de la via,Codi postal,Codi municipi,Municipi,...,Codi província,Província,Codi ABS,ABS,Codi sector sanitari,Sector sanitari,Codi regió sanitària,Regió sanitària,Longitud,Latitud
14,F08004608,"AGUILAR PEREZ, MARIA FRANCISCA",Farmàcia,933170291,CARRER,AUSIAS MARC,31,8010,8019,BARCELONA,...,8,BARCELONA,28.0,Barcelona - 02H,7851.0,Barcelona Eixample,78.0,Barcelona,2.175330,41.391272
16,F08003322,"DUCH FORADAT, MARIA DOLORES",Farmàcia,934068433,PLAÇA,VALLVIDRERA,5,8017,8019,BARCELONA,...,8,BARCELONA,45.0,Barcelona - 05D,7850.0,Barcelona Sarrià - Sant Gervasi,78.0,Barcelona,2.104771,41.412979
23,F08017886,"MARTÍNEZ MORALES, MARIA ANGELES",Farmàcia,933149760,CARRER,ANDRADE,151,8020,8019,BARCELONA,...,8,BARCELONA,331.0,Barcelona - 10J,7847.0,Barcelona Sant Martí,78.0,Barcelona,2.201153,41.416131
30,F08012870,"VIVES SEGUI, MA ASSUMPCIO",Farmàcia,933146860,CARRER,GRAN VIA CORTS CATALANES,1158,8020,8019,BARCELONA,...,8,BARCELONA,74.0,Barcelona - 10D,7847.0,Barcelona Sant Martí,78.0,Barcelona,2.207452,41.418166
32,V08714862,LIFE ÒPTICA,Òptica,935415728,CARRER,VILLARROEL,69,8011,8019,BARCELONA,...,8,BARCELONA,24.0,Barcelona - 02D,7851.0,Barcelona Eixample,78.0,Barcelona,2.158681,41.383622
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5730,F08018569,"LIARTE CHICO, VANESSA/ROIG ESCOBAR, BERTA",Farmàcia,932195289,CARRER,ALBERT LLANAS,28,8024,8019,BARCELONA,...,8,BARCELONA,51.0,Barcelona - 07A,7853.0,Barcelona Horta - Guinardó,78.0,Barcelona,2.158040,41.416806
5732,V08518057,IT OPTICS,Òptica,,CARRER,LLOGREGOS,126,8032,8019,BARCELONA,...,8,BARCELONA,54.0,Barcelona - 07D,7853.0,Barcelona Horta - Guinardó,78.0,Barcelona,2.155897,41.424291
5734,F08008200,"RUBIS MARIN, GEMMA",Farmàcia,933544016,CARRER,SANTA ENGRACIA,73,8016,8019,BARCELONA,...,8,BARCELONA,61.0,Barcelona 8-E,7854.0,Barcelona Nou Barris,78.0,Barcelona,2.178574,41.441050
5736,F08006274,"SAMARANCH MIRADA,MARIANO",Farmàcia,932376210,CARRER,SARAGOSSA,42,8006,8019,BARCELONA,...,8,BARCELONA,43.0,Barcelona - 05B,7850.0,Barcelona Sarrià - Sant Gervasi,78.0,Barcelona,2.148439,41.402592


In [320]:
health_equip_cat_bcn_df['Tipus establiment'].unique()

array(['Farmàcia', 'Òptica', 'Audiopròtesi', 'Ortopèdia', 'Farmaciola'],
      dtype=object)

In [321]:
# Rename Columns
health_equip_cat_bcn_df = health_equip_cat_bcn_df.rename(columns={"Número registre": "Id", "Nom establiment": "Name", "Tipus establiment": "Service Type",
                                                                  "Telèfon": "Phone Number", "Tipus de via": "Roadtype", "Nom de la via": "Road Name",
                                                                  "Número de la via": "Street Number", "Codi postal": "ZipCode", "Codi municipi": "Town ID",
                                                                  "Municipi": "Town", "Codi comarca": "Region ID", "Comarca": "Region", "Codi província": "Province ID",
                                                                  "Província": "Province", "Codi ABS": "ABS ID", "ABS": "ABS Name",
                                                                  "Codi sector sanitari": "Sanitary Sector ID", "Sector sanitari": "Sanitary Sector",
                                                                  "Codi regió sanitària": "Sanitary Region ID", "Regió sanitària": "Sanitary Region",
                                                                  "Longitud": "Longitude", "Latitud": "Latitude"})

display(health_equip_cat_bcn_df.head(10))

Unnamed: 0,Id,Name,Service Type,Phone Number,Roadtype,Road Name,Street Number,ZipCode,Town ID,Town,...,Province ID,Province,ABS ID,ABS Name,Sanitary Sector ID,Sanitary Sector,Sanitary Region ID,Sanitary Region,Longitude,Latitude
14,F08004608,"AGUILAR PEREZ, MARIA FRANCISCA",Farmàcia,933170291,CARRER,AUSIAS MARC,31,8010,8019,BARCELONA,...,8,BARCELONA,28.0,Barcelona - 02H,7851.0,Barcelona Eixample,78.0,Barcelona,2.17533,41.391272
16,F08003322,"DUCH FORADAT, MARIA DOLORES",Farmàcia,934068433,PLAÇA,VALLVIDRERA,5,8017,8019,BARCELONA,...,8,BARCELONA,45.0,Barcelona - 05D,7850.0,Barcelona Sarrià - Sant Gervasi,78.0,Barcelona,2.104771,41.412979
23,F08017886,"MARTÍNEZ MORALES, MARIA ANGELES",Farmàcia,933149760,CARRER,ANDRADE,151,8020,8019,BARCELONA,...,8,BARCELONA,331.0,Barcelona - 10J,7847.0,Barcelona Sant Martí,78.0,Barcelona,2.201153,41.416131
30,F08012870,"VIVES SEGUI, MA ASSUMPCIO",Farmàcia,933146860,CARRER,GRAN VIA CORTS CATALANES,1158,8020,8019,BARCELONA,...,8,BARCELONA,74.0,Barcelona - 10D,7847.0,Barcelona Sant Martí,78.0,Barcelona,2.207452,41.418166
32,V08714862,LIFE ÒPTICA,Òptica,935415728,CARRER,VILLARROEL,69,8011,8019,BARCELONA,...,8,BARCELONA,24.0,Barcelona - 02D,7851.0,Barcelona Eixample,78.0,Barcelona,2.158681,41.383622
34,F08008909,"LAVILLA BIARGE, RODOLF (HEREUS)",Farmàcia,933540304,CARRER,ARGULLOS,49,8016,8019,BARCELONA,...,8,BARCELONA,403.0,Barcelona - 08L,7854.0,Barcelona Nou Barris,78.0,Barcelona,2.182601,41.440721
35,F08006482,"COLOMINA GUITART, CLÀUDIA",Farmàcia,932122627,CARRER,SARAGOSSA,133,8006,8019,BARCELONA,...,8,BARCELONA,43.0,Barcelona - 05B,7850.0,Barcelona Sarrià - Sant Gervasi,78.0,Barcelona,2.145189,41.404663
46,F08002985,"MARRUGAT FONTANALS, OSCAR",Farmàcia,932534704,CARRER,CONSELL DE CENT,253,8011,8019,BARCELONA,...,8,BARCELONA,24.0,Barcelona - 02D,7851.0,Barcelona Eixample,78.0,Barcelona,2.159735,41.386734
47,F08006450,"CARRERAS COMA, MIQUEL",Farmàcia,933187099,CARRER,HOSPITAL,14,8001,8019,BARCELONA,...,8,BARCELONA,20.0,Barcelona - 01E,7846.0,Barcelona Ciutat Vella,78.0,Barcelona,2.172517,41.381087
51,V08553887,TRIOPTIC,Òptica,661759877,CARRER,CAMELIES,22,8024,8019,BARCELONA,...,8,BARCELONA,358.0,Barcelona - 06E,7852.0,Barcelona Gràcia,78.0,Barcelona,2.159754,41.411291


### 6) Dental Clinic

In [402]:
#display(dental_cat_df.info())
#print("\nSTATISTICS:\n")
#display(dental_cat_df.describe())
#non_null_cols, info_df = null_function(dental_cat_df)
#print("\nColumns with no null values:", non_null_cols)
#print("\nColumns with null values:")
#display(info_df)

### 7) Indicators

In [403]:
display(indicators_bcn_df.info())
print("\nSTATISTICS:\n")
#display(indicators_bcn_df.describe())
non_null_cols, info_df = null_function(indicators_bcn_df)
print("\nColumns with no null values:", non_null_cols)
print("\nColumns with null values:")
display(info_df)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 146900 entries, 0 to 146899
Data columns (total 8 columns):
 #   Column                Non-Null Count   Dtype 
---  ------                --------------   ----- 
 0   Data_Indicador        146900 non-null  object
 1   Frequencia_Indicador  146900 non-null  object
 2   Territori             146900 non-null  object
 3   Nom_Indicador         146900 non-null  object
 4   Nom_Variable          131577 non-null  object
 5   Valor                 142783 non-null  object
 6   Unitat                146900 non-null  object
 7   Font                  146900 non-null  object
dtypes: object(8)
memory usage: 9.0+ MB


None


STATISTICS:


Columns with no null values: Index(['Data_Indicador', 'Frequencia_Indicador', 'Territori', 'Nom_Indicador',
       'Unitat', 'Font'],
      dtype='object')

Columns with null values:


Unnamed: 0,Column,Null Count,Null Percentage
0,Nom_Variable,15323,10.430905
1,Valor,4117,2.802587


*For the moment, we won't drop any column.*

In [404]:
indicators_bcn_df.head(10)

Unnamed: 0,Data_Indicador,Frequencia_Indicador,Territori,Nom_Indicador,Nom_Variable,Valor,Unitat,Font
0,2020-02-25,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,1.0,Nombre,Agència de Salut Pública de Barcelona
1,2020-02-26,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
2,2020-02-27,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
3,2020-02-28,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
4,2020-02-29,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
5,2020-03-01,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
6,2020-03-02,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,1.0,Nombre,Agència de Salut Pública de Barcelona
7,2020-03-03,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
8,2020-03-04,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,1.0,Nombre,Agència de Salut Pública de Barcelona
9,2020-03-05,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,2.0,Nombre,Agència de Salut Pública de Barcelona


In [405]:
# Rename Columns
indicators_bcn_df = indicators_bcn_df.rename(columns={"Data_Indicador": "Date", "Frequencia_Indicador": "Frequency", "Territori": "Place",
                                                      "Nom_Indicador": "Indicator", "Nom_Variable": "Variable Name", "Valor": "Variable Value",
                                                      "Unitat": "Unit", "Font": "Source"})
display(indicators_bcn_df.head(10))

Unnamed: 0,Date,Frequency,Place,Indicator,Variable Name,Variable Value,Unit,Source
0,2020-02-25,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,1.0,Nombre,Agència de Salut Pública de Barcelona
1,2020-02-26,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
2,2020-02-27,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
3,2020-02-28,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
4,2020-02-29,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
5,2020-03-01,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
6,2020-03-02,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,1.0,Nombre,Agència de Salut Pública de Barcelona
7,2020-03-03,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,0.0,Nombre,Agència de Salut Pública de Barcelona
8,2020-03-04,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,1.0,Nombre,Agència de Salut Pública de Barcelona
9,2020-03-05,diari,Barcelona,Casos de COVID-19 a Barcelona (diari),,2.0,Nombre,Agència de Salut Pública de Barcelona


In [406]:
indicators_bcn_df['Indicator'].unique()

array(['Casos de COVID-19 a Barcelona (diari)',
       'Casos de COVID-19 a Catalunya (diari)',
       'Casos de COVID-19 a Espanya (diari)',
       'Casos de COVID-19 a Barcelona (acumulat)',
       'Casos de COVID-19 a Catalunya (acumulat)',
       'Casos de COVID-19 a Espanya (acumulat)',
       'Casos de COVID-19 als hospitals de Barcelona',
       'Professionals inhàbils del consorci sanitari',
       'Nombre de dosis administrades de vacunes per al SARS-CoV-2 a Barcelona (diàries)',
       'Nombre de dosis administrades de vacunes per al SARS-CoV-2 a Catalunya (diàries)',
       'Nombre de dosis administrades de vacunes per al SARS-CoV-2 a Barcelona (acumulades)',
       'Nombre de dosis administrades de vacunes per al SARS-CoV-2 a Catalunya (acumulades)',
       'Alumnes confinats en centres educatius (diaris)',
       'Alumnes confinats en centres educatius segons districte municipal (diaris)',
       'Defuncions totals registrades a Barcelona',
       'Defuncions totals a Barc

In [407]:
indicators_bcn_df['Variable Name'].unique()

array([nan, 'Altes', 'Ingressats', 'AIS Nord', 'AIS Dreta',
       'AIS Litoral Mar', 'AIS Esquerra', 'Dosi 1', 'Dosi 2', 'Dosi 3',
       'Ciutat vella', 'Eixample', 'Gràcia', 'Horta - Guinardó',
       'Les Corts', 'Nou Barris', 'Sant Andreu', 'Sant Martí',
       'Sants - Montjuïc', 'Sarrià - Sant Gervasi',
       'Serveis als cementiris', 'Recollides de funeràries',
       'Defuncions padró', 'Observades', 'Esperades', 'Pagament',
       'No pagament', 'No consta', 'Màxim horari NO2', 'Mitjana NO2',
       'Mitjana PM10', '4h', '19h', 'Barcelona - Observatori Fabra',
       'Barcelona - el Raval', 'Barcelona - Zona Universitària',
       '00AM:03AM', '04AM:07AM', '08AM:11AM', '12PM:15PM', '16PM:19PM',
       '20PM:23PM', 'Internacionals', 'Nacionals', 'Commuters',
       'Residents', 'Europa', 'Espanya', 'Àsia', 'Amèrica', 'Àfrica',
       'Oceania', 'Altres', 'Creuers i Iots', 'Càrrega', 'Ferris', 'FGC',
       'Metro', 'Renfe', 'Bus', 'TRAM', 'Accessos', 'Interior ciutat',
      

### 1.3. Dataset Merging

- Datasets with same columns: hospitals_caps_df, pharmacies_bcn_df, day_centers_df, residences_df, health_equip_bcn_df

- Datasets with different columns: indicators_bcn_df, pharmacies_cat_bcn_df, health_equip_cat_bcn_df

In [408]:
# Data Merging (same columns)
health_df = pd.concat([hospitals_caps_df, pharmacies_bcn_df, day_centers_df, residences_df, health_equip_bcn_df], ignore_index=True)

# Drop duplicates
health_df = health_df.drop_duplicates(subset='register_id', keep='first')
display(health_df)

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4.584550e+06,41.409469,2.165559
1,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4.583269e+06,41.397670,2.129935
2,﻿75990049258,Clínica Mi Tres Torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospitals i clíniques,427179.337491,4.583445e+06,41.399249,2.128860
3,﻿93056132443,Centre d'Atenció Primària Montnegre,219100.0,C Montnegre,21.0,19,les Corts,4,Les Corts,8029.0,BARCELONA,933632965,CAPs,427992.371870,4.582146e+06,41.387618,2.138740
4,﻿75990060288,Hospital de Barcelona,144601.0,Av Diagonal,660.0,19,les Corts,4,Les Corts,8034.0,BARCELONA,932542400,Hospitals i clíniques,427228.518238,4.582406e+06,41.389892,2.129574
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2747,﻿99400706843,Centre d’Atenció a la Salut Sexual i Reproduct...,86902.0,Carrer Nou de la Rambla,169.0,11,el Poble-sec,3,Sants-Montjuïc,8004.0,Barcelona,933 249 100,Planificació familiar,430319.122000,4.580348e+06,41.371628,2.166773
2764,﻿99400706848,Centre d’Atenció a la Salut Sexual i Reproduct...,140804.0,Carrer de Garigliano,23.0,50,les Roquetes,8,Nou Barris,8042.0,Barcelona,932 768 066,Planificació familiar,430993.757000,4.588615e+06,41.446149,2.173897
2765,﻿99400706838,Centre d’Atenció a la Salut Sexual i Reproduct...,286504.0,Carrer de Roger de Flor,194.0,7,la Dreta de l'Eixample,2,Eixample,8013.0,Barcelona,934 585 980,Planificació familiar,430845.936000,4.583368e+06,41.398874,2.172728
2766,﻿99400706854,Centre d’Atenció a la Salut Sexual i Reproduct...,700957.0,Carrer de Pere Vergés,3.0,73,la Verneda i la Pau,10,Sant Martí,8020.0,Barcelona,932 788 660,Planificació familiar,433893.712000,4.586013e+06,41.422953,2.208895


NOT EXECUTE!

In [267]:

#indicators_bcn_df, pharmacies_cat_bcn_df, health_equip_cat_bcn_df


# Pharmacies
drop_merge_pharmacies = ['Id', 'Roadtype', 'Town ID', 'Town']
# Concatenar las columnas 'roadtype' y 'roadname'
pharmacies_cat_bcn_df['Road Name'] = pharmacies_cat_bcn_df['Roadtype'] + ' ' + pharmacies_cat_bcn_df['Road Name']
pharmacies_cat_bcn_df.drop(columns=drop_merge_pharmacies, inplace=True)
#pharmacies_cat_bcn_df.info()
display(pharmacies_cat_bcn_df)

# Health Equipment
drop_merge_health_equip = ['Id', 'Phone Number', 'Roadtype', 'Town ID', 'Town', 'Region ID', 'Region', 'Province ID', 'Province',
                           'Sanitary Region', 'Sanitary Region ID']
# Concatenar las columnas 'roadtype' y 'roadname'
health_equip_cat_bcn_df['Road Name'] = health_equip_cat_bcn_df['Roadtype'] + ' ' + health_equip_cat_bcn_df['Road Name']
health_equip_cat_bcn_df.drop(columns=drop_merge_health_equip, inplace=True)
#health_equip_cat_bcn_df.info()
display(health_equip_cat_bcn_df)

Unnamed: 0,Name,ABS ID,ABS Name,ZipCode,Road Name,Street Number,Service Type
0,"BRUNA REVERTER, M.ANGELS",76.0,BARCELONA 10-F,8025,CR DOS DE MAIG,288,Farmàcia
248,"GONZALEZ SOLA,FRANCESC M",75.0,BARCELONA 10-E,8026,CR MALLORCA,519,Farmàcia
249,"MAURI MARTINEZ, ROSA MA",32.0,BARCELONA 3-A,8004,CR SALVA,33,Farmàcia
250,"SANZ JORNET, M. ALEJANDRA",35.0,BARCELONA 3-D,8014,CR CREU COBERTA,95,Farmàcia
251,"DOMINGO ESGLEYES, MIRIAM",49.0,BARCELONA 6-C,8012,CR GRAN DE GRACIA,237,Farmàcia
...,...,...,...,...,...,...,...
1864,"GONZALEZ MERCADER, MARGARITA",358.0,BARCELONA 6-E,8024,AV POMPEU FABRA,2,Farmàcia
1869,"MORILLO BASAS, LOURDES",74.0,BARCELONA 10-D,8020,CR PARAGUAY,25,Farmàcia
1878,"BRIO GUIBERNAU, MONTSERRAT",62.0,BARCELONA 8-F,8042,CR ANTONIO MACHADO,18,Farmàcia
1914,"MARTINEZ PARDO, ISABEL",358.0,BARCELONA 6-E,8024,TS DE DALT,110,Farmàcia


Unnamed: 0,Name,Service Type,Road Name,Street Number,ZipCode,ABS ID,ABS Name,Sanitary Sector ID,Sanitary Sector,Longitude,Latitude
14,"AGUILAR PEREZ, MARIA FRANCISCA",Farmàcia,CARRER AUSIAS MARC,31,8010,28.0,Barcelona - 02H,7851.0,Barcelona Eixample,2.175330,41.391272
16,"DUCH FORADAT, MARIA DOLORES",Farmàcia,PLAÇA VALLVIDRERA,5,8017,45.0,Barcelona - 05D,7850.0,Barcelona Sarrià - Sant Gervasi,2.104771,41.412979
23,"MARTÍNEZ MORALES, MARIA ANGELES",Farmàcia,CARRER ANDRADE,151,8020,331.0,Barcelona - 10J,7847.0,Barcelona Sant Martí,2.201153,41.416131
30,"VIVES SEGUI, MA ASSUMPCIO",Farmàcia,CARRER GRAN VIA CORTS CATALANES,1158,8020,74.0,Barcelona - 10D,7847.0,Barcelona Sant Martí,2.207452,41.418166
32,LIFE ÒPTICA,Òptica,CARRER VILLARROEL,69,8011,24.0,Barcelona - 02D,7851.0,Barcelona Eixample,2.158681,41.383622
...,...,...,...,...,...,...,...,...,...,...,...
5730,"LIARTE CHICO, VANESSA/ROIG ESCOBAR, BERTA",Farmàcia,CARRER ALBERT LLANAS,28,8024,51.0,Barcelona - 07A,7853.0,Barcelona Horta - Guinardó,2.158040,41.416806
5732,IT OPTICS,Òptica,CARRER LLOGREGOS,126,8032,54.0,Barcelona - 07D,7853.0,Barcelona Horta - Guinardó,2.155897,41.424291
5734,"RUBIS MARIN, GEMMA",Farmàcia,CARRER SANTA ENGRACIA,73,8016,61.0,Barcelona 8-E,7854.0,Barcelona Nou Barris,2.178574,41.441050
5736,"SAMARANCH MIRADA,MARIANO",Farmàcia,CARRER SARAGOSSA,42,8006,43.0,Barcelona - 05B,7850.0,Barcelona Sarrià - Sant Gervasi,2.148439,41.402592


In [329]:
# Final Data Merging (dif columns)
health_df.drop(columns=['register_id', 'Town', 'Phone Number'], inplace=True)
health_df = pd.concat([health_df, pharmacies_cat_bcn_df, health_equip_cat_bcn_df], ignore_index=True)

# All names in minus (some are mayus)
health_df['Name'] = health_df['Name'].str.lower()
# Convertir la columna 'Neighborhood ID', 'District ID', 'ZipCode', 'ABS ID' and 'Sanitary Sector ID' a tipo entero (1º NaN a 0, 2º Values to int)
# Street Number NO, because there can be 'bis'

health_df['Neighborhood ID'].fillna(value=0, inplace=True)
health_df['Neighborhood ID'] = health_df['Neighborhood ID'].astype(int)
health_df['District ID'].fillna(value=0, inplace=True)
health_df['District ID'] = health_df['District ID'].astype(int)
health_df['ZipCode'].fillna(value=0, inplace=True)
health_df['ZipCode'] = health_df['ZipCode'].astype(int)
health_df['ABS ID'].fillna(value=0, inplace=True)
health_df['ABS ID'] = health_df['ABS ID'].astype(int)
health_df['Sanitary Sector ID'].fillna(value=0, inplace=True)
health_df['Sanitary Sector ID'] = health_df['Sanitary Sector ID'].astype(int)

# Drop duplicates
health_df = health_df.drop_duplicates(subset='Name', keep='first')
display(health_df)

Unnamed: 0,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Service Type,...,Roadtype,Phone Number,Region ID,Region,Province ID,Province,Sanitary Sector ID,Sanitary Sector,Sanitary Region ID,Sanitary Region
0,centre d'atenció primària sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025,CAPs,...,,,,,,,0,,,
1,instituto oftalmológico tres torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017,Hospitals i clíniques,...,,,,,,,0,,,
2,clínica mi tres torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017,Hospitals i clíniques,...,,,,,,,0,,,
3,centre d'atenció primària montnegre,219100.0,C Montnegre,21.0,19,les Corts,4,Les Corts,8029,CAPs,...,,,,,,,0,,,
4,hospital de barcelona,144601.0,Av Diagonal,660.0,19,les Corts,4,Les Corts,8034,Hospitals i clíniques,...,,,,,,,0,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
4193,"soler alonso, eduard",,JAUME I,14,0,,0,,8002,Farmàcia,...,CARRER,933104226,13.0,BARCELONES,8.0,BARCELONA,7846,Barcelona Ciutat Vella,78.0,Barcelona
4194,òptika k,,MARIA CUBI,182,0,,0,,8021,Òptica,...,CARRER,,13.0,BARCELONES,8.0,BARCELONA,7850,Barcelona Sarrià - Sant Gervasi,78.0,Barcelona
4200,òptica andorrana,,FABRA I PUIG,184,0,,0,,8016,Òptica,...,PASSEIG,935152896,13.0,BARCELONES,8.0,BARCELONA,7854,Barcelona Nou Barris,78.0,Barcelona
4202,"ramírez lópez, belén",,AUGUSTA,113,0,,0,,8006,Farmàcia,...,VIA,932097520,13.0,BARCELONES,8.0,BARCELONA,7850,Barcelona Sarrià - Sant Gervasi,78.0,Barcelona


In [409]:
cols = health_df.columns.tolist()

for col in cols:
  print(col,"Nulls:",health_df[col].isnull().sum())

register_id Nulls: 0
Name Nulls: 0
Street ID Nulls: 1
Road Name Nulls: 1
Street Number Nulls: 1
Neighborhood ID Nulls: 0
Neighborhood Name Nulls: 0
District ID Nulls: 0
District Name Nulls: 0
ZipCode Nulls: 1
Town Nulls: 0
Phone Number Nulls: 3
Service Type Nulls: 0
geo_epgs_25831_x Nulls: 0
geo_epgs_25831_y Nulls: 0
Latitude Nulls: 0
Longitude Nulls: 0


### Revise services

In [410]:
health_df['Service Type'].unique()

array(['CAPs', 'Hospitals i clíniques', 'CAPs, Centres urgències (CUAPs)',
       'Centres urgències (CUAPs)',
       'Hospitals i clíniques, Centres de dia gent gran',
       'Hospitals i clíniques, Residències gent gran', 'Farmàcies',
       "Farmàcies, Farmàcies servei de 9 a 22 h tot l'any",
       'Farmàcies, Farmàcies permanents', 'Centres de dia gent gran',
       'Centres de dia gent gran, Residències gent gran',
       'Residències gent gran', 'Planificació familiar'], dtype=object)

In [411]:
health_df = health_df[health_df['Service Type'] != 'Planificació familiar']

### Service Type Grouping

In [412]:
# DataFrame: health_df
# Define the mapping for replacement
pharmacy_mapping = {
    'Farmàcies': 'Farmàcia',
    "Farmàcies, Farmàcies servei de 9 a 22 h tot l'any": 'Farmàcia',
    'Farmàcies, Farmàcies permanents': 'Farmàcia',
    'Farmaciola': 'Farmàcia',
    # Add more mappings as needed
}

# Replace values in the 'ServiceType' column
health_df['Service Type'] = health_df['Service Type'].replace(pharmacy_mapping)

display(health_df)

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿1015170605,Centre d'Atenció Primària Sardenya,76807.0,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAPs,430258.038349,4.584550e+06,41.409469,2.165559
1,﻿68125439,Instituto Oftalmológico Tres Torres,28000.0,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospitals i clíniques,427267.424673,4.583269e+06,41.397670,2.129935
2,﻿75990049258,Clínica Mi Tres Torres,103502.0,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospitals i clíniques,427179.337491,4.583445e+06,41.399249,2.128860
3,﻿93056132443,Centre d'Atenció Primària Montnegre,219100.0,C Montnegre,21.0,19,les Corts,4,Les Corts,8029.0,BARCELONA,933632965,CAPs,427992.371870,4.582146e+06,41.387618,2.138740
4,﻿75990060288,Hospital de Barcelona,144601.0,Av Diagonal,660.0,19,les Corts,4,Les Corts,8034.0,BARCELONA,932542400,Hospitals i clíniques,427228.518238,4.582406e+06,41.389892,2.129574
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1571,﻿75990116588,Residència Assistencial per a Gent Gran San Pedro,309406.0,Rda Sant Pere,60.0,7,la Dreta de l'Eixample,2,Eixample,8010.0,BARCELONA,933102860,Residències gent gran,431283.357815,4.582468e+06,41.390806,2.178062
1572,﻿98336093115,Residència Assistida per a Gent Gran Institut ...,315500.0,Pg Santa Eulàlia,23.0,23,Sarrià,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932030989,Residències gent gran,425956.699256,4.583965e+06,41.403816,2.114172
1573,﻿98336115952,Residència Assistida per a Gent Gran Vibentia ...,263600.0,C Portell,11.0,37,el Carmel,7,Horta-Guinardó,8023.0,BARCELONA,932132131,Residències gent gran,429053.952844,4.585498e+06,41.417910,2.151043
1574,﻿99400733849,Residència Assitida per a Gent Gran La Vostra ...,137300.0,Carrer de la Foneria,29.0,13,la Marina de Port,3,Sants-Montjuïc,8038.0,Barcelona,,Residències gent gran,427900.987000,4.579122e+06,41.360380,2.138007


In [413]:
health_df['Service Type'].unique()

array(['CAPs', 'Hospitals i clíniques', 'CAPs, Centres urgències (CUAPs)',
       'Centres urgències (CUAPs)',
       'Hospitals i clíniques, Centres de dia gent gran',
       'Hospitals i clíniques, Residències gent gran', 'Farmàcia',
       'Centres de dia gent gran',
       'Centres de dia gent gran, Residències gent gran',
       'Residències gent gran'], dtype=object)

In [418]:
#display(health_df[health_df['Street ID'].isna()])
health_df = health_df.dropna(subset=['Street ID'])

In [419]:
health_df['Street ID'] = health_df['Street ID'].astype(int)
#display(health_df)

### Rename Service Type options into english or group

In [420]:
health_df['Service Type'].unique()

array(['CAPs', 'Hospitals i clíniques', 'CAPs, Centres urgències (CUAPs)',
       'Centres urgències (CUAPs)',
       'Hospitals i clíniques, Centres de dia gent gran',
       'Hospitals i clíniques, Residències gent gran', 'Farmàcia',
       'Centres de dia gent gran',
       'Centres de dia gent gran, Residències gent gran',
       'Residències gent gran'], dtype=object)

In [427]:
display(health_df[health_df['Service Type'] == 'Hospitals i clíniques, Centres de dia gent gran'])

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
15,﻿98349142312,Centre de Dia Serveis Geriàtrics de Barcelona ...,209900,Av Meridiana,328.0,61,la Sagrera,9,Sant Andreu,8027.0,BARCELONA,935111422,"Hospitals i clíniques, Centres de dia gent gran",432077.120325,4586307.0,41.425457,2.187124


In [428]:
display(health_df[health_df['Service Type'] == 'Hospitals i clíniques, Residències gent gran'])

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
71,﻿75990127413,Residència Sociosanitaria Assistida Psicoclíni...,349105,Pg Universal,34.0,43,Horta,7,Horta-Guinardó,8042.0,BARCELONA,934275250,"Hospitals i clíniques, Residències gent gran",429990.319293,4587706.0,41.437877,2.161991


In [431]:
display(health_df[health_df['Service Type'] == 'Centres de dia gent gran, Residències gent gran'])

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
1177,﻿99400085277,Centre Residencial Mutuam Collserola,352507,Pg Vall d'Hebron,159.0,40,Montbau,7,Horta-Guinardó,8007.0,BARCELONA,933613900,"Centres de dia gent gran, Residències gent gran",428724.599637,4587410.0,41.435098,2.146877
1231,﻿75990105220,Residència Assistida per a Gent Gran Acollimen...,270901,Carrer de Pujades,273.0,68,el Poblenou,10,Sant Martí,8005.0,Barcelona,933077058,"Centres de dia gent gran, Residències gent gran",433588.468093,4584073.0,41.405456,2.205455


In [434]:
# MAPPING
mappings = {
    'CAPs': 'CAP (Primary Healthcare Center)',
    'CAPs, Centres urgències (CUAPs)': 'CAP (Primary Healthcare Center)',
    'Centres urgències (CUAPs)': 'CAP (Primary Healthcare Center)',
    'Hospitals i clíniques': 'Hospital & Clinic',
    # Geriatric
    'Hospitals i clíniques, Centres de dia gent gran': 'Day center for the elderly',
    'Centres de dia gent gran': 'Day center for the elderly',
    'Residències gent gran': 'Residence for the elderly',
    # Sociosanitariy
    'Hospitals i clíniques, Residències gent gran': 'Residence for the elderly',
    'Centres de dia gent gran, Residències gent gran': 'Residence for the elderly',
    'Farmàcia': 'Pharmacy',
}

health_df['Service Type'] = health_df['Service Type'].replace(mappings)
display(health_df)

Unnamed: 0,register_id,Name,Street ID,Road Name,Street Number,Neighborhood ID,Neighborhood Name,District ID,District Name,ZipCode,Town,Phone Number,Service Type,geo_epgs_25831_x,geo_epgs_25831_y,Latitude,Longitude
0,﻿1015170605,Centre d'Atenció Primària Sardenya,76807,C Sardenya,466.0,33,el Baix Guinardó,7,Horta-Guinardó,8025.0,BARCELONA,935674380,CAP (Primary Healthcare Center),430258.038349,4.584550e+06,41.409469,2.165559
1,﻿68125439,Instituto Oftalmológico Tres Torres,28000,Via Augusta,281.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,900842848,Hospital & Clinic,427267.424673,4.583269e+06,41.397670,2.129935
2,﻿75990049258,Clínica Mi Tres Torres,103502,C Doctor Roux,76.0,24,les Tres Torres,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932041300,Hospital & Clinic,427179.337491,4.583445e+06,41.399249,2.128860
3,﻿93056132443,Centre d'Atenció Primària Montnegre,219100,C Montnegre,21.0,19,les Corts,4,Les Corts,8029.0,BARCELONA,933632965,CAP (Primary Healthcare Center),427992.371870,4.582146e+06,41.387618,2.138740
4,﻿75990060288,Hospital de Barcelona,144601,Av Diagonal,660.0,19,les Corts,4,Les Corts,8034.0,BARCELONA,932542400,Hospital & Clinic,427228.518238,4.582406e+06,41.389892,2.129574
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1571,﻿75990116588,Residència Assistencial per a Gent Gran San Pedro,309406,Rda Sant Pere,60.0,7,la Dreta de l'Eixample,2,Eixample,8010.0,BARCELONA,933102860,Residence for the elderly,431283.357815,4.582468e+06,41.390806,2.178062
1572,﻿98336093115,Residència Assistida per a Gent Gran Institut ...,315500,Pg Santa Eulàlia,23.0,23,Sarrià,5,Sarrià-Sant Gervasi,8017.0,BARCELONA,932030989,Residence for the elderly,425956.699256,4.583965e+06,41.403816,2.114172
1573,﻿98336115952,Residència Assistida per a Gent Gran Vibentia ...,263600,C Portell,11.0,37,el Carmel,7,Horta-Guinardó,8023.0,BARCELONA,932132131,Residence for the elderly,429053.952844,4.585498e+06,41.417910,2.151043
1574,﻿99400733849,Residència Assitida per a Gent Gran La Vostra ...,137300,Carrer de la Foneria,29.0,13,la Marina de Port,3,Sants-Montjuïc,8038.0,Barcelona,,Residence for the elderly,427900.987000,4.579122e+06,41.360380,2.138007


### Asignar District ID and Neighborhood ID para aquellos con valor 0

In [435]:
# Load Codes & Streets dataset
#roads_codes_file = "roads_codes.csv"
#path_files = "/content/drive/MyDrive/TFG/2. Ejecución/CODE/HealthDatasets/"

# Read CSVs
#roads_codes_df = pd.read_csv(path_files + roads_codes_file, encoding='utf-8')
#display(roads_codes_df)

In [436]:
health_df.to_csv('cleaned_health.csv')

## 2. Build index

- Other Info: *Regiones sanitarias*, *Áreas Básicas de Salud*, *Sectores Sanitarios*, *Defunciones en Cataluña*, *Mortalidad por todo tipo de causa en Cataluña*, *Registro central de población del CatSalut: población por municipio*, *Registro central de población del CatSalut: población por área básica de salud*

#### Load Datasets used for computing the index:

- Number of health services: from health dataset

- Distance to centers: polygon or compute distances

- Region density over number of places offered: *Density assigned to sanitary region*

- Ratio of patients over doctor:

- Patients Satisfaction: *surveys*

- District Population Health: Cronic Diseases, Deaths in the district, population characteristics of district, indicators, etc.

- Center Status - Public or Private (insurances)

- Center Type Ponderation - Hospital/Farmacy, CAP, Residences ...

- School Type Ponderation - School/Kindergarten, Library/Studying Room, Civic Center: from educational dataset

- Standarized Characteristics: operations room, specialists, etc.