# Introduction to data pre-processing in Pandas: Hospitals

This short tutorial includes basic manipulation mechanisms in Pandas. You can follow the examples and run the code at each step.

The example is based on a file containing data about hospitals in Valais.

## Loading files

First we have to load the file. It is a CSV file that can be found in the following folder: `../data/hospitals.csv`

In [29]:
import pandas as pd
hospitalsFile = "../data/hospitals.csv"

hospitals=pd.read_csv(hospitalsFile,encoding='latin')
hospitals



Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID,RuleID_1,RuleID_2,beds,country
0,1,HVS - HÃ´pital psychiatrique de MalÃ©voz,Route de Morgins,10.0,CH-1870,Monthey,0800 012 210,,,Rule_1,Rule_1,30.0,
1,1,HVS - Clinique de Saint-AmÃ©,Vers Saint-AmÃ©,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,20.0,
2,4,Clinique de Pneus,,,1890,St-Maurice,,,,Rule_1,Rule_1,,
3,1,HVS - HÃ´pital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,10.0,Suisse
4,1,HVS - HÃ´pital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,11.0,
5,1,HVS - Institut Central des HÃ´pitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,15.0,
6,1,HVS - HÃ´pital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,30.0,
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,12.0,
8,1,HVS - HÃ´pital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,,Rule_1,Rule_1,45.0,
9,1,HVS - HÃ´pital de ViÃ¨ge,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,,Rule_1,Rule_1,23.0,


Be sure to use the right encoding. We used `latin` in the previous example and many characters were not correctly interpreted. Now we can change to `utf-8`

In [30]:
hospitals=pd.read_csv(hospitalsFile,encoding='utf-8')
hospitals


Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID,RuleID_1,RuleID_2,beds,country
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,CH-1870,Monthey,0800 012 210,,,Rule_1,Rule_1,30.0,
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,20.0,
2,4,Clinique de Pneus,,,1890,St-Maurice,,,,Rule_1,Rule_1,,
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,10.0,Suisse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,11.0,
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,15.0,
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,30.0,
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,12.0,
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,,Rule_1,Rule_1,45.0,
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,,Rule_1,Rule_1,23.0,


## Remove duplicates

Pandas has a function to identify duplicates based on a selection of columns. Let's find duplicates by Address, number and number of beds:

In [31]:
hospitals.duplicated(["Adresse","numero","beds"])

0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
11    False
12    False
13    False
14    False
15    False
16    False
17     True
18    False
19    False
20    False
21    False
22    False
23    False
24    False
dtype: bool

As we can see the row indexed *17* is duplicated. We can use a Pandas function to drop duplicates automatically. It drops all duplicates according to a subset of columns:

In [32]:
hospitals.drop_duplicates(["Adresse","numero","beds"])

Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID,RuleID_1,RuleID_2,beds,country
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,CH-1870,Monthey,0800 012 210,,,Rule_1,Rule_1,30.0,
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,20.0,
2,4,Clinique de Pneus,,,1890,St-Maurice,,,,Rule_1,Rule_1,,
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,10.0,Suisse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,11.0,
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,15.0,
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,30.0,
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,,Rule_1,Rule_1,12.0,
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,,Rule_1,Rule_1,45.0,
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,,Rule_1,Rule_1,23.0,


We can see now that the row 17 has been eliminated. 

## Finding missing values

We can verify with Pandas if there are any missing values in the table (Dataframe). The following code will tell us if there are missing values or not:

In [33]:
hospitals.isnull().values.any() 

True

Indeed there are. If you check all the `NaN` values, for example in most of the country values, are missing. 

We can have a more detailed summary of how many missing values are per column:

In [34]:
hospitals.isnull().sum()

CLASSE            0
ETABLISSEMENT     0
Adresse           1
numero            2
npa               0
ville             1
telephone         1
site_internet     2
RuleID           25
RuleID_1          0
RuleID_2          0
beds              2
country          24
dtype: int64

There are many null values indeed. We can fix this by elimination. 

## Droping an entire column

For example, the `RuleID` column is totally empty. We could get rid of it entirely with the `drop` function:

In [35]:
hospitals.drop(columns='RuleID', inplace=True)
hospitals.isnull().sum()

CLASSE            0
ETABLISSEMENT     0
Adresse           1
numero            2
npa               0
ville             1
telephone         1
site_internet     2
RuleID_1          0
RuleID_2          0
beds              2
country          24
dtype: int64

## Droping rows with missing values

We can see that there is one row that has no Address. This seems to be a msitake, and we may want to drop the entire row. 

The `dropna` function will help us, it will drop rows that have missing values in the specified column: 

In [36]:
hospitals.dropna(subset = ['Adresse'], inplace=True)
hospitals.isnull().sum()

CLASSE            0
ETABLISSEMENT     0
Adresse           0
numero            1
npa               0
ville             1
telephone         0
site_internet     1
RuleID_1          0
RuleID_2          0
beds              1
country          23
dtype: int64

Now we can see that all entries have an address. 

## Filling missing values with a default value

We can see that most countries are empty. We can use the `fillna` function to set a fixed value for all rows where it was missing:

In [37]:
hospitals["country"].fillna('Suisse', inplace=True)
hospitals

Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,CH-1870,Monthey,0800 012 210,,Rule_1,Rule_1,30.0,Suisse
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20.0,Suisse
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10.0,Suisse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11.0,Suisse
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15.0,Suisse
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30.0,Suisse
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12.0,Suisse
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45.0,Suisse
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23.0,Suisse
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,,Suisse


## Fill missing values with a computed value

We see that there is one hospital wihtout number of beds. To quickly fix it we will input the minimum of hospital beds into it:

In [38]:
hospitals['beds'].fillna((hospitals['beds'].min()), inplace=True)
hospitals

Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,CH-1870,Monthey,0800 012 210,,Rule_1,Rule_1,30.0,Suisse
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20.0,Suisse
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10.0,Suisse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11.0,Suisse
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15.0,Suisse
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30.0,Suisse
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12.0,Suisse
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45.0,Suisse
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23.0,Suisse
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,10.0,Suisse


## Filter out rows when vlaues are out of range

We can drop some rows if we think they have really wrong values. For example the Hospital de Fully has 5000 beds. This is not possible and we can in fact drop all hospitals with more than 100 beds:

In [39]:
hospitals = hospitals[hospitals['beds'] <= 1000]
hospitals

Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,CH-1870,Monthey,0800 012 210,,Rule_1,Rule_1,30.0,Suisse
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20.0,Suisse
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10.0,Suisse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11.0,Suisse
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15.0,Suisse
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30.0,Suisse
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12.0,Suisse
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45.0,Suisse
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23.0,Suisse
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,10.0,Suisse


## Data type modification

We can see that the street number and the number of beds are float values. They should be integer:

In [42]:
hospitals['numero'] = hospitals['numero'].astype('Int64') 
hospitals['beds'] = hospitals['beds'].astype('Int64') 

hospitals

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,CH-1870,Monthey,0800 012 210,,Rule_1,Rule_1,30,Suisse
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20,Suisse
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10,Suisse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11,Suisse
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15,Suisse
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30,Suisse
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12,Suisse
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45,Suisse
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23,Suisse
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,10,Suisse


## Text modifications

There are othe erros including the `CH-1870` that e can replace. Also the names of some cities are in uppercase (BRIG). We can fix this too:

In [43]:
hospitals['npa'] = hospitals['npa'].str.replace('CH-','')
hospitals['ville'] = hospitals['ville'].str.title()
hospitals

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,1870,Monthey,0800 012 210,,Rule_1,Rule_1,30,Suisse
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20,Suisse
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10,Suisse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11,Suisse
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15,Suisse
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30,Suisse
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12,Suisse
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45,Suisse
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23,Suisse
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,10,Suisse


## Merge column

We can create a new column merging data from others. For example we can create a full address column concatenating the content of the others:

In [44]:
hospitals['full_adresse'] = hospitals['Adresse'] + ' '+ hospitals['numero'].astype(str) + ', '+ hospitals['npa']+' '+hospitals['ville'] 
hospitals

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country,full_adresse
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,1870,Monthey,0800 012 210,,Rule_1,Rule_1,30,Suisse,"Route de Morgins 10, 1870 Monthey"
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20,Suisse,"Vers Saint-Amé 10, 1890 St-Maurice"
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10,Suisse,"Avenue de la Fusion 27, 1920 Martigny"
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sitten,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11,Suisse,"Avenue du Grand-Champsec 80, 1951 Sitten"
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15,Suisse,"Avenue du Grand-Champsec 86, 1951 Sion"
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30,Suisse,"Rue St-Charles 14, 3960 Sierre"
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12,Suisse,"Route de la Moubra 87, 3963 Crans-Montana"
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45,Suisse,
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23,Suisse,"Pflanzettastrasse 8, 3930 Visp"
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,10,Suisse,"Route de Morgins 54, 1870 Monthey"


## Mapping replacement

We can see that some city names in gemran could be modified. We can do a mapping from the old values to new values and apply it to the entire dataframe:

In [45]:

hospitals=hospitals.replace({"ville":{"Sitten": "Sion","Viège":"Visp"}})
hospitals

Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country,full_adresse
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,1870,Monthey,0800 012 210,,Rule_1,Rule_1,30,Suisse,"Route de Morgins 10, 1870 Monthey"
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20,Suisse,"Vers Saint-Amé 10, 1890 St-Maurice"
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10,Suisse,"Avenue de la Fusion 27, 1920 Martigny"
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sion,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11,Suisse,"Avenue du Grand-Champsec 80, 1951 Sitten"
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15,Suisse,"Avenue du Grand-Champsec 86, 1951 Sion"
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30,Suisse,"Rue St-Charles 14, 3960 Sierre"
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12,Suisse,"Route de la Moubra 87, 3963 Crans-Montana"
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45,Suisse,
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23,Suisse,"Pflanzettastrasse 8, 3930 Visp"
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,10,Suisse,"Route de Morgins 54, 1870 Monthey"


## Codification

The cities are text entries. We can convert them to categories:

In [46]:
hospitals['ville'] = pd.Categorical(hospitals.ville)
hospitals.dtypes

CLASSE              int64
ETABLISSEMENT      object
Adresse            object
numero              Int64
npa                object
ville            category
telephone          object
site_internet      object
RuleID_1           object
RuleID_2           object
beds                Int64
country            object
full_adresse       object
dtype: object

## Selection filtering

We can do all sorts of filtering. For example only take those hospitals located in an avenue and having less than 15 beds:

In [47]:
hospitals.loc[(hospitals['Adresse'].str.startswith('Avenue')) & (hospitals['beds']<15)] 

Unnamed: 0,CLASSE,ETABLISSEMENT,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,beds,country,full_adresse
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10,Suisse,"Avenue de la Fusion 27, 1920 Martigny"
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80,1951,Sion,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11,Suisse,"Avenue du Grand-Champsec 80, 1951 Sitten"


## Grouping

We can also group hospitals by some criteria, for instance by city:

In [53]:
groups=hospitals.groupby('ville')
groups.get_group('Sion')

Unnamed: 0,CLASSE,etablissement,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,lits,pays,full_adresse
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80,1951,Sion,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11,Suisse,"Avenue du Grand-Champsec 80, 1951 Sitten"
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15,Suisse,"Avenue du Grand-Champsec 86, 1951 Sion"
13,2,Clinique de Valère,Rue Pré-Fleuri,16,1950,Sion,027/327.10.10,http://www.cliniquevalere.ch,Rule_1,Rule_1,12,Suisse,"Rue Pré-Fleuri 16, 1950 Sitten"
19,2,Clinique romande de réadaptation (CRR SUVA),Avenue Grand-Champsec,90,1951,Sion,027/603.30.30,http://www.crr-suva.ch,Rule_1,Rule_1,33,Suisse,"Avenue Grand-Champsec 90, 1951 Sion"


## Rename column

In [50]:
hospitals.rename(columns={"beds":"lits"},inplace=True)
hospitals.rename(columns={"ETABLISSEMENT":"etablissement"},inplace=True)
hospitals.rename(columns={"country":"pays"},inplace=True)
hospitals

Unnamed: 0,CLASSE,etablissement,Adresse,numero,npa,ville,telephone,site_internet,RuleID_1,RuleID_2,lits,pays,full_adresse
0,1,HVS - Hôpital psychiatrique de Malévoz,Route de Morgins,10.0,1870,Monthey,0800 012 210,,Rule_1,Rule_1,30,Suisse,"Route de Morgins 10, 1870 Monthey"
1,1,HVS - Clinique de Saint-Amé,Vers Saint-Amé,10.0,1890,St-Maurice,027/604.66.55,http://www.hopitalduvalais.ch,Rule_1,Rule_1,20,Suisse,"Vers Saint-Amé 10, 1890 St-Maurice"
3,1,HVS - Hôpital de Martigny,Avenue de la Fusion,27.0,1920,Martigny,027/603.90.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,10,Suisse,"Avenue de la Fusion 27, 1920 Martigny"
4,1,HVS - Hôpital de Sion,Avenue du Grand-Champsec,80.0,1951,Sion,027/603.40.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,11,Suisse,"Avenue du Grand-Champsec 80, 1951 Sitten"
5,1,HVS - Institut Central des Hôpitaux ICH,Avenue du Grand-Champsec,86.0,1951,Sion,027/603.47.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,15,Suisse,"Avenue du Grand-Champsec 86, 1951 Sion"
6,1,HVS - Hôpital de Sierre,Rue St-Charles,14.0,3960,Sierre,027/603.70.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,30,Suisse,"Rue St-Charles 14, 3960 Sierre"
7,3,HVS - Centre Valaisan de Pneumologie (CVP),Route de la Moubra,87.0,3963,Crans-Montana,027/603.80.00,http://www.hopitalduvalais.ch,Rule_1,Rule_1,12,Suisse,"Route de la Moubra 87, 3963 Crans-Montana"
8,1,HVS - Hôpital de Brigue,Oberlandstrasse,14.0,3900,,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,45,Suisse,
9,1,HVS - Hôpital de Viège,Pflanzettastrasse,8.0,3930,Visp,027/604.33.33,http://www.spitalvs.ch/de/spital-wallis/stando...,Rule_1,Rule_1,23,Suisse,"Pflanzettastrasse 8, 3930 Visp"
10,1,HRC - Hôpital de Monthey,Route de Morgins,54.0,1870,Monthey,024/473.17.31,http://www.hopitalrivierachablais.ch,Rule_1,Rule_1,10,Suisse,"Route de Morgins 54, 1870 Monthey"
