# LT healthcare services

Prepared by [**K.Clemons**](mailto:kimberly.clemons@ext.ec.europa.eu) and [**J.Grazzini**](mailto:jacopo.grazzini@ec.europa.eu) ([_Eurostat_](https://ec.europa.eu/eurostat)).

This notebook illustrates the way to produce *ad-hoc* harmonised data collected from LT national authority. It shows how data can be automatically harmonised, using single-use script-based approachr or a reusable metadata-based approach.

## Setting the environment <a id="environment"></a>

In [1]:
PROJECT = 'basic-services'

import os, sys
import functools
import inspect
import json

import numpy as np
import pandas as pd

In [2]:
COUNTRY, LANG = 'Lithuania', 'lt'
CC = 'LT'

In [3]:
THISDIR = os.getcwd()
SRCPATH = os.path.abspath('../../../src/python/pyeufacility/')
DATAPATH = os.path.abspath('../../../data/healthcare/raw/') 
# or       '/Users/<username>/basic-services/data/healthcare/%s' % CC

try:
    print('src rel. path: %s%s' % (PROJECT, SRCPATH.split(PROJECT)[1])) 
    print('data rel. path: %s%s' % (PROJECT, DATAPATH.split(PROJECT)[1])) 
except:
    print('current path: %s' % os.getcwd())

src rel. path: basic-services/src/python/pyeufacility
data rel. path: basic-services/data/healthcare/raw


## Ad-hoc data ingestion, exploration and processing

In [4]:
IFILE = 'Hospitals_2018.xlsx'

try:
    import xlrd
except ImportError:
    !{sys.executable} -m pip install xlrd

file = os.path.join(DATAPATH, IFILE)
df_src = pd.read_excel(file, header = 2) 

print(df_src.columns)
# print(df_src.dtypes)

Index(['Code of municipality', 'Municipality', 'ID', 'parent_ID',
       'Code of legal entity',
       'Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)',
       'type_code', 'type_name',
       'Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private',
       'Name', 'Address', 'Number of beds at the end of the 2018'],
      dtype='object')


Clean the dataset, add/rename columns, derive information, *etc...*:

In [5]:
df1 = df_src.copy()

df1['country'] = COUNTRY
df1['public_private'] = (df1['Level: 1-national, 2-regional, 3-municipality, 4-nursing, '
                             '5-other public and specialized, 6-private']
                         .apply(lambda x: 'private' if x==6 else 'public')
                        )

df1.rename(columns = {'ID':                                    'id',
                      'Name':                                  'name',
                      #'Address':                               'address',
                      'Number of beds at the end of the 2018': 'cap_beds',
                      'type_name':                             'facility_type'}, 
           inplace = True, errors = 'ignore') 

df1['address'] = df1[['name', 'Address', 'country']].apply(', '.join, 1)

df1.drop(columns = ['Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)',
                    'Level: 1-national, 2-regional, 3-municipality, 4-nursing, '
                    '5-other public and specialized, 6-private',
                    'Code of legal entity', 'Code of municipality'],
         inplace = True, errors = 'ignore')

df1.head()

Unnamed: 0,Municipality,id,parent_ID,type_code,facility_type,name,Address,cap_beds,country,public_private,address
0,Vilnius,1,0,1,general,Vilniaus Universiteto ligoninė Santaros klinikos,"Santariškių 2, Vilnius LT-08661",1356,Lithuania,public,Vilniaus Universiteto ligoninė Santaros klinik...
1,Vilnius,5,1,1,general,Vilniaus Universiteto ligoninės Santaros klini...,"Santariškių 7, Vilnius LT-08406",515,Lithuania,public,Vilniaus Universiteto ligoninės Santaros klini...
2,Vilnius,32,0,1,general,Respublikinė Vilniaus universitetinė ligoninė,"Šiltnamių 29, Vilnius",673,Lithuania,public,"Respublikinė Vilniaus universitetinė ligoninė,..."
3,Vilnius,3,0,1,general,Vilniaus Universiteto ligoninės Žalgirio klinika,"Žalgirio 117, Vilnius",58,Lithuania,public,Vilniaus Universiteto ligoninės Žalgirio klini...
4,Vilnius,10,0,19,psychiatry,Respublikinė Vilniaus psichiatrijos ligoninė,"Parko g. 21, Vilnius",542,Lithuania,public,"Respublikinė Vilniaus psichiatrijos ligoninė, ..."


Build a geocoder/geolocator using the [`geopy`](https://geopy.readthedocs.io/en/stable/#) package, possibly inserting an API key to the geocoding service if needed (`OSM` and `Bing` are considered below):

In [347]:
try:
    import geopy
except ImportError:
    !{sys.executable} -m pip install geopy
finally:
    from geopy import geocoders

key = None
try:
    assert key is not None
except:
    geolocator = functools.partial(geocoders.Nominatim(user_agent = PROJECT).geocode, language = LANG)
else:
    geolocator = geocoders.Bing(user_agent = PROJECT, timeout = 100, api_key = key).geocode

Note some way to circumvent the geocoder limitations (this may not be needed for all geocoders):

In [348]:
print('location: %s' % df1.iloc[0]['address'])
try:
    location = geolocator(df1.iloc[0]['address'])
    assert location is not None
except:
    try:
        location = geolocator(df1.iloc[0]['Address'])
        assert location is not None 
    except:
        pass
    else:
        print('lat/lon coordinates: (%s,%s)' % (location.latitude, location.longitude))
else:
    print('lat/lon coordinates: (%s,%s)' % (location.latitude, location.longitude))

location: Vilniaus Universiteto ligoninė Santaros klinikos, Santariškių 2, Vilnius LT-08661, Lithuania
lat/lon coordinates: (54.75234945,25.277250069302262)


Apply the geocoder to the newly built `'address'` variable

In [349]:
df1['__coord__'] = df1['address'].apply(geolocator)
if df1['__coord__'].isnull().any():
    index = df1[df1['__coord__'].isnull()].index
    short_address = (df1[['Address', 'country']]
                     .apply(', '.join, 1)
                    ) # of use when geocoding fails
    df1.loc[index, '__coord__'] = short_address.loc[index].apply(geolocator)
    
df1.head()

Unnamed: 0,Municipality,id,parent_ID,type_code,facility_type,name,Address,cap_beds,country,public_private,address,__coord__
0,Vilnius,1,0,1,general,Vilniaus Universiteto ligoninė Santaros klinikos,"Santariškių 2, Vilnius LT-08661",1356,Lithuania,public,Vilniaus Universiteto ligoninė Santaros klinik...,"(Santariškių klinikos, 2, Santariškių g., Sant..."
1,Vilnius,5,1,1,general,Vilniaus Universiteto ligoninės Santaros klini...,"Santariškių 7, Vilnius LT-08406",515,Lithuania,public,Vilniaus Universiteto ligoninės Santaros klini...,"(Santariškių klinikų vaikų ligoninė, 7, Santar..."
2,Vilnius,32,0,1,general,Respublikinė Vilniaus universitetinė ligoninė,"Šiltnamių 29, Vilnius",673,Lithuania,public,"Respublikinė Vilniaus universitetinė ligoninė,...",(Respublikinė Vilniaus universitetinė ligoninė...
3,Vilnius,3,0,1,general,Vilniaus Universiteto ligoninės Žalgirio klinika,"Žalgirio 117, Vilnius",58,Lithuania,public,Vilniaus Universiteto ligoninės Žalgirio klini...,"(Žalgirio klinika, 117, Žalgirio g., Šnipiškių..."
4,Vilnius,10,0,19,psychiatry,Respublikinė Vilniaus psichiatrijos ligoninė,"Parko g. 21, Vilnius",542,Lithuania,public,"Respublikinė Vilniaus psichiatrijos ligoninė, ...","(Respublikinė Vilniaus psichiatrijos ligoninė,..."


Extract the latitude/longitude coordinates from the geocoder answers:

In [350]:
df1['lat'], df1['lon'] = zip(*df1['__coord__']
                             .apply(lambda x: (x.latitude, x.longitude) if x != None else (np.nan, np.nan))
                            )
df1.drop(columns = '__coord__', inplace = True, errors = 'ignore')

df1.head()

Unnamed: 0,Municipality,id,parent_ID,type_code,facility_type,name,Address,cap_beds,country,public_private,address,lat,lon
0,Vilnius,1,0,1,general,Vilniaus Universiteto ligoninė Santaros klinikos,"Santariškių 2, Vilnius LT-08661",1356,Lithuania,public,Vilniaus Universiteto ligoninė Santaros klinik...,54.752349,25.27725
1,Vilnius,5,1,1,general,Vilniaus Universiteto ligoninės Santaros klini...,"Santariškių 7, Vilnius LT-08406",515,Lithuania,public,Vilniaus Universiteto ligoninės Santaros klini...,54.755024,25.282701
2,Vilnius,32,0,1,general,Respublikinė Vilniaus universitetinė ligoninė,"Šiltnamių 29, Vilnius",673,Lithuania,public,"Respublikinė Vilniaus universitetinė ligoninė,...",54.668216,25.207738
3,Vilnius,3,0,1,general,Vilniaus Universiteto ligoninės Žalgirio klinika,"Žalgirio 117, Vilnius",58,Lithuania,public,Vilniaus Universiteto ligoninės Žalgirio klini...,54.704809,25.278833
4,Vilnius,10,0,19,psychiatry,Respublikinė Vilniaus psichiatrijos ligoninė,"Parko g. 21, Vilnius",542,Lithuania,public,"Respublikinė Vilniaus psichiatrijos ligoninė, ...",54.683846,25.417448


Save the output data:

In [351]:
OFILE = 'LT_geolocated.csv'
df1.to_csv(OFILE)

## Automating the production

We aim at automating the above operations into a transparent reusable process, while ensuring the potential evolvability of this process (*e.g.*, when data change). For that purpose, we adopt the metadata-based approach implemented throughout the [`pyeudatanat`](https://github.com/eurostat/pyEUDatNat) and [`pyeufacility`](https://github.com/eurostat/pyeufacility) packages.

For a proper import of the packages and setup of the environment, see [the dedicated cells](https://github.com/eurostat/basic-services/blob/master/src/python/notebooks/01_HCS_generic_example_CZ.ipynb#environment) of the [`01_HCS_generic_example_CZ.ipynb` notebook](https://github.com/eurostat/basic-services/blob/master/src/python/notebooks/01_HCS_generic_example_CZ.ipynb).

In [386]:
try:
    import pyeudatnat
except ImportError:
    !{sys.executable} -m pip install git+https://github.com/eurostat/pyEUDatNat.git
finally:
    from pyeudatnat import io

try:
    import pyeufacility
except ImportError:
    # !{sys.executable} -m pip install git+https://github.com/eurostat/basic-services.git
    # if you launch it from the notebooks/ directory, try for instance:
    pardir = os.path.abspath(os.path.join(THISDIR, '../'))
    sys.path.insert(0,pardir)
finally:
    from pyeufacility import hcs, config

In [353]:
prepare_data = hcs.LThcs.prepare_data
preparator = prepare_data()

predicate = lambda o: (inspect.ismethod(o) or inspect.isfunction(o)) and not o.__name__.startswith('__')
print("Methods in '%s': \033[1m%s\033[0m: " 
      % (prepare_data.__name__, [l[0] for l in inspect.getmembers(prepare_data, predicate = predicate)]))

Methods in 'prepare_data': [1m['set_address', 'set_pp', 'split_Adr'][0m: 


In [354]:
split_Adr = preparator.split_Adr
assert callable(split_Adr) is True
print("Method: \033[1m'%s'\033[0m" % split_Adr.__name__)
print(inspect.getsource(split_Adr))

print("Example of use of '%s':" % split_Adr.__name__)
print(" * input complete address: \033[1m'%s'\033[0m" % df['Address'].iloc[0])
print(" * output decomposed address: \033[1m'%s'\033[0m" % list(split_Adr(df['Address'].iloc[0])))

Method: [1m'split_Adr'[0m
    @classmethod
    def split_Adr(cls, s):
        street, number, postcode, city = "", "", "", ""
        mem = re.compile(r'\s*,\s*').split(s)
        left, right = mem[0], " ".join(mem[1:])
        while left == '' and len(right)>1:
            left = right[-1].strip()
            right = right[:-1]
        if len(right) == 1 and left == '':
            return "", right[0], "", ""
        elif len(left) == 1 and right == '':
            return "", "", left[0], ""
        rights = re.compile(r'\s+').split(right)
        for r in rights:
            r = r.strip()
            if r == '': continue
            if r.isnumeric() or r[-1].isdigit():
                postcode = " ".join([postcode,r])
            else:
                city = " ".join([city,r])
        lefts = re.compile(r'\s+').split(left)
        for l in lefts:
            l = l.strip()
            if l == '': continue
            if l.isnumeric() or l[0].isdigit():
                number = " ".j

In [355]:
df_src.head()

Unnamed: 0,Code of municipality,Municipality,ID,parent_ID,Code of legal entity,"Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)",type_code,type_name,"Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private",Name,Address,Number of beds at the end of the 2018
0,101,Vilnius,1,0,124364561,1,1,general,1,Vilniaus Universiteto ligoninė Santaros klinikos,"Santariškių 2, Vilnius LT-08661",1356
1,101,Vilnius,5,1,302620298,1,1,general,1,Vilniaus Universiteto ligoninės Santaros klini...,"Santariškių 7, Vilnius LT-08406",515
2,101,Vilnius,32,0,124243848,1,1,general,1,Respublikinė Vilniaus universitetinė ligoninė,"Šiltnamių 29, Vilnius",673
3,101,Vilnius,3,0,191744287,1,1,general,1,Vilniaus Universiteto ligoninės Žalgirio klinika,"Žalgirio 117, Vilnius",58
4,101,Vilnius,10,0,124247526,1,19,psychiatry,5,Respublikinė Vilniaus psichiatrijos ligoninė,"Parko g. 21, Vilnius",542


In [358]:
set_address = preparator.set_address
assert callable(set_address) is True
print("Method: \033[1m'%s'\033[0m" % set_address.__name__)
print(inspect.getsource(set_address))

df2 = df_src.copy()
set_address(df2)

print("Example of use of '%s':" % set_address.__name__)
df2[['postcode', 'city', 'street', 'number']].head()

Method: [1m'set_address'[0m
    def set_address(self, data):
        cols = data.columns.tolist()
        new_cols = ['street', 'number', 'postcode', 'city']
        data.reindex(columns = [*cols, *new_cols], fill_value = "")
        data[new_cols] =  (
            data
            .apply(lambda row: pd.Series(self.split_Adr(row['Address'])), axis=1)
            )

Example of use of 'set_address':


Unnamed: 0,postcode,city,street,number
0,LT-08661,Vilnius,Santariškių,2
1,LT-08406,Vilnius,Santariškių,7
2,,Vilnius,Šiltnamių,29
3,,Vilnius,Žalgirio,117
4,,Vilnius,Parko g.,21


In [360]:
set_pp = preparator.set_pp
assert callable(set_pp) is True
print("Method: \033[1m'%s'\033[0m" % set_pp.__name__)
print(inspect.getsource(set_pp))

set_pp(df2)

print("Example of use of '%s':" % set_pp.__name__)
df2[['public_private']].head()

Method: [1m'set_pp'[0m
    def set_pp(self, data):
        col_pp = 'Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private'
        data['public_private'] = (
            data[col_pp]
            .apply(lambda x: 'private' if x==6 else 'public')
            )

Example of use of 'set_pp':


Unnamed: 0,public_private
0,public
1,public
2,public
3,public
4,public


In [220]:
METAPATH=os.path.dirname(inspect.getfile(hcs))
print(METAPATH.split(PROJECT)[1])

/src/python/pyeufacility/hcs


In [246]:
meta = os.path.join(METAPATH,'%shcs.json' % CC)

try:
    assert os.path.exists(meta)
    with open(meta, 'r') as fp:
        metadata = io.Json.load(fp)
except (AssertionError,FileNotFoundError):
    print("\n! No metadata JSON-file found for '%s'!" % CC)
else:
    print(io.Json.dumps(metadata, indent=2))

{
  "country": {
    "code": "LT",
    "name": "Lithuania"
  },
  "lang": {
    "code": "lt",
    "name": "Lithuanian"
  },
  "proj": null,
  "file": "Hospitals_2018.xlsx",
  "path": "../../../data/healthcare/raw/",
  "enc": "cp850",
  "sep": ";",
  "columns": [],
  "index": {
    "id": "ID",
    "name": "Name",
    "site": null,
    "lat": null,
    "lon": null,
    "geo_qual": null,
    "street": null,
    "number": null,
    "postcode": null,
    "city": "Municipality",
    "cc": null,
    "country": null,
    "beds": "Number of beds at the end of the 2018",
    "prac": null,
    "rooms": null,
    "ER": null,
    "type": "type_name",
    "PP": null,
    "specs": null,
    "tel": null,
    "email": null,
    "url": null,
    "refdate": null,
    "pubdate": null
  }
}


In [247]:
{k:v for (k,v) in metadata['index'].items() if v is not None}

{'id': 'ID',
 'name': 'Name',
 'city': 'Municipality',
 'beds': 'Number of beds at the end of the 2018',
 'type': 'type_name'}

In [387]:
LThcs = config.facilityFactory(facility = 'HCS', meta = metadata)

In [388]:
lt = LThcs()
lt.load_data(header=2)
lt.data.head()

{'header': 2, 'encoding': 'cp850', 'sep': ';', 'dtype': <class 'object'>, 'compression': 'infer', 'caching': False, 'cache_store': None, 'cache_expire': 0, 'cache_force': True}
Code of municipality                                                                                   object
Municipality                                                                                           object
ID                                                                                                     object
parent_ID                                                                                              object
Code of legal entity                                                                                   object
Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)               object
type_code                                                                                              object
type_name                                            

Unnamed: 0,Code of municipality,Municipality,ID,parent_ID,Code of legal entity,"Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)",type_code,type_name,"Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private",Name,Address,Number of beds at the end of the 2018
0,101,Vilnius,1,0,124364561,1,1,general,1,Vilniaus Universiteto ligoninė Santaros klinikos,"Santariškių 2, Vilnius LT-08661",1356
1,101,Vilnius,5,1,302620298,1,1,general,1,Vilniaus Universiteto ligoninės Santaros klini...,"Santariškių 7, Vilnius LT-08406",515
2,101,Vilnius,32,0,124243848,1,1,general,1,Respublikinė Vilniaus universitetinė ligoninė,"Šiltnamių 29, Vilnius",673
3,101,Vilnius,3,0,191744287,1,1,general,1,Vilniaus Universiteto ligoninės Žalgirio klinika,"Žalgirio 117, Vilnius",58
4,101,Vilnius,10,0,124247526,1,19,psychiatry,5,Respublikinė Vilniaus psichiatrijos ligoninė,"Parko g. 21, Vilnius",542


In [385]:
import importlib
importlib.reload(pyeudatnat)
importlib.reload(pyeudatnat.io)
importlib.reload(pyeudatnat.base)

importlib.reload(pyeufacility)
importlib.reload(hcs)
importlib.reload(config)
importlib.reload(hcs.LThcs)

from pyeufacility import hcs, config

In [365]:
df_src.head()

Unnamed: 0,Code of municipality,Municipality,ID,parent_ID,Code of legal entity,"Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)",type_code,type_name,"Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private",Name,Address,Number of beds at the end of the 2018
0,101,Vilnius,1,0,124364561,1,1,general,1,Vilniaus Universiteto ligoninė Santaros klinikos,"Santariškių 2, Vilnius LT-08661",1356
1,101,Vilnius,5,1,302620298,1,1,general,1,Vilniaus Universiteto ligoninės Santaros klini...,"Santariškių 7, Vilnius LT-08406",515
2,101,Vilnius,32,0,124243848,1,1,general,1,Respublikinė Vilniaus universitetinė ligoninė,"Šiltnamių 29, Vilnius",673
3,101,Vilnius,3,0,191744287,1,1,general,1,Vilniaus Universiteto ligoninės Žalgirio klinika,"Žalgirio 117, Vilnius",58
4,101,Vilnius,10,0,124247526,1,19,psychiatry,5,Respublikinė Vilniaus psichiatrijos ligoninė,"Parko g. 21, Vilnius",542


In [367]:
lt.data.head()

Unnamed: 0,Code of municipality,Municipality,ID,parent_ID,Code of legal entity,"Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)",type_code,type_name,"Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private",Name,Address,Number of beds at the end of the 2018
0,101,Vilnius,1,0,124364561,1,1,general,1,Vilniaus Universiteto ligoninė Santaros klinikos,"Santariškių 2, Vilnius LT-08661",1356
1,101,Vilnius,5,1,302620298,1,1,general,1,Vilniaus Universiteto ligoninės Santaros klini...,"Santariškių 7, Vilnius LT-08406",515
2,101,Vilnius,32,0,124243848,1,1,general,1,Respublikinė Vilniaus universitetinė ligoninė,"Šiltnamių 29, Vilnius",673
3,101,Vilnius,3,0,191744287,1,1,general,1,Vilniaus Universiteto ligoninės Žalgirio klinika,"Žalgirio 117, Vilnius",58
4,101,Vilnius,10,0,124247526,1,19,psychiatry,5,Respublikinė Vilniaus psichiatrijos ligoninė,"Parko g. 21, Vilnius",542


In [369]:
assert df_src.equals(lt.data)

AssertionError: 

In [373]:
type(lt.data)

pandas.core.frame.DataFrame

In [372]:
type(df_src)

pandas.core.frame.DataFrame

In [376]:
df_src.dtypes

Code of municipality                                                                                    int64
Municipality                                                                                           object
ID                                                                                                      int64
parent_ID                                                                                               int64
Code of legal entity                                                                                    int64
Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)                int64
type_code                                                                                               int64
type_name                                                                                              object
Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private     int64
Name      

In [377]:
lt.data.dtypes

Code of municipality                                                                                   object
Municipality                                                                                           object
ID                                                                                                     object
parent_ID                                                                                              object
Code of legal entity                                                                                   object
Subordination: 1-national (MoH), 3-municipality, 8-private, 9-other ministries (not MoH)               object
type_code                                                                                              object
type_name                                                                                              object
Level: 1-national, 2-regional, 3-municipality, 4-nursing, 5-other public and specialized, 6-private    object
Name      

In [379]:
lt.data['Municipality'].equals(df_src['Municipality'])

True