# Combine data

Script to combine the data from TARIEFDEEL.csv (T) and GEBIEDSBEHEERDER.csv (GB), adding the area manager name in dataset T.

I use the `pandas` package for this script to combine the datasets. First I remove unnecessary columns that I don't need. Then I want to add the area manager name from dataset GB to dataset T. After I've combined them I want to create simple visualizations: tables en chart.

## Import & scale down

1. Import `pandas`.
2. Only use the first 25 rows.
3. Print the column names.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt


M = pd.read_csv("datasets/RDW/GEBIEDSBEHEERDER.csv")
F = pd.read_csv("datasets/RDW/TARIEFDEEL.csv")

managers = M.head(25)
fare = F.head(25)
print(fare.columns.values, "\n", managers.columns.values)

['AreaManagerId' 'FareCalculationCode' 'StartDateFarePart'
 'StartDurationFarePart' 'EndDurationFarePart' 'AmountFarePart'
 'StepSizeFarePart' 'EndDateFarePart' 'AmountCumulative'] 
 ['AreaManagerId' 'AreaManagerDesc' 'StartDateAreaManagerId'
 'EndDateAreaManagerId' 'URL']


## Remove fields

1. Remove three unused columns in GB.

In [2]:
managers = managers.drop(managers.columns[[2, 3, 4]], axis=1)
managers

Unnamed: 0,AreaManagerId,AreaManagerDesc
0,2468,Aberdeen Asset Management
1,1783,Westland
2,1708,Steenwijkerland
3,1895,Oldambt
4,267,Nijkerk
5,505,Dordrecht
6,2451,Martini-ziekenhuis
7,173,Oldenzaal
8,928,Kerkrade
9,2062,P1


## Combining datasets with AreaManager ID

1. Get key to merge on (AreaManagerID).
2. Merge the two datasets. With the ID from GB & T the ManagerDec of GB gets added to T.


In [3]:
key = fare.columns[0]
combined = pd.merge(fare, managers, how="left", on=[key])
combined

Unnamed: 0,AreaManagerId,FareCalculationCode,StartDateFarePart,StartDurationFarePart,EndDurationFarePart,AmountFarePart,StepSizeFarePart,EndDateFarePart,AmountCumulative,AreaManagerDesc
0,512,TAR04,20150101,0,999999,0.02,1,29991231,0.0,
1,34,BEZVGBZ,20161101,0,4,0.0,4,20170102,0.0,
2,150,TAR04,20121218,0,999999,0.016667,1,20150101,0.0,
3,150,TAR03,20140101,0,48,0.041667,1,20150101,0.0,
4,150,TAR04,20150101,0,999999,0.018519,1,29991231,0.0,
5,34,BEZVGAB,20170102,0,4,0.0,4,29991231,0.0,
6,150,TAR03,20140101,48,999999,0.083333,1,20150101,2.0,
7,150,TAR02,20140101,0,999999,0.021739,1,20150101,0.0,
8,34,BEZVGAB,20161101,120,999999,0.7,120,20170102,0.7,
9,772,STRIJP02,20190913,0,999999,0.03,1,20190927,0.0,


The only problem is that not all ID's got to a name in database GB where the names of the area managers are stored.