# Handling Flow Data
this notebook takes the preprocessed flow data and turns it into csv's to be used for visualization

##### Preprocessed Data
The preprocessed flow data consists of 168 separate csv-files, one for each hour of the week.
Each file contains one row for every spatial relation and direction where movement was registered within the given hour and day.
The columns provide the direction of movement, the IDs and geographical features (*shapely* polygons and their center-points) of the network cells between which the movement occurred, counts of movement for each mode and their sum, as well as two unix-timestamps for the given hour.

##### Combining the Individual Files
In a first step, the separate csv-files are combined into one data-frame.
The flow-direction column is replaced by a column stating the flow-ID, i.e. a combination of the cell-IDs of start- and end-cell of the relation independent on the direction, and a column giving the direction of movement on that relation.
The mode column names are simplified.
A *shapely LineString* is constructed for each row, connecting the centroids of start- and end-cell of the given relation.
The unix timestamps are translated to the *pandas datetime* format and columns for the hour of the day, the weekday and a day-type are added.
Movements registered between midnight and 02:00 am are assigned the preceding weekday to better represent human behaviour.
Movement during those hours is typically part of the past day's activities rather than those of the coming day and public transport shuts down for the night around that time as well.

##### Translating Geometries
In a next step, the *shapely* geometries are translated to lists of coordinates to be used with the *folium* package that allows for more interactivity in map plots than the *geopandas* package.
A *pandas DataFrame* containing all unique network cell polygons and centroids is extracted to be used as a contextual element in the map plots as well as for spatial aggregation of OD data. It is saved as **celldf.csv**.

##### Preparing for Flow Maps (Map Vis)
To prepare the flow data for plotting on a map, it is aggregated to unique flow-IDs for each hour, taking the sum of movements in both directions since those are not represented on the flow maps.
The resulting *pandas DataFrame* is extended by one additional column for each mode containing the mean amount of movers on the respective relation throughout the dataset. It is saved as **flowdf.csv**.
This value is later used to normalize the line widths within the flow maps.
Values smaller than 5 are excluded from the mean as to prevent extremely wide lines popping up in cases where a relation shows a small movement during some few hours but usually has close to no movement registered.
For example, a relation where only about two movements are registered during 20 hours of the week but during one hour, seven movements are registered, would show up as a line with a relative width of $7/((7+(20*2))/21)\approx3$ instead of $7/(7/1)=1$, showing that this movement is three times the mean amount.
Since lines are only plotted if they represent at least five registered movements, excluding smaller movement from the calculation of the mean ensures that the plotted width actually represents the ratio to the width of all lines that would end up being plotted for single hours on this relation.
An even stronger negative pop-out-effect would appear if the means are calculated over all 168 hours of the week, including those where no movement is registered.

##### Computing Anomalies
To visualize the movement anomalies of Wednesday with respect to the movement on Monday, Tuesday and Thursday, two separate *DataFrames* are extracted from **flowdf.csv**.
One *DataFrame* containing all movements from Monday, Tuesday and Thursday and one containing all movements from Wednesday.
The first *DataFrame* is aggregated to unique flow-IDs and hours, computing the mean of the registered movements in the different modes.
Then, the two *DataFrames* are merged and extended by columns containing the quantitative anomalies, i.e. the difference in the number of movements on Wednesday and the corresponding mean number of movements on the other three days. The *DataFrame* is saved as **anomFlowdf.csv**.

##### Preparing for Temporal Overviews (Cycle Vis)
Preparing for the visualization of a temporal overview, a *pandas DataFrame* is extracted similarly to *flowdf.csv* but retaining the directional split while omitting columns containing geometries only necessary for plotting on maps.
This *DataFrame* thus contains one row for each flow-ID, direction of flow and hour of the week. The *DataFrame* is saved as **cycledf.csv**.
Analogous to *anomFlowdf.csv*, a *DataFrame* for visualizing the temporal evolution of the anomalies on Wednesday is extracted from *cycledf.csv*, although again retaining the directional component of the movements. The *DataFrame* is saved as **anomCycledf.csv**

##### Versions of the used packages:
- pandas: 0.24.2
- numpy: 1.16.4
- shapely: 1.6.4.post1

In [1]:
import pandas as pd
import numpy as np
from shapely.wkt import loads
from shapely.geometry import Point, LineString, Polygon

## combine preprocessed files into one csv:

In [4]:
def rawcsvtodf(day, hour):
    data = pd.read_csv('data/flows_with_mode/flows_munich_day_'+str(day)+'_month_3_hour_'+str(hour)+'.csv',
                       delimiter=',',
                       skipinitialspace=True,
                       skiprows=0)
    df = pd.DataFrame(data)
    return df;

def moveMidnight(row):
    if (row.Hour < 2):
        if (row.DayInt == 0):
            return 6;
        else:
            return row.DayInt-1
    else:
        return row.DayInt;

def getWeekday(row):
    days = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
    return days[row.DayInt];

def getDayType(row):
    dayTypes = ['MonThu', 'MonThu', 'MonThu', 'MonThu', 'Fri', 'Weekend', 'Weekend']
    return dayTypes[row.DayInt];

def createTimeStamps(df):
    df['StartDateTime'] = pd.to_datetime(df.start, unit='s')
    # df['EndDateTime'] = pd.to_datetime(df.end, unit='s') ### not actually needed
    df['DayInt'] = df.StartDateTime.dt.dayofweek
    df['Hour'] = df.StartDateTime.dt.hour
    df['DayInt'] = df.apply(lambda row: moveMidnight(row), axis=1)
    df['Weekday'] = df.apply(lambda row: getWeekday(row), axis=1)
    df['DayType'] = df.apply(lambda row: getDayType(row), axis=1)
    return df;

def loadRawGeometries(df):
    # df['geometry_from'] = df['geometry_from'].apply(loads) ### not needed when only saving as csv again
    df['centroid_from'] = df['centroid_from'].apply(loads)
    # df['geometry_to'] = df['geometry_to'].apply(loads) ### not needed when only saving as csv again
    df['centroid_to'] = df['centroid_to'].apply(loads)
    return df;
    
def createFlowID(row):
    if (row.from_cell >= row.to_cell):
        return str(row.to_cell)+'_'+str(row.from_cell)
    else:
        return str(row.from_cell)+'_'+str(row.to_cell)

def createFlowDirection(row):
    if (row.from_cell >= row.to_cell):
        return -1
    else:
        return 1

def createFlowGeometry(df):
    df['flowID'] = df.apply(lambda row: createFlowID(row), axis=1)
    df['flowIdDir'] = df.apply(lambda row: createFlowDirection(row), axis=1)
    df['flowLine'] = [LineString(se) for se in zip(df.centroid_from, df.centroid_to)]
    return df;

In [5]:
modeflowsdf = pd.DataFrame()

for i in range(11,18):
    for j in range(0,24):
        df_ij = rawcsvtodf(i,j)
        modeflowsdf = modeflowsdf.append(df_ij, ignore_index=True, sort=False)

modeflowsdf = createTimeStamps(modeflowsdf)
modeflowsdf = loadRawGeometries(modeflowsdf)
modeflowsdf = createFlowGeometry(modeflowsdf)

In [6]:
modeflowsdf = modeflowsdf.reindex(columns = ['flowID', 'flowIdDir', 'from_cell', 'to_cell', 'flow_direction',
                                             'moves', 'privat', 'Rail', 'Mode::Subway', 'Mode::Tram', 'Mode::Bus',
                                             'geometry_from', 'centroid_from', 'geometry_to', 'centroid_to', 'flowLine',
                                             'start', 'end', 'StartDateTime', 'EndDateTime',
                                             'Weekday', 'DayType', 'DayInt', 'Hour'])

In [7]:
modeflowsdf.columns = ['flowID', 'flowIdDir', 'from_cell', 'to_cell', 'flow_direction',
                                             'moves', 'privat', 'Rail', 'UBahn', 'Tram', 'Bus',
                                             'geometry_from', 'centroid_from', 'geometry_to', 'centroid_to', 'flowLine',
                                             'start', 'end', 'StartDateTime', 'EndDateTime',
                                             'Weekday', 'DayType', 'DayInt', 'Hour']

In [8]:
modeflowsdf['public'] = modeflowsdf.moves - modeflowsdf.privat

In [9]:
modeflowsdf = modeflowsdf.reindex(columns = ['flowID', 'flowIdDir', 'from_cell', 'to_cell', 'flow_direction',
                                             'moves', 'privat', 'public', 'Rail', 'UBahn', 'Tram', 'Bus',
                                             'geometry_from', 'centroid_from', 'geometry_to', 'centroid_to', 'flowLine',
                                             'start', 'end', 'StartDateTime', 'EndDateTime',
                                             'Weekday', 'DayType', 'DayInt', 'Hour'])

In [16]:
modeflowsdf.head()

Unnamed: 0,flowID,flowIdDir,from_cell,to_cell,flow_direction,moves,privat,public,Rail,UBahn,...,centroid_to,flowLine,start,end,StartDateTime,EndDateTime,Weekday,DayType,DayInt,Hour
0,1_115,1,1,115,1->115,1,1.0,0.0,0.0,0.0,...,POINT (11.45457189209293 48.16887006228107),LINESTRING (11.51032914199828 48.1539675288429...,1552262000.0,1552266000.0,2019-03-11,,Sunday,Weekend,6,0
1,112_115,1,112,115,112->115,2,1.0,1.0,1.0,0.0,...,POINT (11.45457189209293 48.16887006228107),LINESTRING (11.44801357032048 48.2222738631697...,1552262000.0,1552266000.0,2019-03-11,,Sunday,Weekend,6,0
2,115_117,-1,117,115,117->115,1,1.0,0.0,0.0,0.0,...,POINT (11.45457189209293 48.16887006228107),LINESTRING (11.44652284694718 48.1947333578227...,1552262000.0,1552266000.0,2019-03-11,,Sunday,Weekend,6,0
3,115_121,-1,121,115,121->115,1,1.0,0.0,0.0,0.0,...,POINT (11.45457189209293 48.16887006228107),LINESTRING (11.36797558911058 48.2099865841284...,1552262000.0,1552266000.0,2019-03-11,,Sunday,Weekend,6,0
4,115_144,-1,144,115,144->115,1,0.0,1.0,1.0,0.0,...,POINT (11.45457189209293 48.16887006228107),LINESTRING (11.42846150643887 48.1404997116586...,1552262000.0,1552266000.0,2019-03-11,,Sunday,Weekend,6,0


In [17]:
modeflowsdf.to_csv('data/flows_with_mode/all.csv', index=False, sep=';')

## load all.csv from above

In [2]:
def csvtodf_SC(path):
    data = pd.read_csv('data/'+path+'.csv',
                       delimiter=';',
                       skipinitialspace=True,
                       skiprows=0)
    df = pd.DataFrame(data)
    return df;

def csvtodf_C(path):
    data = pd.read_csv('data/'+path+'.csv',
                       delimiter=',',
                       skipinitialspace=True,
                       skiprows=0)
    df = pd.DataFrame(data)
    return df;

def loadGeometries(df):
    df['geometry_from'] = df['geometry_from'].apply(loads)
    df['centroid_from'] = df['centroid_from'].apply(loads)
    df['geometry_to'] = df['geometry_to'].apply(loads)
    df['centroid_to'] = df['centroid_to'].apply(loads)
    df['flowLine'] = df['flowLine'].apply(loads)
    return df;

In [4]:
modeflowsdf = csvtodf_SC('flows_with_mode/all')
modeflowsdf = loadGeometries(modeflowsdf)

In [5]:
modeflowsdf = modeflowsdf.reindex(columns = ['flowID', 'flowIdDir', 'from_cell', 'to_cell', 'flow_direction',
                                             'moves', 'privat', 'public', 'Rail', 'UBahn', 'Tram', 'Bus',
                                             'geometry_from', 'centroid_from', 'geometry_to', 'centroid_to', 'flowLine',
                                             'StartDateTime', 'DayInt', 'Weekday', 'DayType', 'Hour'])

In [9]:
modeflowsdf.columns = ['flowID', 'flowIdDir', 'from_cell', 'to_cell', 'flow_direction',
                                             'moves', 'privat', 'public', 'Rail', 'UBahn', 'Tram', 'Bus',
                                             'geometry_from', 'centroid_from', 'geometry_to', 'centroid_to', 'flowLine',
                                             'dateTime', 'dayInt', 'weekday', 'dayType', 'hour']

In [10]:
modeflowsdf.head()

Unnamed: 0,flowID,flowIdDir,from_cell,to_cell,flow_direction,moves,privat,public,Rail,UBahn,...,geometry_from,centroid_from,geometry_to,centroid_to,flowLine,dateTime,dayInt,weekday,dayType,hour
0,1_115,1,1,115,1->115,1,1.0,0.0,0.0,0.0,...,"POLYGON ((11.510794 48.14605999999999, 11.5142...",POINT (11.51032914199828 48.15396752884293),"POLYGON ((11.4430895 48.16533999999999, 11.443...",POINT (11.45457189209293 48.16887006228107),LINESTRING (11.51032914199828 48.1539675288429...,2019-03-11 00:00:00,6,Sunday,Weekend,0
1,112_115,1,112,115,112->115,2,1.0,1.0,1.0,0.0,...,"POLYGON ((11.434296 48.20744999999999, 11.4357...",POINT (11.44801357032048 48.22227386316977),"POLYGON ((11.4430895 48.16533999999999, 11.443...",POINT (11.45457189209293 48.16887006228107),LINESTRING (11.44801357032048 48.2222738631697...,2019-03-11 00:00:00,6,Sunday,Weekend,0
2,115_117,-1,117,115,117->115,1,1.0,0.0,0.0,0.0,...,"POLYGON ((11.43082 48.204967, 11.429731 48.201...",POINT (11.44652284694718 48.19473335782276),"POLYGON ((11.4430895 48.16533999999999, 11.443...",POINT (11.45457189209293 48.16887006228107),LINESTRING (11.44652284694718 48.1947333578227...,2019-03-11 00:00:00,6,Sunday,Weekend,0
3,115_121,-1,121,115,121->115,1,1.0,0.0,0.0,0.0,...,"POLYGON ((11.386282 48.225548, 11.385904 48.20...",POINT (11.36797558911058 48.20998658412849),"POLYGON ((11.4430895 48.16533999999999, 11.443...",POINT (11.45457189209293 48.16887006228107),LINESTRING (11.36797558911058 48.2099865841284...,2019-03-11 00:00:00,6,Sunday,Weekend,0
4,115_144,-1,144,115,144->115,1,0.0,1.0,1.0,0.0,...,"POLYGON ((11.438359 48.14332599999999, 11.4318...",POINT (11.42846150643887 48.14049971165864),"POLYGON ((11.4430895 48.16533999999999, 11.443...",POINT (11.45457189209293 48.16887006228107),LINESTRING (11.42846150643887 48.1404997116586...,2019-03-11 00:00:00,6,Sunday,Weekend,0


## prepare for folium map

In [11]:
def getCentroidCoords(row):
    return [row.centroid.y,row.centroid.x];

def getFlowCoordsList(row):
    xs = row.flowLine.coords.xy[0].tolist()
    ys = row.flowLine.coords.xy[1].tolist()
    coords = [0,0]
    coords[0] = [ys[0],xs[0]]
    coords[1] = [ys[1],xs[1]]
    return coords;
    # return [[row.centroid_from.y,row.centroid_from.x],[row.centroid_to.y,row.centroid_to.x]];

def getPolygonCoordsList(row):
    xs = row.geometry.exterior.coords.xy[0].tolist()
    ys = row.geometry.exterior.coords.xy[1].tolist()
    coords = [0]*len(xs)
    for i in range(0,len(xs)):
        coords[i] = [ys[i],xs[i]]
    return coords;

## create df for cell polygons
a dataframe containing all cellIDs with their polygons and centroids as lists of coordinates for folium as well as shapely polygons for sjoining with OD data

In [23]:
celldf = modeflowsdf.groupby(['from_cell']).agg({'geometry_from':['first'],
                                                 'centroid_from':['first']}).copy().reset_index()
celldf.columns = celldf.columns.get_level_values(0)
celldf.columns = ['cellID', 'geometry', 'centroid']

celldf['centroidCoords'] = celldf.apply(lambda row: getCentroidCoords(row), axis = 1)
celldf['polyCoords'] = celldf.apply(lambda row: getPolygonCoordsList(row), axis = 1)

celldf = celldf.reindex(columns = ['cellID', 'polyCoords', 'centroidCoords', 'geometry'])

In [25]:
celldf.head()

Unnamed: 0,cellID,polyCoords,centroidCoords,geometry
0,1,"[[48.14605999999999, 11.510794], [48.146254999...","[48.15396752884293, 11.51032914199828]","POLYGON ((11.510794 48.14605999999999, 11.5142..."
1,2,"[[48.23591199999999, 11.635515], [48.237070000...","[48.22263647015427, 11.62774480807693]","POLYGON ((11.635515 48.23591199999999, 11.6338..."
2,3,"[[48.14900599999999, 11.696217], [48.153606000...","[48.13564057367491, 11.70093894926043]","POLYGON ((11.696217 48.14900599999999, 11.6991..."
3,4,"[[48.12879600000001, 11.541556], [48.125282, 1...","[48.13194111905815, 11.54797042679206]","POLYGON ((11.541556 48.12879600000001, 11.5516..."
4,5,"[[48.18833000000001, 11.615821], [48.193665, 1...","[48.19669108307331, 11.6113147037113]","POLYGON ((11.615821 48.18833000000001, 11.6039..."


In [26]:
celldf.to_csv('data/flows_with_mode/aggregations/celldf.csv', index=False, sep=';')

## df for flow graph
all flows by flow ID, aggregated for each hour of each day over both directions; further aggregation to happen within mapping functions

In [19]:
flowdf = modeflowsdf.groupby(['flowID', 'dayInt', 'weekday', 'dayType', 'hour']).agg({'flowLine':['first'],
                                                                             'moves':['sum'],
                                                                             'privat':['sum'],
                                                                             'public':['sum'],
                                                                             'Rail':['sum'],
                                                                             'UBahn':['sum'],
                                                                             'Tram':['sum'],
                                                                             'Bus':['sum']}).copy().reset_index()
flowdf.columns = flowdf.columns.get_level_values(0)

In [20]:
flowdf['flowCoords'] = flowdf.apply(lambda row: getFlowCoordsList(row), axis = 1)

In [27]:
flowdf = flowdf.reindex(columns = ['flowID', 'dayInt', 'weekday', 'dayType', 'hour',
                                   'moves', 'privat', 'public', 'Rail', 'UBahn', 'Tram', 'Bus',
                                   'flowCoords'])

In [28]:
flowdf.head()

Unnamed: 0,flowID,dayInt,weekday,dayType,hour,moves,privat,public,Rail,UBahn,Tram,Bus,flowCoords
0,100_101,0,Monday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
1,100_101,0,Monday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.3..."
2,100_101,0,Monday,MonThu,15,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
3,100_101,1,Tuesday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
4,100_101,1,Tuesday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."


In [29]:
flowdf.to_csv('data/flows_with_mode/aggregations/flowdf.csv', index=False, sep=';')

## extend flowdf
with a mean accross all modes > 5 for each flowID to use when mapping

In [175]:
flowdf = csvtodf_SC('flows_with_mode/aggregations/flowdf')

In [176]:
flowdf.head()

Unnamed: 0,flowID,dayInt,weekday,dayType,hour,moves,privat,public,Rail,UBahn,Tram,Bus,flowCoords
0,100_101,0,Monday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
1,100_101,0,Monday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.3..."
2,100_101,0,Monday,MonThu,15,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
3,100_101,1,Tuesday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
4,100_101,1,Tuesday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."


In [177]:
bigger5 = flowdf.replace(range(0,5),np.nan)

In [178]:
tempmeandf = bigger5.groupby(['flowID']).agg({'moves':['mean'],
                                              'privat':['mean'],
                                              'public':['mean'],
                                              'Rail':['mean'],
                                              'UBahn':['mean'],
                                              'Tram':['mean'],
                                              'Bus':['mean']}).copy().reset_index()
tempmeandf.columns = tempmeandf.columns.get_level_values(1) + tempmeandf.columns.get_level_values(0)

In [179]:
tempmeandf.head()

Unnamed: 0,flowID,meanmoves,meanprivat,meanpublic,meanRail,meanUBahn,meanTram,meanBus
0,100_101,,,,,,,
1,100_102,,,,,,,
2,100_103,5.8,5.5,,,,,
3,100_104,40.185185,39.607407,5.5,6.0,,,
4,100_105,46.271523,38.946667,10.625,10.064516,,,


In [180]:
flowdf = pd.merge(flowdf, tempmeandf, on='flowID', how='outer')

In [181]:
flowdf.head()

Unnamed: 0,flowID,dayInt,weekday,dayType,hour,moves,privat,public,Rail,UBahn,Tram,Bus,flowCoords,meanmoves,meanprivat,meanpublic,meanRail,meanUBahn,meanTram,meanBus
0,100_101,0,Monday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2...",,,,,,,
1,100_101,0,Monday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.3...",,,,,,,
2,100_101,0,Monday,MonThu,15,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2...",,,,,,,
3,100_101,1,Tuesday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2...",,,,,,,
4,100_101,1,Tuesday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2...",,,,,,,


In [182]:
flowdf.to_csv('data/flows_with_mode/aggregations/flowdf.csv', index=False, sep=';')

## df for mapping wednesday anomalies
wednesday moves, means of monday+tuesday+thursday, wednesday anomalies by flowID and hour

In [35]:
flowdf.head()

Unnamed: 0,flowID,dayInt,weekday,dayType,hour,moves,privat,public,Rail,UBahn,Tram,Bus,flowCoords
0,100_101,0,Monday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
1,100_101,0,Monday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.3..."
2,100_101,0,Monday,MonThu,15,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
3,100_101,1,Tuesday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."
4,100_101,1,Tuesday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.39922017377597, 11.75558830832081], [48.2..."


In [84]:
modidoFlowdf = flowdf[(flowdf.dayInt.isin([0,1,3]))].copy()

In [85]:
modidoFlowdf = modidoFlowdf.groupby(['flowID', 'hour']).agg({'flowCoords':['first'],
                                                             'moves':['mean'],
                                                             'privat':['mean'],
                                                             'public':['mean'],
                                                             'Rail':['mean'],
                                                             'UBahn':['mean'],
                                                             'Tram':['mean'],
                                                             'Bus':['mean']}).reset_index()
modidoFlowdf.columns = modidoFlowdf.columns.get_level_values(0)
modidoFlowdf.columns = ['flowID', 'hour', 'flowCoords', 'movesMDD', 'privatMDD', 'publicMDD', 'RailMDD', 'UBahnMDD', 'TramMDD', 'BusMDD']

In [86]:
modidoFlowdf.head()

Unnamed: 0,flowID,hour,flowCoords,movesMDD,privatMDD,publicMDD,RailMDD,UBahnMDD,TramMDD,BusMDD
0,100_101,0,"[[48.39922017377597, 11.75558830832081], [48.2...",1.0,1.0,0.0,0.0,0.0,0.0,0.0
1,100_101,4,"[[48.39922017377597, 11.75558830832081], [48.2...",1.0,1.0,0.0,0.0,0.0,0.0,0.0
2,100_101,5,"[[48.22013108151531, 11.52767333862167], [48.3...",1.0,1.0,0.0,0.0,0.0,0.0,0.0
3,100_101,6,"[[48.39922017377597, 11.75558830832081], [48.2...",2.0,2.0,0.0,0.0,0.0,0.0,0.0
4,100_101,13,"[[48.39922017377597, 11.75558830832081], [48.2...",1.0,1.0,0.0,0.0,0.0,0.0,0.0


In [87]:
miFlowdf = flowdf[(flowdf.dayInt == 2)].copy()

In [88]:
miFlowdf.head()

Unnamed: 0,flowID,dayInt,weekday,dayType,hour,moves,privat,public,Rail,UBahn,Tram,Bus,flowCoords
7,100_101,2,Wednesday,MonThu,18,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.3..."
8,100_101,2,Wednesday,MonThu,19,1,0.0,1.0,1.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.3..."
45,100_102,2,Wednesday,MonThu,2,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.1..."
46,100_102,2,Wednesday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.1..."
47,100_102,2,Wednesday,MonThu,7,1,1.0,0.0,0.0,0.0,0.0,0.0,"[[48.22013108151531, 11.52767333862167], [48.1..."


In [139]:
anomFlowdf = pd.merge(miFlowdf, modidoFlowdf,  how='outer', on=['flowID', 'hour'])
anomFlowdf['flowCoords'] = anomFlowdf.flowCoords_x.fillna(anomFlowdf.flowCoords_y)
anomFlowdf = anomFlowdf.reindex(columns = ['flowID', 'flowCoords', 'hour',
                                           'moves', 'privat', 'public', 'Rail', 'UBahn', 'Tram', 'Bus',
                                           'movesMDD', 'privatMDD', 'publicMDD', 'RailMDD', 'UBahnMDD', 'TramMDD', 'BusMDD'])
anomFlowdf = anomFlowdf.fillna(0)

In [145]:
anomFlowdf['movesAnom'] = anomFlowdf.moves - anomFlowdf.movesMDD
anomFlowdf['privatAnom'] = anomFlowdf.privat - anomFlowdf.privatMDD
anomFlowdf['publicAnom'] = anomFlowdf.public - anomFlowdf.publicMDD
anomFlowdf['RailAnom'] = anomFlowdf.Rail - anomFlowdf.RailMDD
anomFlowdf['UBahnAnom'] = anomFlowdf.UBahn - anomFlowdf.UBahnMDD
anomFlowdf['TramAnom'] = anomFlowdf.Tram - anomFlowdf.TramMDD
anomFlowdf['BusAnom'] = anomFlowdf.Bus - anomFlowdf.BusMDD

In [146]:
anomFlowdf.head()

Unnamed: 0,flowID,flowCoords,hour,moves,privat,public,Rail,UBahn,Tram,Bus,...,UBahnMDD,TramMDD,BusMDD,movesAnom,privatAnom,publicAnom,RailAnom,UBahnAnom,TramAnom,BusAnom
0,100_101,"[[48.22013108151531, 11.52767333862167], [48.3...",18,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
1,100_101,"[[48.22013108151531, 11.52767333862167], [48.3...",19,1.0,0.0,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0
2,100_102,"[[48.22013108151531, 11.52767333862167], [48.1...",2,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,100_102,"[[48.22013108151531, 11.52767333862167], [48.1...",4,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,100_102,"[[48.22013108151531, 11.52767333862167], [48.1...",7,1.0,1.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,-1.0,-1.0,0.0,0.0,0.0,0.0,0.0


In [147]:
anomFlowdf.to_csv('data/flows_with_mode/aggregations/anomFlowdf.csv', index=False, sep=';')

## df for cycle vis
all flows by flow ID and direction aggregated over each hour of each day, further aggregation to happen within mapping functions

In [30]:
cycledf = modeflowsdf.groupby(['flowID', 'flowIdDir', 'from_cell', 'to_cell',
                               'dayInt', 'weekday', 'dayType', 'hour']).agg({'moves':['sum'],
                                                                             'privat':['sum'],
                                                                             'public':['sum'],
                                                                             'Rail':['sum'],
                                                                             'UBahn':['sum'],
                                                                             'Tram':['sum'],
                                                                             'Bus':['sum']}).copy().reset_index()
cycledf.columns = cycledf.columns.get_level_values(0)

In [32]:
cycledf.head()

Unnamed: 0,flowID,flowIdDir,from_cell,to_cell,dayInt,weekday,dayType,hour,moves,privat,public,Rail,UBahn,Tram,Bus
0,100_101,-1,101,100,0,Monday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0
1,100_101,-1,101,100,0,Monday,MonThu,15,1,1.0,0.0,0.0,0.0,0.0,0.0
2,100_101,-1,101,100,1,Tuesday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0
3,100_101,-1,101,100,1,Tuesday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0
4,100_101,-1,101,100,1,Tuesday,MonThu,6,1,1.0,0.0,0.0,0.0,0.0,0.0


In [33]:
cycledf.to_csv('data/flows_with_mode/aggregations/cycledf.csv', index=False, sep=';')

## df for cyclevis of wednesday anomalies
similar to anomFlowdf but for each direction and with cellIDs 

In [151]:
cycledf.head()

Unnamed: 0,flowID,flowIdDir,from_cell,to_cell,dayInt,weekday,dayType,hour,moves,privat,public,Rail,UBahn,Tram,Bus
0,100_101,-1,101,100,0,Monday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0
1,100_101,-1,101,100,0,Monday,MonThu,15,1,1.0,0.0,0.0,0.0,0.0,0.0
2,100_101,-1,101,100,1,Tuesday,MonThu,4,1,1.0,0.0,0.0,0.0,0.0,0.0
3,100_101,-1,101,100,1,Tuesday,MonThu,5,1,1.0,0.0,0.0,0.0,0.0,0.0
4,100_101,-1,101,100,1,Tuesday,MonThu,6,1,1.0,0.0,0.0,0.0,0.0,0.0


In [166]:
mddcycle = cycledf[cycledf.dayInt.isin([0,1,3])].copy()

In [167]:
mddcycle = mddcycle.groupby(['flowID', 'flowIdDir', 'hour',
                             'from_cell', 'to_cell']).agg({'moves':['mean'],
                                                           'privat':['mean'],
                                                            'public':['mean'],
                                                           'Rail':['mean'],
                                                           'UBahn':['mean'],
                                                           'Tram':['mean'],
                                                           'Bus':['mean']}).reset_index()
mddcycle.columns = mddcycle.columns.get_level_values(0)
mddcycle.columns = ['flowID', 'flowIdDir', 'hour', 'from_cell', 'to_cell',
                    'movesMDD', 'privatMDD', 'publicMDD', 'RailMDD', 'UBahnMDD', 'TramMDD', 'BusMDD']

In [175]:
mddcycle.head()

Unnamed: 0,flowID,flowIdDir,hour,from_cell,to_cell,movesMDD,privatMDD,publicMDD,RailMDD,UBahnMDD,TramMDD,BusMDD
0,100_101,-1,0,101,100,1.0,1.0,0.0,0.0,0.0,0.0,0.0
1,100_101,-1,4,101,100,1.0,1.0,0.0,0.0,0.0,0.0,0.0
2,100_101,-1,5,101,100,1.0,1.0,0.0,0.0,0.0,0.0,0.0
3,100_101,-1,6,101,100,2.0,2.0,0.0,0.0,0.0,0.0,0.0
4,100_101,-1,13,101,100,1.0,1.0,0.0,0.0,0.0,0.0,0.0


In [171]:
wcycle = cycledf[cycledf.dayInt == 2].copy()

In [181]:
anomCycledf = pd.merge(wcycle, mddcycle,  how='outer', on=['flowID', 'flowIdDir', 'from_cell', 'to_cell', 'hour'])
anomCycledf = anomCycledf.reindex(columns = ['flowID', 'flowIdDir', 'from_cell', 'to_cell', 'hour',
                                             'moves', 'privat', 'public', 'Rail', 'UBahn', 'Tram', 'Bus',
                                             'movesMDD', 'privatMDD', 'publicMDD', 'RailMDD', 'UBahnMDD', 'TramMDD', 'BusMDD'])
anomCycledf = anomCycledf.fillna(0)

In [195]:
anomCycledf['movesAnom'] = anomCycledf.moves - anomCycledf.movesMDD
anomCycledf['privatAnom'] = anomCycledf.privat - anomCycledf.privatMDD
anomCycledf['publicAnom'] = anomCycledf.public - anomCycledf.publicMDD
anomCycledf['RailAnom'] = anomCycledf.Rail - anomCycledf.RailMDD
anomCycledf['UBahnAnom'] = anomCycledf.UBahn - anomCycledf.UBahnMDD
anomCycledf['TramAnom'] = anomCycledf.Tram - anomCycledf.TramMDD
anomCycledf['BusAnom'] = anomCycledf.Bus - anomCycledf.BusMDD

In [196]:
anomCycledf.head()

Unnamed: 0,flowID,flowIdDir,from_cell,to_cell,hour,moves,privat,public,Rail,UBahn,...,UBahnMDD,TramMDD,BusMDD,movesAnom,privatAnom,publicAnom,RailAnom,UBahnAnom,TramAnom,BusAnom
0,100_101,1,100,101,18,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
1,100_101,1,100,101,19,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,0.0,1.0,0.0,1.0,1.0,0.0,0.0,0.0
2,100_102,-1,102,100,10,1.0,0.0,1.0,1.0,0.0,...,0.0,0.0,0.0,-0.5,-1.5,1.0,1.0,0.0,0.0,0.0
3,100_102,-1,102,100,11,2.0,2.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0
4,100_102,-1,102,100,14,1.0,1.0,0.0,0.0,0.0,...,0.0,0.0,0.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0


In [197]:
anomCycledf.to_csv('data/flows_with_mode/aggregations/anomCycledf.csv', index=False, sep=';')