Skip to content

A visualization module based on neworkx and pandas. Check readme for documentations

License

Notifications You must be signed in to change notification settings

hesihui/AirportsDelayModel

Repository files navigation

GraphFlow

A visualization module based on neworkx and pandas

Getting Started

Prerequisites

  • networkx
  • pandas
  • matplotlib
  • statsmodels
  • IPython
  • pytz

Installing

Dowload the newest release here to the root directory.

Running the tests

generate a top GraphFlow object gf

from graphflow import GraphFlow
from datetime import timedelta
import pandas as pd
import numpy as np

dt = timedelta(seconds=3600)
gf = GraphFlow.import_GF(dt)

describe a GraphFlow object

gf.describe()

aaa

visualize a GraphFlow object:

gf.draw_network_attr(with_pos = True)

enter image description here given nodes and time range generate a new GraphFlow object

sub_nodes = ['HNL','LAS','LAX','OGG']
sub_gf=gf.sub_graph_flow(start_time = '2007-02-01 01:00:00',end_time = '2007-03-01 01:00:00',sub_nodes = sub_nodes )

enter image description here enter image description here

given edges and time range generate a new GraphFlow object

sub_gf = gf.sub_graph_flow('2007-02-01 01:00:00','2007-03-01 01:00:00',
                           edges =  (set(gf.G.out_edges('LAX'))|set(gf.G.in_edges('LAX'))))

enter image description here enter image description here if no edges or nodes are provieded, time range only can also generate a new GraphFlow object

sub_gf=gf.sub_graph_flow('2007-02-01 01:00:00','2007-03-01 01:00:00')

enter image description here enter image description here

generate test set for models to predict and provide evaluation function

from graphflow import test_index_gen,model_evaluation
test_date_index,test_airport_index = test_index_gen()

# sample input 
pred_data = np.random.random((len(test_date_index),len(test_airport_index)))

# evluate the prediction
model_evaluation(pred_data, test_date_index, test_airport_index)
mae metric:  0.46644295778469996
rmse metric:  0.5473421547099921

Documentation

This module needs some files to initialize: files list:

  • graphflow.py
  • pre_data.csv
  • ArrTotalFlights.csv
  • ArrDelayFlights.csv
  • DepTotalFlights.csv
  • DepDelayFlights.csv
  • DelayRatio.csv
  • airport2idx.csv
  • time_stamp2idx.csv
  • graph_edges.csv
  • graph_nodes.csv

graphflow.GraphFlow

class graphflow.GrapgFlow(idx2airport,airport2idx,idx2time_stamp,time_stamp2idx, ArrTotalFlights,DepTotalFlights,ArrDelayFlights,DepDelayFlights, pre_data,G,dt,grid = None ,start_time = None,end_time = None,DelayRatio=None)

Attributes

Attributes Type Description
dt pd.TimeDelta 1D or 1H so far
G nx.DiGraph G.edges having weight attribution as the total flights in time range of grid and with Distance attribution as the distance between two nodes. G.nodesnodes has pos attribution as the real position of a nodes and weight attribution as the timezone information of type string
pre_data pd.DataFrame provide pre-cleaned raw data time range of grid
grid pd.DatetimeIndex with freq being dt
idx2airport dict idx2airport[i] return the iata name for airport index i
airport2idx dict inverse to idx2airport
idx2time_stamp dict idx2time_stamp[t] return the timestamp for time index t
time_stamp2idx dict inverse to idx2airport
ArrTotalFlights pd.DataFrame ArrTotalFlights[t,i] is number of flights with scheduled arrive time between [idx2time_stamp[t],idx2time_stamp[t] + dt) at airport idx2airport[i]
ArrDelayFlights pd.DataFrame ArrDelayFlights[t,i] is number of arrive delayed flights with scheduled arrive time between between [idx2time_stamp[t],idx2time_stamp[t] + dt) at airport idx2airport[i]
DepTotalFlights pd.DataFrame DepTotalFlights[t,i] is number of flights with scheduled departure time between [idx2time_stamp[t],idx2time_stamp[t] + dt) at airport idx2airport[i]
DepDelayFlights pd.DataFrame DepDelayFlights[t,i] is number of departure delayed flights with scheduled departure time between between [idx2time_stamp[t],idx2time_stamp[t] + dt) at airport idx2airport[i]
RealDepTotalFlights pd.DataFrame with same value as DepTotalFlights but with columns translated by idx2airport and index translated by idx2time_stamp
RealArrTotalFlights pd.DataFrame with same value as ArrTotalFlights but with columns translated by idx2airport and index translated by idx2time_stamp
RealDepDelayFlights pd.DataFrame with same value as DepDelayFlights but with columns translated by idx2airport and index translated by idx2time_stamp
RealArrDelayFlights pd.DataFrame with same value as ArrDelayFlights but with columns translated by idx2airport and index translated by idx2time_stamp
RealDelayRatio pd.DataFrame with same value as DelayRatio but with columns translated by idx2airport and index translated by idx2time_stamp

Methods

Methods Description
describe() describe the current GraphFlow obeject
real_format(df) convert a DataFrame object dfinto columns by times stamps and index by airports names
draw_network_attr(nodes_attr= None,edges_attr='weight',size=10,with_pos=True) draw undirected graph fun_1_to_undir_G(G) try several times to get a better display. If with_pos is Fl=False position of nodes will be plotted in random
sub_graph_flow(start_time = None,end_time = None,sub_nodes = None, edges = None) TODO

graphflow.GraphFlow.describe

GraphFlow.describe()

parametersNone

return None

description describe the current GraphFlow obeject.

graphflow.GraphFlow.real_format

GraphFlow.real_format(df)

parameters TODO

return pd.DataFrame

description convert a DataFrame object dfinto columns by times stamps and index by airports names

graphflow.GraphFlow.draw_network_attr

GraphFlow.draw_network_attr(self, nodes_attr = None , edges_attr = 'weight' , size = 6 , with_pos = True)

parameters TODO

return None

description draw undirected graph fun_1_to_undir_G(G) try several times to get a better display. If with_pos is Fl=False position of nodes will be plotted in random

graphflow.GraphFlow.sub_graph_flow

GraphFlow.sub_graph_flow(start_time = None,end_time = None,sub_nodes = None, edges = None)

parameters TODO

return graphflow.GraphFlow

description Doing

  • Generate a sub graph flow. In this case all 'DataFrame' including TotalFlights,DelayFlights,DelayRatio will be re-computed as sub_nodes will build a smaller GraphFlow. and G will be updated as well

  • If sub_nodes is None and edges is None , only weight of G will be recomputed, pre_data and TotalFlights,DelayFlights,DelayRatiowill be sliced accordingly, which is fast.

  • If sub_nodes is None and edges is not None, generated a GraphFlow object with given edges. all data G pre_data , TotalFlights,DelayFlights,DelayRatio will be recomputed. It may be slower

graphflow.test_index_gen

test_index_gen(time_stamp_threshhold = '2008-01-01 00:00:00-08:00',test_time_num = 1800, test_airport_num = 60)

parameters TODO

return test_date_index,test_airport_index

description

  • test_date_index :list of lenth M element is time index at which model predicts the delay ratio. In our case, all elements tidx in test_date_index' satisfy idx2time_stamp[tidx] after 2018-01-01 00:00:00

  • test_airport_index :list of lenth B element is airport index at which model predicts the delay ratio model_evaluation(pred_data).

graphflow.model_evaluation

model_evaluation(pred_data, test_date_index, test_airport_index)

parameters

  • pred_data: np.ndarray pred_data[t,i] gives the predicted delay ratio for time index test_date_index[t] and test_airport_index[i]

return None

description generate the mae , rmse, wae, rwse score of the prediction pred_data.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Authors

See also the list of contributors who participated in this project.

Deployment

  • TODO

Built With

  • TODO

Contributing

  • TODO

Versioning

  • TODO

Acknowledgments

  • TODO

About

A visualization module based on neworkx and pandas. Check readme for documentations

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published