Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How could I get METR-LA dataset? #10

Closed
louisgry opened this issue Oct 7, 2018 · 22 comments
Closed

How could I get METR-LA dataset? #10

louisgry opened this issue Oct 7, 2018 · 22 comments

Comments

@louisgry
Copy link

louisgry commented Oct 7, 2018

No description provided.

@liyaguang
Copy link
Owner

Hi Louis, you may get the dataset following instructions in the README.

@liyaguang
Copy link
Owner

liyaguang commented Oct 9, 2018 via email

@louisgry louisgry closed this as completed Oct 9, 2018
@EllenZYQ
Copy link

hi,where can I get an introduction to the dataset?

@petrhrobar
Copy link

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

@EllenZYQ
Copy link

EllenZYQ commented Feb 16, 2022 via email

@lemonliu1992
Copy link

https://towardsdatascience.com/build-your-first-graph-neural-network-model-to-predict-traffic-speed-in-20-minutes-b593f8f838e5

@ThomasAFink
Copy link

This could be helpful to convert to csv and read the h5 files. 207 detectors. Speed in 5min intervals

import pandas as pd
import h5py

#h5 file path
filename = 'metr-la.h5'

#read h5 file
dataset = h5py.File(filename, 'r')

#print the first unknown key in the h5 file
print(dataset.keys()) #returns df


#save the h5 file to csv using the first key df
with pd.HDFStore(filename, 'r') as d:
    df = d.get('df')
    df.to_csv('metr-la.csv')

@EllenZYQ
Copy link

EllenZYQ commented May 6, 2022 via email

@ThomasAFink
Copy link

If you're using the numpy arrays in the npy example: https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/dataset/metr_la.html. The first array (node_values.npy) is simply the speed values from the mtr-la.h5 file.

The second array is the adjacency matrix (adj_mat.npy) as discussed in the article. It's created from the graph_sensor_ids.txt and distances_la_2012.csv. Then it's somehow flattened which I'm still trying to figure out.

import numpy as np
from pathlib import Path

try:
    import common
    DATA = common.dataDirectory()
except ImportError:
    DATA = Path().resolve() / 'data'
    
#Speeds from mtr-la.h5 https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX view convert h5_to_csv.py
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
NODE_VALUES   = DATA / 'traffic_data' / 'METR-LA' /  'node_values.npy'


#view npy data
data = np.load(NODE_VALUES)

np.set_printoptions(suppress=True)
print(data[0].tolist())
print(len(data[0]))
print(len(data))


#adjacency Matrix distances flattened?
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
ADJENCY_MATRIX   = DATA / 'traffic_data' / 'METR-LA' /  'adj_mat.npy'


#view npy data
data = np.load(ADJENCY_MATRIX)
with np.printoptions(threshold=np.inf):
    print(data[0].tolist())
    print(len(data[0]))
    print(len(data))

Screenshot 2022-05-07 at 01 20 25

@ThomasAFink
Copy link

If you're using the numpy arrays in the npy example: https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/dataset/metr_la.html. The first array (node_values.npy) is simply the speed values from the mtr-la.h5 file.

The second array is the adjacency matrix (adj_mat.npy) as discussed in the article. It's created from the graph_sensor_ids.txt and distances_la_2012.csv. Then it's somehow flattened which I'm still trying to figure out.

import numpy as np
from pathlib import Path

try:
    import common
    DATA = common.dataDirectory()
except ImportError:
    DATA = Path().resolve() / 'data'
    
#Speeds from mtr-la.h5 https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX view convert h5_to_csv.py
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
NODE_VALUES   = DATA / 'traffic_data' / 'METR-LA' /  'node_values.npy'


#view npy data
data = np.load(NODE_VALUES)

np.set_printoptions(suppress=True)
print(data[0].tolist())
print(len(data[0]))
print(len(data))


#adjacency Matrix distances flattened?
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
ADJENCY_MATRIX   = DATA / 'traffic_data' / 'METR-LA' /  'adj_mat.npy'


#view npy data
data = np.load(ADJENCY_MATRIX)
with np.printoptions(threshold=np.inf):
    print(data[0].tolist())
    print(len(data[0]))
    print(len(data))
Screenshot 2022-05-07 at 01 20 25

This PowerPoint has something to do with that: https://www.slideshare.net/chirantanGupta1/traffic-prediction-from-street-network-imagespptx

@ThomasAFink
Copy link

This could be helpful to convert to csv and read the h5 files. 207 detectors. Speed in 5min intervals

import pandas as pd
import h5py

#h5 file path
filename = 'metr-la.h5'

#read h5 file
dataset = h5py.File(filename, 'r')

#print the first unknown key in the h5 file
print(dataset.keys()) #returns df


#save the h5 file to csv using the first key df
with pd.HDFStore(filename, 'r') as d:
    df = d.get('df')
    df.to_csv('metr-la.csv')

Speeds in metr-la.h5 file match the numpy array.
Screenshot 2022-05-06 at 22 30 45

@ThomasAFink
Copy link

If you're using the numpy arrays in the npy example: https://pytorch-geometric-temporal.readthedocs.io/en/latest/_modules/torch_geometric_temporal/dataset/metr_la.html. The first array (node_values.npy) is simply the speed values from the mtr-la.h5 file.
The second array is the adjacency matrix (adj_mat.npy) as discussed in the article. It's created from the graph_sensor_ids.txt and distances_la_2012.csv. Then it's somehow flattened which I'm still trying to figure out.

import numpy as np
from pathlib import Path

try:
    import common
    DATA = common.dataDirectory()
except ImportError:
    DATA = Path().resolve() / 'data'
    
#Speeds from mtr-la.h5 https://drive.google.com/drive/folders/10FOTa6HXPqX8Pf5WRoRwcFnW9BrNZEIX view convert h5_to_csv.py
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
NODE_VALUES   = DATA / 'traffic_data' / 'METR-LA' /  'node_values.npy'


#view npy data
data = np.load(NODE_VALUES)

np.set_printoptions(suppress=True)
print(data[0].tolist())
print(len(data[0]))
print(len(data))


#adjacency Matrix distances flattened?
#npy formated examples from https://graphmining.ai/temporal_datasets/METR-LA.zip https://graphmining.ai/temporal_datasets/
ADJENCY_MATRIX   = DATA / 'traffic_data' / 'METR-LA' /  'adj_mat.npy'


#view npy data
data = np.load(ADJENCY_MATRIX)
with np.printoptions(threshold=np.inf):
    print(data[0].tolist())
    print(len(data[0]))
    print(len(data))
Screenshot 2022-05-07 at 01 20 25

This PowerPoint has something to do with that: https://www.slideshare.net/chirantanGupta1/traffic-prediction-from-street-network-imagespptx

And this repository is also related to the adjacency matrix: https://github.com/FelixOpolka/STGCN-PyTorch

@ThomasAFink
Copy link

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Probably just the distances between all two points put in matrix shape (207 Detectors X 207 Detectors), but it's related to the adjacency matrix found in: adj_mat.npy

https://stackoverflow.com/questions/19412462/getting-distance-between-two-points-based-on-latitude-longitude

from geopy.distance import geodesic


origin = (30.172705, 31.526725)  # (latitude, longitude) don't confuse
dist = (30.288281, 31.732326)

print(geodesic(origin, dist).meters)  # 23576.805481751613
print(geodesic(origin, dist).kilometers)  # 23.576805481751613
print(geodesic(origin, dist).miles)  # 14.64994773134371 

@ThomasAFink
Copy link

ThomasAFink commented May 6, 2022

For each detector node the nearest 12 detectors nodes are added into the adjacency matrix, the rest are filled with 0s. 1 is always the weight path to the current detector node in the list. Could maybe also use the k-nearest neighbor algorithm.? Idk.

@ThomasAFink
Copy link

ThomasAFink commented May 6, 2022

Trying to reconstruct this example here because I have my own data from a different city: https://colab.research.google.com/drive/132hNQ0voOtTVk3I4scbD3lgmPTQub0KR?usp=sharing

Video: https://www.youtube.com/watch?v=Rws9mf1aWUs

@ThomasAFink
Copy link

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Dijkstra finds the optimal route between two detectors.
#8 (comment)

@ThomasAFink
Copy link

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Dijkstra finds the optimal route between two detectors. #8 (comment)

An example with Dijkstra using OpenStreetMap. Output slightly different than with Google Maps. https://medium.com/p/2d97d4881996

import osmnx as ox
import networkx as nx

# define the start and end locations in latlng
ox.config(log_console=True, use_cache=True)
start_latlng = (34.14745, -118.37124) #Detector 717490
# location to where you want to find your route
end_latlng = (34.15497, -118.31829) #Detector 773869


# find shortest route based on the mode of travel
place     = 'Los Angeles, California, United States'

# 'drive', 'bike', 'walk'# find shortest path based on distance or time
mode      = 'drive'

# 'length','time'# create graph from OSM within the boundaries of some       
optimizer = 'time' 

      
# geocodable place(s)
graph = ox.graph_from_place(place, network_type = mode)

# find the nearest node to the start location
orig_node = ox.get_nearest_node(graph, start_latlng)

# find the nearest node to the end location
dest_node = ox.get_nearest_node(graph, end_latlng)

#  find the shortest path
shortest_route = nx.shortest_path(graph,
                                  orig_node,
                                  dest_node,
                                  weight=optimizer)

#find the shortest path method dijkstra or bellman-ford
shortest_route_distance = nx.shortest_path_length(graph, orig_node,dest_node,weight="length", method="dijkstra")

#distance between 717490 and 773869 with OpenStreetMap is 8252.298 and the original value in the dataset was 7647.0
print("Distance: " + str(shortest_route_distance))

@EllenZYQ
Copy link

EllenZYQ commented Oct 11, 2022 via email

@ThomasAFink
Copy link

Hi, where can I find information on how distances_la_2012.csv file is created? Are those distances between every single sensor with every single other one?

Here's how I created my own distance matrix: https://github.com/ThomasAFink/osmnx_adjacency_matrix_for_graph_convolutional_networks

@EllenZYQ
Copy link

EllenZYQ commented Feb 6, 2023 via email

1 similar comment
@EllenZYQ
Copy link

EllenZYQ commented Jul 12, 2023 via email

@Kqingzheng
Copy link

Kqingzheng commented Jul 12, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants