Skip to content

saimmehmood/semantic_relationships

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Semantic Relationships of Geographical Areas based on Trajectories

A set of tools to understand semantic and geographical proximity between different geographical regions.

Best Paper Award for the 21st IEEE Mobile Data Management 2020, Versailles, France (download paper from proceedings: http://mdmconferences.org/mdm2020/online_program.html)

Abstract

Mining trajectory data to find interesting patterns is of increasing research interest due to a broad range of useful applications, including analysis of transportation systems, location-based social networks, and crowd behavior. The primary focus of this research is to leverage the abundance of trajectory data to automatically and accurately learn latent semantic relationships between different geographical areas (e.g., semantically correlated neighborhoods of a city) as revealed by patterns of moving objects over time. While previous studies have utilized trajectories for this type of analysis at the level of a single geographical area, the results cannot be easily generalized to inform comparative analysis of different geographical areas. In this paper, we study this problem systematically. First, we present a method that utilizes trajectories to learn low-dimensional representations of geographical areas in an embedded space. Then, we develop a statistical method that allows to quantify the degree to which real trajectories deviate from a theoretical null model. The method allows to (a) distinguish geographical proximity to semantic proximity, and (b) inform a comparative analysis of two (or more) models obtained by trajectories defined on different geographical areas. This deep analysis can improve our understanding of how space is perceived by individuals and inform better decisions of urban planning. Our experimental evaluation aims to demonstrate the effectiveness and usefulness of the proposed statistical method in two large-scale real-world data sets coming from the New York City and the city of Porto, Portugal, respectively. The methods we present are generic and can be utilized to inform a number of useful applications, ranging from location-based services, such as point-of-interest recommendations, to finding semantic relationships between different cities.

datasets

code

  • uniform_grid.py - generates a uniform grid by taking diagonal coordinates for the geographical space.

  • postgre.sql - contains postgres/postgis queries to store trajectories and geographical area grid cells into the database and convert trajectory from a set of geospatial coordinates into set of grid cells.

  • new_york_taxi.py - contains code for fetching trajectory paths through Google Directions API by providing starting and ending points of taxi rides taken in the area of Manhattan.

  • calculation.py - contains code for fetching coordinates.

  • real_model_graph.py - this code generates walks of trajectory paths based on cells ids. It outputs trajectory paths as list of cell ids.

  • shuffle_walks.py - once the walks of cell ids are generated, this code file generates trajectory permutations for real, null and alternate null models.

  • real_model_main.py - this is the modified version of node2vec that avoids generating random walks. We are giving it real walks generated by trajectories passing through grid cells. As output, embeddings (.emb files) are obtained through this code.

> python3 real_model_main.py --input walks.txt --output nodes.emb
  • tsne-vis.py - further we generate embeddings visualization using this code.

  • embeddings_tsne.py - this code generates a graph by adding edges between grid cells. Each cell is considered as a node. Each cell is connected to its adjacent cells i.e., (top, bottom, left & right). As output, an edge list of cell ids is generated in a format accepted by node2vec.

  • getting_vectors_cosine_sim.py - once we have obtained embeddings for the real and null models, we calculate our quantitative analysis metric i.e., cosine similarity between vector embeddings.

  • top_k_cos_diff.py - by comparing the similarity of pairs of nodes in different models, it is possible to discover interesting ones. These are pairs of nodes that expose a large difference of their similarity score in two underlying models (e.g., real vs null model).

visualizations

  • contains folders with visualizations for cosine similarity, embeddings, heat maps & histograms showing quantitative and qualitative analysis.

ppt

extensions

  • this folder contains extended code that cover analysis on a different level of granularity i.e., to include semantic analysis of point-of-interests. There is a data folder that contains POI data fetched through Google Places API.

  • fetching_pois.py - contains code that fetches POIs in a specified geographical area using Google Places API.

  • query.sql - contains postgis query that fetches cell ids and poi ids that are inside those cells.

  • traj_as_poi.py - converts trajectory as walks on grid cells into trajectory as POI walks by compraing walks on grid cells with pois inside those grid cells.

Acknowledgments

This repo was helpful in writing code to fetch POIs from Google API: (https://github.com/slimkrazy/python-google-places)

Trajectory fetching code was written in collaboration with Jay: (https://github.com/jaycenca)

For plotting functionality, this repo was helpful: (https://github.com/vgm64/gmplot)

About

Framework to understand semantic relationships between geographical areas based on object movement paths (trajectories).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages