This version is under developing.
TODO:docker to cloud.
GeoSQLToolkit: A Spatial-Temporal Data Management Tool Encompassing the Entire Lifecycle from Data Acquisition to Storage
GeoSQLToolkit is a spatial-temporal big data management tool that covers functions ranging from database creation, data acquisition, data processing, and data storage, even to lifecycle management, starting from ground zero. GeoSQLToolkit offers quick and convenient processing methods for common spatial-temporal data tasks (such as mobile signaling data, street view images, road network simplification) in research work. It also provides processing scripts for various general geographic spatial tasks (such as fishnet creation, geocoding).
In summary, GeoSQLToolKit can assist in establishing a comprehensive spatial-temporal database, providing accessible methods for data acquisition and processing commonly used in research work, and formatting the data for storage. This tool can make future research work more convenient and sustainable.
- Database Creation: Establishing a robust, well-designed spatial-temporal database from scratch.
- Data Acquisition: Primarily using web crawling techniques to obtain various types of publicly available internet data, such as POIs, street view images, and population mobility data.
- Data Processing: Cleansing, processing, and modeling a series of raw acquired data to obtain final data results for research objectives.
- Data Storage: Formatting and storing processed data, and exposing user-friendly data interfaces.
Data | Data Type | Scale | Region | Source | Status |
---|---|---|---|---|---|
Boundary | Polygon | City & Province | CN | Gaode | Mongo:Completed✅ PG: Completed✅ |
POI | Point | Point | CN | Gaode | Mongo:Developing |
AOI | Polygon | Polygon | CN | Gaode | / |
mobility strength | Graph | City | CN | Baidu | / |
social demographic | / | City | CN | Baidu | / |
Night Light Image | tif | City(m) | CN | Harvard Dataverse | / |
StreetView Image | png | Point | CN | Baidu | / |
Now we have finished Boundary and POI is underdeveloping.
Processing Porject | Processing Object | Input | Output | Description |
---|---|---|---|---|
road_regularization | road | road network | simplified road network | Simplify the intricate road network and extract the main roads |
mobile_data_process | signal data(individual with timestamp) | signal data | home/work location(grid) | extracting individual user activities and stay points to detect their residential and work locations |
Urban Renewal Indicator Calculator | series of urban elements(poi, block, road...) | urban elements | urban indices(density, coverage rate...) | calculating the main indices used in urban planning |
Social Segregation | signal data(individual with timestamp) | signal data | segregation_indices(PSI) in different prospectives(individual, unit, time) | Reference: Xu, Y., Belyi, A., Santi, P. and Ratti, C. Quantifying segregation in an integrated urban physical-social space. Journal of The Royal Society Interface, 16: 20190536. |
Night Light Image Vitality Index | NIL | NIL | Vitality Index in this NIL | Calculating the Night Light index in NIL |
-
The structure of postgres --- a easy-to-use interface for users.
Compared to MongoDB, PostgreSQL is a structured database. This implies that data within PostgreSQL needs to be further simplified and refined after retrieval from the MongoDB database. At the same time, it also signifies that users, when accessing data through the data interface, do not need to overly concern themselves with data structure matters. Therefore, utilizing PostgreSQL to provide interfaces for data products offers convenience and user-friendliness
-
Configure your Database-related softwares (pgsql, mongodb...)
-
input your database params in db_conn.py
class PGConfigs(str, Enum):
HOST = 'your host'
PORT = 'your port'
Database = 'your db'
User = 'your user name in pgsql'
Password = 'your pw in pgsql'
# For MongoDB to store raw data
client = MongoClient("your mongo url")
RAW_DB = client["your mongo collection"]
# For PGSQL to store product data
DATA_PRO = create_engine(
f"postgresql://{PGConfigs.User}:{PGConfigs.Password}@{PGConfigs.HOST}:{PGConfigs.PORT}/{PGConfigs.Database}")
- Open python file: ./data_collection/boundary/administration_boundaries.py and run the following codes:
if __name__ == "__main__":
start_time = time.time()
# level: 'province' or 'city'.
BoundariesCrawler(level="province").crawl_boundaries()
print("--- %s seconds ---" % (time.time() - start_time))
- Run the Python File: ./mongo_raw_to_pro/boundary.py to put the data from RAW_DB to PRO_DB:
if __name__ == "__main__":
process_province(2023)
process_city(2023)
- Run the Python File: ./mongo2postgres/boundary.py:
if __name__ == "__main__":
amap_province(2023)
amap_city(2023)
The code is intended for personal studying and research purposes only. Please do not use it for any non-scientific or illegal purposes.
Feel free to contact for technical discussions (kingsleyl0107@gmail.com).