Lei Dong, Xiaohui Yuan, Meng Li, Carlo Ratti, and Yu Liu
Measuring the geographical distribution of economic activity plays a key role in scientific research and policymaking. However, previous studies and data on economic activity either have a coarse spatial resolution or cover a limited time span, and the high-resolution characteristics of socioeconomic dynamics are largely unknown. Here, we construct a dataset on the economic activity of mainland China, the gridded establishment dataset (GED), which measures the volume of establishments at a 0.01$^{\circ}$ latitude by 0.01$^{\circ}$ longitude scale. Specifically, our dataset captures the geographically based opening and closing of approximately 25.5 million companies that registered in mainland China over the period 2005-2015. The characteristics of fine granularity and long-term observability give the GED a high application value. The dataset not only allows us to quantify the spatiotemporal patterns of the establishments, urban vibrancy, and socioeconomic activity, but also helps us uncover the fundamental principles underlying the dynamics of industrial and economic development.
firm_scientific_data.ipynb
: the Python code for preprocessing and aggregating establishment data.firm_stat.R
: the R code for data analysis (Figures 2, 4, 5; Table 3).
main data
: downloadyearbook_light.csv
: data for Figure 4.Beijing_Popu_1km.csv
: data for Figure 5.
contact: arch.dongl@gmail.com
Dong, L., Yuan, X., Li, M., Ratti, C., & Liu, Y. A gridded establishment dataset as a proxy for economic activity in China. Sci Data 8, 5 (2021). https://doi.org/10.1038/s41597-020-00792-9