There are some things to check before using geospatial data (shp file).  
It is the accuracy of figures (polygons) that can be seen as containers containing data.  
The basic checks I make when dealing with shp files are:  
1. Whether there is overlap of figures (polygons)  
2. Check if there is an error in the figures (polygons)  
   (For example, in the case of building information,
    if the size of a figure pointing to a specific building is too small to be viewed as the building information,
    or if unnecessary figures exist)

I use the "geopandas" package to check and refine the above.

In [1]:
import pandas as pd
import numpy as np
import geopandas as gpd
pd.set_option('mode.chained_assignment',  None)

Load shp file
- data name : "건축물연령공간정보" - 부산광역시 중구
- data source : The National Spatial Information Portal (http://openapi.nsdi.go.kr/nsdi/index.do)

In [7]:
df = gpd.read_file('data/국가공간정보포털_건축물연령정보_부산중구/AL_26110_D196_20230111.shp', sep = ",", encoding='cp949')
print(df.shape)

(6536, 32)


In [8]:
df.head()

Unnamed: 0,A0,A1,A2,A3,A4,A5,A6,A7,A8,A9,...,A22,A23,A24,A25,A26,A27,A28,A29,A30,geometry
0,61803665,1988202940501795945100000000,2611010100100010092,2611010100,부산광역시 중구 영주동,1,일반,1-92,2376,2,...,1.0,1988-04-07,1988-12-26,36.0,04,30대,040,40세미만,2023-01-11,"POLYGON ((385247.116 181518.357, 385274.950 18..."
1,61803670,1976203036551795391300000000,2611010100100020000,2611010100,부산광역시 중구 영주동,1,일반,2,2377,2,...,0.0,,1976-01-21,48.0,05,40대,050,50세미만,2023-01-11,"POLYGON ((385355.766 181451.422, 385363.593 18..."
2,61803671,1976203000751795639200000000,2611010100100020078,2611010100,부산광역시 중구 영주동,1,일반,2-78,2378,2,...,0.0,,1976-01-21,48.0,05,40대,050,50세미만,2023-01-11,"POLYGON ((385318.857 181475.157, 385326.733 18..."
3,61803672,1976203018211795506600000000,2611010100100020130,2611010100,부산광역시 중구 영주동,1,일반,2-130,2379,2,...,0.0,,1976-01-21,48.0,05,40대,050,50세미만,2023-01-11,"POLYGON ((385336.746 181462.278, 385344.613 18..."
4,61803674,0000203144251794533700000000,2611010100100040001,2611010100,부산광역시 중구 영주동,1,일반,4-1,379,1,...,0.0,,,,ZZ,기타,ZZZ,구분없,2023-01-11,"POLYGON ((385468.516 181348.727, 385467.869 18..."


1. Whether there is overlap of figures (polygons)

- Check for duplicate values in the "geometry" column.

  The "geometry" column is the coordinate information of each polygon object that is automatically created when a shp file is imported into geopandas.

In [3]:
df[df.duplicated(subset=['geometry'], keep='first')]

Unnamed: 0,A0,A1,A2,A3,A4,A5,A6,A7,A8,A9,...,A22,A23,A24,A25,A26,A27,A28,A29,A30,geometry
1890,61805997,2018203345041779409000000000,2611010900100030014,2611010900,부산광역시 중구 중앙동6가,1,일반,3-14,100182458,1,...,1.0,2017-10-20,2018-04-24,6.0,1,10세미만,10,10세미만,2023-01-11,"POLYGON ((385690.379 179835.910, 385692.613 17..."


Looking at the above result,  
the number of duplicate figures (the number of figures to be removed) is 1 in total.

2. Check if there is an error in the figures (polygons)

- First, calculate the area of the figure with the geometry column.
- Check whether the calculated area of the figure is too small or too large to be tolerated due to the nature of the data.

In [4]:
df['area'] = df['geometry'].map(lambda x : round(x.area, 2))

Since the sample data is building information,  
data with too small an "area" value is likely to be an error.  
Therefore, it is necessary to visualize and review after extracting the data for which the "area" size value is less than 1 (the minimum value that fits the characteristics of the data).

In [5]:
print(len(df[df['area']<1]))
df[df['area']<1]

62


Unnamed: 0,A0,A1,A2,A3,A4,A5,A6,A7,A8,A9,...,A23,A24,A25,A26,A27,A28,A29,A30,geometry,area
120,61803828,0000203206271792593100000000,2611010100100500006,2611010100,부산광역시 중구 영주동,1,일반,50-6,679,1,...,,,,ZZ,기타,ZZZ,구분없,2023-01-11,"POLYGON ((385530.983 181154.846, 385531.070 18...",0.78
344,61804125,1980202729171793535600000000,2611010100101510001,2611010100,부산광역시 중구 영주동,1,일반,151-1,991,1,...,1980-07-04,1980-10-23,44.0,05,40대,045,45세미만,2023-01-11,"POLYGON ((385051.885 181239.534, 385051.972 18...",0.78
349,61804129,1972202713001792994400000000,2611010100101510006,2611010100,부산광역시 중구 영주동,1,일반,151-6,997,1,...,,,,ZZ,기타,ZZZ,구분없,2023-01-11,"POLYGON ((385036.793 181185.077, 385036.881 18...",0.78
358,61804140,1972202712871793233300000000,2611010100101510017,2611010100,부산광역시 중구 영주동,1,일반,151-17,1008,1,...,1972-07-24,1972-10-27,52.0,06,50대,055,55세미만,2023-01-11,"POLYGON ((385036.183 181208.967, 385036.271 18...",0.78
789,61804619,1993202627831790743000000000,2611010100103210000,2611010100,부산광역시 중구 영주동,1,일반,321,1455,1,...,1990-10-31,1993-12-14,31.0,04,30대,035,35세미만,2023-01-11,"POLYGON ((384956.133 180958.173, 384956.220 18...",0.78
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5915,61810673,1954203080301778048100000000,2611013300100050003,2611013300,부산광역시 중구 광복동1가,1,일반,5-3,11401,1,...,,,,ZZ,기타,ZZZ,구분없,2023-01-11,"POLYGON ((385434.200 179697.494, 385434.287 17...",0.78
5961,61810720,1954203120661777025800000000,2611013300100380006,2611013300,부산광역시 중구 광복동1가,1,일반,38-6,11462,1,...,,,,ZZ,기타,ZZZ,구분없,2023-01-11,"POLYGON ((385476.621 179596.045, 385476.707 17...",0.78
6363,61811188,1949202548791776727500000000,2611014000100370007,2611014000,부산광역시 중구 남포동5가,1,일반,37-7,12347,1,...,,,,ZZ,기타,ZZZ,구분없,2023-01-11,"POLYGON ((384905.219 179554.728, 384905.307 17...",0.78
6426,61811264,1965202725731775601900000000,2611014000101130002,2611014000,부산광역시 중구 남포동5가,1,일반,113-2,12468,1,...,,,,ZZ,기타,ZZZ,구분없,2023-01-11,"POLYGON ((385084.461 179445.700, 385084.548 17...",0.78
