학습 목표
- 서울시내 중학교 진학률 데이터세트 (지역에 따른 군집)
- LabelEncoder, OneHotEncoder 필요
- 지도시각화 (위도, 경도) -> folium

In [1]:
from sklearn.cluster import KMeans

import pandas as pd
import numpy as np
import folium

import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]:
path = './data/middle_shcool_graduates_report.xlsx'
df = pd.read_excel(path)
df.head()

Unnamed: 0,지역,학교명,코드,유형,주야,남학생수,여학생수,일반고,특성화고,과학고,외고_국제고,예고_체고,마이스터고,자사고,자공고,기타진학,취업,미상,위도,경도
0,성북구,서울대학교사범대학부설중학교,3,국립,주간,277,0,0.585,0.148,0.018,0.007,0.0,0.011,0.227,0.0,0.004,0,0.0,37.594942,127.038909
1,종로구,서울대학교사범대학부설여자중학교,3,국립,주간,0,256,0.68,0.199,0.0,0.035,0.008,0.0,0.043,0.004,0.031,0,0.0,37.577473,127.003857
2,강남구,개원중학교,3,공립,주간,170,152,0.817,0.047,0.009,0.012,0.003,0.006,0.09,0.003,0.009,0,0.003,37.491637,127.071744
3,강남구,개포중학교,3,공립,주간,83,72,0.755,0.097,0.013,0.013,0.019,0.019,0.065,0.0,0.019,0,0.0,37.480439,127.062201
4,서초구,경원중학교,3,공립,주간,199,212,0.669,0.017,0.007,0.01,0.005,0.0,0.282,0.0,0.01,0,0.0,37.51075,127.0089


In [3]:
print(df.columns.values)

['지역' '학교명' '코드' '유형' '주야' '남학생수' '여학생수' '일반고' '특성화고' '과학고' '외고_국제고'
 '예고_체고' '마이스터고' '자사고' '자공고' '기타진학' '취업' '미상' '위도' '경도']


In [4]:
# 지도에 위치표시

school_map = folium.Map(location = [37.55, 126.98],  
                       zoom_start = 12)

for name, lat, lng in zip(df.학교명, df.위도, df.경도):
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        color = 'coral',
        fill = True, 
        fill_opacity = .5,
        popup = name
    ).add_to(school_map)

school_map

진행절차
- 데이터 전처리 - 원핫인코딩
- 군집모형 생성 - 분석에 사용할 피처는 과학고, 외고국제고, 자사고
- 표준화
- 모형 객체 생성
- 모형 학습
- 예측
- 예측 결과를 데이터 프레임에 추가
- 클러스터 값으로 그룹화, 그룹별 내용 출력
- 지도 그래프로 시각화

In [5]:
dist_df = df['지역']

dist_encoded = pd.get_dummies(dist_df)

pd.concat([df, dist_encoded], axis = 1)

Unnamed: 0,지역,학교명,코드,유형,주야,남학생수,여학생수,일반고,특성화고,과학고,...,성동구,성북구,송파구,양천구,영등포구,용산구,은평구,종로구,중구,중랑구
0,성북구,서울대학교사범대학부설중학교,3,국립,주간,277,0,0.585,0.148,0.018,...,0,1,0,0,0,0,0,0,0,0
1,종로구,서울대학교사범대학부설여자중학교,3,국립,주간,0,256,0.680,0.199,0.000,...,0,0,0,0,0,0,0,1,0,0
2,강남구,개원중학교,3,공립,주간,170,152,0.817,0.047,0.009,...,0,0,0,0,0,0,0,0,0,0
3,강남구,개포중학교,3,공립,주간,83,72,0.755,0.097,0.013,...,0,0,0,0,0,0,0,0,0,0
4,서초구,경원중학교,3,공립,주간,199,212,0.669,0.017,0.007,...,0,0,0,0,0,0,0,0,0,0
5,강남구,구룡중학교,3,공립,주간,153,133,0.787,0.066,0.007,...,0,0,0,0,0,0,0,0,0,0
6,강남구,압구정중학교,3,공립,주간,111,86,0.589,0.015,0.015,...,0,0,0,0,0,0,0,0,0,0
7,강남구,단국대학교사범대학부속중학교,3,사립,주간,218,0,0.752,0.000,0.032,...,0,0,0,0,0,0,0,0,0,0
8,강남구,대명중학교,3,공립,주간,250,206,0.757,0.018,0.013,...,0,0,0,0,0,0,0,0,0,0
9,강남구,대왕중학교,3,공립,주간,183,178,0.814,0.000,0.006,...,0,0,0,0,0,0,0,0,0,0


In [6]:
c_kmeans = KMeans(n_clusters=3)

c_kmeans.fit_transform(df[['과학고', '외고_국제고', '자사고']])

array([[0.0899901 , 0.09780343, 0.18967294],
       [0.09716166, 0.28105208, 0.02427553],
       [0.04870595, 0.23372091, 0.05238525],
       ...,
       [0.13960774, 0.32447171, 0.03976285],
       [0.13960774, 0.32447171, 0.03976285],
       [0.12867914, 0.31373204, 0.03081803]])

In [7]:
df['cluster_id'] = c_kmeans.labels_

In [13]:
# 지도에 위치표시

school_map = folium.Map(location = [37.55, 126.98],  
                       zoom_start = 12)

for name, lat, lng in zip(df.학교명, df.위도, df.경도):
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        color = 'coral',
        fill = True, 
        fill_opacity = .5,
        popup = name
    ).add_to(school_map)

school_map