# RFM을 이용하여 상위, 하위 업종 분류하기

1. 1차적으로 성장, 쇠퇴 산업을 구분하여 쇠퇴산업에 집중하여 분석을 실시.
- 성장, 쇠퇴 산업을 상위 15개로 한정
- RFM 지수를 이용한 산업 구분
- r: 분기를 나타냄(2016-1 = 1, ... 2021-2 = 22), f: 매출건수, m:매출액
    - 즉, RFM은 최근 데이터일수록 높은 값을 가지는 산업 구분 지표
    - r이 가중치 역할
    - 이를 통해 성장, 쇠퇴 산업을 구분
    
- R : 분기당 변동량(최근 지표까지 파악 가능) - 연도별, 분기별로 점수매겨 2021-2분기가 제일 높게
- F : 건수
- M : 매출액

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from matplotlib import font_manager, rc
font_path='C:\Windows\Fonts/malgun.ttf'
font=font_manager.FontProperties(fname=font_path).get_name() # 폰트 적용
rc('font',family=font)

In [1]:
s2021=pd.read_csv('../data/서울시2021.csv',encoding='utf-8')
s2020=pd.read_csv('../data/서울시2020.csv',encoding='utf-8')
s2019=pd.read_csv('../data/서울시2019.csv',encoding='utf-8')
s2018=pd.read_csv('../data/서울시2018.csv',encoding='utf-8')
s2017=pd.read_csv('../data/서울시2017.csv',encoding='utf-8')
s2016=pd.read_csv('../data/서울시2016.csv',encoding='utf-8')

In [4]:
pd.concat([s2016, s2017, s2018, s2019, s2020, s2021], axis=0).to_csv('../data/서울시_merged.csv')

In [51]:
s2021.head(2)

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,시간대_건수~24_매출_건수,남성_매출_건수,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수
0,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300043,전자상거래업,5836078.0,92,...,0,14,74,0,27,42,5,5,9,8
1,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300035,인테리어,86120359.0,739,...,0,370,0,0,0,0,370,0,0,4


## 2021년 데이터로만 먼저 해보기

### M지표부터 이용하여 매출액 분류하기

- 매출액 데이터들을 합하여 M지표로 계산

- 매출금액/(매출건수*점포수)

In [5]:
s2021['M지표']=s2021['분기당_매출_금액']/(s2021['분기당_매출_건수']*s2021['점포수'])
s2021['M지표']=np.log1p(s2021['M지표']) # 숫자간 격차가 너무 커서 log1p사용
s2021.head(2)

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,남성_매출_건수,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표
0,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300043,전자상거래업,5836078.0,92,...,14,74,0,27,42,5,5,9,8,8.978466
1,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300035,인테리어,86120359.0,739,...,370,0,0,0,0,370,0,0,4,10.279698


### R점수 계산

- EX) 2021년 2분기의 경우 6.25점으로
- 2016년부터 0.0, 0.25... 이렇게 시작

In [7]:
a=s2021[s2021['기준_분기_코드']==1]
a.head(2)

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,남성_매출_건수,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표
32378,2021,1,U,관광특구,1001496,강남 마이스 관광특구,CS300043,전자상거래업,524674.0,5,...,0,5,0,0,5,0,0,0,8,9.481729
32379,2021,1,U,관광특구,1001496,강남 마이스 관광특구,CS300032,가전제품,2403139000.0,8727,...,3692,4386,120,2034,2064,1728,1333,797,8,10.446452


In [8]:
a['R지표']=6.0
a['R지표']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


32378    6.0
32379    6.0
32380    6.0
32381    6.0
32382    6.0
        ... 
64733    6.0
64734    6.0
64735    6.0
64736    6.0
64737    6.0
Name: R지표, Length: 32360, dtype: float64

In [9]:
b=s2021[s2021['기준_분기_코드']==2]
b['R지표']=6.25

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [57]:
b

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
0,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300043,전자상거래업,5.836078e+06,92,...,74,0,27,42,5,5,9,8,8.978466,6.25
1,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300035,인테리어,8.612036e+07,739,...,0,0,0,0,370,0,0,4,10.279698,6.25
2,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300032,가전제품,1.829845e+09,8110,...,3729,84,1750,2096,1986,882,943,8,10.247238,6.25
3,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300031,가구,1.101960e+10,5852,...,3433,0,97,544,1274,1655,2282,6,12.656646,6.25
4,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300028,화초,1.026839e+09,22212,...,12018,36,6420,7344,3521,2384,1311,6,8.949734,6.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32373,2021,2,A,골목상권,1000001,계동길,CS100005,제과점,1.829857e+08,20731,...,11180,321,2349,5677,4307,3077,3363,3,7.987261,6.25
32374,2021,2,A,골목상권,1000001,계동길,CS100004,양식음식점,2.934674e+08,6381,...,3418,38,2204,1558,814,829,327,4,9.349990,6.25
32375,2021,2,A,골목상권,1000001,계동길,CS100003,일식음식점,9.165669e+07,2260,...,1050,0,222,451,389,364,315,2,9.917343,6.25
32376,2021,2,A,골목상권,1000001,계동길,CS100002,중식음식점,1.317797e+07,768,...,363,112,155,165,149,91,43,1,9.750326,6.25


In [10]:
s2021=pd.concat([b,a])
s2021

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
0,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300043,전자상거래업,5.836078e+06,92,...,74,0,27,42,5,5,9,8,8.978466,6.25
1,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300035,인테리어,8.612036e+07,739,...,0,0,0,0,370,0,0,4,10.279698,6.25
2,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300032,가전제품,1.829845e+09,8110,...,3729,84,1750,2096,1986,882,943,8,10.247238,6.25
3,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300031,가구,1.101960e+10,5852,...,3433,0,97,544,1274,1655,2282,6,12.656646,6.25
4,2021,2,U,관광특구,1001496,강남 마이스 관광특구,CS300028,화초,1.026839e+09,22212,...,12018,36,6420,7344,3521,2384,1311,6,8.949734,6.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64733,2021,1,A,골목상권,1000001,계동길,CS100005,제과점,1.656883e+08,18512,...,9187,208,2471,4577,3857,2990,2609,4,7.713597,6.00
64734,2021,1,A,골목상권,1000001,계동길,CS100004,양식음식점,1.994485e+08,4442,...,2175,31,1505,1155,439,566,256,3,9.613661,6.00
64735,2021,1,A,골목상권,1000001,계동길,CS100003,일식음식점,6.032547e+07,1563,...,635,0,185,339,242,279,112,2,9.867807,6.00
64736,2021,1,A,골목상권,1000001,계동길,CS100002,중식음식점,3.979185e+07,2042,...,1041,243,629,307,322,269,97,1,9.877539,6.00


### F지표 구하기

In [11]:
s2021['분기당_매출_건수']

0           92
1          739
2         8110
3         5852
4        22212
         ...  
64733    18512
64734     4442
64735     1563
64736     2042
64737    13820
Name: 분기당_매출_건수, Length: 64738, dtype: int64

In [12]:
s2021['F지표']=s2021['분기당_매출_건수']/s2021['점포수']

In [13]:
s2021['F지표']=np.log1p(s2021['F지표'])

In [32]:
s2021.columns

Index(['기준_년_코드', '기준_분기_코드', '상권_구분_코드', '상권_구분_코드_명', '상권_코드', '상권_코드_명',
       '서비스_업종_코드', '서비스_업종_코드_명', '분기당_매출_금액', '분기당_매출_건수', '주중_매출_비율',
       '주말_매출_비율', '월요일_매출_비율', '화요일_매출_비율', '수요일_매출_비율', '목요일_매출_비율',
       '금요일_매출_비율', '토요일_매출_비율', '일요일_매출_비율', '시간대_00~06_매출_비율',
       '시간대_06~11_매출_비율', '시간대_11~14_매출_비율', '시간대_14~17_매출_비율',
       '시간대_17~21_매출_비율', '시간대_21~24_매출_비율', '남성_매출_비율', '여성_매출_비율',
       '연령대_10_매출_비율', '연령대_20_매출_비율', '연령대_30_매출_비율', '연령대_40_매출_비율',
       '연령대_50_매출_비율', '연령대_60_이상_매출_비율', '주중_매출_금액', '주말_매출_금액', '월요일_매출_금액',
       '화요일_매출_금액', '수요일_매출_금액', '목요일_매출_금액', '금요일_매출_금액', '토요일_매출_금액',
       '일요일_매출_금액', '시간대_00~06_매출_금액', '시간대_06~11_매출_금액', '시간대_11~14_매출_금액',
       '시간대_14~17_매출_금액', '시간대_17~21_매출_금액', '시간대_21~24_매출_금액', '남성_매출_금액',
       '여성_매출_금액', '연령대_10_매출_금액', '연령대_20_매출_금액', '연령대_30_매출_금액',
       '연령대_40_매출_금액', '연령대_50_매출_금액', '연령대_60_이상_매출_금액', '주중_매출_건수',
       '주말_매출_건수', '월요일_매출_건수', '화요일_매출_건수', '수요일_매출_건수', '목요일_매출_건수',

In [35]:
rfm2021=s2021[['기준_년_코드', '기준_분기_코드', '남성_매출_비율', '여성_매출_비율', '연령대_10_매출_비율', 
               '연령대_20_매출_비율', '연령대_30_매출_비율', '연령대_40_매출_비율',
               '연령대_50_매출_비율', '연령대_60_이상_매출_비율', '월요일_매출_비율', '화요일_매출_비율', '수요일_매출_비율', '목요일_매출_비율',
               '금요일_매출_비율', '토요일_매출_비율', '일요일_매출_비율',
               '서비스_업종_코드_명','R지표','F지표','M지표', '분기당_매출_금액', '상권_구분_코드_명']]
rfm2021

Unnamed: 0,기준_년_코드,기준_분기_코드,남성_매출_비율,여성_매출_비율,연령대_10_매출_비율,연령대_20_매출_비율,연령대_30_매출_비율,연령대_40_매출_비율,연령대_50_매출_비율,연령대_60_이상_매출_비율,...,목요일_매출_비율,금요일_매출_비율,토요일_매출_비율,일요일_매출_비율,서비스_업종_코드_명,R지표,F지표,M지표,분기당_매출_금액,상권_구분_코드_명
0,2021,2,29,71,0,18,68,6,4,4,...,0,15,24,45,전자상거래업,6.25,2.525729,8.978466,5.836078e+06,관광특구
1,2021,2,100,0,0,0,0,100,0,0,...,0,86,0,0,인테리어,6.25,5.224402,10.279698,8.612036e+07,관광특구
2,2021,2,54,47,0,18,26,25,23,7,...,21,13,19,22,가전제품,6.25,6.922398,10.247238,1.829845e+09,관광특구
3,2021,2,43,57,0,2,10,21,33,34,...,13,10,20,22,가구,6.25,6.883804,12.656646,1.101960e+10,관광특구
4,2021,2,48,53,0,23,31,20,13,13,...,9,31,24,9,화초,6.25,8.216899,8.949734,1.026839e+09,관광특구
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64733,2021,1,37,45,1,11,19,18,16,17,...,14,14,13,12,제과점,6.00,8.440096,7.713597,1.656883e+08,골목상권
64734,2021,1,40,43,0,28,25,9,15,6,...,10,14,25,21,양식음식점,6.00,7.300923,9.613661,1.994485e+08,골목상권
64735,2021,1,30,36,0,8,19,16,16,7,...,9,18,16,0,일식음식점,6.00,6.662494,9.867807,6.032547e+07,골목상권
64736,2021,1,39,49,10,33,15,13,12,5,...,14,12,14,18,중식음식점,6.00,7.622175,9.877539,3.979185e+07,골목상권


In [36]:
rfm2021['F지표']=rfm2021['F지표'].round(1)
rfm2021['M지표']=rfm2021['M지표'].round(1)
rfm2021.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,기준_년_코드,기준_분기_코드,남성_매출_비율,여성_매출_비율,연령대_10_매출_비율,연령대_20_매출_비율,연령대_30_매출_비율,연령대_40_매출_비율,연령대_50_매출_비율,연령대_60_이상_매출_비율,...,목요일_매출_비율,금요일_매출_비율,토요일_매출_비율,일요일_매출_비율,서비스_업종_코드_명,R지표,F지표,M지표,분기당_매출_금액,상권_구분_코드_명
0,2021,2,29,71,0,18,68,6,4,4,...,0,15,24,45,전자상거래업,6.25,2.5,9.0,5836078.0,관광특구
1,2021,2,100,0,0,0,0,100,0,0,...,0,86,0,0,인테리어,6.25,5.2,10.3,86120360.0,관광특구
2,2021,2,54,47,0,18,26,25,23,7,...,21,13,19,22,가전제품,6.25,6.9,10.2,1829845000.0,관광특구
3,2021,2,43,57,0,2,10,21,33,34,...,13,10,20,22,가구,6.25,6.9,12.7,11019600000.0,관광특구
4,2021,2,48,53,0,23,31,20,13,13,...,9,31,24,9,화초,6.25,8.2,8.9,1026839000.0,관광특구


In [37]:
# rfm2021.loc[:,'서비스_업종_코드_명'] = rfm2021.loc[:,'서비스_업종_코드_명'].astype('category').cat.codes

In [38]:
rfm2021

Unnamed: 0,기준_년_코드,기준_분기_코드,남성_매출_비율,여성_매출_비율,연령대_10_매출_비율,연령대_20_매출_비율,연령대_30_매출_비율,연령대_40_매출_비율,연령대_50_매출_비율,연령대_60_이상_매출_비율,...,목요일_매출_비율,금요일_매출_비율,토요일_매출_비율,일요일_매출_비율,서비스_업종_코드_명,R지표,F지표,M지표,분기당_매출_금액,상권_구분_코드_명
0,2021,2,29,71,0,18,68,6,4,4,...,0,15,24,45,전자상거래업,6.25,2.5,9.0,5.836078e+06,관광특구
1,2021,2,100,0,0,0,0,100,0,0,...,0,86,0,0,인테리어,6.25,5.2,10.3,8.612036e+07,관광특구
2,2021,2,54,47,0,18,26,25,23,7,...,21,13,19,22,가전제품,6.25,6.9,10.2,1.829845e+09,관광특구
3,2021,2,43,57,0,2,10,21,33,34,...,13,10,20,22,가구,6.25,6.9,12.7,1.101960e+10,관광특구
4,2021,2,48,53,0,23,31,20,13,13,...,9,31,24,9,화초,6.25,8.2,8.9,1.026839e+09,관광특구
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64733,2021,1,37,45,1,11,19,18,16,17,...,14,14,13,12,제과점,6.00,8.4,7.7,1.656883e+08,골목상권
64734,2021,1,40,43,0,28,25,9,15,6,...,10,14,25,21,양식음식점,6.00,7.3,9.6,1.994485e+08,골목상권
64735,2021,1,30,36,0,8,19,16,16,7,...,9,18,16,0,일식음식점,6.00,6.7,9.9,6.032547e+07,골목상권
64736,2021,1,39,49,10,33,15,13,12,5,...,14,12,14,18,중식음식점,6.00,7.6,9.9,3.979185e+07,골목상권


In [39]:
# np.isposinf(rfm2021)
rfm2021=rfm2021.replace([np.inf,-np.inf],np.nan)
rfm2021

Unnamed: 0,기준_년_코드,기준_분기_코드,남성_매출_비율,여성_매출_비율,연령대_10_매출_비율,연령대_20_매출_비율,연령대_30_매출_비율,연령대_40_매출_비율,연령대_50_매출_비율,연령대_60_이상_매출_비율,...,목요일_매출_비율,금요일_매출_비율,토요일_매출_비율,일요일_매출_비율,서비스_업종_코드_명,R지표,F지표,M지표,분기당_매출_금액,상권_구분_코드_명
0,2021,2,29,71,0,18,68,6,4,4,...,0,15,24,45,전자상거래업,6.25,2.5,9.0,5.836078e+06,관광특구
1,2021,2,100,0,0,0,0,100,0,0,...,0,86,0,0,인테리어,6.25,5.2,10.3,8.612036e+07,관광특구
2,2021,2,54,47,0,18,26,25,23,7,...,21,13,19,22,가전제품,6.25,6.9,10.2,1.829845e+09,관광특구
3,2021,2,43,57,0,2,10,21,33,34,...,13,10,20,22,가구,6.25,6.9,12.7,1.101960e+10,관광특구
4,2021,2,48,53,0,23,31,20,13,13,...,9,31,24,9,화초,6.25,8.2,8.9,1.026839e+09,관광특구
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64733,2021,1,37,45,1,11,19,18,16,17,...,14,14,13,12,제과점,6.00,8.4,7.7,1.656883e+08,골목상권
64734,2021,1,40,43,0,28,25,9,15,6,...,10,14,25,21,양식음식점,6.00,7.3,9.6,1.994485e+08,골목상권
64735,2021,1,30,36,0,8,19,16,16,7,...,9,18,16,0,일식음식점,6.00,6.7,9.9,6.032547e+07,골목상권
64736,2021,1,39,49,10,33,15,13,12,5,...,14,12,14,18,중식음식점,6.00,7.6,9.9,3.979185e+07,골목상권


In [40]:
rfm2021.isnull().sum()

기준_년_코드              0
기준_분기_코드             0
남성_매출_비율             0
여성_매출_비율             0
연령대_10_매출_비율         0
연령대_20_매출_비율         0
연령대_30_매출_비율         0
연령대_40_매출_비율         0
연령대_50_매출_비율         0
연령대_60_이상_매출_비율      0
월요일_매출_비율            0
화요일_매출_비율            0
수요일_매출_비율            0
목요일_매출_비율            0
금요일_매출_비율            0
토요일_매출_비율            0
일요일_매출_비율            0
서비스_업종_코드_명          0
R지표                  0
F지표                412
M지표                412
분기당_매출_금액            0
상권_구분_코드_명           0
dtype: int64

In [41]:
rfm2021=rfm2021.dropna(axis=0)
rfm2021

Unnamed: 0,기준_년_코드,기준_분기_코드,남성_매출_비율,여성_매출_비율,연령대_10_매출_비율,연령대_20_매출_비율,연령대_30_매출_비율,연령대_40_매출_비율,연령대_50_매출_비율,연령대_60_이상_매출_비율,...,목요일_매출_비율,금요일_매출_비율,토요일_매출_비율,일요일_매출_비율,서비스_업종_코드_명,R지표,F지표,M지표,분기당_매출_금액,상권_구분_코드_명
0,2021,2,29,71,0,18,68,6,4,4,...,0,15,24,45,전자상거래업,6.25,2.5,9.0,5.836078e+06,관광특구
1,2021,2,100,0,0,0,0,100,0,0,...,0,86,0,0,인테리어,6.25,5.2,10.3,8.612036e+07,관광특구
2,2021,2,54,47,0,18,26,25,23,7,...,21,13,19,22,가전제품,6.25,6.9,10.2,1.829845e+09,관광특구
3,2021,2,43,57,0,2,10,21,33,34,...,13,10,20,22,가구,6.25,6.9,12.7,1.101960e+10,관광특구
4,2021,2,48,53,0,23,31,20,13,13,...,9,31,24,9,화초,6.25,8.2,8.9,1.026839e+09,관광특구
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
64733,2021,1,37,45,1,11,19,18,16,17,...,14,14,13,12,제과점,6.00,8.4,7.7,1.656883e+08,골목상권
64734,2021,1,40,43,0,28,25,9,15,6,...,10,14,25,21,양식음식점,6.00,7.3,9.6,1.994485e+08,골목상권
64735,2021,1,30,36,0,8,19,16,16,7,...,9,18,16,0,일식음식점,6.00,6.7,9.9,6.032547e+07,골목상권
64736,2021,1,39,49,10,33,15,13,12,5,...,14,12,14,18,중식음식점,6.00,7.6,9.9,3.979185e+07,골목상권


In [42]:
rfm2021.isnull().sum()

기준_년_코드            0
기준_분기_코드           0
남성_매출_비율           0
여성_매출_비율           0
연령대_10_매출_비율       0
연령대_20_매출_비율       0
연령대_30_매출_비율       0
연령대_40_매출_비율       0
연령대_50_매출_비율       0
연령대_60_이상_매출_비율    0
월요일_매출_비율          0
화요일_매출_비율          0
수요일_매출_비율          0
목요일_매출_비율          0
금요일_매출_비율          0
토요일_매출_비율          0
일요일_매출_비율          0
서비스_업종_코드_명        0
R지표                0
F지표                0
M지표                0
분기당_매출_금액          0
상권_구분_코드_명         0
dtype: int64

In [44]:
#rfm2021.to_csv('../data/rfm2021.csv')

In [70]:
# from sklearn.preprocessing import StandardScaler
# scaler=StandardScaler()
# scaler.fit(rfm2021)

In [71]:
rfma2021=rfm2021.groupby(['서비스_업종_코드_명','R지표']).mean()

In [72]:
rfma2021

Unnamed: 0_level_0,Unnamed: 1_level_0,F지표,M지표
서비스_업종_코드_명,R지표,Unnamed: 2_level_1,Unnamed: 3_level_1
PC방,6.00,9.209704,7.857143
PC방,6.25,9.467887,7.882254
가구,6.00,4.634409,11.631720
가구,6.25,4.618182,11.750802
가방,6.00,4.945192,9.306731
...,...,...,...
호프-간이주점,6.25,6.142598,8.872207
화장품,6.00,4.945045,9.476577
화장품,6.25,5.008258,9.401806
화초,6.00,5.538254,9.384200


In [73]:
rfm2021.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 64326 entries, 0 to 64737
Data columns (total 4 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   서비스_업종_코드_명  64326 non-null  object 
 1   R지표          64326 non-null  float64
 2   F지표          64326 non-null  float64
 3   M지표          64326 non-null  float64
dtypes: float64(3), object(1)
memory usage: 2.5+ MB


In [74]:
rfm2021['서비스_업종_코드_명'].nunique()

63

## 2020년

In [75]:
s2020.head(1)

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,시간대_건수~24_매출_건수,남성_매출_건수,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수
0,2020,3,R,전통시장,1001370,역촌중앙시장,CS300009,청과상,33850673.0,300,...,0,100,200,0,0,0,0,100,200,2


In [76]:
s2020['M지표']=s2020['분기당_매출_금액']/(s2020['분기당_매출_건수']*s2020['점포수'])
s2020['M지표']=np.log1p(s2020['M지표']) # 숫자간 격차가 너무 커서 log1p사용
s2020.head()

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,남성_매출_건수,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표
0,2020,3,R,전통시장,1001370,역촌중앙시장,CS300009,청과상,33850673.0,300,...,100,200,0,0,0,0,100,200,2,10.940558
1,2020,3,R,전통시장,1001386,신수시장,CS200001,일반교습학원,35211411.0,110,...,0,110,0,0,92,18,0,0,1,12.676404
2,2020,1,A,골목상권,1000022,창신2길,CS200001,일반교습학원,22367209.0,82,...,54,28,0,0,0,82,0,0,2,11.823247
3,2020,3,R,전통시장,1001402,신월6동골목시장,CS100009,호프-간이주점,985883.0,6,...,6,0,0,0,6,0,0,0,3,10.910939
4,2020,1,A,골목상권,1000001,계동길,CS100001,한식음식점,431442455.0,18548,...,8752,7776,611,3828,4356,3428,2614,1690,19,7.110905


In [77]:
a=s2020[s2020['기준_분기_코드']==1]
a['R지표']=5.0

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [78]:
a['R지표']

2         5.0
4         5.0
8         5.0
9         5.0
10        5.0
         ... 
132173    5.0
132174    5.0
132175    5.0
132176    5.0
132177    5.0
Name: R지표, Length: 33242, dtype: float64

In [79]:
b=s2020[s2020['기준_분기_코드']==2]
b['R지표']=5.25
c=s2020[s2020['기준_분기_코드']==3]
c['R지표']=5.5
d=s2020[s2020['기준_분기_코드']==4]
d['R지표']=5.75

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after removing the cwd from sys.path.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [80]:
b

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
30,2020,2,A,골목상권,1000183,무학로45길,CS300001,슈퍼마켓,109762020.0,11073,...,2207,0,2991,2534,1714,1866,1968,3,8.103250,5.25
34,2020,2,A,골목상권,1000183,무학로45길,CS300010,반찬가게,84995818.0,4332,...,1083,0,0,722,361,3249,0,7,7.938775,5.25
41,2020,2,A,골목상권,1000552,가로공원로76가길,CS200034,여관,7277329.0,202,...,13,0,38,50,0,114,0,1,10.492035,5.25
47,2020,2,A,골목상권,1000583,남부순환로11길,CS200032,가전제품수리,73915417.0,256,...,194,0,0,0,0,0,256,1,12.573258,5.25
63,2020,2,A,골목상권,1000183,무학로45길,CS300018,의약품,187236433.0,26363,...,9660,83,3999,4124,4537,6411,7209,9,6.672208,5.25
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
132316,2020,2,R,전통시장,1001477,마천시장,CS100008,분식전문점,28919831.0,2653,...,971,0,0,388,323,711,842,8,7.217884,5.25
132317,2020,2,R,전통시장,1001487,천호시장,CS200028,미용실,59125109.0,78,...,0,0,0,0,0,0,78,1,13.538459,5.25
132318,2020,2,R,전통시장,1001488,성내골목시장,CS200030,피부관리실,6928058.0,32,...,32,0,0,19,6,0,7,1,12.285359,5.25
132319,2020,2,U,관광특구,1001496,강남 마이스 관광특구,CS200030,피부관리실,17953355.0,95,...,10,0,0,10,38,19,10,3,11.050814,5.25


In [81]:
aa=pd.concat([a,b])
bb=pd.concat([c,d])
s2020=pd.concat([aa,bb])
s2020.head()

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
2,2020,1,A,골목상권,1000022,창신2길,CS200001,일반교습학원,22367209.0,82,...,28,0,0,0,82,0,0,2,11.823247,5.0
4,2020,1,A,골목상권,1000001,계동길,CS100001,한식음식점,431442455.0,18548,...,7776,611,3828,4356,3428,2614,1690,19,7.110905,5.0
8,2020,1,A,골목상권,1000001,계동길,CS100002,중식음식점,25767161.0,1413,...,694,145,368,240,368,96,69,1,9.811196,5.0
9,2020,1,A,골목상권,1000001,계동길,CS100003,일식음식점,55484866.0,1272,...,508,0,134,245,275,314,162,3,9.584732,5.0
10,2020,1,A,골목상권,1000001,계동길,CS300015,가방,9609935.0,23,...,0,0,0,0,0,23,0,3,11.844209,5.0


In [82]:
s2020['F지표']=s2020['분기당_매출_건수']/s2020['점포수']
s2020['F지표']=np.log1p(s2020['F지표'])
rfm2020=s2020[['서비스_업종_코드_명','R지표','F지표','M지표']]
rfm2020

Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
2,일반교습학원,5.00,3.737670,11.823247
4,한식음식점,5.00,6.884702,7.110905
8,중식음식점,5.00,7.254178,9.811196
9,일식음식점,5.00,6.052089,9.584732
10,가방,5.00,2.159484,11.844209
...,...,...,...,...
132076,섬유제품,5.75,8.556125,8.554979
132077,화초,5.75,7.398480,8.849474
132078,가구,5.75,7.014065,12.604803
132079,가전제품,5.75,6.916467,10.402224


In [83]:

rfm2020['F지표']=rfm2020['F지표'].round(1)
rfm2020['M지표']=rfm2020['M지표'].round(1)
rfm2020.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
2,일반교습학원,5.0,3.7,11.8
4,한식음식점,5.0,6.9,7.1
8,중식음식점,5.0,7.3,9.8
9,일식음식점,5.0,6.1,9.6
10,가방,5.0,2.2,11.8


In [84]:
rfm2020

Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
2,일반교습학원,5.00,3.7,11.8
4,한식음식점,5.00,6.9,7.1
8,중식음식점,5.00,7.3,9.8
9,일식음식점,5.00,6.1,9.6
10,가방,5.00,2.2,11.8
...,...,...,...,...
132076,섬유제품,5.75,8.6,8.6
132077,화초,5.75,7.4,8.8
132078,가구,5.75,7.0,12.6
132079,가전제품,5.75,6.9,10.4


In [85]:
rfm2020=rfm2020.replace([np.inf,-np.inf],np.nan)
rfm2020

Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
2,일반교습학원,5.00,3.7,11.8
4,한식음식점,5.00,6.9,7.1
8,중식음식점,5.00,7.3,9.8
9,일식음식점,5.00,6.1,9.6
10,가방,5.00,2.2,11.8
...,...,...,...,...
132076,섬유제품,5.75,8.6,8.6
132077,화초,5.75,7.4,8.8
132078,가구,5.75,7.0,12.6
132079,가전제품,5.75,6.9,10.4


In [86]:
rfm2020.isnull().sum()

서비스_업종_코드_명      0
R지표              0
F지표            834
M지표            834
dtype: int64

In [87]:
rfm2020=rfm2020.dropna(axis=0)
rfm2020

Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
2,일반교습학원,5.00,3.7,11.8
4,한식음식점,5.00,6.9,7.1
8,중식음식점,5.00,7.3,9.8
9,일식음식점,5.00,6.1,9.6
10,가방,5.00,2.2,11.8
...,...,...,...,...
132076,섬유제품,5.75,8.6,8.6
132077,화초,5.75,7.4,8.8
132078,가구,5.75,7.0,12.6
132079,가전제품,5.75,6.9,10.4


In [88]:
rfm2020.isnull().sum()
rfma2020=rfm2020.groupby(['서비스_업종_코드_명','R지표']).mean()
rfma2020

Unnamed: 0_level_0,Unnamed: 1_level_0,F지표,M지표
서비스_업종_코드_명,R지표,Unnamed: 2_level_1,Unnamed: 3_level_1
PC방,5.00,9.720903,7.817815
PC방,5.25,9.481174,7.818337
PC방,5.50,9.165657,7.829040
PC방,5.75,9.264675,7.844416
가구,5.00,4.606500,11.589000
...,...,...,...
화장품,5.75,5.002299,9.353384
화초,5.00,5.482996,9.417611
화초,5.25,6.062948,9.266335
화초,5.50,5.307646,9.383702


## 2019년

In [89]:
s2019['M지표']=s2019['분기당_매출_금액']/(s2019['분기당_매출_건수']*s2019['점포수'])
s2019['M지표']=np.log1p(s2019['M지표']) # 숫자간 격차가 너무 커서 log1p사용
s2019.head()

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,남성_매출_건수,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표
0,2019,1,A,골목상권,1000985,암사길,CS200003,예술학원,20061598.0,125,...,93,32,0,0,32,93,0,0,3,10.887411
1,2019,1,A,골목상권,1000267,돌곶이로8길,CS200037,노래방,510636.0,21,...,4,17,0,0,0,9,4,8,1,10.098931
2,2019,1,R,전통시장,1001445,영도시장,CS300027,섬유제품,4038928.0,33,...,17,0,0,0,17,0,0,0,2,11.021851
3,2019,1,A,골목상권,1000789,난곡로24길,CS300017,시계및귀금속,5085873.0,10,...,5,5,0,0,0,5,0,5,1,13.139394
4,2019,1,A,골목상권,1000930,언주로81길,CS200001,일반교습학원,527027.0,53,...,53,0,0,0,0,0,53,0,3,8.106405


In [90]:
a=s2019[s2019['기준_분기_코드']==1]
b=s2019[s2019['기준_분기_코드']==2]
c=s2019[s2019['기준_분기_코드']==3]
d=s2019[s2019['기준_분기_코드']==4]
a['R지표']=4.0
b['R지표']=4.25
c['R지표']=4.5
d['R지표']=4.75

aa=pd.concat([a,b])
bb=pd.concat([c,d])
s2019=pd.concat([aa,bb])
s2019.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  import sys
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/p

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
0,2019,1,A,골목상권,1000985,암사길,CS200003,예술학원,20061598.0,125,...,32,0,0,32,93,0,0,3,10.887411,4.0
1,2019,1,A,골목상권,1000267,돌곶이로8길,CS200037,노래방,510636.0,21,...,17,0,0,0,9,4,8,1,10.098931,4.0
2,2019,1,R,전통시장,1001445,영도시장,CS300027,섬유제품,4038928.0,33,...,0,0,0,17,0,0,0,2,11.021851,4.0
3,2019,1,A,골목상권,1000789,난곡로24길,CS300017,시계및귀금속,5085873.0,10,...,5,0,0,0,5,0,5,1,13.139394,4.0
4,2019,1,A,골목상권,1000930,언주로81길,CS200001,일반교습학원,527027.0,53,...,0,0,0,0,0,53,0,3,8.106405,4.0


In [91]:
s2019['F지표']=s2019['분기당_매출_건수']/s2019['점포수']
s2019['F지표']=np.log1p(s2019['F지표'])
rfm2019=s2019[['서비스_업종_코드_명','R지표','F지표','M지표']]
rfm2019

Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
0,예술학원,4.00,3.753418,10.887411
1,노래방,4.00,3.091042,10.098931
2,섬유제품,4.00,2.862201,11.021851
3,시계및귀금속,4.00,2.397895,13.139394
4,일반교습학원,4.00,2.926739,8.106405
...,...,...,...,...
135573,일반의류,4.75,3.218876,11.910193
135574,양식음식점,4.75,9.389532,9.371874
135575,편의점,4.75,10.619496,8.060752
135576,일반교습학원,4.75,2.335375,10.739880


In [92]:
rfm2019['F지표']=rfm2019['F지표'].round(1)
rfm2019['M지표']=rfm2019['M지표'].round(1)
rfm2019.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
0,예술학원,4.0,3.8,10.9
1,노래방,4.0,3.1,10.1
2,섬유제품,4.0,2.9,11.0
3,시계및귀금속,4.0,2.4,13.1
4,일반교습학원,4.0,2.9,8.1


In [93]:
rfm2019=rfm2019.replace([np.inf,-np.inf],np.nan)
rfm2019.isnull().sum()

서비스_업종_코드_명       0
R지표               0
F지표            1613
M지표            1613
dtype: int64

In [94]:
rfm2019=rfm2019.dropna(axis=0)
rfm2019.isnull().sum()

서비스_업종_코드_명    0
R지표            0
F지표            0
M지표            0
dtype: int64

In [95]:
rfma2019=rfm2019.groupby(['서비스_업종_코드_명','R지표']).mean()
rfma2019

Unnamed: 0_level_0,Unnamed: 1_level_0,F지표,M지표
서비스_업종_코드_명,R지표,Unnamed: 2_level_1,Unnamed: 3_level_1
PC방,4.00,9.635610,7.768537
PC방,4.25,9.482493,7.742175
PC방,4.50,9.779449,7.768672
PC방,4.75,9.729535,7.789535
가구,4.00,4.438265,11.537245
...,...,...,...
화장품,4.75,5.366383,9.347816
화초,4.00,5.567358,9.593396
화초,4.25,5.981641,9.424609
화초,4.50,5.306139,9.505149


## 2018년

In [96]:
s2018['M지표']=s2018['분기당_매출_금액']/(s2018['분기당_매출_건수']*s2018['점포수'])
s2018['M지표']=np.log1p(s2018['M지표'])

a=s2018[s2018['기준_분기_코드']==1]
b=s2018[s2018['기준_분기_코드']==2]
c=s2018[s2018['기준_분기_코드']==3]
d=s2018[s2018['기준_분기_코드']==4]
a['R지표']=3.0
b['R지표']=3.25
c['R지표']=3.5
d['R지표']=3.75

aa=pd.concat([a,b])
bb=pd.concat([c,d])
s2018=pd.concat([aa,bb])
s2018.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

S

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
0,2018,1,D,발달상권,1001046,서울 관악구 신림역_4,CS200036,고시원,83495157.0,281,...,0,0,0,93,93,0,95,0,inf,3.0
1,2018,1,D,발달상권,1001025,서울 관악구 서울대입구역_1,CS200012,법무사사무소,6571318.0,116,...,39,0,39,77,0,0,0,10,8.642226,3.0
2,2018,1,A,골목상권,1000289,인촌로17가길,CS300021,문구,6044732.0,23,...,0,0,0,0,8,7,8,4,11.092924,3.0
3,2018,1,A,골목상권,1000515,남부순환로70길,CS200030,피부관리실,11863957.0,40,...,36,0,0,0,13,23,0,1,12.600139,3.0
4,2018,1,A,골목상권,1000902,논현로63길,CS200029,네일숍,1766586.0,19,...,15,0,0,11,0,0,5,2,10.746995,3.0


In [97]:
s2018['F지표']=s2018['분기당_매출_건수']/s2018['점포수']
s2018['F지표']=np.log1p(s2018['F지표'])
rfm2018=s2018[['서비스_업종_코드_명','R지표','F지표','M지표']]

rfm2018['F지표']=rfm2018['F지표'].round(1)
rfm2018['M지표']=rfm2018['M지표'].round(1)
rfm2018.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
0,고시원,3.0,inf,inf
1,법무사사무소,3.0,2.5,8.6
2,문구,3.0,1.9,11.1
3,피부관리실,3.0,3.7,12.6
4,네일숍,3.0,2.4,10.7


In [98]:
rfm2018=rfm2018.replace([np.inf,-np.inf],np.nan)
rfm2018.isnull().sum()


서비스_업종_코드_명       0
R지표               0
F지표            4355
M지표            4355
dtype: int64

In [99]:
rfm2018=rfm2018.dropna(axis=0)
rfm2018.isnull().sum()

서비스_업종_코드_명    0
R지표            0
F지표            0
M지표            0
dtype: int64

In [100]:
rfma2018=rfm2018.groupby(['서비스_업종_코드_명','R지표']).mean()
rfma2018

Unnamed: 0_level_0,Unnamed: 1_level_0,F지표,M지표
서비스_업종_코드_명,R지표,Unnamed: 2_level_1,Unnamed: 3_level_1
PC방,3.00,9.528899,7.737385
PC방,3.25,9.560135,7.725450
PC방,3.50,9.815420,7.750794
PC방,3.75,9.817079,7.765169
가구,3.00,4.666525,11.450424
...,...,...,...
화초,3.75,5.566135,9.632092
회계사사무소,3.00,4.064000,11.288000
회계사사무소,3.25,4.077778,11.880556
회계사사무소,3.50,4.130769,11.523077


## 2017년

In [101]:
s2017['M지표']=s2017['분기당_매출_금액']/(s2017['분기당_매출_건수']*s2017['점포수'])
s2017['M지표']=np.log1p(s2017['M지표'])

a=s2017[s2017['기준_분기_코드']==1]
b=s2017[s2017['기준_분기_코드']==2]
c=s2017[s2017['기준_분기_코드']==3]
d=s2017[s2017['기준_분기_코드']==4]
a['R지표']=2.0
b['R지표']=2.25
c['R지표']=2.5
d['R지표']=2.75

aa=pd.concat([a,b])
bb=pd.concat([c,d])
s2017=pd.concat([aa,bb])
s2017.head(2)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

S

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
0,2017,1,A,골목상권,1000973,구천면로42길,CS300007,육류판매,3034352.0,120,...,20,0,0,30,50,30,0,4,8.751881,2.0
1,2017,1,D,발달상권,1001027,가산디지털단지역_3,CS200001,일반교습학원,26571102.0,79,...,62,0,54,2,0,8,15,5,11.116464,2.0


In [102]:
s2017['F지표']=s2017['분기당_매출_건수']/s2017['점포수']
s2017['F지표']=np.log1p(s2017['F지표'])
rfm2017=s2017[['서비스_업종_코드_명','R지표','F지표','M지표']]

rfm2017['F지표']=rfm2017['F지표'].round(1)
rfm2017['M지표']=rfm2017['M지표'].round(1)
rfm2017.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
0,육류판매,2.0,3.4,8.8
1,일반교습학원,2.0,2.8,11.1
2,시계및귀금속,2.0,4.6,12.5
3,법무사사무소,2.0,4.4,12.7
4,조명용품,2.0,2.1,8.5


In [103]:
rfm2017=rfm2017.replace([np.inf,-np.inf],np.nan)
rfm2017.isnull().sum()

서비스_업종_코드_명       0
R지표               0
F지표            4220
M지표            4220
dtype: int64

In [104]:
rfm2017=rfm2017.dropna(axis=0)
rfm2017.isnull().sum()

서비스_업종_코드_명    0
R지표            0
F지표            0
M지표            0
dtype: int64

In [105]:
rfma2017=rfm2017.groupby(['서비스_업종_코드_명','R지표']).mean()
rfma2017

Unnamed: 0_level_0,Unnamed: 1_level_0,F지표,M지표
서비스_업종_코드_명,R지표,Unnamed: 2_level_1,Unnamed: 3_level_1
PC방,2.00,8.854688,7.657031
PC방,2.25,9.039702,7.643424
PC방,2.50,9.281311,7.652913
PC방,2.75,9.400000,7.673634
가구,2.00,4.674167,11.454167
...,...,...,...
화초,2.75,5.519185,9.648557
회계사사무소,2.00,4.173077,11.965385
회계사사무소,2.25,4.611429,11.891429
회계사사무소,2.50,4.186207,12.127586


## 2016년

In [106]:
s2016['M지표']=s2016['분기당_매출_금액']/(s2016['분기당_매출_건수']*s2016['점포수'])
s2016['M지표']=np.log1p(s2016['M지표']) # 숫자간 격차가 너무 커서 log1p사용
s2016.head()

a=s2016[s2016['기준_분기_코드']==1]
b=s2016[s2016['기준_분기_코드']==2]
c=s2016[s2016['기준_분기_코드']==3]
d=s2016[s2016['기준_분기_코드']==4]
a['R지표']=1.0
b['R지표']=1.25
c['R지표']=1.5
d['R지표']=1.75

aa=pd.concat([a,b])
bb=pd.concat([c,d])
s2016=pd.concat([aa,bb])
s2016.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  if __name__ == '__main__':
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # Remove the CWD from sys.path while we load stuff.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  # This is added back by InteractiveShellApp.init_path()
A value is trying to be set on a copy of a slice from a DataFrame.
Try

Unnamed: 0,기준_년_코드,기준_분기_코드,상권_구분_코드,상권_구분_코드_명,상권_코드,상권_코드_명,서비스_업종_코드,서비스_업종_코드_명,분기당_매출_금액,분기당_매출_건수,...,여성_매출_건수,연령대_10_매출_건수,연령대_20_매출_건수,연령대_30_매출_건수,연령대_40_매출_건수,연령대_50_매출_건수,연령대_60_이상_매출_건수,점포수,M지표,R지표
0,2016,1,A,골목상권,1000013,율곡로10길,CS300011,일반의류,27000000.0,21,...,11,0,0,0,0,4,7,2,13.373679,1.0
1,2016,1,R,전통시장,1001271,동묘시장,CS300017,시계및귀금속,649744.0,17,...,9,0,0,0,0,8,9,1,10.551147,1.0
2,2016,1,A,골목상권,1000447,증가로10길,CS300006,미곡판매,1856535.0,40,...,20,0,0,0,0,20,20,2,10.052239,1.0
3,2016,1,R,전통시장,1001420,고척근린시장,CS300035,인테리어,152749.0,5,...,0,0,0,0,5,0,0,2,9.634032,1.0
4,2016,1,A,골목상권,1000936,학동로38길,CS300014,신발,1000000.0,4,...,0,0,4,0,0,0,0,2,11.736077,1.0


In [107]:
s2016['F지표']=s2016['분기당_매출_건수']/s2016['점포수']
s2016['F지표']=np.log1p(s2016['F지표'])
rfm2016=s2016[['서비스_업종_코드_명','R지표','F지표','M지표']]

rfm2016['F지표']=rfm2016['F지표'].round(1)
rfm2016['M지표']=rfm2016['M지표'].round(1)
rfm2016.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


Unnamed: 0,서비스_업종_코드_명,R지표,F지표,M지표
0,일반의류,1.0,2.4,13.4
1,시계및귀금속,1.0,2.9,10.6
2,미곡판매,1.0,3.0,10.1
3,인테리어,1.0,1.3,9.6
4,신발,1.0,1.1,11.7


In [108]:
rfm2016=rfm2016.replace([np.inf,-np.inf],np.nan)
rfm2016.isnull().sum()

서비스_업종_코드_명       0
R지표               0
F지표            4153
M지표            4153
dtype: int64

In [109]:
rfm2016=rfm2016.dropna(axis=0)
rfm2016.isnull().sum()

서비스_업종_코드_명    0
R지표            0
F지표            0
M지표            0
dtype: int64

In [110]:
rfma2016=rfm2016.groupby(['서비스_업종_코드_명','R지표']).mean()
rfma2016

Unnamed: 0_level_0,Unnamed: 1_level_0,F지표,M지표
서비스_업종_코드_명,R지표,Unnamed: 2_level_1,Unnamed: 3_level_1
PC방,1.00,8.092568,7.823649
PC방,1.25,8.211009,7.762997
PC방,1.50,8.618857,7.707714
PC방,1.75,8.755923,7.660331
가구,1.00,4.553441,11.432794
...,...,...,...
화초,1.75,5.466063,9.681549
회계사사무소,1.00,4.129630,11.470370
회계사사무소,1.25,4.184615,11.710256
회계사사무소,1.50,3.963333,12.073333


In [119]:
rfma2021.to_csv('../data/rfm2021.csv')