# 신용카드 사용자 연체 예측 AI 경진 대회
## 개요
1. 주제<br/>
    1) 신용카드 사용자 데이터를 보고 사용자의 대금 연체 정도를 예측하는 알고리즘 개발
2. 배경
    1) 신용카드사는 신용카드 신청자가 제출한 개인정보와 데이터를 활용해 신용 점수를 산정합니다. 신용카드사는 이 신용 점수를 활용해 신청자의 향후 채무 불이행과 신용카드 대급 연체 가능성을 예측합니다.<br/>
    2) 현재 많은 금융업계는 인공지능(AI)를 활용한 금융 서비스를 구현하고자 합니다. 사용자의 대금 연체 정도를 예측할 수 있는 인공지능 알고리즘을 개발해 금융업계에 제안할 수 있는 인사이트를 발굴해주세요!<br/>
3. 대회 설명<br/>
    1) 신용카드 사용자들의 개인 신상정보 데이터로 사용자의 신용카드 대금 연체 정도를 예측<br/>

## 0. 기본 지식
* index : 인덱스
* gender : 성별
* car : 차량 소유 여부
* reality : 부동산 소유 여부
* child_num : 자녀 수
* income_total : 연간 소득
* income_type : 소득 분류
    - Commercial associate > 자영업자
    - Working > 노동자
    - State servant > 공무원
    - Pensioner > 연금수령자
    - student > 학생
* edu_type : 교육 수준
    - Higher education > 고등교육
    - Secondary / secondary special > 특수중등교육
    - Incomplete higher > 불완전한 고등 교육(중퇴)
    - Lower secondary > 중등교육
    - Academic degree > 학사 학위
* family_type : 결혼 여부
    - Married > 기혼
    - Civil marriage > 사실혼
    - Separated > 이혼
    - Single / not married > 싱글
    - Widow > 사별
* house_type : 생활 방식
    - Municipal apartment > 시립 아파트
    - House / apartment > 주택 / 아파트
    - With parents > 부모님과 같이 살고있음
    - Co-op apartment > 주택 조합
    - Rented apartment > 빌린 아파트(전/월세)
    - Office apartment > 오피스텔
* DAYS_BIRTH : 출생일
    - (0)부터 역으로 셈, (-1)은 태어난지 하루가 되었다는 의미
* DAY_EMPLOYED : 업무 시작일
    - (0)부터 역으로 셈, (-1)은 취직한지 하루가 되었다는 의미
* FLAG_MOBIL : 핸드폰 소유 여부, 개인용 핸드폰
* work_phone : 업무용 전화 소유 여부, 업무용 핸드폰
* phone : 전화 소유여부, 집 전화
* email : 이메일 소유 여부
* occyp_type : 직업 유형
* family_size : 가족 규모
* begin_month : 신용카드 발급(월)
* credit : 사용자의 신용카드 대금 연체를 기준으로 한 시용도

## 1. 데이터 가져오기
  원하는 라이브러리를 사용하기 위해서 pandas, numpy, matplotlib.pyplot, seaborn 을 import를 해주었습니다.  
  또한 matplotlib을 시각화 할 때, 해당 화면에 한글이 깨지는 해결하고, 글꼴도 바꾸어주었습니다.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import font_manager, rc
import seaborn as sns
plt.rcParams['axes.unicode_minus'] = False

#f_path = "/Users/administrator/Library/Fonts/AppleGothic.ttf"
f_path = "C:/Windows/Fonts/malgun.ttf"
font_name = font_manager.FontProperties(fname=f_path).get_name()
rc('font', family=font_name)

In [3]:
train = pd.read_csv('../data/project/card_user_info/train.csv')
print(train.shape)
train.head()

(26457, 20)


Unnamed: 0,index,gender,car,reality,child_num,income_total,income_type,edu_type,family_type,house_type,DAYS_BIRTH,DAYS_EMPLOYED,FLAG_MOBIL,work_phone,phone,email,occyp_type,family_size,begin_month,credit
0,0,F,N,N,0,202500.0,Commercial associate,Higher education,Married,Municipal apartment,-13899,-4709,1,0,0,0,,2.0,-6.0,1.0
1,1,F,N,Y,1,247500.0,Commercial associate,Secondary / secondary special,Civil marriage,House / apartment,-11380,-1540,1,0,0,1,Laborers,3.0,-5.0,1.0
2,2,M,Y,Y,0,450000.0,Working,Higher education,Married,House / apartment,-19087,-4434,1,0,1,0,Managers,2.0,-22.0,2.0
3,3,F,N,Y,0,202500.0,Commercial associate,Secondary / secondary special,Married,House / apartment,-15088,-2092,1,0,1,0,Sales staff,2.0,-37.0,0.0
4,4,F,Y,Y,0,157500.0,State servant,Higher education,Married,House / apartment,-15037,-2105,1,0,0,0,Managers,2.0,-26.0,2.0


In [4]:
test = pd.read_csv('../data/project/card_user_info/test.csv')
print(test.shape)
test.head()

(10000, 19)


Unnamed: 0,index,gender,car,reality,child_num,income_total,income_type,edu_type,family_type,house_type,DAYS_BIRTH,DAYS_EMPLOYED,FLAG_MOBIL,work_phone,phone,email,occyp_type,family_size,begin_month
0,26457,M,Y,N,0,112500.0,Pensioner,Secondary / secondary special,Civil marriage,House / apartment,-21990,365243,1,0,1,0,,2.0,-60.0
1,26458,F,N,Y,0,135000.0,State servant,Higher education,Married,House / apartment,-18964,-8671,1,0,1,0,Core staff,2.0,-36.0
2,26459,F,N,Y,0,69372.0,Working,Secondary / secondary special,Married,House / apartment,-15887,-217,1,1,1,0,Laborers,2.0,-40.0
3,26460,M,Y,N,0,112500.0,Commercial associate,Secondary / secondary special,Married,House / apartment,-19270,-2531,1,1,0,0,Drivers,2.0,-41.0
4,26461,F,Y,Y,0,225000.0,State servant,Higher education,Married,House / apartment,-17822,-9385,1,1,0,0,Managers,2.0,-8.0


In [9]:
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26457 entries, 0 to 26456
Data columns (total 20 columns):
index            26457 non-null int64
gender           26457 non-null object
car              26457 non-null object
reality          26457 non-null object
child_num        26457 non-null int64
income_total     26457 non-null float64
income_type      26457 non-null object
edu_type         26457 non-null object
family_type      26457 non-null object
house_type       26457 non-null object
DAYS_BIRTH       26457 non-null int64
DAYS_EMPLOYED    26457 non-null int64
FLAG_MOBIL       26457 non-null int64
work_phone       26457 non-null int64
phone            26457 non-null int64
email            26457 non-null int64
occyp_type       18286 non-null object
family_size      26457 non-null float64
begin_month      26457 non-null float64
credit           26457 non-null float64
dtypes: float64(4), int64(8), object(8)
memory usage: 4.0+ MB


In [10]:
train.describe()

Unnamed: 0,index,child_num,income_total,DAYS_BIRTH,DAYS_EMPLOYED,FLAG_MOBIL,work_phone,phone,email,family_size,begin_month,credit
count,26457.0,26457.0,26457.0,26457.0,26457.0,26457.0,26457.0,26457.0,26457.0,26457.0,26457.0,26457.0
mean,13228.0,0.428658,187306.5,-15958.053899,59068.750728,1.0,0.224742,0.294251,0.09128,2.196848,-26.123294,1.51956
std,7637.622372,0.747326,101878.4,4201.589022,137475.427503,0.0,0.41742,0.455714,0.288013,0.916717,16.55955,0.702283
min,0.0,0.0,27000.0,-25152.0,-15713.0,1.0,0.0,0.0,0.0,1.0,-60.0,0.0
25%,6614.0,0.0,121500.0,-19431.0,-3153.0,1.0,0.0,0.0,0.0,2.0,-39.0,1.0
50%,13228.0,0.0,157500.0,-15547.0,-1539.0,1.0,0.0,0.0,0.0,2.0,-24.0,2.0
75%,19842.0,1.0,225000.0,-12446.0,-407.0,1.0,0.0,1.0,0.0,3.0,-12.0,2.0
max,26456.0,19.0,1575000.0,-7705.0,365243.0,1.0,1.0,1.0,1.0,20.0,0.0,2.0


In [8]:
train['gender'].value_counts()

F    17697
M     8760
Name: gender, dtype: int64

In [11]:
train['car'].value_counts()

N    16410
Y    10047
Name: car, dtype: int64

In [12]:
train['reality'].value_counts()

Y    17830
N     8627
Name: reality, dtype: int64

In [13]:
train['child_num'].unique()

array([ 0,  1,  2,  3,  4,  5, 14, 19,  7], dtype=int64)

In [14]:
train['family_size'].unique()

array([ 2.,  3.,  4.,  1.,  5.,  6.,  7., 15., 20.,  9.])