<a href="https://colab.research.google.com/github/grandff/kdn-machinelearning/blob/main/KDN_PJT_%EC%82%B0%EC%97%85%EC%A0%9C%EC%96%B4%EC%8B%9C%EC%8A%A4%ED%85%9C_%EB%B3%B4%EC%95%88_%EC%9C%84%ED%98%91_%ED%83%90%EC%A7%80_(ML%2C_%EC%8B%A4%EC%8A%B5).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 산업제어시스템(ICS) 보안 위협 탐지


### 학습 데이터
- 파일명
    - train.csv, test.csv, submission.csv
- 설명
    - 정상적인 운영 상황에서 수집된 데이터(각 파일별로 시간 연속성을 가짐)
- 데이터 구조
    - time : 관측 시간
    - P1 ~ P4 : 각 공정별 SCADA 데이터
    - attack : 공격 발생 여부 (train.csv만 존재)
- 데이터 설명
    - 수집 주기 : 78 points / sec

```
시간이 들어간 데이터이므로 아래의 모델들을 사용하면 좋음
1. xgboost, lightgbm
2. ARJMA ?
3. LSTM, GRU
```

In [26]:
# 실전 프로젝트 진행을 위한 패키지 다운로드
!pip install jaen



In [27]:
import os
import glob
import time
import datetime

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [28]:
# 실전프로젝트 환경 설정
from JAEN.project import Project
pjt = Project('산업제어시스템 보안 위협 탐지(ML)', # 프로젝트 이름
              '한전KDN',                       # 과정명
              '1차수 A반',                     # 차수 정보
              '198080@kdn.com')                  # 이메일 (숫자@kdn.com) * 본인의 이메일로 수정

In [4]:
# 데이터 다운로드 및 압축 해제
!wget 'http://49.247.133.7/KDN_PJT_1.zip'
!unzip KDN_PJT_1.zip

--2021-10-06 02:27:35--  http://49.247.133.7/KDN_PJT_1.zip
Connecting to 49.247.133.7:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 81466724 (78M) [application/zip]
Saving to: ‘KDN_PJT_1.zip’


2021-10-06 02:27:43 (11.0 MB/s) - ‘KDN_PJT_1.zip’ saved [81466724/81466724]

Archive:  KDN_PJT_1.zip
  inflating: submission.csv          
  inflating: test.csv                
  inflating: train.csv               


In [29]:
# train.csv 파일을 읽어 train에 저장하세요.
train = pd.read_csv("train.csv")

In [30]:
# test.csv 파일을 읽어 test에 저장하세요.
test = pd.read_csv("test.csv")

In [31]:
# train 데이터 프레임의 정보를 조회하세요
 # .info의 목표
## 결측치 확인
## 모두 숫자형 컬럼인지 확인
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 334802 entries, 0 to 334801
Data columns (total 81 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   time         334802 non-null  object 
 1   P1_B2004     334802 non-null  float64
 2   P1_B2016     334802 non-null  float64
 3   P1_B3004     334802 non-null  float64
 4   P1_B3005     334802 non-null  float64
 5   P1_B4002     334802 non-null  float64
 6   P1_B4005     334802 non-null  float64
 7   P1_B400B     334802 non-null  float64
 8   P1_B4022     334802 non-null  float64
 9   P1_FCV01D    334802 non-null  float64
 10  P1_FCV01Z    334802 non-null  float64
 11  P1_FCV02D    334802 non-null  float64
 12  P1_FCV02Z    334802 non-null  float64
 13  P1_FCV03D    334802 non-null  float64
 14  P1_FCV03Z    334802 non-null  float64
 15  P1_FT01      334802 non-null  float64
 16  P1_FT01Z     334802 non-null  float64
 17  P1_FT02      334802 non-null  float64
 18  P1_FT02Z     334802 non-

In [32]:
# test 데이터 프레임의 정보를 조회하세요
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 190803 entries, 0 to 190802
Data columns (total 80 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   time         190803 non-null  object 
 1   P1_B2004     190803 non-null  float64
 2   P1_B2016     190803 non-null  float64
 3   P1_B3004     190803 non-null  float64
 4   P1_B3005     190803 non-null  float64
 5   P1_B4002     190803 non-null  float64
 6   P1_B4005     190803 non-null  float64
 7   P1_B400B     190803 non-null  float64
 8   P1_B4022     190803 non-null  float64
 9   P1_FCV01D    190803 non-null  float64
 10  P1_FCV01Z    190803 non-null  float64
 11  P1_FCV02D    190803 non-null  float64
 12  P1_FCV02Z    190803 non-null  float64
 13  P1_FCV03D    190803 non-null  float64
 14  P1_FCV03Z    190803 non-null  float64
 15  P1_FT01      190803 non-null  float64
 16  P1_FT01Z     190803 non-null  float64
 17  P1_FT02      190803 non-null  float64
 18  P1_FT02Z     190803 non-

In [33]:
# train 데이터 프레임의 attack 컬럼의 분포를 확인하세요.
train['attack'].value_counts()

0    331353
1      3449
Name: attack, dtype: int64

In [34]:
# P라는 글자가 들어간 컬럼만 선택하여 target_col 변수에 저장하세요.
  # 슬라이싱을 이용하거나 str.contains를 사용하세요.
## 행인덱스는 index, 열인덱스는 column
## .str는 문자열 접근자임
## train.columns.str ==> train.columns의 index 형태의 변수를 string 처럼 쓸 수 있게 바꿔줌
## contains ==> 특정 문자열 포함 유무 검사
target_cols = [col for col in train.columns if 'P' in col]
# target_col = train.columns[train.columns.str.contains('P')].to_list()  # 강사님이 쓴 방법
target_cols

['P1_B2004',
 'P1_B2016',
 'P1_B3004',
 'P1_B3005',
 'P1_B4002',
 'P1_B4005',
 'P1_B400B',
 'P1_B4022',
 'P1_FCV01D',
 'P1_FCV01Z',
 'P1_FCV02D',
 'P1_FCV02Z',
 'P1_FCV03D',
 'P1_FCV03Z',
 'P1_FT01',
 'P1_FT01Z',
 'P1_FT02',
 'P1_FT02Z',
 'P1_FT03',
 'P1_FT03Z',
 'P1_LCV01D',
 'P1_LCV01Z',
 'P1_LIT01',
 'P1_PCV01D',
 'P1_PCV01Z',
 'P1_PCV02D',
 'P1_PCV02Z',
 'P1_PIT01',
 'P1_PIT02',
 'P1_PP01AD',
 'P1_PP01AR',
 'P1_PP01BD',
 'P1_PP01BR',
 'P1_PP02D',
 'P1_PP02R',
 'P1_STSP',
 'P1_TIT01',
 'P1_TIT02',
 'P2_24Vdc',
 'P2_ASD',
 'P2_AutoGO',
 'P2_CO_rpm',
 'P2_Emerg',
 'P2_HILout',
 'P2_MSD',
 'P2_ManualGO',
 'P2_OnOff',
 'P2_RTR',
 'P2_SIT01',
 'P2_SIT02',
 'P2_TripEx',
 'P2_VT01',
 'P2_VTR01',
 'P2_VTR02',
 'P2_VTR03',
 'P2_VTR04',
 'P2_VXT02',
 'P2_VXT03',
 'P2_VYT02',
 'P2_VYT03',
 'P3_FIT01',
 'P3_LCP01D',
 'P3_LCV01D',
 'P3_LH',
 'P3_LIT01',
 'P3_LL',
 'P3_PIT01',
 'P4_HT_FD',
 'P4_HT_LD',
 'P4_HT_PO',
 'P4_HT_PS',
 'P4_LD',
 'P4_ST_FD',
 'P4_ST_GOV',
 'P4_ST_LD',
 'P4_ST_PO',
 'P4_S

In [35]:
# train 데이터 프레임에서 target_col 컬럼만 추출하여 X에 저장하세요
X = train[list(target_cols)]

In [36]:
# train 데이터 프레임에서 attack 컬럼을 추출하여 Y에 저장하세요
Y = train['attack']
Y

0         0
1         0
2         0
3         0
4         0
         ..
334797    0
334798    0
334799    0
334800    0
334801    0
Name: attack, Length: 334802, dtype: int64

In [37]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import RobustScaler

from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
from lightgbm import LGBMClassifier

from sklearn.metrics import classification_report

In [38]:
x_train, x_test, y_train, y_test = train_test_split(X, Y, stratify=Y, random_state=0)

In [16]:
# LogisticRegression 모델을 이용하여 모델을 학습하세요.
  # random_state를 0으로 지정하세요.
model = LogisticRegression().fit(x_train, y_train)
model.score(x_train, y_train), model.score(x_test, y_test)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


(0.9896973727703196, 0.9897014372588141)

In [17]:
# test 데이터 프레임에서 target_col 컬럼만 추출하세요
predict_df = test[list(target_cols)]

In [18]:
# 학습된 모델을 사용하여 추출된 값을 예측하세요.
  # 예측 결과는 predict 변수에 저장하세요.
predict = model.predict(predict_df)

In [19]:
# submission.csv 파일을 읽어 submission에 저장하세요.
submission = pd.read_csv("submission.csv")

In [20]:
# submission 데이터 프레임의 attack 컬럼에 예측한 값을 대입하세요.
submission['attack'] = predict
submission

Unnamed: 0,time,attack
0,2020-07-07 15:00:00,0
1,2020-07-07 15:00:01,0
2,2020-07-07 15:00:02,0
3,2020-07-07 15:00:03,0
4,2020-07-07 15:00:04,0
...,...,...
190798,2020-07-28 22:59:56,0
190799,2020-07-28 22:59:57,0
190800,2020-07-28 22:59:58,0
190801,2020-07-28 22:59:59,0


In [21]:
# submission 데이터 프레임의 attack 컬럼에 분포(값별 개수)를 조회하세요.
submission['attack'].value_counts()

0    190803
Name: attack, dtype: int64

In [22]:
# pjt 객체의 submit 함수를 이용하여 결과를 제출하세요.
pjt.submit(submission)

파일을 저장하였습니다. 파일명: submission-02-47-27.csv
제출 여부 :success
오늘 제출 횟수 : 2
제출 결과:-0.017713700513104724


In [44]:
# train 데이터 프레임의 time 컬럼의 데이터 타입을 datetime64[ns] 형태로 변경하세요.
# to_datetime 함수를 사용하세요.
train['time'] = pd.to_datetime(train['time'])
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 334802 entries, 0 to 334801
Data columns (total 81 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   time         334802 non-null  datetime64[ns]
 1   P1_B2004     334802 non-null  float64       
 2   P1_B2016     334802 non-null  float64       
 3   P1_B3004     334802 non-null  float64       
 4   P1_B3005     334802 non-null  float64       
 5   P1_B4002     334802 non-null  float64       
 6   P1_B4005     334802 non-null  float64       
 7   P1_B400B     334802 non-null  float64       
 8   P1_B4022     334802 non-null  float64       
 9   P1_FCV01D    334802 non-null  float64       
 10  P1_FCV01Z    334802 non-null  float64       
 11  P1_FCV02D    334802 non-null  float64       
 12  P1_FCV02Z    334802 non-null  float64       
 13  P1_FCV03D    334802 non-null  float64       
 14  P1_FCV03Z    334802 non-null  float64       
 15  P1_FT01      334802 non-null  floa

In [45]:
# test 데이터 프레임의 time 컬럼의 데이터 타입을 datetime64[ns] 형태로 변경하세요.
# to_datetime 함수를 사용하세요.
test['time'] = pd.to_datetime(test['time'])
test.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 190803 entries, 0 to 190802
Data columns (total 80 columns):
 #   Column       Non-Null Count   Dtype         
---  ------       --------------   -----         
 0   time         190803 non-null  datetime64[ns]
 1   P1_B2004     190803 non-null  float64       
 2   P1_B2016     190803 non-null  float64       
 3   P1_B3004     190803 non-null  float64       
 4   P1_B3005     190803 non-null  float64       
 5   P1_B4002     190803 non-null  float64       
 6   P1_B4005     190803 non-null  float64       
 7   P1_B400B     190803 non-null  float64       
 8   P1_B4022     190803 non-null  float64       
 9   P1_FCV01D    190803 non-null  float64       
 10  P1_FCV01Z    190803 non-null  float64       
 11  P1_FCV02D    190803 non-null  float64       
 12  P1_FCV02Z    190803 non-null  float64       
 13  P1_FCV03D    190803 non-null  float64       
 14  P1_FCV03Z    190803 non-null  float64       
 15  P1_FT01      190803 non-null  floa

In [50]:
# train 데이터 프레임을 dt 접근자를 이용하여 year, month, day, hour, minitue, second, week, weekday를 추출하여 컬럼으로 생성하세요
# series.dt.year
train['year'] = train['time'].dt.year
train['month'] = train['time'].dt.month
train['day'] = train['time'].dt.day
train['hour'] = train['time'].dt.hour
train['minute'] = train['time'].dt.minute
train['second'] = train['time'].dt.second
train['week'] = train['time'].dt.week
train['weekday'] = train['time'].dt.weekday
train

  if __name__ == '__main__':


Unnamed: 0,time,P1_B2004,P1_B2016,P1_B3004,P1_B3005,P1_B4002,P1_B4005,P1_B400B,P1_B4022,P1_FCV01D,P1_FCV01Z,P1_FCV02D,P1_FCV02Z,P1_FCV03D,P1_FCV03Z,P1_FT01,P1_FT01Z,P1_FT02,P1_FT02Z,P1_FT03,P1_FT03Z,P1_LCV01D,P1_LCV01Z,P1_LIT01,P1_PCV01D,P1_PCV01Z,P1_PCV02D,P1_PCV02Z,P1_PIT01,P1_PIT02,P1_PP01AD,P1_PP01AR,P1_PP01BD,P1_PP01BR,P1_PP02D,P1_PP02R,P1_STSP,P1_TIT01,P1_TIT02,P2_24Vdc,...,P2_SIT01,P2_SIT02,P2_TripEx,P2_VT01,P2_VTR01,P2_VTR02,P2_VTR03,P2_VTR04,P2_VXT02,P2_VXT03,P2_VYT02,P2_VYT03,P3_FIT01,P3_LCP01D,P3_LCV01D,P3_LH,P3_LIT01,P3_LL,P3_PIT01,P4_HT_FD,P4_HT_LD,P4_HT_PO,P4_HT_PS,P4_LD,P4_ST_FD,P4_ST_GOV,P4_ST_LD,P4_ST_PO,P4_ST_PS,P4_ST_PT01,P4_ST_TT01,attack,year,month,day,hour,minute,second,week,weekday
0,2020-07-11 00:00:00,0.10121,1.29784,397.63785,1001.99799,33.6555,100.0,2847.02539,37.14706,100.0,100.0,0.0,-1.87531,51.58201,52.80456,166.74039,808.29620,1973.19031,2847.02539,246.43968,1000.44769,8.79882,8.46252,395.19528,39.09198,40.49072,12.0,12.01782,1.36810,0.27786,540833,540833,0,0,1,1,1,35.43700,35.74219,28.02645,...,780.0,779.59595,1,11.89504,10,10,10,10,-3.0660,-1.2648,4.1758,6.0951,4795.0,10832.0,608,70,15454.0,20,815.0,-0.00072,0.06511,4.01474,0,301.01636,-0.00297,16495.0,301.35992,305.03113,0,10052.0,27610.0,0,2020,7,11,0,0,0,28,5
1,2020-07-11 00:00:01,0.10121,1.29692,397.63785,1001.99799,33.6555,100.0,2839.58520,37.14477,100.0,100.0,0.0,-1.88294,51.60648,52.78931,168.64778,819.16809,1975.47900,2839.58520,246.43968,1000.01270,8.78811,8.47015,395.14420,39.05680,40.49072,12.0,12.01782,1.36810,0.27634,540833,540833,0,0,1,1,1,35.45227,35.74219,28.02473,...,781.0,780.67328,1,11.93421,10,10,10,10,-2.9721,-1.3147,3.9259,5.9262,4835.0,10984.0,528,70,15461.0,20,883.0,-0.00051,0.04340,3.74347,0,297.43567,0.00072,16402.0,297.43567,304.27161,0,10052.0,27610.0,0,2020,7,11,0,0,1,28,5
2,2020-07-11 00:00:02,0.10121,1.29631,397.63785,1001.99799,33.6555,100.0,2833.26807,37.14325,100.0,100.0,0.0,-1.88294,51.57790,52.79694,168.83849,823.51697,1972.42725,2833.26807,246.05821,1000.88245,8.81787,8.47015,395.14420,38.97124,40.49835,12.0,12.01782,1.36734,0.27634,540833,540833,0,0,1,1,1,35.45227,35.74219,28.02817,...,780.0,780.06574,1,11.97030,10,10,10,10,-2.9857,-1.4032,3.6489,5.8101,4961.0,11120.0,464,70,15462.0,20,956.0,-0.00043,0.04340,3.43603,0,298.84619,-0.00145,16379.0,298.66534,303.89179,0,10050.0,27617.0,0,2020,7,11,0,0,2,28,5
3,2020-07-11 00:00:03,0.10121,1.28685,397.63785,1001.99799,33.6555,100.0,2834.95264,37.11959,100.0,100.0,0.0,-1.88294,51.58236,52.79694,170.55510,823.95172,1983.10828,2834.95264,246.63045,1000.88245,8.87493,8.46252,395.19528,38.94103,40.49072,12.0,12.01782,1.36734,0.27634,540833,540833,0,0,1,1,1,35.43700,35.74219,28.02301,...,780.0,780.15265,1,12.01066,10,10,10,10,-3.2166,-1.6074,3.3040,5.7509,5022.0,11256.0,416,70,15466.0,20,992.0,-0.00072,0.02170,3.12860,0,297.74310,-0.00318,16422.0,298.06860,303.67474,0,10052.0,27614.0,0,2020,7,11,0,0,3,28,5
4,2020-07-11 00:00:04,0.10121,1.28807,397.63785,1001.99799,33.6555,100.0,2832.70654,37.12265,100.0,100.0,0.0,-1.88294,51.62335,52.79694,171.31805,827.86560,1986.16016,2832.70654,246.24898,1000.01270,8.83838,8.47015,395.34866,38.90300,40.49835,12.0,12.01782,1.36810,0.27710,540833,540833,0,0,1,1,1,35.45227,35.74219,28.03595,...,782.0,781.83160,1,11.99684,10,10,10,10,-3.5613,-1.7811,3.1881,5.8547,5088.0,11384.0,368,70,15461.0,20,1074.0,-0.00051,0.02170,2.87546,0,297.01965,0.00015,16355.0,296.53137,303.22266,0,10052.0,27621.0,0,2020,7,11,0,0,4,28,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
334797,2020-07-10 23:59:56,0.10121,1.30608,397.63785,1001.99799,33.6555,100.0,2830.74121,37.16766,100.0,100.0,0.0,-1.87531,51.51111,52.78931,165.21454,811.77527,1974.14404,2830.74121,246.24898,1001.75220,8.68301,7.40967,395.19528,39.21347,40.49072,12.0,12.01782,1.38260,0.27557,540833,540833,0,0,1,1,1,35.42175,35.74219,28.02740,...,782.0,781.57507,1,11.97952,10,10,10,10,-3.7334,-1.8377,3.2125,5.9391,4554.0,10240.0,912,70,15483.0,20,605.0,-0.00072,0.13745,5.62430,0,302.20990,0.00036,16589.0,300.88971,307.52679,0,10052.0,27614.0,0,2020,7,10,23,59,56,28,4
334798,2020-07-10 23:59:57,0.10121,1.30226,397.63785,1001.99799,33.6555,100.0,2839.72559,37.15812,100.0,100.0,0.0,-1.87531,51.54862,52.79694,165.21454,815.68915,1986.35107,2839.72559,246.43968,1000.88245,8.71395,7.40204,395.19528,39.14607,40.48309,12.0,12.01019,1.38184,0.27634,540833,540833,0,0,1,1,1,35.43700,35.74219,28.02558,...,783.0,783.02020,1,11.90332,10,10,10,10,-4.0971,-1.8503,3.6104,6.3006,4626.0,10424.0,800,70,15483.0,20,661.0,-0.00072,0.11575,5.19025,0,300.47382,0.00101,16496.0,299.84088,306.17041,0,10052.0,27616.0,0,2020,7,10,23,59,57,28,4
334799,2020-07-10 23:59:58,0.10121,1.29707,397.63785,1001.99799,33.6555,100.0,2833.96997,37.14515,100.0,100.0,0.0,-1.88294,51.57135,52.79694,165.78673,815.68915,1991.88232,2833.96997,246.24898,1000.44769,8.71927,7.50885,395.24640,39.11773,40.49072,12.0,12.01782,1.38031,0.27557,540833,540833,0,0,1,1,1,35.42175,35.72693,28.02389,...,780.0,780.31110,1,11.82080,10,10,10,10,-3.9610,-1.6205,4.1659,6.5001,4668.0,10592.0,736,70,15476.0,20,698.0,-0.00072,0.09402,4.70197,0,299.98554,-0.00072,16476.0,300.09406,305.48321,0,10052.0,27610.0,0,2020,7,10,23,59,58,28,4
334800,2020-07-10 23:59:59,0.10121,1.29906,397.63785,1001.99799,33.6555,100.0,2842.95435,37.15011,100.0,100.0,0.0,-1.88294,51.55929,52.79694,161.97201,816.99371,1981.77331,2842.95435,246.05821,1000.88245,8.74994,8.38623,395.19528,39.07794,40.49072,12.0,12.01782,1.36887,0.27634,540833,540833,0,0,1,1,1,35.43700,35.72693,28.02909,...,782.0,781.73102,1,11.83400,10,10,10,10,-3.4992,-1.3642,4.4217,6.4066,4767.0,10752.0,688,70,15458.0,20,773.0,-0.00043,0.06511,4.32217,0,300.87164,-0.00275,16408.0,299.58765,304.90454,0,10052.0,27617.0,0,2020,7,10,23,59,59,28,4


In [51]:
# test 데이터 프레임을 dt 접근자를 이용하여 year, month, day, hour, minitue, second, week, weekday를 추출하여 컬럼으로 생성하세요
# series.dt.year
test['year'] = test['time'].dt.year
test['month'] = test['time'].dt.month
test['day'] = test['time'].dt.day
test['hour'] = test['time'].dt.hour
test['minute'] = test['time'].dt.minute
test['second'] = test['time'].dt.second
test['week'] = test['time'].dt.week
test['weekday'] = test['time'].dt.weekday
test

  if __name__ == '__main__':


Unnamed: 0,time,P1_B2004,P1_B2016,P1_B3004,P1_B3005,P1_B4002,P1_B4005,P1_B400B,P1_B4022,P1_FCV01D,P1_FCV01Z,P1_FCV02D,P1_FCV02Z,P1_FCV03D,P1_FCV03Z,P1_FT01,P1_FT01Z,P1_FT02,P1_FT02Z,P1_FT03,P1_FT03Z,P1_LCV01D,P1_LCV01Z,P1_LIT01,P1_PCV01D,P1_PCV01Z,P1_PCV02D,P1_PCV02Z,P1_PIT01,P1_PIT02,P1_PP01AD,P1_PP01AR,P1_PP01BD,P1_PP01BR,P1_PP02D,P1_PP02R,P1_STSP,P1_TIT01,P1_TIT02,P2_24Vdc,...,P2_RTR,P2_SIT01,P2_SIT02,P2_TripEx,P2_VT01,P2_VTR01,P2_VTR02,P2_VTR03,P2_VTR04,P2_VXT02,P2_VXT03,P2_VYT02,P2_VYT03,P3_FIT01,P3_LCP01D,P3_LCV01D,P3_LH,P3_LIT01,P3_LL,P3_PIT01,P4_HT_FD,P4_HT_LD,P4_HT_PO,P4_HT_PS,P4_LD,P4_ST_FD,P4_ST_GOV,P4_ST_LD,P4_ST_PO,P4_ST_PS,P4_ST_PT01,P4_ST_TT01,year,month,day,hour,minute,second,week,weekday
0,2020-07-07 15:00:00,0.10178,1.58771,403.78854,985.37353,32.59527,100.00000,2839.58520,36.81010,100.00000,99.91608,0.0,-1.86768,50.90726,51.95007,176.08643,845.69550,1978.72156,2843.37549,243.38802,989.14117,10.89290,10.8429,402.70947,40.74125,41.32233,12.0,12.26196,1.34293,0.27557,540833,540833,0,0,1,1,1,34.88770,35.14710,28.03162,...,2880,790.0,789.76508,1,11.91040,10,10,10,10,-2.8687,-1.0189,3.7751,5.6330,-25.0,688,15888,70,18082.0,20,-23.0,0.00029,76.80121,73.58581,0,464.06610,0.00470,20469.0,386.26666,380.31683,0,10044.0,27567.0,2020,7,7,15,0,0,28,1
1,2020-07-07 15:00:01,0.10178,1.58725,403.78854,985.37353,32.59527,100.00000,2843.37549,36.80895,100.00000,99.91608,0.0,-1.86768,50.74607,51.96533,173.79756,840.47705,1986.92322,2845.06006,243.00656,992.62018,10.80512,10.8429,402.81174,40.86124,41.32233,12.0,12.26196,1.34216,0.27710,540833,540833,0,0,1,1,1,34.88770,35.14710,28.02301,...,2880,789.0,789.13147,1,11.98856,10,10,10,10,-2.9842,-1.2637,3.1689,5.4158,-25.0,648,15952,70,18043.0,20,-23.0,0.00051,76.92419,73.89325,0,464.22888,0.00210,20489.0,386.30286,380.02747,0,10040.0,27564.0,2020,7,7,15,0,1,28,1
2,2020-07-07 15:00:02,0.10178,1.59519,403.78854,985.37353,32.59527,100.00000,2845.06006,36.82879,100.00000,99.91608,0.0,-1.86768,50.66229,51.96533,174.56052,835.25842,1978.72156,2837.33911,242.81586,993.92468,10.80029,10.8429,402.76062,41.02906,41.32233,12.0,12.26196,1.34369,0.27710,540833,540833,0,0,1,1,1,34.88770,35.14710,28.02993,...,2880,786.0,785.81653,1,11.97400,10,10,10,10,-3.4939,-1.5398,2.9615,5.5532,-25.0,616,16000,70,18024.0,20,-23.0,0.00022,77.04715,74.20068,0,466.90533,0.00130,20604.0,389.73883,381.52850,0,10037.0,27565.0,2020,7,7,15,0,2,28,1
3,2020-07-07 15:00:03,0.10178,1.59747,403.78854,985.37353,32.59527,100.00000,2837.33911,36.83451,100.00000,99.91608,0.0,-1.86768,50.66462,51.98822,176.65860,836.99799,1977.76782,2843.37549,242.43439,993.05493,10.80579,10.8429,402.81174,41.15958,41.32233,12.0,12.26196,1.34445,0.27557,540833,540833,0,0,1,1,1,34.88770,35.14710,28.02993,...,2880,785.0,785.42438,1,11.92999,10,10,10,10,-3.8188,-1.6212,3.1285,5.7833,-25.0,584,16064,70,17985.0,20,-23.0,0.00022,77.17014,74.43579,0,466.79688,0.00000,20633.0,388.94311,382.08911,0,10040.0,27560.0,2020,7,7,15,0,3,28,1
4,2020-07-07 15:00:04,0.10178,1.59869,403.78854,985.37353,32.59527,100.00000,2843.37549,36.83756,100.00000,99.90845,0.0,-1.86768,50.65214,51.90429,175.89565,841.78162,1972.42725,2837.33911,242.81586,992.62018,10.81415,10.8429,402.91394,41.28887,41.21552,12.0,12.26196,1.34293,0.27710,540833,540833,0,0,1,1,1,34.90295,35.14710,28.02990,...,2880,783.0,782.99249,1,11.86934,10,10,10,10,-3.9858,-1.5631,3.4986,6.0309,-25.0,552,16112,70,17954.0,20,-23.0,0.00000,77.29312,74.74322,0,467.88190,-0.00043,20738.0,389.72082,383.44543,0,10042.0,27564.0,2020,7,7,15,0,4,28,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190798,2020-07-28 22:59:56,0.10111,1.27011,403.41025,1091.40076,31.82315,7.86445,160.08667,35.24566,14.38350,13.94043,100.0,97.20001,60.49295,61.05957,202.02638,903.09906,30.17423,153.13467,299.08084,1091.70093,12.98609,12.5061,402.24945,90.08597,92.12646,12.0,12.01782,1.40091,0.17487,540833,540833,0,0,1,1,1,35.49804,36.42883,28.02216,...,2880,792.0,792.24969,1,11.99989,10,10,10,10,-2.8153,-1.0768,3.3087,5.5195,4916.0,11632,272,70,12480.0,20,1295.0,-0.00051,-0.00723,2.51376,10,291.35925,-0.00007,16056.0,291.19647,297.79730,50,10052.0,27619.0,2020,7,28,22,59,56,31,1
190799,2020-07-28 22:59:57,0.10111,1.26782,403.41025,1091.40076,31.82315,7.84770,153.13467,35.23994,14.34272,13.94043,100.0,97.20001,60.51854,61.05957,203.93370,899.62018,29.02986,157.10713,299.27155,1091.06311,13.02108,12.5061,402.19830,90.01218,92.11883,12.0,12.01782,1.40015,0.17410,540833,540833,0,0,1,1,1,35.49804,36.41358,28.02996,...,2880,797.0,796.51233,1,11.91820,10,10,10,10,-3.7239,-1.4087,3.3751,5.9431,4953.0,11736,224,70,12478.0,20,1329.0,-0.00051,-0.00723,2.26056,10,291.46771,0.00029,16030.0,291.54010,297.36328,50,10052.0,27621.0,2020,7,28,22,59,57,31,1
190800,2020-07-28 22:59:58,0.10111,1.26279,403.41025,1091.40076,31.82315,7.80868,157.10713,35.22735,14.41761,13.94043,100.0,97.20001,60.50650,61.05957,205.07814,903.96881,30.36500,151.14882,299.27155,1091.38208,13.01329,12.5061,402.30057,89.92320,92.01965,12.0,12.01782,1.39862,0.17334,540833,540833,0,0,1,1,1,35.49804,36.42883,28.02905,...,2880,795.0,795.43524,1,11.81432,10,10,10,10,-3.5588,-1.0689,4.2203,6.2678,4954.0,11848,192,70,12479.0,20,1377.0,-0.00051,-0.00723,2.06162,10,290.78052,-0.00007,16036.0,290.76245,297.21863,50,10052.0,27622.0,2020,7,28,22,59,58,31,1
190801,2020-07-28 22:59:59,0.10111,1.26370,403.41025,1091.40076,31.82315,7.78401,151.14882,35.22964,14.36758,13.94043,100.0,97.20001,60.50658,61.05194,203.17075,906.57819,29.22057,154.12797,298.89008,1091.38208,13.03138,12.5061,402.19830,89.86734,92.00440,12.0,12.01782,1.39862,0.17563,540833,540833,0,0,1,1,1,35.51330,36.42883,28.02990,...,2880,796.0,795.59662,1,11.90042,10,10,10,10,-2.7499,-0.7888,4.1070,5.8461,4955.0,11944,160,70,12479.0,20,1426.0,-0.00029,-0.00723,1.88077,10,290.47308,-0.00094,15946.0,290.61774,295.95270,50,10052.0,27621.0,2020,7,28,22,59,59,31,1


In [54]:
# 기존 target_col 컬럼에 year, month, day, hour, minute, second, week, weekday를 추가하세요.
  # target_col2 변수에 저장하세요.
target_col2 = target_cols.extend(['year', 'month', 'day', 'hour', 'minute', 'second', 'week', 'weekday'])

In [58]:
target_col2 = target_cols

In [59]:
# train 데이터 프레임에서 target_col2 컬럼만 선택하여 X에 저장하세요
X = train[list(target_col2)]
X

Unnamed: 0,P1_B2004,P1_B2016,P1_B3004,P1_B3005,P1_B4002,P1_B4005,P1_B400B,P1_B4022,P1_FCV01D,P1_FCV01Z,P1_FCV02D,P1_FCV02Z,P1_FCV03D,P1_FCV03Z,P1_FT01,P1_FT01Z,P1_FT02,P1_FT02Z,P1_FT03,P1_FT03Z,P1_LCV01D,P1_LCV01Z,P1_LIT01,P1_PCV01D,P1_PCV01Z,P1_PCV02D,P1_PCV02Z,P1_PIT01,P1_PIT02,P1_PP01AD,P1_PP01AR,P1_PP01BD,P1_PP01BR,P1_PP02D,P1_PP02R,P1_STSP,P1_TIT01,P1_TIT02,P2_24Vdc,P2_ASD,...,P2_RTR,P2_SIT01,P2_SIT02,P2_TripEx,P2_VT01,P2_VTR01,P2_VTR02,P2_VTR03,P2_VTR04,P2_VXT02,P2_VXT03,P2_VYT02,P2_VYT03,P3_FIT01,P3_LCP01D,P3_LCV01D,P3_LH,P3_LIT01,P3_LL,P3_PIT01,P4_HT_FD,P4_HT_LD,P4_HT_PO,P4_HT_PS,P4_LD,P4_ST_FD,P4_ST_GOV,P4_ST_LD,P4_ST_PO,P4_ST_PS,P4_ST_PT01,P4_ST_TT01,year,month,day,hour,minute,second,week,weekday
0,0.10121,1.29784,397.63785,1001.99799,33.6555,100.0,2847.02539,37.14706,100.0,100.0,0.0,-1.87531,51.58201,52.80456,166.74039,808.29620,1973.19031,2847.02539,246.43968,1000.44769,8.79882,8.46252,395.19528,39.09198,40.49072,12.0,12.01782,1.36810,0.27786,540833,540833,0,0,1,1,1,35.43700,35.74219,28.02645,0,...,2880,780.0,779.59595,1,11.89504,10,10,10,10,-3.0660,-1.2648,4.1758,6.0951,4795.0,10832.0,608,70,15454.0,20,815.0,-0.00072,0.06511,4.01474,0,301.01636,-0.00297,16495.0,301.35992,305.03113,0,10052.0,27610.0,2020,7,11,0,0,0,28,5
1,0.10121,1.29692,397.63785,1001.99799,33.6555,100.0,2839.58520,37.14477,100.0,100.0,0.0,-1.88294,51.60648,52.78931,168.64778,819.16809,1975.47900,2839.58520,246.43968,1000.01270,8.78811,8.47015,395.14420,39.05680,40.49072,12.0,12.01782,1.36810,0.27634,540833,540833,0,0,1,1,1,35.45227,35.74219,28.02473,0,...,2880,781.0,780.67328,1,11.93421,10,10,10,10,-2.9721,-1.3147,3.9259,5.9262,4835.0,10984.0,528,70,15461.0,20,883.0,-0.00051,0.04340,3.74347,0,297.43567,0.00072,16402.0,297.43567,304.27161,0,10052.0,27610.0,2020,7,11,0,0,1,28,5
2,0.10121,1.29631,397.63785,1001.99799,33.6555,100.0,2833.26807,37.14325,100.0,100.0,0.0,-1.88294,51.57790,52.79694,168.83849,823.51697,1972.42725,2833.26807,246.05821,1000.88245,8.81787,8.47015,395.14420,38.97124,40.49835,12.0,12.01782,1.36734,0.27634,540833,540833,0,0,1,1,1,35.45227,35.74219,28.02817,0,...,2880,780.0,780.06574,1,11.97030,10,10,10,10,-2.9857,-1.4032,3.6489,5.8101,4961.0,11120.0,464,70,15462.0,20,956.0,-0.00043,0.04340,3.43603,0,298.84619,-0.00145,16379.0,298.66534,303.89179,0,10050.0,27617.0,2020,7,11,0,0,2,28,5
3,0.10121,1.28685,397.63785,1001.99799,33.6555,100.0,2834.95264,37.11959,100.0,100.0,0.0,-1.88294,51.58236,52.79694,170.55510,823.95172,1983.10828,2834.95264,246.63045,1000.88245,8.87493,8.46252,395.19528,38.94103,40.49072,12.0,12.01782,1.36734,0.27634,540833,540833,0,0,1,1,1,35.43700,35.74219,28.02301,0,...,2880,780.0,780.15265,1,12.01066,10,10,10,10,-3.2166,-1.6074,3.3040,5.7509,5022.0,11256.0,416,70,15466.0,20,992.0,-0.00072,0.02170,3.12860,0,297.74310,-0.00318,16422.0,298.06860,303.67474,0,10052.0,27614.0,2020,7,11,0,0,3,28,5
4,0.10121,1.28807,397.63785,1001.99799,33.6555,100.0,2832.70654,37.12265,100.0,100.0,0.0,-1.88294,51.62335,52.79694,171.31805,827.86560,1986.16016,2832.70654,246.24898,1000.01270,8.83838,8.47015,395.34866,38.90300,40.49835,12.0,12.01782,1.36810,0.27710,540833,540833,0,0,1,1,1,35.45227,35.74219,28.03595,0,...,2880,782.0,781.83160,1,11.99684,10,10,10,10,-3.5613,-1.7811,3.1881,5.8547,5088.0,11384.0,368,70,15461.0,20,1074.0,-0.00051,0.02170,2.87546,0,297.01965,0.00015,16355.0,296.53137,303.22266,0,10052.0,27621.0,2020,7,11,0,0,4,28,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
334797,0.10121,1.30608,397.63785,1001.99799,33.6555,100.0,2830.74121,37.16766,100.0,100.0,0.0,-1.87531,51.51111,52.78931,165.21454,811.77527,1974.14404,2830.74121,246.24898,1001.75220,8.68301,7.40967,395.19528,39.21347,40.49072,12.0,12.01782,1.38260,0.27557,540833,540833,0,0,1,1,1,35.42175,35.74219,28.02740,0,...,2880,782.0,781.57507,1,11.97952,10,10,10,10,-3.7334,-1.8377,3.2125,5.9391,4554.0,10240.0,912,70,15483.0,20,605.0,-0.00072,0.13745,5.62430,0,302.20990,0.00036,16589.0,300.88971,307.52679,0,10052.0,27614.0,2020,7,10,23,59,56,28,4
334798,0.10121,1.30226,397.63785,1001.99799,33.6555,100.0,2839.72559,37.15812,100.0,100.0,0.0,-1.87531,51.54862,52.79694,165.21454,815.68915,1986.35107,2839.72559,246.43968,1000.88245,8.71395,7.40204,395.19528,39.14607,40.48309,12.0,12.01019,1.38184,0.27634,540833,540833,0,0,1,1,1,35.43700,35.74219,28.02558,0,...,2880,783.0,783.02020,1,11.90332,10,10,10,10,-4.0971,-1.8503,3.6104,6.3006,4626.0,10424.0,800,70,15483.0,20,661.0,-0.00072,0.11575,5.19025,0,300.47382,0.00101,16496.0,299.84088,306.17041,0,10052.0,27616.0,2020,7,10,23,59,57,28,4
334799,0.10121,1.29707,397.63785,1001.99799,33.6555,100.0,2833.96997,37.14515,100.0,100.0,0.0,-1.88294,51.57135,52.79694,165.78673,815.68915,1991.88232,2833.96997,246.24898,1000.44769,8.71927,7.50885,395.24640,39.11773,40.49072,12.0,12.01782,1.38031,0.27557,540833,540833,0,0,1,1,1,35.42175,35.72693,28.02389,0,...,2880,780.0,780.31110,1,11.82080,10,10,10,10,-3.9610,-1.6205,4.1659,6.5001,4668.0,10592.0,736,70,15476.0,20,698.0,-0.00072,0.09402,4.70197,0,299.98554,-0.00072,16476.0,300.09406,305.48321,0,10052.0,27610.0,2020,7,10,23,59,58,28,4
334800,0.10121,1.29906,397.63785,1001.99799,33.6555,100.0,2842.95435,37.15011,100.0,100.0,0.0,-1.88294,51.55929,52.79694,161.97201,816.99371,1981.77331,2842.95435,246.05821,1000.88245,8.74994,8.38623,395.19528,39.07794,40.49072,12.0,12.01782,1.36887,0.27634,540833,540833,0,0,1,1,1,35.43700,35.72693,28.02909,0,...,2880,782.0,781.73102,1,11.83400,10,10,10,10,-3.4992,-1.3642,4.4217,6.4066,4767.0,10752.0,688,70,15458.0,20,773.0,-0.00043,0.06511,4.32217,0,300.87164,-0.00275,16408.0,299.58765,304.90454,0,10052.0,27617.0,2020,7,10,23,59,59,28,4


In [73]:
# X 데이터 프레임의 분산을 확인하세요.
  # 결과를 오름차순으로 정렬하세요.
X.var().sort_values(ascending=True)

P2_RTR       0.000000e+00
P1_PP01AD    0.000000e+00
P2_TripEx    0.000000e+00
P1_PP01AR    0.000000e+00
P2_VTR01     0.000000e+00
                 ...     
P3_FIT01     3.199248e+06
P4_ST_GOV    3.257661e+06
P3_LIT01     1.777464e+07
P3_LCP01D    2.622846e+07
P3_LCV01D    4.655434e+07
Length: 87, dtype: float64

In [79]:
# 분산값이 0인 컬럼을 조회하세요.
## 분산값이 0인 컬럼은 같은값을 가진 경우이므로 ml 의 경우 제거해주는게 좋음
X.var()[X.var() == 0]

P1_PP01AD      0.0
P1_PP01AR      0.0
P1_PP01BD      0.0
P1_PP01BR      0.0
P1_PP02D       0.0
P1_PP02R       0.0
P1_STSP        0.0
P2_ASD         0.0
P2_AutoGO      0.0
P2_ManualGO    0.0
P2_RTR         0.0
P2_TripEx      0.0
P2_VTR01       0.0
P2_VTR02       0.0
P2_VTR03       0.0
P2_VTR04       0.0
P3_LH          0.0
P3_LL          0.0
year           0.0
month          0.0
dtype: float64

In [81]:
# 위의 결과의 index 정보를 이용하여 해당 컬럼을 X 데이터 프레임에서 삭제하세요.
  # drop으로 위에서 분산값이 0인 df의 index들만 제거
  # inplace는 사본이 아닌 원본에 바로 적용할 수 있는 옵션
X.drop(X.var()[X.var() == 0].index, axis = 1, inplace=True)
X

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,P1_B2004,P1_B2016,P1_B3004,P1_B3005,P1_B4002,P1_B4005,P1_B400B,P1_B4022,P1_FCV01D,P1_FCV01Z,P1_FCV02D,P1_FCV02Z,P1_FCV03D,P1_FCV03Z,P1_FT01,P1_FT01Z,P1_FT02,P1_FT02Z,P1_FT03,P1_FT03Z,P1_LCV01D,P1_LCV01Z,P1_LIT01,P1_PCV01D,P1_PCV01Z,P1_PCV02D,P1_PCV02Z,P1_PIT01,P1_PIT02,P1_TIT01,P1_TIT02,P2_24Vdc,P2_CO_rpm,P2_Emerg,P2_HILout,P2_MSD,P2_OnOff,P2_SIT01,P2_SIT02,P2_VT01,P2_VXT02,P2_VXT03,P2_VYT02,P2_VYT03,P3_FIT01,P3_LCP01D,P3_LCV01D,P3_LIT01,P3_PIT01,P4_HT_FD,P4_HT_LD,P4_HT_PO,P4_HT_PS,P4_LD,P4_ST_FD,P4_ST_GOV,P4_ST_LD,P4_ST_PO,P4_ST_PS,P4_ST_PT01,P4_ST_TT01,day,hour,minute,second,week,weekday
0,0.10121,1.29784,397.63785,1001.99799,33.6555,100.0,2847.02539,37.14706,100.0,100.0,0.0,-1.87531,51.58201,52.80456,166.74039,808.29620,1973.19031,2847.02539,246.43968,1000.44769,8.79882,8.46252,395.19528,39.09198,40.49072,12.0,12.01782,1.36810,0.27786,35.43700,35.74219,28.02645,54074.0,0,712.07275,763.19324,1,780.0,779.59595,11.89504,-3.0660,-1.2648,4.1758,6.0951,4795.0,10832.0,608,15454.0,815.0,-0.00072,0.06511,4.01474,0,301.01636,-0.00297,16495.0,301.35992,305.03113,0,10052.0,27610.0,11,0,0,0,28,5
1,0.10121,1.29692,397.63785,1001.99799,33.6555,100.0,2839.58520,37.14477,100.0,100.0,0.0,-1.88294,51.60648,52.78931,168.64778,819.16809,1975.47900,2839.58520,246.43968,1000.01270,8.78811,8.47015,395.14420,39.05680,40.49072,12.0,12.01782,1.36810,0.27634,35.45227,35.74219,28.02473,54089.0,0,708.52661,763.19324,1,781.0,780.67328,11.93421,-2.9721,-1.3147,3.9259,5.9262,4835.0,10984.0,528,15461.0,883.0,-0.00051,0.04340,3.74347,0,297.43567,0.00072,16402.0,297.43567,304.27161,0,10052.0,27610.0,11,0,0,1,28,5
2,0.10121,1.29631,397.63785,1001.99799,33.6555,100.0,2833.26807,37.14325,100.0,100.0,0.0,-1.88294,51.57790,52.79694,168.83849,823.51697,1972.42725,2833.26807,246.05821,1000.88245,8.81787,8.47015,395.14420,38.97124,40.49835,12.0,12.01782,1.36734,0.27634,35.45227,35.74219,28.02817,54124.0,0,709.15527,763.19324,1,780.0,780.06574,11.97030,-2.9857,-1.4032,3.6489,5.8101,4961.0,11120.0,464,15462.0,956.0,-0.00043,0.04340,3.43603,0,298.84619,-0.00145,16379.0,298.66534,303.89179,0,10050.0,27617.0,11,0,0,2,28,5
3,0.10121,1.28685,397.63785,1001.99799,33.6555,100.0,2834.95264,37.11959,100.0,100.0,0.0,-1.88294,51.58236,52.79694,170.55510,823.95172,1983.10828,2834.95264,246.63045,1000.88245,8.87493,8.46252,395.19528,38.94103,40.49072,12.0,12.01782,1.36734,0.27634,35.43700,35.74219,28.02301,54099.0,0,715.46631,763.19324,1,780.0,780.15265,12.01066,-3.2166,-1.6074,3.3040,5.7509,5022.0,11256.0,416,15466.0,992.0,-0.00072,0.02170,3.12860,0,297.74310,-0.00318,16422.0,298.06860,303.67474,0,10052.0,27614.0,11,0,0,3,28,5
4,0.10121,1.28807,397.63785,1001.99799,33.6555,100.0,2832.70654,37.12265,100.0,100.0,0.0,-1.88294,51.62335,52.79694,171.31805,827.86560,1986.16016,2832.70654,246.24898,1000.01270,8.83838,8.47015,395.34866,38.90300,40.49835,12.0,12.01782,1.36810,0.27710,35.45227,35.74219,28.03595,54094.0,0,709.22852,763.19324,1,782.0,781.83160,11.99684,-3.5613,-1.7811,3.1881,5.8547,5088.0,11384.0,368,15461.0,1074.0,-0.00051,0.02170,2.87546,0,297.01965,0.00015,16355.0,296.53137,303.22266,0,10052.0,27621.0,11,0,0,4,28,5
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
334797,0.10121,1.30608,397.63785,1001.99799,33.6555,100.0,2830.74121,37.16766,100.0,100.0,0.0,-1.87531,51.51111,52.78931,165.21454,811.77527,1974.14404,2830.74121,246.24898,1001.75220,8.68301,7.40967,395.19528,39.21347,40.49072,12.0,12.01782,1.38260,0.27557,35.42175,35.74219,28.02740,54132.0,0,719.06128,763.19324,1,782.0,781.57507,11.97952,-3.7334,-1.8377,3.2125,5.9391,4554.0,10240.0,912,15483.0,605.0,-0.00072,0.13745,5.62430,0,302.20990,0.00036,16589.0,300.88971,307.52679,0,10052.0,27614.0,10,23,59,56,28,4
334798,0.10121,1.30226,397.63785,1001.99799,33.6555,100.0,2839.72559,37.15812,100.0,100.0,0.0,-1.87531,51.54862,52.79694,165.21454,815.68915,1986.35107,2839.72559,246.43968,1000.88245,8.71395,7.40204,395.19528,39.14607,40.48309,12.0,12.01019,1.38184,0.27634,35.43700,35.74219,28.02558,54156.0,0,705.58472,763.19324,1,783.0,783.02020,11.90332,-4.0971,-1.8503,3.6104,6.3006,4626.0,10424.0,800,15483.0,661.0,-0.00072,0.11575,5.19025,0,300.47382,0.00101,16496.0,299.84088,306.17041,0,10052.0,27616.0,10,23,59,57,28,4
334799,0.10121,1.29707,397.63785,1001.99799,33.6555,100.0,2833.96997,37.14515,100.0,100.0,0.0,-1.88294,51.57135,52.79694,165.78673,815.68915,1991.88232,2833.96997,246.24898,1000.44769,8.71927,7.50885,395.24640,39.11773,40.49072,12.0,12.01782,1.38031,0.27557,35.42175,35.72693,28.02389,54140.0,0,704.98657,763.19324,1,780.0,780.31110,11.82080,-3.9610,-1.6205,4.1659,6.5001,4668.0,10592.0,736,15476.0,698.0,-0.00072,0.09402,4.70197,0,299.98554,-0.00072,16476.0,300.09406,305.48321,0,10052.0,27610.0,10,23,59,58,28,4
334800,0.10121,1.29906,397.63785,1001.99799,33.6555,100.0,2842.95435,37.15011,100.0,100.0,0.0,-1.88294,51.55929,52.79694,161.97201,816.99371,1981.77331,2842.95435,246.05821,1000.88245,8.74994,8.38623,395.19528,39.07794,40.49072,12.0,12.01782,1.36887,0.27634,35.43700,35.72693,28.02909,54127.0,0,711.99341,763.19324,1,782.0,781.73102,11.83400,-3.4992,-1.3642,4.4217,6.4066,4767.0,10752.0,688,15458.0,773.0,-0.00043,0.06511,4.32217,0,300.87164,-0.00275,16408.0,299.58765,304.90454,0,10052.0,27617.0,10,23,59,59,28,4


In [82]:
# LogisticRegression 모델을 이용하여 모델을 학습하세요.
  # random_state를 0으로 지정하세요.
model = LogisticRegression(random_state=0).fit(X, Y)

STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
    https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression


In [83]:
# X 데이터 프레임의 컬럼명을 추출하세요
X.columns

Index(['P1_B2004', 'P1_B2016', 'P1_B3004', 'P1_B3005', 'P1_B4002', 'P1_B4005',
       'P1_B400B', 'P1_B4022', 'P1_FCV01D', 'P1_FCV01Z', 'P1_FCV02D',
       'P1_FCV02Z', 'P1_FCV03D', 'P1_FCV03Z', 'P1_FT01', 'P1_FT01Z', 'P1_FT02',
       'P1_FT02Z', 'P1_FT03', 'P1_FT03Z', 'P1_LCV01D', 'P1_LCV01Z', 'P1_LIT01',
       'P1_PCV01D', 'P1_PCV01Z', 'P1_PCV02D', 'P1_PCV02Z', 'P1_PIT01',
       'P1_PIT02', 'P1_TIT01', 'P1_TIT02', 'P2_24Vdc', 'P2_CO_rpm', 'P2_Emerg',
       'P2_HILout', 'P2_MSD', 'P2_OnOff', 'P2_SIT01', 'P2_SIT02', 'P2_VT01',
       'P2_VXT02', 'P2_VXT03', 'P2_VYT02', 'P2_VYT03', 'P3_FIT01', 'P3_LCP01D',
       'P3_LCV01D', 'P3_LIT01', 'P3_PIT01', 'P4_HT_FD', 'P4_HT_LD', 'P4_HT_PO',
       'P4_HT_PS', 'P4_LD', 'P4_ST_FD', 'P4_ST_GOV', 'P4_ST_LD', 'P4_ST_PO',
       'P4_ST_PS', 'P4_ST_PT01', 'P4_ST_TT01', 'day', 'hour', 'minute',
       'second', 'week', 'weekday'],
      dtype='object')

In [84]:
# test 데이터 프레임에서 해당 컬럼을 추출하세요
test[X.columns]

Unnamed: 0,P1_B2004,P1_B2016,P1_B3004,P1_B3005,P1_B4002,P1_B4005,P1_B400B,P1_B4022,P1_FCV01D,P1_FCV01Z,P1_FCV02D,P1_FCV02Z,P1_FCV03D,P1_FCV03Z,P1_FT01,P1_FT01Z,P1_FT02,P1_FT02Z,P1_FT03,P1_FT03Z,P1_LCV01D,P1_LCV01Z,P1_LIT01,P1_PCV01D,P1_PCV01Z,P1_PCV02D,P1_PCV02Z,P1_PIT01,P1_PIT02,P1_TIT01,P1_TIT02,P2_24Vdc,P2_CO_rpm,P2_Emerg,P2_HILout,P2_MSD,P2_OnOff,P2_SIT01,P2_SIT02,P2_VT01,P2_VXT02,P2_VXT03,P2_VYT02,P2_VYT03,P3_FIT01,P3_LCP01D,P3_LCV01D,P3_LIT01,P3_PIT01,P4_HT_FD,P4_HT_LD,P4_HT_PO,P4_HT_PS,P4_LD,P4_ST_FD,P4_ST_GOV,P4_ST_LD,P4_ST_PO,P4_ST_PS,P4_ST_PT01,P4_ST_TT01,day,hour,minute,second,week,weekday
0,0.10178,1.58771,403.78854,985.37353,32.59527,100.00000,2839.58520,36.81010,100.00000,99.91608,0.0,-1.86768,50.90726,51.95007,176.08643,845.69550,1978.72156,2843.37549,243.38802,989.14117,10.89290,10.8429,402.70947,40.74125,41.32233,12.0,12.26196,1.34293,0.27557,34.88770,35.14710,28.03162,54116.0,0,725.21362,763.19324,1,790.0,789.76508,11.91040,-2.8687,-1.0189,3.7751,5.6330,-25.0,688,15888,18082.0,-23.0,0.00029,76.80121,73.58581,0,464.06610,0.00470,20469.0,386.26666,380.31683,0,10044.0,27567.0,7,15,0,0,28,1
1,0.10178,1.58725,403.78854,985.37353,32.59527,100.00000,2843.37549,36.80895,100.00000,99.91608,0.0,-1.86768,50.74607,51.96533,173.79756,840.47705,1986.92322,2845.06006,243.00656,992.62018,10.80512,10.8429,402.81174,40.86124,41.32233,12.0,12.26196,1.34216,0.27710,34.88770,35.14710,28.02301,54114.0,0,721.74072,763.19324,1,789.0,789.13147,11.98856,-2.9842,-1.2637,3.1689,5.4158,-25.0,648,15952,18043.0,-23.0,0.00051,76.92419,73.89325,0,464.22888,0.00210,20489.0,386.30286,380.02747,0,10040.0,27564.0,7,15,0,1,28,1
2,0.10178,1.59519,403.78854,985.37353,32.59527,100.00000,2845.06006,36.82879,100.00000,99.91608,0.0,-1.86768,50.66229,51.96533,174.56052,835.25842,1978.72156,2837.33911,242.81586,993.92468,10.80029,10.8429,402.76062,41.02906,41.32233,12.0,12.26196,1.34369,0.27710,34.88770,35.14710,28.02993,54082.0,0,718.15796,763.19324,1,786.0,785.81653,11.97400,-3.4939,-1.5398,2.9615,5.5532,-25.0,616,16000,18024.0,-23.0,0.00022,77.04715,74.20068,0,466.90533,0.00130,20604.0,389.73883,381.52850,0,10037.0,27565.0,7,15,0,2,28,1
3,0.10178,1.59747,403.78854,985.37353,32.59527,100.00000,2837.33911,36.83451,100.00000,99.91608,0.0,-1.86768,50.66462,51.98822,176.65860,836.99799,1977.76782,2843.37549,242.43439,993.05493,10.80579,10.8429,402.81174,41.15958,41.32233,12.0,12.26196,1.34445,0.27557,34.88770,35.14710,28.02993,54109.0,0,716.38794,763.19324,1,785.0,785.42438,11.92999,-3.8188,-1.6212,3.1285,5.7833,-25.0,584,16064,17985.0,-23.0,0.00022,77.17014,74.43579,0,466.79688,0.00000,20633.0,388.94311,382.08911,0,10040.0,27560.0,7,15,0,3,28,1
4,0.10178,1.59869,403.78854,985.37353,32.59527,100.00000,2843.37549,36.83756,100.00000,99.90845,0.0,-1.86768,50.65214,51.90429,175.89565,841.78162,1972.42725,2837.33911,242.81586,992.62018,10.81415,10.8429,402.91394,41.28887,41.21552,12.0,12.26196,1.34293,0.27710,34.90295,35.14710,28.02990,54111.0,0,714.66675,763.19324,1,783.0,782.99249,11.86934,-3.9858,-1.5631,3.4986,6.0309,-25.0,552,16112,17954.0,-23.0,0.00000,77.29312,74.74322,0,467.88190,-0.00043,20738.0,389.72082,383.44543,0,10042.0,27564.0,7,15,0,4,28,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
190798,0.10111,1.27011,403.41025,1091.40076,31.82315,7.86445,160.08667,35.24566,14.38350,13.94043,100.0,97.20001,60.49295,61.05957,202.02638,903.09906,30.17423,153.13467,299.08084,1091.70093,12.98609,12.5061,402.24945,90.08597,92.12646,12.0,12.01782,1.40091,0.17487,35.49804,36.42883,28.02216,54171.0,0,712.49390,763.19324,1,792.0,792.24969,11.99989,-2.8153,-1.0768,3.3087,5.5195,4916.0,11632,272,12480.0,1295.0,-0.00051,-0.00723,2.51376,10,291.35925,-0.00007,16056.0,291.19647,297.79730,50,10052.0,27619.0,28,22,59,56,31,1
190799,0.10111,1.26782,403.41025,1091.40076,31.82315,7.84770,153.13467,35.23994,14.34272,13.94043,100.0,97.20001,60.51854,61.05957,203.93370,899.62018,29.02986,157.10713,299.27155,1091.06311,13.02108,12.5061,402.19830,90.01218,92.11883,12.0,12.01782,1.40015,0.17410,35.49804,36.41358,28.02996,54161.0,0,732.51953,763.19324,1,797.0,796.51233,11.91820,-3.7239,-1.4087,3.3751,5.9431,4953.0,11736,224,12478.0,1329.0,-0.00051,-0.00723,2.26056,10,291.46771,0.00029,16030.0,291.54010,297.36328,50,10052.0,27621.0,28,22,59,57,31,1
190800,0.10111,1.26279,403.41025,1091.40076,31.82315,7.80868,157.10713,35.22735,14.41761,13.94043,100.0,97.20001,60.50650,61.05957,205.07814,903.96881,30.36500,151.14882,299.27155,1091.38208,13.01329,12.5061,402.30057,89.92320,92.01965,12.0,12.01782,1.39862,0.17334,35.49804,36.42883,28.02905,54147.0,0,723.01636,763.19324,1,795.0,795.43524,11.81432,-3.5588,-1.0689,4.2203,6.2678,4954.0,11848,192,12479.0,1377.0,-0.00051,-0.00723,2.06162,10,290.78052,-0.00007,16036.0,290.76245,297.21863,50,10052.0,27622.0,28,22,59,58,31,1
190801,0.10111,1.26370,403.41025,1091.40076,31.82315,7.78401,151.14882,35.22964,14.36758,13.94043,100.0,97.20001,60.50658,61.05194,203.17075,906.57819,29.22057,154.12797,298.89008,1091.38208,13.03138,12.5061,402.19830,89.86734,92.00440,12.0,12.01782,1.39862,0.17563,35.51330,36.42883,28.02990,54145.0,0,719.51294,763.19324,1,796.0,795.59662,11.90042,-2.7499,-0.7888,4.1070,5.8461,4955.0,11944,160,12479.0,1426.0,-0.00029,-0.00723,1.88077,10,290.47308,-0.00094,15946.0,290.61774,295.95270,50,10052.0,27621.0,28,22,59,59,31,1


In [85]:
# 학습된 모델을 사용하여 변환된 데이터를 예측하세요.
  # 예측 결과는 predict 변수에 저장하세요.
predict = model.predict(test[X.columns])

In [86]:
# submission 데이터 프레임의 attack 컬럼에 예측한 값을 대입하세요.
submission['attack'] = predict
submission

Unnamed: 0,time,attack
0,2020-07-07 15:00:00,0
1,2020-07-07 15:00:01,0
2,2020-07-07 15:00:02,0
3,2020-07-07 15:00:03,0
4,2020-07-07 15:00:04,0
...,...,...
190798,2020-07-28 22:59:56,0
190799,2020-07-28 22:59:57,0
190800,2020-07-28 22:59:58,0
190801,2020-07-28 22:59:59,0


In [87]:
# pjt 객체의 submit 함수를 이용하여 결과를 제출하세요.
pjt.submit(submission)

파일을 저장하였습니다. 파일명: submission-05-25-05.csv
제출 여부 :success
오늘 제출 횟수 : 3
제출 결과:0.03315666211416879


In [88]:
%%time
# RandomForestClassifier 모델을 이용하여 모델을 학습하세요.
  # random_state를 0으로 지정하세요.
  # n_jobs를 -1로 지정하세요
model = RandomForestClassifier(random_state=0, n_jobs=-1).fit(X,Y)

CPU times: user 6min 41s, sys: 272 ms, total: 6min 41s
Wall time: 3min 40s


In [89]:
# 학습된 모델을 사용하여 변환된 데이터를 예측하세요.
  # 예측 결과는 predict 변수에 저장하세요.
predict = model.predict(test[X.columns])

In [90]:
# submission 데이터 프레임의 attack 컬럼에 예측한 값을 대입하세요.
submission['attack'] = predict
submission

Unnamed: 0,time,attack
0,2020-07-07 15:00:00,0
1,2020-07-07 15:00:01,0
2,2020-07-07 15:00:02,0
3,2020-07-07 15:00:03,0
4,2020-07-07 15:00:04,0
...,...,...
190798,2020-07-28 22:59:56,0
190799,2020-07-28 22:59:57,0
190800,2020-07-28 22:59:58,0
190801,2020-07-28 22:59:59,0


In [91]:
# pjt 객체의 submit 함수를 이용하여 결과를 제출하세요.
pjt.submit(submission)

파일을 저장하였습니다. 파일명: submission-05-30-02.csv
제출 여부 :success
오늘 제출 횟수 : 4
제출 결과:0.35247544438898215


# 데이터 전처리, 다른 모델을 사용하여 성능을 더 높여보세요