# CMS 데이터 상,하한 이상감지(v 1.0)
## 1. 개요 
- CMS 데이터의 상,하한 값에 대한 이상을 감지하고 관련 담당자에게 메일을 송부함으로써 데이터 신뢰성 확보 및 회전설비 감시체계 강건화
- 기본컨셉은 CMS 데이터를 쿼리하고, 데이터를 정제한 후 정상기간(수리기간 제외) 데이터의 평균 및 표준편차(σ)를 구하여,  
통계학적으로 유의미한 이상한계값(3σ)을 초과 및 미만의 데이터는 이상값(Anomaly)로 판정하고, 이를 담당자에게 메일송부하는 시스템으로 구현 

## 2. Version History (현재 : 1.0ver 개발중)
###  [1.0]  : <u> 현재단계</u>
 - [x] Data : 설비상태해석 시스템 시보데이터 추출(Excel Export) 
 - [x] 정제 : 누락되어있는 구간을 삭제하는 것이 아닌, 0으로 일괄처리
 - [x] 이상값 한계 설정 : 값이 0 이하(음수포함)인 경우를 제외하고, 나머지 데이터에서 평균, 표준편차산출
 - [x] 이상 판정 : Raw Data에서 상,하한을 넘어선 데이터를 이상값으로 판정
 - [x] 그래프 : Plotly를 이용하고, 이상값에 대한 그래프 별도표시
 - 관련기술 : Plotly Graph, 이상탐지기법(3α)

### [1.5]
 - [x] Data : 설비상태해석 시스템 DB에서 직접 데이터 추출(Oracle Database 연결), 
 - [x] 5소결 201Belt Conveyor Data 쿼리(기간 : '20.1.1~12.31)
 - [x] 데이터 병렬화(2개이상 개소 데이터 추출)
 - 관련기술 : Oracle Database-Python 연결(pyodbc, oracle client 및 DNS 연결), 데이터 병렬처리

### [2.0]
 - Data : 설비상태해석 시스템 CMS 시보데이터 및 수리실적 데이터
 - 이상 판정 : 이상값 중 생산휴지(수리,장애) 실적이 있는 데이터는 제외
 - 그래프 : 수리,장애 실적구간은 별도색깔로 표시 및 labeling
 - 관련기술 : 구간별 그래프 별도표기
 
### [3.0] 
 - Data : Local 데이터 추출(Excel Export)
 - 정제 : 기존 알고리즘에 적용할 수 있도록 데이터 정제

### [3.5]
 - Data 쿼리 : Local 데이터 DB에서 추출(Database 연결)

### [4.0]
 - Data 쿼리 : Local 데이터 1시간 단위 자동추출(Scheduling)

### [5.0] 
 - 이상치에 발생 시 일단위로 수합해서 담당자 메일송부

In [2]:
# Library import 
import pandas as pd
import numpy as np
import time
from datetime import datetime
import pyodbc
import os


from tqdm import tqdm # 실행 progress bar

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
from plotly.offline import plot, iplot, init_notebook_mode
init_notebook_mode(connected=True)
import plotly.io as pio
pio.renderers.default = "notebook_connected"

In [3]:
conn = pyodbc.connect('DSN=PKFAC_64;UID=PC629640P;PWD=welcom1!')
cursor = conn.cursor()
query = '''select 
      c.LOCATION_DESCRIPTION as 공정명,
      a.EQP_DT_MSR_SNR_NM as 센서명,
      a.EQP_DT_MSR_SNR_ID as 센서ID,
      a.EQP_AST_CD as Asset,
      b.AC_DT as 발생일,
      b.EQP_MNTR_DT_A_V as 발생치,
      b.EQP_MNTR_DT_BAS_V1 as 주의치,
      b.EQP_MNTR_DT_BAS_V2 as 위험치


from POSFAC.TB_P20_MTFA120 a, -- CMS 센서현황
    POSFAC.TB_P20_MTFA140 b, -- CMS 시보실적
    POSFAC.TB_P20_EAMA120 c -- 공정(AREA)
       
  where 1=1
  
    AND A.EQP_RPR_OP_CD = B.EQP_RPR_OP_CD
    AND A.EQP_MNTR_DT_TP_TP = B.EQP_MNTR_DT_TP_TP
    AND A.EQP_DT_MSR_SNR_ID = B.EQP_DT_MSR_SNR_ID
    and a.EQP_AST_CD = '4K5311959'
    and a.EQP_RPR_OP_CD = c.LOCATION
    and b.AC_DT >= TO_DATE('2020-01-01', 'YYYY-MM-DD')
    and b.AC_DT < TO_DATE('2021-01-01', 'YYYY-MM-DD')
    and b.EQP_MNTR_DT_TP_TP = ?
      ORDER BY B.AC_DT ASC;
    '''
# pyodbc 사용시 변수적용은 "?" 으로 함

In [72]:
#sqldata = pd.read_sql(query, conn)
list = ['V1', 'V2']
for i in list:
    globals()['sqldata{}'.format(i)] = pd.read_sql(query,conn, params=[i])

#testt = pd.read_sql(sql=query,con=conn, params=['V1'])
#testt[1].head()
sqldataV1


Unnamed: 0,공정명,센서명,센서ID,ASSET,발생일,발생치,주의치,위험치
0,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 00:00:00,1.90,3.0,4.0
1,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 01:00:00,1.97,3.0,4.0
2,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 02:00:00,2.02,3.0,4.0
3,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 03:00:00,2.00,3.0,4.0
4,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 04:00:00,2.02,3.0,4.0
...,...,...,...,...,...,...,...,...
6489,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 19:00:00,2.11,3.0,4.0
6490,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 20:00:00,2.24,3.0,4.0
6491,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 21:00:00,2.36,3.0,4.0
6492,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 22:00:00,2.38,3.0,4.0


In [73]:
sqldataV2

Unnamed: 0,공정명,센서명,센서ID,ASSET,발생일,발생치,주의치,위험치
0,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 00:00:00,1.42,10.59,15.77
1,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 01:00:00,1.50,10.59,15.77
2,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 02:00:00,1.53,10.59,15.77
3,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 03:00:00,1.98,10.59,15.77
4,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 04:00:00,1.82,10.59,15.77
...,...,...,...,...,...,...,...,...
6489,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 19:00:00,0.11,10.59,15.77
6490,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 20:00:00,0.11,10.59,15.77
6491,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 21:00:00,0.12,10.59,15.77
6492,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-12-31 22:00:00,0.11,10.59,15.77


In [36]:
#sqldata.to_excel('소결.xlsx')

In [4]:
rawdata = sqldata
#rawdata = pd.read_csv("5소결201BC.csv")
rawdata.head(20)

Unnamed: 0,공정명,센서명,센서ID,ASSET,발생일,발생치,주의치,위험치
0,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 00:00:00,1.9,3.0,4.0
1,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 01:00:00,1.97,3.0,4.0
2,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 02:00:00,2.02,3.0,4.0
3,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 03:00:00,2.0,3.0,4.0
4,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 04:00:00,2.02,3.0,4.0
5,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 05:00:00,2.12,3.0,4.0
6,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 06:00:00,2.05,3.0,4.0
7,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 07:00:00,2.36,3.0,4.0
8,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 08:00:00,2.06,3.0,4.0
9,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 09:00:00,2.06,3.0,4.0


In [7]:
sqldata.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6494 entries, 0 to 6493
Data columns (total 8 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   공정명     6494 non-null   object        
 1   센서명     6494 non-null   object        
 2   센서ID    6494 non-null   object        
 3   ASSET   6494 non-null   object        
 4   발생일     6494 non-null   datetime64[ns]
 5   발생치     6494 non-null   float64       
 6   주의치     6494 non-null   float64       
 7   위험치     6494 non-null   float64       
dtypes: datetime64[ns](1), float64(3), object(4)
memory usage: 406.0+ KB


In [6]:
# 복사한 표를 DataFrame으로 옮김
#rawdata = pd.read_csv("5소결201BC.csv")
rawdata = sqldata
rawdata.head()

Unnamed: 0,공정명,센서명,센서ID,ASSET,발생일,발생치,주의치,위험치
0,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 00:00:00,1.9,3.0,4.0
1,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 01:00:00,1.97,3.0,4.0
2,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 02:00:00,2.02,3.0,4.0
3,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 03:00:00,2.0,3.0,4.0
4,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-01 04:00:00,2.02,3.0,4.0


In [8]:
# color pallette
cnf, dth, rec, act = '#393e46', '#ff2e63', '#21bf73', '#fe9801' 
DEFAULT_PLOTLY_COLORS=['rgb(31, 119, 180)', 'rgb(255, 127, 14)',
                       'rgb(44, 160, 44)', 'rgb(214, 39, 40)',
                       'rgb(148, 103, 189)', 'rgb(140, 86, 75)',
                       'rgb(227, 119, 194)', 'rgb(127, 127, 127)',
                       'rgb(188, 189, 34)', 'rgb(23, 190, 207)']

# 기본 폰트 설정
layout_font = {'font':dict(size=24,color='#60606e',family='Franklin Gothic' )}

In [9]:
fig2 = ff.create_distplot([rawdata['발생치']], ['진동(속도)'])
fig2.show()

In [10]:
rawdata.describe()

Unnamed: 0,발생치,주의치,위험치
count,6494.0,6494.0,6494.0
mean,2.249455,3.0,4.0
std,0.376101,0.0,0.0
min,0.0,3.0,4.0
25%,2.07,3.0,4.0
50%,2.21,3.0,4.0
75%,2.48,3.0,4.0
max,5.84,3.0,4.0


In [11]:
# 데이터 전처리

# 1) 결측치는 0으로 대체
rawdata.fillna(0)

# 2) 기준값 설정을 위해, 0이하의 값을 제외한 정상데이터 추출
standard = rawdata['발생치']>0
standard_data = rawdata[standard]
standard_data.describe()
# 나중에 전처리를 함수로 만들자.

Unnamed: 0,발생치,주의치,위험치
count,6475.0,6475.0,6475.0
mean,2.256056,3.0,4.0
std,0.356333,0.0,0.0
min,0.34,3.0,4.0
25%,2.07,3.0,4.0
50%,2.22,3.0,4.0
75%,2.48,3.0,4.0
max,5.84,3.0,4.0


In [12]:
# 기준데이터 graph
fig3 = ff.create_distplot([standard_data['발생치']], ['진동(속도)'])
fig3.show()

In [18]:
# 3 sigma 계산을 위한 함수구현 
# 3 sigma 를 threshold 로 설정 

def anomaly(rawdata, standard_data):
    rawdata['anomaly'] = 0
    rawdata['check'] = ""
    mean = standard_data['발생치'].mean()
    std = standard_data['발생치'].std()
    upper_threshold = mean+ std*3
    lower_threshold = mean - std*3
    countN = rawdata['발생치'].count()
    for i in range(countN):
        if rawdata['발생치'][i] >= upper_threshold:
            rawdata['anomaly'][i] = 1
            rawdata['check'][i] = "이상치"
        elif rawdata['발생치'][i] < lower_threshold:
            rawdata['anomaly'][i] = 2
            rawdata['check'][i] = "수리여부 확인"
            

In [19]:
anomaly(rawdata,standard_data)

In [15]:
test_condition = rawdata['anomaly']==1
rawdata[test_condition]

Unnamed: 0,공정명,센서명,센서ID,ASSET,발생일,발생치,주의치,위험치,anomaly,check
224,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-10 08:00:00,0.43,3.0,4.0,1,수리여부 확인
225,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-10 09:00:00,0.44,3.0,4.0,1,수리여부 확인
226,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-10 10:00:00,0.44,3.0,4.0,1,수리여부 확인
227,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-10 11:00:00,0.43,3.0,4.0,1,수리여부 확인
228,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-01-10 12:00:00,0.43,3.0,4.0,1,수리여부 확인
...,...,...,...,...,...,...,...,...,...,...
6410,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-10-22 09:00:00,0.35,3.0,4.0,1,수리여부 확인
6411,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-10-22 10:00:00,0.35,3.0,4.0,1,수리여부 확인
6412,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-10-22 12:00:00,0.35,3.0,4.0,1,수리여부 확인
6413,(광)제선_소결_5소결_소결,Flow Dynamic Conveyor (510.201),92,4K5311959,2020-10-22 13:00:00,0.36,3.0,4.0,1,수리여부 확인


In [28]:

countN = rawdata['발생치'].count()
fig = go.Figure()
marker1_color = [DEFAULT_PLOTLY_COLORS[3] if i == 1 else DEFAULT_PLOTLY_COLORS[4] if i==2 else DEFAULT_PLOTLY_COLORS[7] for i in rawdata['anomaly']]
marker1_size = [8 if i == 1 or i == 2 else 5 for i in rawdata['anomaly']]

fig.add_trace(go.Scatter(x= rawdata['발생일'], y= rawdata['발생치'],
                        mode='markers', marker=dict(color=marker1_color, size=marker1_size), name='5소결201BC'))

fig.update_layout(title='<b> 진동값(속도) 추이 <b>')
fig.show()

## 가속도데이터

In [74]:

# 1) 결측치는 0으로 대체
sqldataV2.fillna(0)

# 2) 기준값 설정을 위해, 0이하의 값을 제외한 정상데이터 추출
standard2 = sqldataV2['발생치']>0
standard_data2 = sqldataV2[standard2]
standard_data2.describe()

Unnamed: 0,발생치,주의치,위험치
count,6475.0,6475.0,6475.0
mean,2.859138,10.59,15.77
std,2.17641,1.760506e-12,5.524896e-13
min,0.02,10.59,15.77
25%,1.255,10.59,15.77
50%,1.8,10.59,15.77
75%,3.855,10.59,15.77
max,10.41,10.59,15.77


In [75]:
anomaly(sqldataV2,standard_data2)

In [77]:
fig4 = ff.create_distplot([standard_data2['발생치']], ['진동(가속도)'])
fig4.show()

In [76]:

countN = sqldataV2['발생치'].count()
fig = go.Figure()
marker1_color = [DEFAULT_PLOTLY_COLORS[3] if i == 1 else DEFAULT_PLOTLY_COLORS[4] if i==2 else DEFAULT_PLOTLY_COLORS[7] for i in sqldataV2['anomaly']]
marker1_size = [8 if i == 1 or i == 2 else 5 for i in sqldataV2['anomaly']]

fig.add_trace(go.Scatter(x= sqldataV2['발생일'], y= sqldataV2['발생치'],
                        mode='markers', marker=dict(color=marker1_color, size=marker1_size), name='5소결201BC'))

fig.update_layout(title='<b> 진동값(가속도) 추이 <b>')
fig.show()