# Anomaly Detection - LOF 

#### LOF 특성 상 새로운 값을 넣진 못함. 이미 주어진 데이터 중에서 abnormal 한 것이 있는지 확인이 가능. 

1. k-distance of an object p 정의 

2. k-distance Nighborhood of an object p 정의 
> $N_k(p)$ = {$q \in D / {p}| d(p,q) <= k-distance(p)$}

3. rechability distance 정의 
> rechability - distance_k(p,o) = max{k-distance(o), d(p,o)}

4. local reachability density of an abject p 정의  
> $lrd_k(p) = \frac{|N_k(p)|}{\sum_{o \in n_k(p)} reachability- distance_k(p,o)} $

5. Local outlier factor of an objct p 정의 
> $ LOF_k(p) = \frac{\sum_{o \in N_k(p)} \frac{lrd_k(o)}{lrd_k(p)}} {|N_k(p)|} $ = $\frac{\frac{1}{lrd_k(p)} \sum_{o \in N_k(p)} lrd_k(o)} {|N_k(p)|}$ 


**구현해야 하는 것**
- dist 함수 : n x n 행렬. 각 두 점 사이의 거리를 측정한 행렬 반환  
> self.dist 생성 

- k_distance 함수 : n x 1 행렬. n개의 사례에 대해 k_dist 값 반환

> self.k_dist 생성. n x n 행렬. dist 함수의 각 사례마다 k번째로 가까운 값까지 값 유지, 그외는 0으로 부여 
> |N_k(p)| 도 같이 계산  

- rechability distance 함수 > self.k_dist, self.dist, k_value를 활용하여 max{k_distance(o), d(p,o) 구현

- lrd 함수 : n x 1 행렬. self.k_dist와 |N_k(p)| 로 구현할 것 


- LOF 함수 : n x 1 행렬. $\sum_{o \in N_k(p)} lrd_k(o)$ 함수는 k_dist에서 값이 0이 아닌 값에 대해서 값을 구하는 것으로 구현 


**필요한 것**
- X
- k 

**함수의 형태**
- 위에서 이미 기술함. 



In [1]:
import numpy as np
import pandas as pd
import random as rand

from sklearn.datasets import load_iris
X = load_iris()['data']

import matplotlib.pyplot as plt
import scipy as sc
from scipy.stats import norm
from sys import maxsize

In [66]:
class LOF(): 
    def __init__(self,X, k) : 
        self.X = np.array(X)
        self.n = np.shape(X)[0]
        self.m = np.shape(X)[1] 
        
        self.k = k
        self.dist = self.cal_dist()
        self.k_dist, self.k_value, self.nkp_list = self.cal_k_dist()
        self.r_dist = self.re_dist()
        self.lrd_lst = self.lrd()
        self.lof_lst = self.lof() 
    
    def cal_dist(self) :
        dist = []
        for i in range(self.n) : 
            vector = [np.sqrt(np.sum( (X[i] - X[j]) **2 )) for j in range(self.n)]
            dist.append(vector)
        return np.array(dist)

    def cal_k_dist(self) : 
        k_dist_lst = [] 
        k_value = [] 
        nkp_list = []
        for i in range(self.n) : 
            index = np.argsort(self.dist[i])
            k_dist = self.dist[i][index][self.k]
            vector = [self.dist[i,j] if self.dist[i,j] <= k_dist else 0 for j in range(self.n)]
            nkp = np.array([1 if self.dist[i,j] <= k_dist else 0 for j in range(self.n)]).sum()
            k_dist_lst.append(vector)
            k_value.append(k_dist)
            nkp_list.append(nkp)
        
        return np.array(k_dist_lst), np.array(k_value), np.array(nkp_list)
    
        
    def re_dist(self) : 
        re_dist = np.zeros((self.n, self.n))
        for i in range(self.n) : 
            for j in range(self.n) : 
                re_dist[i,j] = max(self.k_dist[i,j], self.dist[i,j])
        return re_dist
    
    def lrd(self) : 
        lrd = []
        for i in range(self.n): 
            lower = np.sum([self.r_dist[i,j] for j in range(self.n) if self.k_dist[i,j] != 0])
            value = self.nkp_list[i] / lower
            lrd.append(value)
        return np.array(lrd)
    
    def lof(self) : 
        lof = [] 
        for i in range(self.n) :             
            upper = np.sum([self.lrd_lst[j] if self.k_dist[i,j] !=0 else 0 for j in range(self.n)]) / self.lrd_lst[i]
            under = self.nkp_list[i]
            lof.append(upper/under)
        return lof
    
    def check_abnormal(self, x): 
        index = np.where(self.X == x)[0][0]
        return print("lof Score is ", self.lof_lst[index])
    
            
            

In [67]:
test = LOF(X,3) 
test.cal_k_dist()
test.check_abnormal([7.9, 3.8, 6.4, 2. ])

lof Score is  1.2432793507465156


In [54]:
a = np.array([5,4,3,2,6,4,8,6,5,3])

print(np.where(a==8)[0][0])

6


In [39]:
print(X)

[[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]
 [5.4 3.9 1.7 0.4]
 [4.6 3.4 1.4 0.3]
 [5.  3.4 1.5 0.2]
 [4.4 2.9 1.4 0.2]
 [4.9 3.1 1.5 0.1]
 [5.4 3.7 1.5 0.2]
 [4.8 3.4 1.6 0.2]
 [4.8 3.  1.4 0.1]
 [4.3 3.  1.1 0.1]
 [5.8 4.  1.2 0.2]
 [5.7 4.4 1.5 0.4]
 [5.4 3.9 1.3 0.4]
 [5.1 3.5 1.4 0.3]
 [5.7 3.8 1.7 0.3]
 [5.1 3.8 1.5 0.3]
 [5.4 3.4 1.7 0.2]
 [5.1 3.7 1.5 0.4]
 [4.6 3.6 1.  0.2]
 [5.1 3.3 1.7 0.5]
 [4.8 3.4 1.9 0.2]
 [5.  3.  1.6 0.2]
 [5.  3.4 1.6 0.4]
 [5.2 3.5 1.5 0.2]
 [5.2 3.4 1.4 0.2]
 [4.7 3.2 1.6 0.2]
 [4.8 3.1 1.6 0.2]
 [5.4 3.4 1.5 0.4]
 [5.2 4.1 1.5 0.1]
 [5.5 4.2 1.4 0.2]
 [4.9 3.1 1.5 0.2]
 [5.  3.2 1.2 0.2]
 [5.5 3.5 1.3 0.2]
 [4.9 3.6 1.4 0.1]
 [4.4 3.  1.3 0.2]
 [5.1 3.4 1.5 0.2]
 [5.  3.5 1.3 0.3]
 [4.5 2.3 1.3 0.3]
 [4.4 3.2 1.3 0.2]
 [5.  3.5 1.6 0.6]
 [5.1 3.8 1.9 0.4]
 [4.8 3.  1.4 0.3]
 [5.1 3.8 1.6 0.2]
 [4.6 3.2 1.4 0.2]
 [5.3 3.7 1.5 0.2]
 [5.  3.3 1.4 0.2]
 [7.  3.2 4.7 1.4]
 [6.4 3.2 4.5 1.5]
 [6.9 3.1 4.