# Build & Save Similarity Model
---

### 개요
* **Preprocessed_repository**로 부터 **preprocessing** 된 data를 불러와 각 data 사이 **유사도(similarity)**를 계산하여 하나의 **유사도 모델(similarity_model)**을 구성하여 반환/저장함

---
* 아래는 저장되어있는 preprocessed_data 사이 similarity를 계산하여 similarity_model을 구성/저장하는 과정임  

<img src="https://raw.githubusercontent.com/jhyun0919/EnergyData_jhyun/master/docs/images/%EC%8A%A4%ED%81%AC%EB%A6%B0%EC%83%B7%202016-05-18%20%EC%98%A4%EC%A0%84%2010.26.43.jpg" alt="Drawing" style="width: 700px;"/>

---
* similarity 계산과 save 과정에 필요한 module들을 import 하자

In [1]:
from utils import GlobalParameter
from utils import FileIO
from utils import Similarity
import os



---
* 다음 과정은 repository의 경로를 지정하고 확인하는 과정이다

In [2]:
repository4prepodessed_path = os.path.join(GlobalParameter.Repository_Path, GlobalParameter.Preprocessed_Path)
repository4prepodessed_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data'

---
* 지정된 경로 아래에 있는 preprocessed_data file들의 abs_path를 list로 만들어 반환하자

In [3]:
file_list = FileIO.Load.load_filelist(repository4prepodessed_path)
file_list

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* file_list를 인자값으로 전달하여 **similarity_model**을 구성하고, 
    * 해당 모델(similarity_model)과 
    * 저장된 경로(model_save_path)를 반환 받자

In [4]:
similarity_model, model_save_path = Similarity.Model.build_model(file_list)

---
* 반환 받은 model_save_path를 확인해보자

In [5]:
model_save_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/model/model.bin'

---
* 반환 받은 similarity_model을 확인해보자

In [6]:
similarity_model

{'cosine_similarity': array([[ 0.   ,  0.018,  0.787,  0.03 ,  0.032,  0.658,  0.128,  0.033,  1.   ],
        [ 0.018,  0.   ,  0.791,  0.048,  0.042,  0.665,  0.144,  0.028,  1.   ],
        [ 0.787,  0.791,  0.   ,  0.798,  0.799,  0.147,  0.825,  0.806,  1.   ],
        [ 0.03 ,  0.048,  0.798,  0.   ,  0.002,  0.656,  0.059,  0.022,  1.   ],
        [ 0.032,  0.042,  0.799,  0.002,  0.   ,  0.656,  0.061,  0.023,  1.   ],
        [ 0.658,  0.665,  0.147,  0.656,  0.656,  0.   ,  0.693,  0.666,  1.   ],
        [ 0.128,  0.144,  0.825,  0.059,  0.061,  0.693,  0.   ,  0.081,  1.   ],
        [ 0.033,  0.028,  0.806,  0.022,  0.023,  0.666,  0.081,  0.   ,  1.   ],
        [ 1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  0.   ]]),
 'euclidean_distance': array([[     0.   ,   2642.35 ,  13562.254,   3838.5  ,   3935.521,
          13338.732,   6934.901,   3828.04 ,  13775.754],
        [  2642.35 ,      0.   ,  13858.202,   4656.261,   4383.415,
          13639.555, 

---
### Similarity Model  

* **similarity_model**의 구성
    * file_list
    * cosine_similarity
    * euclidean_distance
    * manhatton_distance
    * gradient_similarity
    * reversed_gradient_similarity

---
* **file_list**
    * preprocessed_repository 아래에 있는 data file의 abs_path를 list로 관리하는 항목임임
        * 각 file의 list_idx는 차후 similarity_matrix에서 row와 column의 idx와 일치하게 됨

In [7]:
similarity_model['file_list']

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* **cosine_similarity**
    * 각 data 사이 **cosine simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [8]:
similarity_model['cosine_similarity']

array([[ 0.   ,  0.018,  0.787,  0.03 ,  0.032,  0.658,  0.128,  0.033,  1.   ],
       [ 0.018,  0.   ,  0.791,  0.048,  0.042,  0.665,  0.144,  0.028,  1.   ],
       [ 0.787,  0.791,  0.   ,  0.798,  0.799,  0.147,  0.825,  0.806,  1.   ],
       [ 0.03 ,  0.048,  0.798,  0.   ,  0.002,  0.656,  0.059,  0.022,  1.   ],
       [ 0.032,  0.042,  0.799,  0.002,  0.   ,  0.656,  0.061,  0.023,  1.   ],
       [ 0.658,  0.665,  0.147,  0.656,  0.656,  0.   ,  0.693,  0.666,  1.   ],
       [ 0.128,  0.144,  0.825,  0.059,  0.061,  0.693,  0.   ,  0.081,  1.   ],
       [ 0.033,  0.028,  0.806,  0.022,  0.023,  0.666,  0.081,  0.   ,  1.   ],
       [ 1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  0.   ]])

---
* **euclidean_distance**
    * 각 data 사이 **euclidean distance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [9]:
similarity_model['euclidean_distance']

array([[     0.   ,   2642.35 ,  13562.254,   3838.5  ,   3935.521,
         13338.732,   6934.901,   3828.04 ,  13775.754],
       [  2642.35 ,      0.   ,  13858.202,   4656.261,   4383.415,
         13639.555,   7423.263,   3501.575,  14067.647],
       [ 13562.254,  13858.202,      0.   ,  14987.302,  15014.644,
           780.26 ,  13443.173,  14660.575,   1270.35 ],
       [  3838.5  ,   4656.261,  14987.302,      0.   ,    863.361,
         14744.373,   5204.77 ,   3186.513,  15191.619],
       [  3935.521,   4383.415,  15014.644,    863.361,      0.   ,
         14771.002,   5278.966,   3224.633,  15218.663],
       [ 13338.732,  13639.555,    780.26 ,  14744.373,  14771.002,
             0.   ,  13225.429,  14424.24 ,   1493.382],
       [  6934.901,   7423.263,  13443.173,   5204.77 ,   5278.966,
         13225.429,      0.   ,   5824.987,  13606.85 ],
       [  3828.04 ,   3501.575,  14660.575,   3186.513,   3224.633,
         14424.24 ,   5824.987,      0.   ,  14853.905],


---
* **manhatton_distance**
    * 각 data 사이 **manhatton distance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [10]:
similarity_model['manhattan_distance']

array([[  0.        ,   3.25540708,  31.82507657,   7.81085148,
          8.67394117,  30.83050115,  15.01547835,   9.47018381,
         32.39041941],
       [  3.25540708,   0.        ,  33.81701641,  10.99700717,
         10.2241571 ,  32.87824217,  18.21221861,   8.47199324,
         34.35555481],
       [ 31.82507657,  33.81701641,   0.        ,  32.65734072,
         31.77390668,   1.38168621,  24.11103903,  33.28170774,
          0.72742094],
       [  7.81085148,  10.99700717,  32.65734072,   0.        ,
          0.92934826,  31.67382387,  10.7878635 ,   7.07132538,
         33.15664669],
       [  8.67394117,  10.2241571 ,  31.77390668,   0.92934826,
          0.        ,  30.83869946,  11.66738691,   7.26606295,
         32.24697934],
       [ 30.83050115,  32.87824217,   1.38168621,  31.67382387,
         30.83869946,   0.        ,  23.74864835,  32.55789442,
          1.65915124],
       [ 15.01547835,  18.21221861,  24.11103903,  10.7878635 ,
         11.66738691,  23.7486

---
* **gradient_similarity**
    * 각 data 사이 **gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [11]:
similarity_model['gradient_similarity']

array([[ 0.        ,  0.00208466,  0.00395662,  0.00144547,  0.00244612,
         0.00601553,  0.00177311,  0.00272743,  0.0018041 ],
       [ 0.00208466,  0.        ,  0.00489024,  0.00234003,  0.00202164,
         0.00664223,  0.00234035,  0.00234193,  0.00161328],
       [ 0.00395662,  0.00489024,  0.        ,  0.00442068,  0.00479398,
         0.00382184,  0.00406345,  0.00523414,  0.0034438 ],
       [ 0.00144547,  0.00234003,  0.00442068,  0.        ,  0.00196922,
         0.00570512,  0.00166231,  0.00268299,  0.0015578 ],
       [ 0.00244612,  0.00202164,  0.00479398,  0.00196922,  0.        ,
         0.0064926 ,  0.00226905,  0.00243586,  0.00148083],
       [ 0.00601553,  0.00664223,  0.00382184,  0.00570512,  0.0064926 ,
         0.        ,  0.00579008,  0.00696167,  0.00518702],
       [ 0.00177311,  0.00234035,  0.00406345,  0.00166231,  0.00226905,
         0.00579008,  0.        ,  0.00259492,  0.000972  ],
       [ 0.00272743,  0.00234193,  0.00523414,  0.00268299,  0

---
* **reversed_gradient_similarity**
    * 각 data 사이 **reversed gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [12]:
similarity_model['reversed_gradient_similarity']

array([[ 0.        ,  0.00338842,  0.00480964,  0.00324464,  0.00324978,
         0.00655412,  0.00265261,  0.00370379,  0.00180496],
       [ 0.00338842,  0.        ,  0.00489199,  0.00314165,  0.00298095,
         0.00660363,  0.00255306,  0.00342713,  0.00161328],
       [ 0.00480964,  0.00489199,  0.        ,  0.00465954,  0.00475072,
         0.00794305,  0.00405224,  0.00519512,  0.00344433],
       [ 0.00324464,  0.00314165,  0.00465954,  0.        ,  0.003003  ,
         0.0061877 ,  0.00241081,  0.00345702,  0.00155835],
       [ 0.00324978,  0.00298095,  0.00475072,  0.003003  ,  0.        ,
         0.00645082,  0.00241638,  0.00329473,  0.0014809 ],
       [ 0.00655412,  0.00660363,  0.00794305,  0.0061877 ,  0.00645082,
         0.        ,  0.00580954,  0.00693758,  0.00518349],
       [ 0.00265261,  0.00255306,  0.00405224,  0.00241081,  0.00241638,
         0.00580954,  0.        ,  0.00287024,  0.000972  ],
       [ 0.00370379,  0.00342713,  0.00519512,  0.00345702,  0