# Build & Save Similarity Model
---

### 개요
* **Preprocessed_repository**로 부터 **preprocessing** 된 data를 불러와 각 data 사이 **유사도(similarity)**를 계산하여 하나의 **유사도 모델(similarity_model)**을 구성하여 반환/저장함

---
* 아래는 저장되어있는 preprocessed_data 사이 similarity를 계산하여 similarity_model을 구성/저장하는 과정임  

<img src="https://raw.githubusercontent.com/jhyun0919/EnergyData_jhyun/master/docs/images/%EC%8A%A4%ED%81%AC%EB%A6%B0%EC%83%B7%202016-05-18%20%EC%98%A4%EC%A0%84%2010.26.43.jpg" alt="Drawing" style="width: 700px;"/>

---
* similarity 계산과 save 과정에 필요한 module들을 import 하자

In [1]:
from utils import GlobalParameter
from utils import FileIO
from utils import Similarity
import os



---
* 다음 과정은 repository의 경로를 지정하고 확인하는 과정이다

In [2]:
repository4prepodessed_path = os.path.join(GlobalParameter.Repository_Path, GlobalParameter.Preprocessed_Path)
repository4prepodessed_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data'

---
* 지정된 경로 아래에 있는 preprocessed_data file들의 abs_path를 list로 만들어 반환하자

In [3]:
file_list = FileIO.Load.load_filelist(repository4prepodessed_path)
file_list

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* file_list를 인자값으로 전달하여 **similarity_model**을 구성하고, 
    * 해당 모델(similarity_model)과 
    * 저장된 경로(model_save_path)를 반환 받자

In [4]:
similarity_model, model_save_path = Similarity.Model.build_model(file_list)

---
* 반환 받은 model_save_path를 확인해보자

In [5]:
model_save_path

'/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/model/model.bin'

---
* 반환 받은 similarity_model을 확인해보자

In [6]:
similarity_model

{'cosine_similarity': array([[ 0.   ,  0.018,  0.784,  0.031,  0.033,  0.647,  0.131,  0.034,  1.   ],
        [ 0.018,  0.   ,  0.788,  0.049,  0.043,  0.654,  0.147,  0.028,  1.   ],
        [ 0.784,  0.788,  0.   ,  0.795,  0.795,  0.148,  0.823,  0.803,  1.   ],
        [ 0.031,  0.049,  0.795,  0.   ,  0.002,  0.642,  0.061,  0.023,  1.   ],
        [ 0.033,  0.043,  0.795,  0.002,  0.   ,  0.641,  0.063,  0.023,  1.   ],
        [ 0.647,  0.654,  0.148,  0.642,  0.641,  0.   ,  0.678,  0.653,  1.   ],
        [ 0.131,  0.147,  0.823,  0.061,  0.063,  0.678,  0.   ,  0.082,  1.   ],
        [ 0.034,  0.028,  0.803,  0.023,  0.023,  0.653,  0.082,  0.   ,  1.   ],
        [ 1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  0.   ]]),
 'covariance': array([[  6.36089729e+02,   5.33009878e+02,   8.57556961e+00,
           7.26216445e+02,   6.97625219e+02,   7.15140857e+00,
           6.61663182e+02,   6.23831746e+02,  -2.18822623e-03],
        [  5.33009878e+02,   4.8456

---
### Similarity Model  

* **similarity_model**의 구성
    * file_list
    * cosine_similarity
    * euclidean_distance
    * manhatton_distance
    * gradient_similarity
    * reversed_gradient_similarity

---
* **file_list**
    * preprocessed_repository 아래에 있는 data file의 abs_path를 list로 관리하는 항목임임
        * 각 file의 list_idx는 차후 similarity_matrix에서 row와 column의 idx와 일치하게 됨

In [7]:
similarity_model['file_list']

['/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA10_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW1_HA11_VM_KV_KAM.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_EP_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_K.bin',
 '/Users/JH/Documents/GitHub/EnergyData_jhyun/repository/preprocessed_data/PP_VTT_GW2_HA4_VM_KV_KAM.bin']

---
* **covariance**
    * 각 data 사이 **covariance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [8]:
similarity_model['covariance']

array([[  6.36089729e+02,   5.33009878e+02,   8.57556961e+00,
          7.26216445e+02,   6.97625219e+02,   7.15140857e+00,
          6.61663182e+02,   6.23831746e+02,  -2.18822623e-03],
       [  5.33009878e+02,   4.84569097e+02,   6.73427740e+00,
          6.20936756e+02,   6.15941674e+02,   3.49532666e+00,
          5.84748335e+02,   5.49810742e+02,  -2.20098979e-03],
       [  8.57556961e+00,   6.73427740e+00,   1.31534289e+01,
          9.59640031e+00,   9.76126154e+00,   1.25425492e+01,
          8.46741371e+00,   6.65993695e+00,  -4.03273336e-05],
       [  7.26216445e+02,   6.20936756e+02,   9.59640031e+00,
          9.49352590e+02,   9.20153290e+02,   1.35201371e+01,
          9.20638373e+02,   7.99284040e+02,  -2.32934952e-03],
       [  6.97625219e+02,   6.15941674e+02,   9.76126154e+00,
          9.20153290e+02,   9.07136992e+02,   1.45045611e+01,
          8.99293702e+02,   7.70782801e+02,  -2.14319908e-03],
       [  7.15140857e+00,   3.49532666e+00,   1.25425492e+01,
   

---
* **cosine_similarity**
    * 각 data 사이 **cosine simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [9]:
similarity_model['cosine_similarity']

array([[ 0.   ,  0.018,  0.784,  0.031,  0.033,  0.647,  0.131,  0.034,  1.   ],
       [ 0.018,  0.   ,  0.788,  0.049,  0.043,  0.654,  0.147,  0.028,  1.   ],
       [ 0.784,  0.788,  0.   ,  0.795,  0.795,  0.148,  0.823,  0.803,  1.   ],
       [ 0.031,  0.049,  0.795,  0.   ,  0.002,  0.642,  0.061,  0.023,  1.   ],
       [ 0.033,  0.043,  0.795,  0.002,  0.   ,  0.641,  0.063,  0.023,  1.   ],
       [ 0.647,  0.654,  0.148,  0.642,  0.641,  0.   ,  0.678,  0.653,  1.   ],
       [ 0.131,  0.147,  0.823,  0.061,  0.063,  0.678,  0.   ,  0.082,  1.   ],
       [ 0.034,  0.028,  0.803,  0.023,  0.023,  0.653,  0.082,  0.   ,  1.   ],
       [ 1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  1.   ,  0.   ]])

---
* **euclidean_distance**
    * 각 data 사이 **euclidean distance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [10]:
similarity_model['euclidean_distance']

array([[     0.   ,   2642.098,  13504.789,   3861.998,   3958.654,
         13265.964,   6993.981,   3842.326,  13721.771],
       [  2642.098,      0.   ,  13802.378,   4675.627,   4403.274,
         13568.805,   7479.117,   3514.483,  14015.209],
       [ 13504.789,  13802.378,      0.   ,  14911.893,  14940.325,
           785.959,  13385.896,  14622.85 ,   1270.772],
       [  3861.998,   4675.627,  14911.893,      0.   ,    863.385,
         14649.591,   5239.52 ,   3195.704,  15120.411],
       [  3958.654,   4403.274,  14940.325,    863.385,      0.   ,
         14677.27 ,   5313.004,   3212.968,  15148.529],
       [ 13265.964,  13568.805,    785.959,  14649.591,  14677.27 ,
             0.   ,  13145.714,  14368.155,   1501.651],
       [  6993.981,   7479.117,  13385.896,   5239.52 ,   5313.004,
         13145.714,      0.   ,   5875.922,  13551.702],
       [  3842.326,   3514.483,  14622.85 ,   3195.704,   3212.968,
         14368.155,   5875.922,      0.   ,  14820.05 ],


---
* **manhatton_distance**
    * 각 data 사이 **manhatton distance**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [11]:
similarity_model['manhattan_distance']

array([[  0.        ,   3.25471372,  31.81870433,   7.82647339,
          8.69115082,  30.80160279,  15.24280938,   9.51509212,
         32.40001293],
       [  3.25471372,   0.        ,  33.81244044,  11.01312243,
         10.23811025,  32.84977573,  18.4404397 ,   8.48195366,
         34.36353824],
       [ 31.81870433,  33.81244044,   0.        ,  32.6116483 ,
         31.73311299,   1.39772756,  24.00491751,  33.30164425,
          0.73369298],
       [  7.82647339,  11.01312243,  32.6116483 ,   0.        ,
          0.93195459,  31.61010915,  10.95329826,   7.14077374,
         33.13142385],
       [  8.69115082,  10.23811025,  31.73311299,   0.93195459,
          0.        ,  30.78142965,  11.83220839,   7.29324067,  32.2261594 ],
       [ 30.80160279,  32.84977573,   1.39772756,  31.61010915,
         30.78142965,   0.        ,  23.60137438,  32.55306785,   1.6956011 ],
       [ 15.24280938,  18.4404397 ,  24.00491751,  10.95329826,
         11.83220839,  23.60137438,   0.      

---
* **gradient_similarity**
    * 각 data 사이 **gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [12]:
similarity_model['gradient_similarity']

array([[ 0.        ,  0.00207609,  0.00395646,  0.00144547,  0.00244227,
         0.00601553,  0.00176359,  0.00271583,  0.0018041 ],
       [ 0.00207609,  0.        ,  0.00489012,  0.00234   ,  0.00201816,
         0.0066422 ,  0.00232618,  0.00233234,  0.00161326],
       [ 0.00395646,  0.00489012,  0.        ,  0.00442052,  0.00479404,
         0.00382169,  0.00406317,  0.00523414,  0.0034438 ],
       [ 0.00144547,  0.00234   ,  0.00442052,  0.        ,  0.00196436,
         0.00570512,  0.00162707,  0.00264815,  0.0015578 ],
       [ 0.00244227,  0.00201816,  0.00479404,  0.00196436,  0.        ,
         0.00648779,  0.00223118,  0.00240035,  0.00148086],
       [ 0.00601553,  0.0066422 ,  0.00382169,  0.00570512,  0.00648779,
         0.        ,  0.00578989,  0.00696167,  0.00518702],
       [ 0.00176359,  0.00232618,  0.00406317,  0.00162707,  0.00223118,
         0.00578989,  0.        ,  0.00258435,  0.000972  ],
       [ 0.00271583,  0.00233234,  0.00523414,  0.00264815,  0

---
* **reversed_gradient_similarity**
    * 각 data 사이 **reversed gradient simialrity**를 계산하여 해당 유사도(similarity)를 **symmetric matrix**로 구성

In [13]:
similarity_model['reversed_gradient_similarity']

array([[ 0.00349142,  0.0033884 ,  0.00480964,  0.00324464,  0.00324981,
         0.00655412,  0.00265261,  0.00370379,  0.00180496],
       [ 0.0033884 ,  0.00311348,  0.00489197,  0.00314162,  0.00298095,
         0.00660361,  0.00255304,  0.00342711,  0.00161326],
       [ 0.00480964,  0.00489197,  0.00677804,  0.00465954,  0.00475077,
         0.00794305,  0.00405068,  0.00519061,  0.00344433],
       [ 0.00324464,  0.00314162,  0.00465954,  0.00299787,  0.00300303,
         0.0061877 ,  0.00241081,  0.00345702,  0.00155835],
       [ 0.00324981,  0.00298095,  0.00475077,  0.00300303,  0.00284884,
         0.00644887,  0.0024164 ,  0.00329476,  0.00148092],
       [ 0.00655412,  0.00660361,  0.00794305,  0.0061877 ,  0.00644887,
         0.01026654,  0.00579803,  0.00692617,  0.00518349],
       [ 0.00265261,  0.00255304,  0.00405068,  0.00241081,  0.0024164 ,
         0.00579803,  0.00182463,  0.00287024,  0.000972  ],
       [ 0.00370379,  0.00342711,  0.00519061,  0.00345702,  0