# 모델 Prediction

## 개요
학습된 추천 모델을 사용하여 추천할 아이템을 Prediction하는 예제입니다.  
이 예제는 빅쿼리에 저장된 데이터를 가져와서 Spark를 사용하여 모델 Prediction을 적용한 결과를 다시 빅쿼리에 적재하는 과정에 대해 설명합니다.


## 모델 Prediction

### 1. 모델 가져오기
HDFS에 저장된 모델을 가져옵니다. 여기서 사용하는 모델은 Training 예제에서 생성한 모델과 동일합니다.

In [2]:
import pickle
from pydatafabric.ye import get_hdfs_conn

# Load model
model_path = "/data/tmp/example_model/v0"
byte_object = get_hdfs_conn().cat(model_path)
model = pickle.loads(byte_object)
print(model.params)

  conn = pyarrow.hdfs.connect(user="airflow")


ModuleNotFoundError: No module named 'lightgbm'

추가로 라벨 인코더를 별도의 변수에 저장합니다.

In [2]:


# Code Example

### 2. Feature 데이터 가져오기
Prediction을 수행하기 위해 사용자 Feature 정보를 가져옵니다.  
이 예제에서 사용자 Feature 정보는 빅쿼리에 저장되어 있으며 다음과 같은 쿼리로 데이터를 가져옵니다.
여기서는 Prediction을 하기 위해 쿼리 결과를 Pandas Dataframe으로 변환합니다. 수행 결과를 빠르게 확인하기 위해 100명의 사용자 정보만을 쿼리합니다.

In [3]:
from pydatafabric.gcp import bq_to_pandas


# Feature List
# Code Example

query = f"""
    ...
"""
print(query)

df = bq_to_pandas(query)


    SELECT  svc_mgmt_num
    --,       udf.feature_hashing(svc_mgmt_num, 100) as seed
    ,       substr(svc_mgmt_num, 9,2) as seed
    ,       '' as prod_id
    ,       '' as prod_nm
    ,       CAST('0.1234' AS FLOAT64) as score
    ,       membership_vip_yn, mbr_card_gr_cd, non_vip_mbr_discount_median_yn, membership_cnt_ratio_median_yn, bas_fee_amt, membership_amt_ratio_median_yn, mbr_discount_cnt_movie, mbr_discount_amt_movie, mbr_discount_amt_cum_movie, mbr_discount_amt_video, app_use_cnt_video, app_use_traffic_video_median_yn, app_use_cnt_etc_video, real_avg_arpu, real_arpu_bf_m1, app_use_cnt_youtube, night_traffic_hour_ratio, day_traffic_hour_ratio, app_use_days_video, real_arpu_bf_m2, mbr_discount_cnt_bakery, real_arpu_bf_m3, avg_data_usage_in_gb, data_use_night_ratio_median_yn, bf_m6_avg_data_usage_in_gb, bf_m6_sum_data_usage_in_gb, real_data_use_gb_bf_m0, emb050, data_usage_in_gb_bf_m0, avg_traffic_mb_per_hour, app_use_days_video_median_yn, emb049, app_use_days_music, mbr_di

Downloading: 100%|██████████| 100/100 [00:03<00:00, 31.79rows/s]


### 3. Prediction

이제 Prediction하는 과정입니다. Prediction하는 메서드를 정의합니다.

In [4]:
# Code Example

Prediction한 결과를 저장하고 확인합니다. 사용자의 혜택에 대한 선호도 점수를 확인할 수 있습니다.

In [10]:
# Code Example

Unnamed: 0,svc_mgmt_num,seed,prod_id,prod_nm,score,membership_vip_yn,mbr_card_gr_cd,non_vip_mbr_discount_median_yn,membership_cnt_ratio_median_yn,bas_fee_amt,...,app_use_traffic_wavve_median_yn,bas_ofr_data_gb_qty_val,app_use_traffic_video,emb004,app_use_cnt_etc_music,emb042,emb003,avg_display_resol,data_gift_recv_yn_bf_m0,prefer_device_price
0,02199aa37a5b21ae804a98b7ea012886eb2063f42ff850...,1,76,culture,0.465723,0,0,0,0,15400.0,...,0,0.292969,0.0,0.0,0.0,0.0,0.0,409920.0,0,0
1,02199aa37a5b21ae804a98b7ea012886eb2063f42ff850...,1,141,data_plus,0.896818,0,0,0,0,15400.0,...,0,0.292969,0.0,0.0,0.0,0.0,0.0,409920.0,0,0
2,02199aa37a5b21ae804a98b7ea012886eb2063f42ff850...,1,140,global,0.737327,0,0,0,0,15400.0,...,0,0.292969,0.0,0.0,0.0,0.0,0.0,409920.0,0,0
3,02199aa37a5b21ae804a98b7ea012886eb2063f42ff850...,1,62,open_mbr,0.368313,0,0,0,0,15400.0,...,0,0.292969,0.0,0.0,0.0,0.0,0.0,409920.0,0,0
4,02199aa37a5b21ae804a98b7ea012886eb2063f42ff850...,1,142,vip_pick_movie,0.356301,0,0,0,0,15400.0,...,0,0.292969,0.0,0.0,0.0,0.0,0.0,409920.0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
595,01e4a98418b0e45ea688054f83d71da3e5a13ce52b0842...,1,141,data_plus,0.912852,0,0,0,0,825.0,...,0,0.009766,0.0,0.0,0.0,0.0,0.0,0.0,0,0
596,01e4a98418b0e45ea688054f83d71da3e5a13ce52b0842...,1,140,global,0.733663,0,0,0,0,825.0,...,0,0.009766,0.0,0.0,0.0,0.0,0.0,0.0,0,0
597,01e4a98418b0e45ea688054f83d71da3e5a13ce52b0842...,1,62,open_mbr,0.572281,0,0,0,0,825.0,...,0,0.009766,0.0,0.0,0.0,0.0,0.0,0.0,0,0
598,01e4a98418b0e45ea688054f83d71da3e5a13ce52b0842...,1,142,vip_pick_movie,0.371578,0,0,0,0,825.0,...,0,0.009766,0.0,0.0,0.0,0.0,0.0,0.0,0,0


Prediction 결과를 빅쿼리 테이블에 저장합니다.

In [6]:
from pydatafabric.gcp import pandas_to_bq_table

pandas_to_bq_table(
    pd_df=df,
    dataset="temp_1d",
    table_name="example_model_prediction_result"
)

빅쿼리 테이블에 저장된 Prediction 데이터를 확인합니다.

In [7]:
from pydatafabric.gcp import import_bigquery_ipython_magic
import_bigquery_ipython_magic()

In [8]:
%%bq

SELECT *
FROM temp_1d.example_model_prediction_result

BigQuery execution took 4 seconds.


Unnamed: 0,svc_mgmt_num,prod_id,prod_nm,score
0,02077c121754329abf07f3caf7020bf21da24c66099f21...,62,open_mbr,0.483434
1,0108b6e0762edc413402983ece4ba019eb52717354aad0...,62,open_mbr,0.573481
2,01dafebfb858952a269d31d6a2e6c07a9b90c28c29d07f...,62,open_mbr,0.397717
3,0185176451c6a72beec8186d92fa71605dbef15cd95fbf...,62,open_mbr,0.584078
4,0148a2bf2ed821cff9c8d85dc05a5ed4c4d9d74edd7757...,62,open_mbr,0.340511
...,...,...,...,...
595,00f995124a7f7cc9c3cf67f4d21d19a602c0800d91beb8...,62,open_mbr,0.416590
596,021a2ea804395d344879af74033f75a2a0052acd1c2086...,62,open_mbr,0.416590
597,0148a2bf2ed821cff9c8d85dc05a5ed4c4d9d74edd7757...,140,global,0.743495
598,01cd81d8e0e1e6c3fd8f87bb6af0c2e66f9bba2a4269f4...,140,global,0.743495
