Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P3 그래프 추천에서 행렬곱 질문입니다. #1

Open
theeluwin opened this issue Feb 2, 2020 · 1 comment
Open

P3 그래프 추천에서 행렬곱 질문입니다. #1

theeluwin opened this issue Feb 2, 2020 · 1 comment

Comments

@theeluwin
Copy link

안녕하세요..
코드를 보다보니 약간 궁금한 부분이 생겼습니다.

commerce의 4.4에서 P3 추천을 하는 부분에서 아래와 같은 코드가 있는데요:

-- Item:User x User:Item 그래프로 Item:Item 그래프 생성
drop table if exists tmp_p3_iter1;

create table tmp_p3_iter1 as
select item_index1, item_index2, prob
from (
   select
      item_index1, item_index2, prob, row_number() over (partition by item_index1 order by prob desc) as rank
   from
      (
        select
            a.item_index as item_index1,
            b.item_index as item_index2,
            sum(a.prob * b.prob) as prob
        from tmp_p3_graph a
        inner join tmp_p3_graph b
        on a.user_index = b.user_index and a.item_index != b.item_index
        group by a.item_index, b.item_index
   ) a
) a
where rank <= 100;

여기서 sum(a.prob * b.prob) as prob 하는 부분이 있는데,
a는 item → user 역할이고
b는 user → item 역할이기 때문에
b에선 user_prob을 가져오는게 맞지 않나요?
a에선 P(user|item)를, b에선 P(item|user)를 담당해주기 때문이라고 생각했습니다.
감사합니다.

@theeluwin
Copy link
Author

sum(a.prob * b.prob)으로 계산했을 경우, item_index1GROUP BY해서, 즉 row의 합을 구해보면 1을 훌쩍 넘게 됩니다.
지금은 random walk 중이니까 row-stochastic 조건을 만족해야하고 (합이 1이 아니더라도 1을 넘을 순 없어야하니까요), sum(a.prob * b.user_prob)에선 1을 넘지 않습니다.

재구현 코드는 다음과 같습니다:

  1. sum(a.prob * b.prob)의 경우:
SELECT
    SUM(prob_ii) AS summed
FROM (
    SELECT
        graph_iu.item_index AS item_index1,
        graph_ui.item_index AS item_index2,
        SUM(graph_iu.prob * graph_ui.prob) AS prob_ii
    FROM tmp_p3_graph AS graph_iu
        INNER JOIN tmp_p3_graph AS graph_ui
        ON
            graph_iu.user_index = graph_ui.user_index
            AND graph_iu.item_index != graph_ui.item_index
    GROUP BY
        graph_iu.item_index,
        graph_ui.item_index
) AS multiplied
GROUP BY
    item_index1
ORDER BY
    summed DESC
LIMIT 10;
  1. sum(a.prob * b.user_prob)의 경우:
SELECT
    SUM(prob_ii) AS summed
FROM (
    SELECT
        graph_iu.item_index AS item_index1,
        graph_ui.item_index AS item_index2,
        SUM(graph_iu.prob * graph_ui.user_prob) AS prob_ii
    FROM tmp_p3_graph AS graph_iu
        INNER JOIN tmp_p3_graph AS graph_ui
        ON
            graph_iu.user_index = graph_ui.user_index
            AND graph_iu.item_index != graph_ui.item_index
    GROUP BY
        graph_iu.item_index,
        graph_ui.item_index
) AS multiplied
GROUP BY
    item_index1
ORDER BY
    summed DESC
LIMIT 10;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant