# Topic Model Tutorial
We want to obtain user and item features from users' past purchase information.
There are several ways to do this.
1. Memory-based
2. Model-based
3. Hybrid
4. Deep-Learning

I present the second method(2. Model-based), which uses a topic model.
Without going into the details of the topic model, we consider users as sentences and purchase items as words to cluster users.
For a detailed explanation of the topic model, please refer to the following paper.

[Latent Dirichlet Allocation](https://web.archive.org/web/20120501152722/http://jmlr.csail.mit.edu/papers/v3/blei03a.html)
Blei, David M.; Ng, Andrew Y.; Jordan, Michael I. Journal of Machine Learning Research. 3 (4–5): pp. 993–1022.

In [None]:
import numpy as np
import pandas as pd

In [None]:
from gensim.models import LdaModel # https://radimrehurek.com/gensim/models/ldamodel.html
from gensim.corpora.dictionary import Dictionary
import pyLDAvis.gensim

In [None]:
train_df = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/transactions_train.csv',
                       dtype = {'article_id': 'object'})
articles_df = pd.read_csv('../input/h-and-m-personalized-fashion-recommendations/articles.csv',
                          dtype = {'article_id': 'object'})

In [None]:
# Can someone please tell me a smart way to do this?
temp_dict = {}
for customer_id, article_id in zip(train_df['customer_id'], train_df['article_id']):
    if customer_id in temp_dict:
        temp_dict[customer_id].append(article_id)
    else:
        temp_dict[customer_id] = [article_id]
customer_id_list = []
raw_corpus = []
for customer_id in temp_dict:
    customer_id_list.append(customer_id)
    raw_corpus.append(temp_dict[customer_id])
del temp_dict

In [None]:
# Create a dictionary that maps words to word ids
dictionary = Dictionary(raw_corpus)
# Convert to BoW format that can be read by LdaModel
corpus = [dictionary.doc2bow(article_id) for article_id in raw_corpus]

In [None]:
num_topics = 15
lda = LdaModel(corpus=corpus, num_topics=num_topics, id2word=dictionary, random_state=0)

In [None]:
lda.save('lda_model.pickle')

In [None]:
# Visualization
vis = pyLDAvis.gensim.prepare(lda, corpus, dictionary, n_jobs = 1, sort_topics = False)
pyLDAvis.save_html(vis, 'H&M_topic.html')
pyLDAvis.display(vis)

In [None]:
index_name_dict = dict(zip(articles_df['article_id'], articles_df['index_name']))
garment_group_name_dict = dict(zip(articles_df['article_id'], articles_df['garment_group_name']))
color_name_dict = dict(zip(articles_df['article_id'], articles_df['colour_group_name']))

In [None]:
for topic in range(num_topics):
    print(topic)
    for article_id, prob in lda.show_topic(topic, 30):
        print(index_name_dict[article_id] + ' : ' + garment_group_name_dict[article_id] + ' : ' + color_name_dict[article_id])