In this project, we are going to train the BERTopic model to identify topics from e-commerce clothing reviews. We are going to use the E-commerce Clothing Reviews dataset, available on Kaggle. It provides real commercial data of an e-commerce website with reviews provided by customers. There are fields, like clothing id, age of the client, title, review text, and so on.
The article with the explanations is Topic Modeling for E-commerce Reviews using BERTopic.
data/
: contains all the dataraw_data/
: contains original dataprocessed_data/
: contained processed data
model/
: contains artifact of BERTopic modeloutput/
: contains the plots generated with BERTopic modelsrc
: contains the following scriptstrain.py
: Python script to train BERTopic model and save artifactmlflow_log.py
: Python script to track the experiments of the ML modeltopic_model.py
: Python script to create BERTopic modelprocess_data.py
: Python script to clean and filter the data
topic_model.visualize_topics()
topic_model.visualize_barchart(top_n_topics = 10)
topic_model.visualize_documents(docs)
topic_model.visualize_hierarchy()
topic_model.visualize_heatmap(n_clusters=10, width=1000, height=1000)