Skip to content

nwihardjo/yelp_challenge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yelp Challenge

This repo contains the code for HKUST COMP4332 projects, which is using the data from the Yelp Challenge. The model for each project is provided under model.py or model.ipynb on each folder, as well as the training and validation data used.

report.pdf on each project will further discuss the model and features of the data used, as well as further explain the implementation and the final hyperparameters.

Projects

Project 1: Sentiment Analysis, predicting the rating based on the review provided by the user, mainly the text review is used. The final model uses Bidirectional-GRU with Time-Distributed layers, which able to achieve 70.25% validation accuracy

Project 2: Link Prediction using Deep Walk, predicting the presence of relationship between vertices using DFS-like approach. The final model uses AUC score metrics, and able to achieve 95.87%

Project 3: Recommendation Prediction based on Wide and Deep Learning implementation with some feature engineering. RMSE metrics is used, and the final model is able to achieve the value of 1.0293

Training Environment

Most of the training of the model is done on either Google Colab because of their TPU support. However, as running grid search requires significantly longer time to train the model, and Google Colab has its limit on the runtime, Intel AI Cluster is used instead.

Setup

The easiest way to run locally is to make a Conda environment for Python3.6 and install the required library in that environment:

  • keras
  • nltk
  • tensorflow
  • sklearn
  • numpy
  • pandas
  • tqdm
  • node2vec
  • networkx
  • gensim
  • and other basic libraries

Contributor