This project is about Natural Language processing and classifying images obtained from yelp dataset which can be downloaded from here https://www.yelp.com/dataset.
NLP
- python 3.5
- spacy
- gensim
- pyLDAvis
- Word2vec
- Bokeh
- tSNE
Image classification
- python 3.5
- tensorflow
- opencv
- 12 GB RAM
Modern_NLP.ipynb walks through the following topics(best viewed on nb viewer)
- A tour of the dataset
- Introduction to text processing with spaCy
- Automatic phrase modeling
- Topic modeling with LDA
- Visualizing topic models with pyLDAvis
- Word vector models with word2vec
- Visualizing word2vec with t-SNE
- Install project requirements
- Create a folder yelpData and move the extracted data from yelp into this folder. Move the photos to 'yelpData/yelpPhotos' directory
- Run photo_process.py, enter the size you desire to resize to Ex: 64 for 64 x 64 or 32 for 32 x 32.
- Run photo_info.py to get information about the photos
- Run classifier.py to start the model (may take longer, 6 to 10 hours without a GPU)
- Run predict.py to predict image label
pip install -r requirements.txt
python ./photoAnalysis/photo_process.py
python ./photoAnalysis/photo_info.py
python ./classifier/classifier.py
python ./classifier/predict.py