THANOS

THANOS is a sequence model which captures the two level hierarchy within a document. The first is word level hierarchy and the second is the sentence level hierarchy. The THANOS architecture consists of a tree-LSTM based word level encoder in order to obtain embedding for each sentence in the dataset, a GRU based sentence level encoder and a sentence-level attention layer. However, the limitation of tree-LSTM is that, it does not directly support batched computation. Therefore SPINN (https://arxiv.org/pdf/1603.06021.pdf) is used to implement Tree LSTM at word level to create sentence vectors from the word embedding.

Data explanation is as following

Yelp review dataset (raw_150k.csv) consisting of 150k reviews is in path Data/Yelp Raw Data.
Text review column (Data/Input Data) from raw_150k.csv is used to create parse trees (Data/Binary Tree Output) using the jar file in the path Data/Binary Tree Jar File. The jar file is created using Stanford tree parser with some NLP preprocessing tasks.
Binary tree output is used to create 3 pickle files (Data/Pickle File) named yelp_unk150k.pkl, yelp_parsedtree150k and vocab.pkl.
- yelp_parsedtree150k.pkl consits of parsed trees for the reviews.
- yelp_unk150k.pkl consists of list of tokens for each repective tree. The token list consist of the words in the tree in sequential order. the words with less than 5 frequency in the vocab list of the dataset is replace by 'unk' token.
  - For example : ( ( ( ( i ( ( excepted ( a lot ) ) ( from ( this movie ) ) ) ) , ) and ) ( it ( did deliver ) ) ) . )
  - token list for above tree will be: ['i', 'expected', 'a', 'lot', 'from', 'this', 'movie', ',', 'and', 'it', 'did', 'deliver']
- vocab.pkl file consists of all the unique words in the tokens created from trees which will be used to create the dictionary of words in our dataset.

After preparing the tree, token , and vocab file we are ready to feed this data to train our model. Below are the steps

python notebook creating_train_test_dev_files.ipynb is used to create train, dev and test pickle files from yelp_parsedtree150k.pkl and yelp_unk150k.pkl files.
python notebook run_model.ipynb consists of commands to create the vocab json file using python file build_vocab.py and train the model using python file train.py. The commands are as below:

How to execute the model

Open the jupyter notebook and run all the cells of python notebook creating_train_test_dev_files.ipynb.
%run build_vocab.py --data_dir Data/Pickle File (from jupyter notebook)
%run train.py --data_dir Data --model_dir experiments/base_model (from jupyter notebook)

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
Data		Data
experiments/base_model		experiments/base_model
model		model
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
build_vocab.py		build_vocab.py
creating_train_test_dev_files.ipynb		creating_train_test_dev_files.ipynb
evaluate.py		evaluate.py
requirements.txt		requirements.txt
run_model.ipynb		run_model.ipynb
search_hyperparams.py		search_hyperparams.py
synthesize_results.py		synthesize_results.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data

Data

experiments/base_model

experiments/base_model

model

model

LICENSE

LICENSE

README.md

README.md

init.py

init.py

build_vocab.py

build_vocab.py

creating_train_test_dev_files.ipynb

creating_train_test_dev_files.ipynb

evaluate.py

evaluate.py

requirements.txt

requirements.txt

run_model.ipynb

run_model.ipynb

search_hyperparams.py

search_hyperparams.py

synthesize_results.py

synthesize_results.py

train.py

train.py

utils.py

utils.py

Repository files navigation

THANOS

About

Releases

Packages

Languages

License

richa912/THANOS

Folders and files

Latest commit

History

Repository files navigation

THANOS

About

Resources

License

Stars

Watchers

Forks

Languages