computational-linguistics-homework

This is a homework.

REQUIREMENT

Tensorflow 1.12 or above
Python 3.5 or above

HOW TO RUN

1 Clone

$ git clone git@github.com:huan/computational-linguistics-homework.git
Cloning into 'computational-linguistics-homework'...
remote: Enumerating objects: 64, done.
remote: Counting objects: 100% (64/64), done.
remote: Compressing objects: 100% (34/34), done.
remote: Total 64 (delta 23), reused 56 (delta 19), pack-reused 0
Receiving objects: 100% (64/64), 14.10 KiB | 335.00 KiB/s, done.
Resolving deltas: 100% (23/23), done.

2 Download

$ make download
./scripts/download.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   141  100   141    0     0     61      0  0:00:02  0:00:02 --:--:--    61
100 8793k  100 8793k    0     0   313k      0  0:00:28  0:00:28 --:--:--  418k
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   157  100   157    0     0     72      0  0:00:02  0:00:02 --:--:--    72
100 21974  100 21974    0     0   4700      0  0:00:04  0:00:04 --:--:-- 12221

3 Preprocess

$ make preprocess
PYTHONPATH=. python3 bin/preprocess.py > data/corpus_preprocessed.txt

4 Train

Loss will descrease from 0.6 to 0.08 after 2 epochs.

If you increase the epochs from 2 to 5, the loss will be able to decrease from 0.08 to around 0.05

$ make train
PYTHONPATH=. python3 bin/train.py
No checkponit fount.
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
embedding (Embedding)        (None, 120, 64)           298944    
_________________________________________________________________
bidirectional (Bidirectional (None, 120, 128)          66048     
_________________________________________________________________
dropout (Dropout)            (None, 120, 128)          0         
_________________________________________________________________
time_distributed (TimeDistri (None, 120, 1)            129       
=================================================================
Total params: 365,121
Trainable params: 365,121
Non-trainable params: 0
_________________________________________________________________
Epoch 1/2
45478/45478 [==============================] - 55s 1ms/step - loss: 0.1920
Epoch 2/2
45478/45478 [==============================] - 54s 1ms/step - loss: 0.0803

5 Inference

$  make inference
PYTHONPATH=. python3 bin/inference.py
本报 讯 春节 临近 ， 全国 各地 积极 开展 走访 慰问 困难 企业 和 特困 职工 的 送 温暖 活动 ，...

AUTHOR

@zixia Huan LI <zixia@zixia.net>

COPYRIGHT & LICENSE

Code released under the Apache-2.0 License
Docs released under Creative Commons

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.vscode		.vscode
bin		bin
chinese_segmentation		chinese_segmentation
scripts		scripts
tests/fixtures		tests/fixtures
.editorconfig		.editorconfig
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

computational-linguistics-homework

REQUIREMENT

HOW TO RUN

1 Clone

2 Download

3 Preprocess

4 Train

5 Inference

AUTHOR

COPYRIGHT & LICENSE

About

Releases 1

Packages

Languages

License

huan/computational-linguistics-homework-2018

Folders and files

Latest commit

History

Repository files navigation

computational-linguistics-homework

REQUIREMENT

HOW TO RUN

1 Clone

2 Download

3 Preprocess

4 Train

5 Inference

AUTHOR

COPYRIGHT & LICENSE

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages