ALBERT

A Lite BERT reimplmentation

This is A lite Bert reimlpementation which modify from google-research/bert

Three addition:

Add Lamb optimazation -- optimization_albert.py
Add factorization --modeling_albert (embedding_lookup_factorized)
Shared Parameter -- Please refer to modeling_albert.py

To do:

Senetence Order Prediction didn't use ! now the pretrained method is still Next Sentence Prediction

[2019/10/01] Now can train!!! you need to first generate bpe vocab.txt (Please refer to subword-nmt) and modify the syntax to collect your subword unit.

Train from Scratch Tuturial

First your need to download your data {Wikipedia or BookCorpus}, then

use subword-nmt github to generate code.bpe,
use code.bpe generate vocab.txt then you can train~.

[Finish]: On testing can validate that the total parameter will not increasing although increase layer number. [Finish]: can train using below code [To do]: you need to collect data

python run_albert_pretraining --input_file {training data} --bert_config_file config.json --output_dir {your path}

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
.vscode		.vscode
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
config.json		config.json
create_pretraining_data.py		create_pretraining_data.py
create_pretraining_data_albert.py		create_pretraining_data_albert.py
extract_features.py		extract_features.py
generate_json.py		generate_json.py
modeling.py		modeling.py
modeling_albert.py		modeling_albert.py
modeling_test.py		modeling_test.py
modeling_test_albert.py		modeling_test_albert.py
multilingual.md		multilingual.md
optimization.py		optimization.py
optimization_albert.py		optimization_albert.py
optimization_test.py		optimization_test.py
predicting_movie_reviews_with_bert_on_tf_hub.ipynb		predicting_movie_reviews_with_bert_on_tf_hub.ipynb
requirements.txt		requirements.txt
run_albert_pretraining.py		run_albert_pretraining.py
run_classifier.py		run_classifier.py
run_classifier_with_tfhub.py		run_classifier_with_tfhub.py
run_pretraining.py		run_pretraining.py
run_pretraining_albert.py		run_pretraining_albert.py
run_squad.py		run_squad.py
sample_text.txt		sample_text.txt
timeline		timeline
tokenization.py		tokenization.py
tokenization_test.py		tokenization_test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALBERT

A Lite BERT reimplmentation

Train from Scratch Tuturial

About

Releases

Packages

Languages

License

pohanchi/ALBert-tf

Folders and files

Latest commit

History

Repository files navigation

ALBERT

A Lite BERT reimplmentation

Train from Scratch Tuturial

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages