Text Classification, 🇰🇷 버전

ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.

For a technical description of the algorithm, see our paper: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Using the ktrain library, proceed with the text classification. Detailed descriptions can be found at Blog

👩🏻‍💻 System requirements

pip install -r requirements.txt

👨🏿‍💻 How to use

With simple commands, you can proceed with text classification for datasets made up of csv files, use main.py:

python main.py \
	--csv data.csv \
	--label Category \
	--data Resume \
	--epoch 5

🎨 parser detail

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--csv', help='train model csv file')
parser.add_argument('--label', help='train label of dataset')
parser.add_argument('--data', help='train dataset')
parser.add_argument('--epoch', help='traing Epoch')

📝 read_dataset

def read_dataset(dataset, data, label):
	df = pd.read_csv(dataset)
	label_list = list(set(df[args.label]))
	df.sample(frac=1)
	x_train, x_test, y_train, y_test = train_test_split(
    	list(df[data]), list(df[label]), test_size=0.33, random_state=42)
	return x_train, x_test, y_train, y_test, label_list

☄️ Available models

Replace the bottom part with the model you want.

	MODEL_NAME = 'albert-base-v2'

Model	Type of detail
BERT:	bert-base-uncased, bert-large-uncased, bert-base-multilingual-uncased, and others.
DistilBERT:	distilbert-base-uncased, distilbert-base-multilingual-cased, distilbert-base-german-cased, and others
ALBERT:	albert-base-v2, albert-large-v2, and others
RoBERTa:	roberta-base, roberta-large, roberta-large-mnli
XLM:	xlm-mlm-xnli15–1024, xlm-mlm-100–1280, and others
XLNet:	xlnet-base-cased, xlnet-large-cased

🏴‍☠️ Performance

97.16 📈

🃏 predictor

You can use the function below.

def predictor(learner, test):
	predictor = ktrain.get_predictor(learner.model, preproc=t)
	print(predictor.predict(test))

📊 tensorboard

tensorboard \
	--logdir==training:your_log_dir \
	--host=127.0.0.1

🔬 Library

https://github.com/amaiya/ktrain

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
logs		logs
tmp/test_model		tmp/test_model
training_1/cp.ckpt		training_1/cp.ckpt
LICENSE		LICENSE
README.md		README.md
img.png		img.png
ko.md		ko.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

logs

logs

tmp/test_model

tmp/test_model

training_1/cp.ckpt

training_1/cp.ckpt

LICENSE

LICENSE

README.md

README.md

img.png

img.png

ko.md

ko.md

main.py

main.py

requirements.txt

requirements.txt

Repository files navigation

Text Classification, 🇰🇷 버전

👩🏻‍💻 System requirements

👨🏿‍💻 How to use

🎨 parser detail

📝 read_dataset

☄️ Available models

🏴‍☠️ Performance

97.16 📈

🃏 predictor

📊 tensorboard

🔬 Library

About

Releases

Packages

Languages

License

gyunggyung/ALBERT-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

Text Classification, 🇰🇷 버전

👩🏻‍💻 System requirements

👨🏿‍💻 How to use

🎨 parser detail

📝 read_dataset

☄️ Available models

🏴‍☠️ Performance

97.16 📈

🃏 predictor

📊 tensorboard

🔬 Library

About

Topics

Resources

License

Stars

Watchers

Forks

Languages