Skip to content

gyunggyung/ALBERT-Text-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text Classification, 🇰🇷 버전

ALBERT is "A Lite" version of BERT, a popular unsupervised language representation learning algorithm. ALBERT uses parameter-reduction techniques that allow for large-scale configurations, overcome previous memory limitations, and achieve better behavior with respect to model degradation.

For a technical description of the algorithm, see our paper: ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

Using the ktrain library, proceed with the text classification. Detailed descriptions can be found at Blog

👩🏻‍💻 System requirements

pip install -r requirements.txt

👨🏿‍💻 How to use

With simple commands, you can proceed with text classification for datasets made up of csv files, use main.py:

python main.py \
	--csv data.csv \
	--label Category \
	--data Resume \
	--epoch 5

🎨 parser detail

parser = argparse.ArgumentParser(description='Process some integers.')
parser.add_argument('--csv', help='train model csv file')
parser.add_argument('--label', help='train label of dataset')
parser.add_argument('--data', help='train dataset')
parser.add_argument('--epoch', help='traing Epoch')

📝 read_dataset

def read_dataset(dataset, data, label):
	df = pd.read_csv(dataset)
	label_list = list(set(df[args.label]))
	df.sample(frac=1)
	x_train, x_test, y_train, y_test = train_test_split(
    	list(df[data]), list(df[label]), test_size=0.33, random_state=42)
	return x_train, x_test, y_train, y_test, label_list

☄️ Available models

Replace the bottom part with the model you want.

	MODEL_NAME = 'albert-base-v2'
Model Type of detail
BERT: bert-base-uncased, bert-large-uncased, bert-base-multilingual-uncased, and others.
DistilBERT: distilbert-base-uncased, distilbert-base-multilingual-cased, distilbert-base-german-cased, and others
ALBERT: albert-base-v2, albert-large-v2, and others
RoBERTa: roberta-base, roberta-large, roberta-large-mnli
XLM: xlm-mlm-xnli15–1024, xlm-mlm-100–1280, and others
XLNet: xlnet-base-cased, xlnet-large-cased

🏴‍☠️ Performance

97.16 📈

🃏 predictor

You can use the function below.

def predictor(learner, test):
	predictor = ktrain.get_predictor(learner.model, preproc=t)
	print(predictor.predict(test))

📊 tensorboard

tensorboard \
	--logdir==training:your_log_dir \
	--host=127.0.0.1

🔬 Library

https://github.com/amaiya/ktrain

Releases

No releases published

Packages

No packages published

Languages