# Introduction
#### In this code, I introduce how to train sentence classifier to solve problem that which sentence include dataset name?
#### For implementation, I use [Flair](https://github.com/flairNLP/flair) that is easy NLP tool for using [Task-aware representation of sentences (TARS)](https://kishaloyhalder.github.io/pdfs/tars_coling2020.pdf).
#### Using TARS, we can train a high-performance classifier with very few sentences (Few-Shot).

In [None]:
!pip install --upgrade git+https://github.com/zalandoresearch/flair.git

In [None]:
from flair.data import Corpus
from flair.datasets import SentenceDataset
from flair.trainers import ModelTrainer
from flair.models.text_classification_model import TARSClassifier
from flair.data import Sentence

# Make small corpus for sentence classification.
#### In this corpus, "include" means dataset name include in sentence.
#### The sentence in the corpus picked up from training json files.

In [None]:
train = SentenceDataset(
    [
        Sentence('In this study we employ base years and follow-up data of national probability samples of high school students in the US,\
                    namely the National Longitudinal Study (NLS) of the High School Class of 1972 (base year, fourth and fifth follow-up) and\
                    the National Education Longitudinal Study (NELS) of the Eighth Grade Class of 1988 (second and fourth follow-up) . ').add_label('include_or_not', 'include'),
        
        Sentence('These symptoms are also of interest as potential manifestations of underlying disease at the earliest stages of AD prior\
                    to a diagnosis Data used in the preparation of this article were obtained from the Alzheimers Disease Neuroimaging Initiative\
                    (ADNI) database (www.loni.ucla.edu ADNI) . ').add_label('include_or_not', 'include'),
        
        Sentence('The case for common standards was crystallized in a report published by the National Governors Association (NGA), the Council of\
                    Chief State School Officers (CCSSO), and Achieve (NGA et al. 2008) . Authored by an International Benchmarking Advisory Group,\
                    chaired by then-Governor Janet Na-politano (AZ), then-Governor Sonny Perdue (GA), and Craig R. Barrett, the chairman of the Intel\
                    Corporation board, the report drew heavily on research using data from the Programme for International Student Assessment (PISA)\
                    and Trends in International Mathematics and Science Study (TIMSS) . ').add_label('include_or_not', 'include'),
        
        Sentence('During this stage of the process, then, one particular set of inferences, among the differing ones that could be drawn from research\
                    and indicator data, were selected and framed in such a way as to persuade key policy audiences that common standards held the potential\
                    to rectify pressing educational and economic problems . ').add_label('include_or_not', 'not'),
        
        Sentence('He concluded that the TC-related heating is approximately 1.4 AE 0.7 PW (1 PW = 10 15 W), which may account for a substantial portion of\
                    the OHT carried by the meridional overturning circulation . ').add_label('include_or_not', 'not'),
        
        Sentence('Seeing these conditions required special attention so as not to cause a worse impact in the following year . ').add_label('include_or_not', 'not')
    ])

test = SentenceDataset(
    [
        Sentence('Then, we used MCI and AD cases from the Alzheimers disease Neuroimaging Initiative (ADNI) to evaluate the performance in estimating\
                    both the functional deficits, such cognitive scores, and diagnostic categories (NC, MCI, or AD) of these patients. ').add_label('include_or_not', 'include'),
        
        Sentence('Soft-bottom habitats are one of the most widespread habitats on Earth and one with many keystone species and ecosystem engineers that\
                    play critical roles in biogeochemical cycles, energy transfer to important commercial fisheries, and provision of other essential\
                    ecosystem services (Snelgrove 1999) .').add_label('include_or_not', 'not')
    ])

# make a corpus with train and test split
corpus = Corpus(train=train, test=test)

# Few-Shot Training using TARS

In [None]:
tars = TARSClassifier.load('tars-base')

tars.add_and_switch_to_new_task("INCLUDE_NOT", label_dictionary=corpus.make_label_dictionary())

trainer = ModelTrainer(tars, corpus)

trainer.train(base_path='resources/taggers/include_not',
              learning_rate=0.02,
              mini_batch_size=1,
              max_epochs=10,
              train_with_dev=True,
              )

# Test the final model

In [None]:
tars = TARSClassifier.load('resources/taggers/include_not/final-model.pt')

# Prepare a test sentence that includes dataset name
sentence = Sentence('The data for this study is taken from the National Education Longitudinal Study (NELS) .')

tars.predict(sentence)
print(sentence)

# Prepare a test sentence that NOT includes dataset name
sentence = Sentence('Other family background indicators include socioeconomic status, and whether another language besides English is spoken in the home .')

tars.predict(sentence)
print(sentence)