# Quickstart

The quickest way to predict variant from text is with `get_variant` function:

In [1]:
from ABClf import get_variant

get_variant("Would anyone amongst you would fancy a biscuit?")

'B'

# Detailed look

American vs. Brittish classification is based on the dictionary, generated from varcon and voc.tab corpora. The dictionary itself can be loaded from [`lexicon.pickle`](lexicon.pickle) file like this:

In [2]:
from ABClf import load_lexicon

lex = load_lexicon()

type(lex)

dict

The dictionary contains about 6k words:

In [3]:
len(lex)

6107

To check what the classifier is picking up we can call `count_variants` function with the loaded lexicon.

In [4]:
from ABClf import count_variants

s = "Would anyone amongst you would fancy a biscuit?"

counts, breakdown = count_variants(s, lex)

counts

{'B': 1}

In [5]:
breakdown

{'amongst': {'variant': 'B', 'count': 1}}

To classify an instance we use the `counts` and some arbitrary logic:
* for documents with no identified American on Brittish lexems we return `UNK`, 
* if one variant has more than twice as many identified words as the other, we classify the instance as the more frequent variant,
* else we classify it as `MIX`

In [6]:
from ABClf import counts_to_category

counts_to_category(counts)

'B'

# Batch processing

Classifying with `get_variant` is not slow, but if we load the lexicon beforehand instead of at every runtime, execution speed is increased by about 18dB (speedup of 71x):

In [7]:
%timeit get_variant(s)

944 µs ± 646 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [8]:
%timeit get_variant(s, lex=lex)

13.4 µs ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
