# Usage

## Via Command Line Interface (CLI)

In [None]:
!ekorpkit

### CLI example to build a corpus

```bash
ekorpkit --config-dir /workspace/projects/ekorpkit-book/config  \
    project=esgml \
    dir.workspace=/workspace \
    verbose=false \
    print_config=false \
    num_workers=1 \
    cmd=fetch_builtin_corpus \
    +corpus/builtin=_dummy_fomc_minutes \
    corpus.builtin.io.force.summarize=true \
    corpus.builtin.io.force.preprocess=true \
    corpus.builtin.io.force.build=false \
    corpus.builtin.io.force.download=false
```

### CLI Help

To see the available configurations for CLI, run the command:

In [None]:
!ekorpkit --help

In [None]:
!ekorpkit --info defaults

## Via Python

### Compose an ekorpkit config

In [None]:
from ekorpkit import eKonf
cfg = eKonf.compose()
print('Config type:', type(cfg))
eKonf.pprint(cfg)

### Instantiating objects with an ekorpkit config

#### compose a config for the nltk class

In [None]:
from ekorpkit import eKonf
config_group='preprocessor/tokenizer=nltk'
cfg = eKonf.compose(config_group=config_group)
eKonf.pprint(cfg)
nltk = eKonf.instantiate(cfg)

In [None]:
text = "I shall reemphasize some of those thoughts today in the context of legislative proposals that are now before the current Congress."
nltk.tokenize(text)

In [None]:
 nltk.nouns(text)

#### compose a config for the mecab class

In [None]:
config_group='preprocessor/tokenizer=mecab'
cfg = eKonf.compose(config_group=config_group)
eKonf.pprint(cfg)

#### intantiate a mecab config and tokenize a text

In [None]:
mecab = eKonf.instantiate(cfg)
text = 'IMF가 推定한 우리나라의 GDP갭률은 今年에도 소폭의 마이너스(−)를 持續하고 있다.'
mecab.tokenize(text)

#### compose and instantiate a `formal_ko` config for the normalizer class

In [None]:
config_group='preprocessor/normalizer=formal_ko'
cfg_norm = eKonf.compose(config_group=config_group)
norm = eKonf.instantiate(cfg_norm)
norm(text)

#### instantiate a mecab config with the above normalizer config

In [None]:
config_group='preprocessor/tokenizer=mecab'
cfg = eKonf.compose(config_group=config_group)
cfg.normalize = cfg_norm
mecab = eKonf.instantiate(cfg)
mecab.tokenize(text)