# Getting started with ekorpkit

## Introduction

ekorpkit is the acronym for eKonomic Research Python Toolkit. It looks like it's for economic research. Actually, it is a Python library for natural language processing and machine learning. In particular, it is designed to support the Korean language as well as English.

ekorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization. Its powerful configuration composition is backed by Hydra.

## Key features

### Easy Configuration

- You can compose your configuration dynamically, enabling you to easily get the perfect configuration for each research. 
- You can override everything from the command line, which makes experimentation fast, and removes the need to maintain multiple similar configuration files. 
- With a help of the **eKonf** class, it is also easy to compose configurations in a jupyter notebook environment.

### No Boilerplate

- eKorpkit lets you focus on the problem at hand instead of spending time on boilerplate code like command line flags, loading configuration files, logging etc.

### Workflows

- A workflow is a configurable automated process that will run one or more jobs.
- You can divide your research into several unit jobs (tasks), then combine those jobs into one workflow.
- You can have multiple workflows, each of which can perform a different set of tasks.

### Sharable and Reproducible

- With eKorpkit, you can easily share your datasets and models.
- Sharing configs along with datasets and models makes every research reproducible.
- You can share each unit jobs or an entire workflow.

### Pluggable Architecture

- eKorpkit has a pluggable architecture, enabling it to combine with your own implementation.


## Installation

To use ekorpkit, you need to install it first. The recommended way is to use pip.

Install the latest version of ekorpkit by running:

```bash
pip install -U ekorpkit
```

To install all extra dependencies,

```bash
pip install ekorpkit[all]
```

To install all extra dependencies, exhaustively, (not recommended)

```bash
pip install ekorpkit[exhaustive]
```

To install or upgrade the pre-release version of ekorpkit, run:

```bash
pip install -U --pre ekorpkit
```

## Extra dependencies

To list of extra dependency sets,


In [1]:
from ekorpkit import eKonf
eKonf.dependencies()


['tokenize',
 'all',
 'mecab',
 'tokenize-en',
 'dataset',
 'topic',
 'visualize',
 'parser',
 'wiki',
 'fomc',
 'edgar',
 'transformers',
 'model',
 'automl',
 'cached-path',
 'google',
 'ddbackend',
 'fetch',
 'doc',
 'disco',
 'art',
 'dalle-mini',
 'label',
 'exhaustive']

To seed the list of libraries in each dependency set,

In [2]:
eKonf.dependencies("tokenize")

{'emoji<2.0',
 'fugashi',
 'mecab-ko-dic',
 'mecab-python3',
 'nltk',
 'pynori',
 'pysbd',
 'sacremoses',
 'soynlp'}

## Usage

### Via Command Line Interface (CLI)


In [3]:
!ekorpkit

[[36m2022-08-26 08:52:39,439[0m][[34mekorpkit.base[0m][[32mINFO[0m] - Loaded .env from /workspace/projects/ekorpkit-book/config/.env[0m
[[36m2022-08-26 08:52:39,443[0m][[34mekorpkit.base[0m][[32mINFO[0m] - setting environment variable CACHED_PATH_CACHE_ROOT to /workspace/.cache/cached_path[0m
[[36m2022-08-26 08:52:39,443[0m][[34mekorpkit.base[0m][[32mINFO[0m] - setting environment variable KMP_DUPLICATE_LIB_OK to TRUE[0m

name        : ekorpkit
author      : Young Joon Lee
description : eKorpkit provides a flexible interface for NLP and ML research pipelines such as extraction, transformation, tokenization, training, and visualization.
website     : https://entelecheia.github.io/ekorpkit-book/
version     : 0.1.38+20.g61522ea

Execute `ekorpkit --help` to see what eKorpkit provides
