lextract - Dictionary based lexical item extractor

Overview

`lextract.aho_corasick`

Find multiwords in text using an Aho Corasick automaton. Works for Mandarin and Finnish.

`lextract.keyed_db`

Find multiwords in text using the rarest lemma as a key. Can find contiguous multiwords in tokenized text or discontinuous ones from a dependency tree.

`lextract.mweproc`

Processing pipeline for FinnMWE.

Documentation

There are only tests and a few docstrings for now.

Name		Name	Last commit message	Last commit date
Latest commit History 98 Commits
lextract		lextract
scripts		scripts
tests		tests
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
Snakefile		Snakefile
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run_checks.sh		run_checks.sh
run_mypy.sh		run_mypy.sh
run_tests.sh		run_tests.sh
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lextract - Dictionary based lexical item extractor

Overview

`lextract.aho_corasick`

`lextract.keyed_db`

`lextract.mweproc`

Documentation

About

Releases

Packages

Languages

frankier/lextract

Folders and files

Latest commit

History

Repository files navigation

lextract - Dictionary based lexical item extractor

Overview

lextract.aho_corasick

lextract.keyed_db

lextract.mweproc

Documentation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`lextract.aho_corasick`

`lextract.keyed_db`

`lextract.mweproc`

Packages