GitHub - KGerring/forte: Forte is a flexible and powerful NLP builder FOR TExt. This is part of the CASL project: http://casl-project.ai/

Forte is a toolkit for building Natural Language Processing pipelines, featuring composable components, convenient data interfaces, and cross-task interaction. Forte designs a universal data representation format for text, making it a one-stop platform to assemble state-of-the-art NLP/ML technologies, ranging from Information Retrieval, Natural Language Understanding to Natural Language Generation.

Forte was originally developed in CMU and is actively contributed by Petuum in collaboration with other institutes. This project is part of the CASL Open Source family.

Download and Installation

To install the released version from PyPI:

pip install forte

To install from source,

git clone https://github.com/asyml/forte.git
cd forte
pip install .

To install some forte adapter for some existing libraries:

git clone https://github.com/asyml/forte-wrappers.git
cd forte-wrappers
# Change spacy to other tools. Check here https://github.com/asyml/forte-wrappers#libraries-and-tools-supported for available tools.
pip install src/spacy

Some components or modules in forte may require some extra requirements:

pip install forte[ner]: Install packages required for ner_trainer
pip install forte[test]: Install packages required for running unit tests.
pip install forte[example]: Install packages required for running forte examples.
pip install forte[wikipedia]: Install packages required for reading wikipedia datasets.
pip install forte[augment]: Install packages required for data augmentation module.
pip install forte[stave]: Install packages required for StaveProcessor.
pip install forte[audio_ext]: Install packages required for AudioReader.

Getting Started

Examples
Documentation
Currently we are working on some interesting tutorials

With Forte, it is extremely simple to build an integrated system that can search documents, analyze, extract information and generate language all in one place. This allows developers to fully utilize the strength of individual module, combine the results from each step, and enables the system to make fully informed decision at the end of the pipeline.

Forte not only makes it easy to integrate with arbitrary 3rd party tools (Check out these examples!), but also brings technology to you by offering a miscellaneous collection of deep learning modules via Texar, and a convenient model-data interface for casting tasks to models.

Library Example

A simple code example that runs Named Entity Recognizer from Spacy (required installing forte spacy wrapper)

from forte import Pipeline
from forte.data.readers import TerminalReader
from fortex.spacy import SpacyProcessor

for pack in Pipeline().set_reader(
        TerminalReader()
).add(
    SpacyProcessor(), {"processors": ["sentence", "ner"]}
).initialize().process_dataset():
    for sentence in pack.get("ft.onto.base_ontology.Sentence"):
        print("The sentence is: ", sentence.text)
        print("The entities are: ")
        for ent in pack.get("ft.onto.base_ontology.EntityMention", sentence):
            print(ent.text, ent.ner_type)

Core Design Principles

The core design principle of Forte is the abstraction of NLP concepts and machine learning models. It not only separates data, model and tasks but also enables interactions between different components of the pipeline. Based on this principle, we make Forte:

Composable: Forte helps users to decompose a problem into data, models and tasks. The tasks can further be divided into sub-tasks. A complex use case can be solved by composing heterogeneous modules via straightforward python APIs or declarative configuration files. The components (e.g. models or tasks) in the pipeline can be flexibly swapped in and out, as long as the API contracts are matched. This approach greatly improves module reusability, enables fast development and enhances the flexibility of using libraries.
Generalizable and Extensible: Forte not only generalizes well on a wide range of NLP tasks, but also extends easily to new tasks or new domains. In particular, Forte provides the Ontology system that helps users define types according to their specific tasks. Users can declaratively specify the type through simple JSON files and our Code Generation tool will automatically generate ready-to-use python files for your project. Check out our Ontology Generation documentation for more details.
Universal Data Flow: Forte enables a universal data flow that supports seamless data flow between different steps. Central to Forte's composable architecture, a transparent data flow facilitates flexible process interventions and simple pipeline management. Adaptive to generic data formats, Forte is positioned as a perfect tool for data inspection, component swapping and result sharing. This is particularly helpful during team collaborations!


A high level Architecture of Forte showing how ontology and entries work with the pipeline.


Forte stores results in data packs and use the ontology to represent task logic.

Contributing

If you are interested in making enhancement to Forte, please first go over our Code of Conduct and Contribution Guideline

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 1,019 Commits
.github		.github
data_samples		data_samples
docs		docs
examples		examples
forte		forte
ft		ft
ftx		ftx
scripts		scripts
tests		tests
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.pylintrc		.pylintrc
.readthedocs.yml		.readthedocs.yml
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
HISTORY.rst		HISTORY.rst
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
citation		citation
codecov.yml		codecov.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Download and Installation

Getting Started

Library Example

Core Design Principles

Contributing

License

Companies and Universities Supporting Forte

About

Releases

Packages

Languages

License

KGerring/forte

Folders and files

Latest commit

History

Repository files navigation

Download and Installation

Getting Started

Library Example

Core Design Principles

Contributing

License

Companies and Universities Supporting Forte

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages