Skip to content

imvladikon/spacy-trankit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spaCy + Trankit

This package wraps the Trankit library, so you can use trankit models in a spaCy pipeline.

GitHub Code style: black

Using this wrapper, you'll be able to use the following annotations, computed by your pretrained trankit pipeline/model:

  • Statistical tokenization (reflected in the Doc and its tokens)
  • Lemmatization (token.lemma and token.lemma_)
  • Part-of-speech tagging (token.tag, token.tag_, token.pos, token.pos_)
  • Morphological analysis (token.morph)
  • Dependency parsing (token.dep, token.dep_, token.head)
  • Named entity recognition (doc.ents, token.ent_type, token.ent_type_, token.ent_iob, token.ent_iob_)
  • Sentence segmentation (doc.sents)

️️️⌛️ Installation

As of v0.1.0 spacy-trankit is only compatible with spaCy v3.x. To install the most recent version:

pip install git+https://github.com/imvladikon/spacy-trankit

or from pypi:

pip install spacy-trankit

📖 Usage & Examples

Load pre-trained trankit model into a spaCy pipeline:

import spacy_trankit

# Initialize the pipeline
nlp = spacy_trankit.load("en")

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)

Load it from the path:

import spacy_trankit

# Initialize the pipeline
nlp = spacy_trankit.load_from_path(name="en", path="./cache") 

doc = nlp("Barack Obama was born in Hawaii. He was elected president in 2008.")
for token in doc:
    print(token.text, token.lemma_, token.pos_, token.dep_, token.ent_type_)
print(doc.ents)