In [1]:
!pip install txtai[all] > /dev/null

The Entity pipeline applies a token classifier to text and extracts entity/label combinations.

In [2]:
from txtai.pipeline import Entity

# Create and run pipeline
entity = Entity()
entity("Canada's last fully intact ice shelf has suddenly collapsed, " \
       "forming a Manhattan-sized iceberg")

[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /root/nltk_data...
[nltk_data]   Unzipping taggers/averaged_perceptron_tagger.zip.
[nltk_data] Downloading package cmudict to /root/nltk_data...
[nltk_data]   Unzipping corpora/cmudict.zip.
No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision f2482bf (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/998 [00:00<?, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/60.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

[('Canada', 'LOC', 0.999609649181366),
 ('Manhattan', 'MISC', 0.6513978242874146)]

The Extractor pipeline is a combination of a similarity instance (embeddings or similarity pipeline) to build a question context and a model that answers questions. The model can be a prompt-driven large language model (LLM), an extractive question-answering model or a custom pipeline.

In [None]:
from txtai.embeddings import Embeddings
from txtai.pipeline import Extractor

# Embeddings model ranks candidates before passing to QA pipeline
embeddings = Embeddings({"path": "sentence-transformers/nli-mpnet-base-v2"})

In [None]:
# Create and run pipeline
extractor = Extractor(embeddings, 
                      "distilbert-base-cased-distilled-squad")

The Generator pipeline takes an input prompt and generates follow-on text.

In [None]:
from txtai.pipeline import Generator

# Create and run pipeline
generator = Generator()
generator("Hello, who are you?")

The Labels pipeline uses a text classification model to apply labels to input text. This pipeline can classify text using either a zero shot model (dynamic labeling) or a standard text classification model (fixed labeling).


In [11]:
%%capture

from txtai.pipeline import Labels

# Create labels model
labels = Labels()


No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


In [10]:
data = ["Dodgers lose again, give up 3 HRs in a loss to the Giants",
        "Giants 5 Cardinals 4 final in extra innings",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Flyers 4 Lightning 1 final. 45 saves for the Lightning.",
        "Slashing, penalty, 2 minute power play coming up",
        "What a stick save!",
        "Leads the NFL in sacks with 9.5",
        "UCF 38 Temple 13",
        "With the 30 yard completion, down to the 10 yard line",
        "Drains the 3pt shot!!, 0:15 remaining in the game",
        "Intercepted! Drives down the court and shoots for the win",
        "Massive dunk!!! they are now up by 15 with 2 minutes to go"]

# List of labels
tags = ["Baseball", "Football", "Hockey", "Basketball"]

In [12]:
labels(data[0],tags)

[(0, 0.7450798153877258),
 (1, 0.24277229607105255),
 (3, 0.006451273337006569),
 (2, 0.005696582607924938)]

In [13]:
for text in data:
    print("%-75s %s" % (text, tags[labels(text, tags)[0][0]]))

Dodgers lose again, give up 3 HRs in a loss to the Giants                   Baseball
Giants 5 Cardinals 4 final in extra innings                                 Baseball
Dodgers drop Game 2 against the Giants, 5-4                                 Baseball
Flyers 4 Lightning 1 final. 45 saves for the Lightning.                     Hockey
Slashing, penalty, 2 minute power play coming up                            Hockey
What a stick save!                                                          Hockey
Leads the NFL in sacks with 9.5                                             Football
UCF 38 Temple 13                                                            Football
With the 30 yard completion, down to the 10 yard line                       Football
Drains the 3pt shot!!, 0:15 remaining in the game                           Basketball
Intercepted! Drives down the court and shoots for the win                   Basketball
Massive dunk!!! they are now up by 15 with 2 minutes to go         

The Sequences pipeline runs text through a sequence-sequence model and generates output text.

In [16]:
from txtai.pipeline import Sequences

# Create and run pipeline
sequences = Sequences()
sequences("Hello, how are you?", 
          "translate English to German: ")

No model was supplied, defaulted to t5-base and revision 686f1db (https://huggingface.co/t5-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


'Hallo, wie sind Sie?'

The Similarity pipeline computes similarity between queries and list of text using a text classifier.

This pipeline supports both standard text classification models and zero-shot classification models. The pipeline uses the queries as labels for the input text. The results are transposed to get scores per query/label vs scores per input text.

In [17]:
from txtai.pipeline import Similarity

# Create and run pipeline
similarity = Similarity()

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


[(0, 0.6845343708992004), (1, 0.002550412667915225)]

In [19]:
similarity("Interesting Idea", [
    "There is number of benefits in making life interesting", 
    "Life on Mars will be more interesting and filled with challenges"
])

[(1, 0.9839457869529724), (0, 0.8746001124382019)]

The Summary pipeline summarizes text. This pipeline runs a text2text model that abstractively creates a summary of the input text.

In [None]:
from txtai.pipeline import Summary

# Create and run pipeline
summary = Summary()

In [21]:
summary("""There is number of benefits in making life interesting. 
    Life on Mars will be more interesting and filled with challenges""")

'There is number of benefits in making life interesting. \n    Life on Mars will be more interesting and filled with challenges'

The Translation pipeline translates text between languages. It supports over 100+ languages. Automatic source language detection is built-in. This pipeline detects the language of each input text row, loads a model for the source-target combination and translates text to the target language.

In [None]:
from txtai.pipeline import Translation

# Create and run pipeline
translate = Translation()
translate("This is a test translation into Spanish", "es")