In [1]:
from transformers import (
    TokenClassificationPipeline,
    AutoModelForTokenClassification,
    AutoTokenizer,
)
from transformers.pipelines import AggregationStrategy
import numpy as np

In [2]:
text1 = """
Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.

Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, beefy man with hardly any neck, although he did have a very large mustache. Mrs. Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere.

The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn’t think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley’s sister, but they hadn’t met for several years; in fact, Mrs. Dursley pretended she didn’t have a sister, because her sister and her good-for-nothing husband were as unDursleyish as it was possible to be. The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street. The Dursleys knew that the Potters had a small son, too, but they had never even seen him. This boy was another good reason for keeping the Potters away; they didn’t want Dudley mixing with a child like that.
"""
print(text1)


Mr. and Mrs. Dursley, of number four, Privet Drive, were proud to say that they were perfectly normal, thank you very much. They were the last people you’d expect to be involved in anything strange or mysterious, because they just didn’t hold with such nonsense.

Mr. Dursley was the director of a firm called Grunnings, which made drills. He was a big, beefy man with hardly any neck, although he did have a very large mustache. Mrs. Dursley was thin and blonde and had nearly twice the usual amount of neck, which came in very useful as she spent so much of her time craning over garden fences, spying on the neighbors. The Dursleys had a small son called Dudley and in their opinion there was no finer boy anywhere.

The Dursleys had everything they wanted, but they also had a secret, and their greatest fear was that somebody would discover it. They didn’t think they could bear it if anyone found out about the Potters. Mrs. Potter was Mrs. Dursley’s sister, but they hadn’t met for several ye

In [3]:
text2 = """
Once, there was a boy who became bored when he watched over the village sheep grazing on the hillside. To entertain himself, he sang out, “Wolf! Wolf! The wolf is chasing the sheep!”

When the villagers heard the cry, they came running up the hill to drive the wolf away. But, when they arrived, they saw no wolf. The boy was amused when seeing their angry faces.

“Don’t scream wolf, boy,” warned the villagers, “when there is no wolf!” They angrily went back down the hill.

Later, the shepherd boy cried out once again, “Wolf! Wolf! The wolf is chasing the sheep!” To his amusement, he looked on as the villagers came running up the hill to scare the wolf away.

As they saw there was no wolf, they said strictly, “Save your frightened cry for when there really is a wolf! Don’t cry ‘wolf’ when there is no wolf!” But the boy grinned at their words while they walked grumbling down the hill once more.

Later, the boy saw a real wolf sneaking around his flock. Alarmed, he jumped on his feet and cried out as loud as he could, “Wolf! Wolf!” But the villagers thought he was fooling them again, and so they didn’t come to help.

At sunset, the villagers went looking for the boy who hadn’t returned with their sheep. When they went up the hill, they found him weeping.

“There really was a wolf here! The flock is gone! I cried out, ‘Wolf!’ but you didn’t come,” he wailed.

An old man went to comfort the boy. As he put his arm around him, he said, “Nobody believes a liar, even when he is telling the truth!”
"""
print(text2)


Once, there was a boy who became bored when he watched over the village sheep grazing on the hillside. To entertain himself, he sang out, “Wolf! Wolf! The wolf is chasing the sheep!”

When the villagers heard the cry, they came running up the hill to drive the wolf away. But, when they arrived, they saw no wolf. The boy was amused when seeing their angry faces.

“Don’t scream wolf, boy,” warned the villagers, “when there is no wolf!” They angrily went back down the hill.

Later, the shepherd boy cried out once again, “Wolf! Wolf! The wolf is chasing the sheep!” To his amusement, he looked on as the villagers came running up the hill to scare the wolf away.

As they saw there was no wolf, they said strictly, “Save your frightened cry for when there really is a wolf! Don’t cry ‘wolf’ when there is no wolf!” But the boy grinned at their words while they walked grumbling down the hill once more.

Later, the boy saw a real wolf sneaking around his flock. Alarmed, he jumped on his feet and 

## ml6team/keyphrase-extraction-kbir-kpcrowd

In [4]:
# Define keyphrase extraction pipeline
class KeyphraseExtractionPipeline(TokenClassificationPipeline):
    def __init__(self, model, *args, **kwargs):
        super().__init__(
            model=AutoModelForTokenClassification.from_pretrained(model),
            tokenizer=AutoTokenizer.from_pretrained(model),
            *args,
            **kwargs
        )

    def postprocess(self, model_outputs):
        results = super().postprocess(
            model_outputs=model_outputs,
            aggregation_strategy=AggregationStrategy.SIMPLE,
        )
        return np.unique([result.get("word").strip() for result in results])

In [5]:
# Load pipeline
model_name = "ml6team/keyphrase-extraction-kbir-kpcrowd"
extractor = KeyphraseExtractionPipeline(model=model_name)

In [6]:
# Inference
keyphrases = extractor(text1)
keyphrases

array(['Drive', 'Dudley', 'Dursley', 'Dursleys', 'Grunnings', 'Mrs.',
       'Potter', 'Potters', 'Privet', 'aning', 'beefy', 'big', 'blonde',
       'director', 'discover', 'drills', 'garden', 'good',
       'greatest fear', 'involved', 'keeping', 'mixing', 'mustache',
       'mysterious', 'neck', 'neighbors', 'nonsense', 'normal', 'opinion',
       'people', 'perfectly', 'proud', 'reason', 'say', 'seen',
       'shuddered', 'small', 'spent', 'spying', 'strange', 'unDursleyish',
       'useful', 'usual'], dtype='<U13')

In [7]:
# Inference
keyphrases = extractor(text2)
keyphrases

array(['Alarmed', 'Wolf', 'amusement', 'angry', 'bored', 'boy', 'chasing',
       'cried', 'entertain', 'found', 'frightened', 'grazing', 'hill',
       'hillside', 'jumped', 'liar', 'looking', 'old', 'running',
       'scream', 'sheep', 'shepherd boy', 'sneaking', 'telling', 'truth',
       'villagers', 'wolf'], dtype='<U12')

## ml6team/keyphrase-extraction-kbir-openkp

In [14]:
# Define keyphrase extraction pipeline
class KeyphraseExtractionPipeline(TokenClassificationPipeline):
    def __init__(self, model, *args, **kwargs):
        super().__init__(
            model=AutoModelForTokenClassification.from_pretrained(model),
            tokenizer=AutoTokenizer.from_pretrained(model),
            *args,
            **kwargs
        )

    def postprocess(self, model_outputs):
        results = super().postprocess(
            model_outputs=model_outputs,
            aggregation_strategy=AggregationStrategy.SIMPLE,
        )
        return np.unique([result.get("word").strip() for result in results])

In [15]:
# Load pipeline
model_name = "ml6team/keyphrase-extraction-kbir-openkp"
extractor = KeyphraseExtractionPipeline(model=model_name)

In [16]:
# Inference
keyphrases = extractor(text1)
keyphrases

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


array([], dtype=float64)

In [17]:
# Inference
keyphrases = extractor(text2)
keyphrases

array([], dtype=float64)

## ml6team/keyphrase-extraction-distilbert-kptimes

In [25]:
# Define keyphrase extraction pipeline
class KeyphraseExtractionPipeline(TokenClassificationPipeline):
    def __init__(self, model, *args, **kwargs):
        super().__init__(
            model=AutoModelForTokenClassification.from_pretrained(model),
            tokenizer=AutoTokenizer.from_pretrained(model),
            *args,
            **kwargs
        )

    def postprocess(self, model_outputs):
        results = super().postprocess(
            model_outputs=model_outputs,
            aggregation_strategy=AggregationStrategy.FIRST,
        )
        return np.unique([result.get("word").strip() for result in results])

In [26]:
# Load pipeline
model_name = "ml6team/keyphrase-extraction-distilbert-kptimes"
extractor = KeyphraseExtractionPipeline(model=model_name)

In [30]:
# Inference
keyphrases = extractor(text1)
keyphrases

array([], dtype=float64)

In [28]:
# Inference
keyphrases = extractor(text2)
keyphrases

array([], dtype=float64)

## ml6team/keyphrase-extraction-kbir-kptimes

In [31]:
# Define keyphrase extraction pipeline
class KeyphraseExtractionPipeline(TokenClassificationPipeline):
    def __init__(self, model, *args, **kwargs):
        super().__init__(
            model=AutoModelForTokenClassification.from_pretrained(model),
            tokenizer=AutoTokenizer.from_pretrained(model),
            *args,
            **kwargs
        )

    def postprocess(self, model_outputs):
        results = super().postprocess(
            model_outputs=model_outputs,
            aggregation_strategy=AggregationStrategy.SIMPLE,
        )
        return np.unique([result.get("word").strip() for result in results])

In [32]:
# Load pipeline
model_name = "ml6team/keyphrase-extraction-kbir-kptimes"
extractor = KeyphraseExtractionPipeline(model=model_name)

Downloading:   0%|          | 0.00/855 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.42G [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/1.15k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/798k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/456k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [33]:
# Inference
keyphrases = extractor(text1)
keyphrases

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.


array([], dtype=float64)

In [34]:
# Inference
keyphrases = extractor(text2)
keyphrases

array([], dtype=float64)

## ml6team/keyphrase-extraction-distilbert-openkp

In [22]:
# Define keyphrase extraction pipeline
class KeyphraseExtractionPipeline(TokenClassificationPipeline):
    def __init__(self, model, *args, **kwargs):
        super().__init__(
            model=AutoModelForTokenClassification.from_pretrained(model),
            tokenizer=AutoTokenizer.from_pretrained(model),
            *args,
            **kwargs
        )

    def postprocess(self, model_outputs):
        results = super().postprocess(
            model_outputs=model_outputs,
            aggregation_strategy=AggregationStrategy.FIRST,
        )
        return np.unique([result.get("word").strip() for result in results])

In [23]:
# Load pipeline
model_name = "ml6team/keyphrase-extraction-distilbert-openkp"
extractor = KeyphraseExtractionPipeline(model=model_name)

In [24]:
# Inference
keyphrases = extractor(text1)
keyphrases

array([], dtype=float64)

In [25]:
# Inference
keyphrases = extractor(text2)
keyphrases

array([], dtype=float64)