In [41]:
from transformers import pipeline
import statistics

In [2]:
classifier = pipeline("zero-shot-classification", model="facebook/bart-large-mnli")

# check here for further docs https://huggingface.co/facebook/bart-large-mnli
# Model predeccesor papers: https://arxiv.org/abs/1909.00161 Benchmarking zero-shot text classification, Datasets, Evaluation and Entailment approach
# Models paper itself: https://arxiv.org/abs/1910.13461 BART: Denoising Sequence-to-sequence pre-training for natural language generation, translation and comprehension
# BART here is trained on the MNLI corpus of sentence label pairs: https://cims.nyu.edu/~sbowman/multinli/?ref=blog.paperspace.com

Downloading: 100%|██████████| 1.13k/1.13k [00:00<00:00, 128kB/s]
Downloading: 100%|██████████| 1.52G/1.52G [04:56<00:00, 5.49MB/s]
Downloading: 100%|██████████| 26.0/26.0 [00:00<00:00, 4.34kB/s]
Downloading: 100%|██████████| 878k/878k [00:00<00:00, 2.49MB/s]
Downloading: 100%|██████████| 446k/446k [00:00<00:00, 1.73MB/s]
Downloading: 100%|██████████| 1.29M/1.29M [00:00<00:00, 3.01MB/s]


In [11]:
labels = ["cooking", "working", "sleeping", "programming"]
hypothesis_template = 'This text is about {}.'
sequence = "When I program, the code is compiled down to assembly, and then down to binary."

In [12]:
prediction = classifier(sequence, labels, hypothesis_template=hypothesis_template, multi_class=True)

print(prediction)

The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.


{'sequence': 'When I program, the code is compiled down to assembly, and then down to binary.', 'labels': ['programming', 'working', 'sleeping', 'cooking'], 'scores': [0.9569053053855896, 0.7005553841590881, 0.0004397584416437894, 0.0004357791331131011]}


This model works pretty well, and we can certainly use it in the project. It will be helpful to know the specificity of the labels, however...

In [13]:
labels = [
    "programming",
    "python",
    "java",
    "high performance teams",
    "stock market",
    "sleeping",
    "eating"
]

In [14]:
hypothesis_template = "This text is about {}."

In [21]:
sequence = "A team needs to be properly looked after, and everyone's input needs to be taken into account to work effectively."

In [22]:
prediction = classifier(sequence, labels, hypothesis_template=hypothesis_template, multi_class=True)

# returns dict
# sequence str- the sentence
# labels list[str] list of labels sorted by likelihood
# scores list[float] list of probabilities

The `multi_class` argument has been deprecated and renamed to `multi_label`. `multi_class` will be removed in a future version of Transformers.


In [23]:
labels = prediction['labels']
scores = prediction['scores']

for label, score in zip(labels, scores):
    print(f"Label: {label}, {score}")

Label: high performance teams, 0.2316187620162964
Label: programming, 0.17947016656398773
Label: java, 0.050939228385686874
Label: python, 0.025592995807528496
Label: stock market, 0.01506301760673523
Label: eating, 0.00031525595113635063
Label: sleeping, 0.00016555377806071192


In [32]:
article = "Amazon Web Services (AWS) Lambda is a usage-based\ncomputing infrastructure service that can execute\nPython 3 code. One of the challenges of this\nenvironment is ensuring efficient performance of your Lambda Functions.\nApplication performance monitoring (APM) is particularly useful in these\nsituations because you are billed based on how long you use the\nresources.\nIn this post we will install and configure\nSentry's APM that works via a\nLambda layer.\nNote that if you are looking for error monitoring rather than performance\nmonitoring, take a look at\nHow to Monitor Python Functions on AWS Lambda with Sentry\nrather than following this post.\nFirst steps with AWS Lambda\nA local development environment is not\nrequired to follow this tutorial because all of the coding and configuration\ncan happen in a web browser through the\nAWS Console.\nSign into your existing AWS account\nor sign up for a new account. Lambda\ngives you the first 1 million requests for free so that you can execute\nbasic applications without no or low cost.\n\nWhen you log into your account, use the search box to enter\n\"lambda\" and select \"Lambda\" when it appears to get to the right\npage.\n\nIf you have already used Lambda before, you will see your existing Lambda\nfunctions in a searchable table. We're going to create a new function so\nclick the \"Create function\" button."

In [33]:
text_without_newlines = article.replace("\n", "")
prepped_article = text_without_newlines.split(".")

In [43]:
labeldict = {
    "programming": [],
    "python": [],
    "java":[] ,
    "high performance teams":[] ,
    "stock market":[] ,
    "sleeping": [],
    "eating": []
}

for sentence in prepped_article:
    try:
        prediction = classifier(sentence, labels, hypothesis_template=hypothesis_template, multi_label=True)
        for label, score in zip(prediction['labels'], prediction['scores']):
            if label in labeldict:
                labeldict[label].append(score)
    except ValueError:
        continue

In [44]:
aggregated_scores = {}

for label, scores in labeldict.items():
    total_score = statistics.mean(scores)
    aggregated_scores[label] = total_score

print(aggregated_scores)

{'programming': 0.6534916612912308, 'python': 0.23560783724215897, 'java': 0.019242797880327667, 'high performance teams': 0.058129258919507265, 'stock market': 0.0037784543078900738, 'sleeping': 0.0006328487771415067, 'eating': 0.0005542129345650954}
