In [2]:
from transformers import pipeline

## Sentiment Analysis

In [15]:
classifier = pipeline("sentiment-analysis")
classifier("Instagram wants to limit hashtag spam.")

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'label': 'NEGATIVE', 'score': 0.988932728767395}]

## Zero-Shot Classification

In [18]:
classifier = pipeline("zero-shot-classification", model = 'facebook/bart-large-mnli')
classifier(
    ["Inter Miami wins the MLS", "Match tonight betwee Chiefs vs. Patriots", "Michael Jordan plans to sell Charlotte Hornets"],
    candidate_labels=["soccer", "football", "basketball"]
    )

Device set to use cpu


[{'sequence': 'Inter Miami wins the MLS',
  'labels': ['soccer', 'football', 'basketball'],
  'scores': [0.9162040948867798, 0.07244189083576202, 0.011354007758200169]},
 {'sequence': 'Match tonight betwee Chiefs vs. Patriots',
  'labels': ['football', 'basketball', 'soccer'],
  'scores': [0.9281435608863831, 0.0391676239669323, 0.032688744366168976]},
 {'sequence': 'Michael Jordan plans to sell Charlotte Hornets',
  'labels': ['basketball', 'football', 'soccer'],
  'scores': [0.9859175682067871, 0.009983371943235397, 0.004099058918654919]}]

## Text Generation

In [21]:
generator = pipeline("text-generation", temperature=0.8)
generator("Once upon a time, in a land where the King Pineapple was")

No model was supplied, defaulted to openai-community/gpt2 and revision 607a30d (https://huggingface.co/openai-community/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': "Once upon a time, in a land where the King Pineapple was a common crop, the Queen of the North had lived in a small village. The Queen had always lived in a small village, and her daughter, who was also the daughter of the Queen, had lived in a larger village. The royal family would come to the Queen's village, and then the Queen would return to her castle and live there with her daughters. In the middle of the night, she would lay down on the royal bed and kiss the princess at least once, and then she would return to her castle to live there with her men. In the daytime, however, the Queen would be gone forever, and her mother would be alone. The reason for this disappearance, in the form of the Great Northern Passage and the Great Northern Passage, was the royal family had always wanted to take the place of the Queen. In the end, they took the place of the Queen, and went with their daughter to meet the King. At that time, the King was the only person on the isla

## Name and Entity Recognition

In [24]:
ner = pipeline("ner", grouped_entities=True)
ner("The man landed on the moon in 1969. Neil Armstrong was the first man to step on the Moon's surface. He was a NASA Astronaut.")

No model was supplied, defaulted to dbmdz/bert-large-cased-finetuned-conll03-english and revision 4c53496 (https://huggingface.co/dbmdz/bert-large-cased-finetuned-conll03-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Device set to use cpu


[{'entity_group': 'PER',
  'score': np.float32(0.99960065),
  'word': 'Neil Armstrong',
  'start': 36,
  'end': 50},
 {'entity_group': 'LOC',
  'score': np.float32(0.82190216),
  'word': 'Moon',
  'start': 84,
  'end': 88},
 {'entity_group': 'ORG',
  'score': np.float32(0.9842771),
  'word': 'NASA',
  'start': 109,
  'end': 113},
 {'entity_group': 'MISC',
  'score': np.float32(0.8394754),
  'word': 'As',
  'start': 114,
  'end': 116}]

## Summarization

In [27]:
summarizer = pipeline("summarization")
summarizer("""
In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism, in which text is converted to numerical representations called tokens, and each token is converted into a vector via lookup from a word embedding table.[1] At each layer, each token is then contextualized within the scope of the context window with other (unmasked) tokens via a parallel multi-head attention mechanism, allowing the signal for key tokens to be amplified and less important tokens to be diminished.

Transformers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM).[2] Later variations have been widely adopted for training large language models (LLMs) on large (language) datasets.[3]
""")

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


[{'summary_text': ' In deep learning, the transformer is an artificial neural network architecture based on the multi-head attention mechanism . Transformerers have the advantage of having no recurrent units, therefore requiring less training time than earlier recurrent neural architectures (RNNs) such as long short-term memory (LSTM)'}]

# Resume Sentiment and Tone

In [46]:
text = """
Data Scientist with over 10 years of experience in data analysis, applying machine learning and advanced analytics to solve complex business challenges. Specialized in machine learning models, optimizating processes, and solving business problems with data, aligning model development with business outcomes. Proven ability to lead end-to-end projects, from data collection and wrangling to production-ready deployment, using cloud platforms, modern ML frameworks, and containerized pipelines. Expertise in predictive analytics, inferential analytics, and statistical modeling. Experienced in SQL, Python, Databricks.
"""

In [47]:
from transformers import pipeline
from transformers import AutoTokenizer

# 1. Load your desired tokenizer
model_checkpoint = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

# 2. Tokenize the text without padding or truncation
# We return tensors or lists to slice them manually
tokens = tokenizer(text, add_special_tokens=False, return_tensors="pt")["input_ids"][0]

# 3. Define chunk size (leaving room for 2 special tokens if needed)
chunk_size = 508

# 4. Create chunks
chunks = [tokens[i:i + chunk_size] for i in range(0, len(tokens), chunk_size)]

# 5. Convert back to strings or add special tokens for model input
decoded_chunks = []
for chunk in chunks:
    # This adds [CLS] and [SEP] and converts back to a format the model likes
    final_input = tokenizer.prepare_for_model(chunk.tolist(), add_special_tokens=True)
    decoded_chunks.append(tokenizer.decode(final_input['input_ids']))


# Initialize sentiment analysis pipeline
sentiment_pipeline = pipeline("sentiment-analysis")

You're using a BertTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
Device set to use cpu


In [48]:
# Sentiment analysis
sentiment = sentiment_pipeline(decoded_chunks)[0]
print(f"Sentiment: {sentiment['label']}")
print(f"Confidence: {100*sentiment['score']:.2f}%")

Sentiment: POSITIVE
Confidence: 99.88%


In [51]:
# Categorize tone
tone_pipeline = pipeline("zero-shot-classification", model = 'facebook/bart-large-mnli',
                        candidate_labels=["Senior", "Junior", "Trainee", "Blue-collar", "White-collar", "Self-employed"])
tone = tone_pipeline(decoded_chunks)[0]

print(f"Tone: {tone['labels']}")
print(f"Confidence: {100*tone['scores']}")

Device set to use cpu


Tone: ['Senior', 'Blue-collar', 'White-collar', 'Self-employed', 'Junior', 'Trainee']
Confidence: [0.4091757535934448, 0.23886899650096893, 0.20030029118061066, 0.06276989728212357, 0.05929991230368614, 0.029585164040327072]


## Image classification

In [55]:
image_classifier = pipeline(
    task="image-classification", model="google/vit-base-patch16-224"
)
result = image_classifier(
    "https://images.unsplash.com/photo-1689009480504-6420452a7e8e?q=80&w=687&auto=format&fit=crop&ixlib=rb-4.1.0&ixid=M3wxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8fA%3D%3D"
)
print(result)

Fast image processor class <class 'transformers.models.vit.image_processing_vit_fast.ViTImageProcessorFast'> is available for this model. Using slow image processor class. To use the fast image processor class set `use_fast=True`.
Device set to use cpu


[{'label': 'Yorkshire terrier', 'score': 0.9792122840881348}, {'label': 'Australian terrier', 'score': 0.00648861238732934}, {'label': 'silky terrier, Sydney silky', 'score': 0.00571345305070281}, {'label': 'Norfolk terrier', 'score': 0.0013639888493344188}, {'label': 'Norwich terrier', 'score': 0.0010306559270247817}]
