# Multi-Task NLP pipeline with Hugging Face

Consider a story about colonists on Mars below. I will use the extensive capabilties of the Hugging Face library to:
* Summarize the story
* Answer different questions about the story
* Analyze the sentiment of the last paragraph
* Generate an image based on the sentence from the story

In [None]:
# The Text Corpus
story = """
The Mars Colony known as Ares VI faced an unusual crisis. Dr. Elara Vance,
the Chief Botanist, discovered that the nutrient solution sustaining the crucial 'Oxygen Algae' was being depleted too rapidly.
She initially suspected a sensor malfunction, a common issue in the harsh Martian environment, but diagnostic logs confirmed the problem was real.
The colony's main power core, a fusion reactor located in Sector 3, was diverting energy away from the life support systems to compensate for a minor, but persistent, leak in the cooling pipes.
The engineering team, led by Jax, had been trying to fix the leak for three Martian sols (about 3 Earth days) but considered it a low-priority maintenance task.
Elara immediately overrode their protocol, manually routing auxiliary power back to her algae vats, stabilizing the oxygen supply. She then sent a strongly worded message to Jax, demanding a permanent fix by the next solar cycle.
Her swift action averted a catastrophe, reminding everyone that life support always takes priority over structural integrity alarms.
"""

print(f"Story length: {len(story.split())} words")

Story length: 169 words


## Summarize a story

Since this is a short text, I will use a pipeline object for its ease of use and speed. I will choose the T5-small model because we needed to do a summarization (text-to-text task) and to avoide overhead. The GPU is also not needed for this task.

In [None]:
from transformers import pipeline

# Instantiate the pipeline object, specifying the task and the model name.
summarizer = pipeline("summarization", model="t5-small")

# Run the pipeline on the story
summary_result = summarizer(story, max_length=50, min_length=10, do_sample=False)

# Print the result. The output is a list of dictionaries.
print("\n--- Summary Result ---")
print(summary_result[0]['summary_text'])

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

Device set to use cuda:0
Both `max_new_tokens` (=256) and `max_length`(=50) seem to have been set. `max_new_tokens` will take precedence. Please refer to the documentation for more information. (https://huggingface.co/docs/transformers/main/en/main_classes/text_generation)



--- Summary Result ---
the nutrient solution sustaining the crucial 'Oxygen Algae' was being depleted too rapidly . the colony's main power core, a fusion reactor, was diverting energy away from the life support systems to compensate for a minor leak in the cooling pipes .


## Use the model to answer questions about the story

DistilBERT offers excellent balance of speed, size, and performance.

In [None]:
context = story # Use the story text as the context


# Define your Question and Context
questions = [
    {"question": "What was causing the life support systems to lose power?",
      "context": context
    },
    {
      "question": "Who was the Chief Botanist that fixed the problem?",
      "context": context
    },
    {
      "question": "Who led the engineering team?",
      "context": context
    },
    {
      "question": "For how long the engineering team tried to fix the leak?",
      "context": context
    }
]

# Instantiate the pipeline object, scifying the task and the model name.
qa_pipeline = pipeline("question-answering",
                         model="distilbert/distilbert-base-cased-distilled-squad")

# Run the pipeline
results = qa_pipeline(questions)

print("\n--- Question Answering Result ---")

for input_data, result in zip(questions, results):
  # Print the predicted answer, score, and where the answer starts in the text.
  print(f"Question: {input_data['question']}")
  print(f"Answer: {result['answer']}")
  print(f"Confidence Score: {result['score']:.4f}")
  print("--------")

Device set to use cpu



--- Question Answering Result ---
Question: What was causing the life support systems to lose power?
Answer: leak in the cooling pipes
Confidence Score: 0.2645
--------
Question: Who was the Chief Botanist that fixed the problem?
Answer: Dr. Elara Vance
Confidence Score: 0.8016
--------
Question: Who led the engineering team?
Answer: Jax
Confidence Score: 0.9645
--------
Question: For how long the engineering team tried to fix the leak?
Answer: three Martian sols (about 3 Earth days
Confidence Score: 0.4033
--------


## Analyze the sentiment of the last paragraph of the story

`distilbert-base-uncased-finetuned-sst-2-english` is suitable for the sentiment analysis task because:
* `distilbert`: lightweight version of BERT, ideal for quick testing
* `base-uncased`: Good for general text where capitalization isn't a core indicator of meaning
* `finetuned`: Shows the model is specialized, not just generally trained on raw text.
* `sst-2`, Stanford Sentiment Treebank: Specifically trained on a benchmark dataset for Positive/Negative classification.


In [None]:
# Text to classify (the final resolution part of the story)
text_to_analyze = """
 Elara immediately overrode their protocol,
 manually routing auxiliary power back to her algae vats, stabilizing
 the oxygen supply. She then sent a strongly worded message to Jax,
 demanding a permanent fix by the next solar cycle.
 Her swift action averted a catastrophe.
"""

# Instantiate the pipeline object, specifying the task and the model name.
sentiment_classifier = pipeline('sentiment-analysis',
                                model="distilbert-base-uncased-finetuned-sst-2-english")

# Run the pipeline
sentiment_result = sentiment_classifier(text_to_analyze)
print(sentiment_result)

config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


[{'label': 'NEGATIVE', 'score': 0.9696677327156067}]


## Generate several pictures based on the story

Utilize Hugging Face extensive capabilties to generate an image based on the visual prompt. The prompt would be a phrase from a story: "Multiple green algae vats in a futuristic labaratory."

What we will use:
* `StableDiffusionPipeline` - automates complex tasks involved in image generation. It is optimized for loading models onto a GPU (or CPU) and managing the significant memory required for image generation.
* `CompVis/stable-diffusion-v1-4` - the v1-4 model, is one of the most widely used, foundational versions of Stable Diffusion. It was trained on billions of images and can generate a massive variety of high-quality visuals.

In [None]:
from diffusers import StableDiffusionPipeline
import torch

model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(
    model_id,
    dtype=torch.float16
)

device = "cuda" if torch.cuda.is_available() else "cpu"
pipe = pipe.to(device)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model_index.json:   0%|          | 0.00/541 [00:00<?, ?B/s]

Fetching 16 files:   0%|          | 0/16 [00:00<?, ?it/s]

preprocessor_config.json:   0%|          | 0.00/342 [00:00<?, ?B/s]

scheduler_config.json:   0%|          | 0.00/313 [00:00<?, ?B/s]

scheduler_config-checkpoint.json:   0%|          | 0.00/209 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/592 [00:00<?, ?B/s]

config.json: 0.00B [00:00, ?B/s]

safety_checker/model.safetensors:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

text_encoder/model.safetensors:   0%|          | 0.00/492M [00:00<?, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer_config.json:   0%|          | 0.00/806 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/472 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

unet/diffusion_pytorch_model.safetensors:   0%|          | 0.00/3.44G [00:00<?, ?B/s]

config.json:   0%|          | 0.00/551 [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

vae/diffusion_pytorch_model.safetensors:   0%|          | 0.00/335M [00:00<?, ?B/s]

Keyword arguments {'dtype': torch.float16} are not expected by StableDiffusionPipeline and will be ignored.


Loading pipeline components...:   0%|          | 0/7 [00:00<?, ?it/s]

Using the pipeline object, generate and save an image. You can find it in the project repository.

In [None]:
prompt = """Multiple green algae vats in a futuristic labaratory"""

# This will generate an image
with torch.autocast("cuda"):
  image = pipe(prompt).images[0]

# Save the image on google colab
image.save("Algae_vats.png")
print("Image saved as an Algae_vats.png")

  0%|          | 0/50 [00:00<?, ?it/s]

Image saved as an Algae_vats.png
