### The Hugging Face transformers library provides a variety of pretrained models and pipelines that can perform different natural language processing (NLP) tasks beyond sentiment analysis. Below are some examples of tasks you can perform using different pipelines provided by the transformers library, along with a set of functions demonstrating these capabilities.



pip install wordcloud --trusted-host pypi.org --trusted-host files.pythonhosted.org transformers==4.9.2 torch==1.9.0

In [1]:
#1. Text Classification (other than sentiment analysis)
#Function: Classify text into predefined categories.
from transformers import pipeline
# Explicitly specify the model name
model_name = "distilbert-base-uncased-finetuned-sst-2-english"

def classify_text(text, model_name="distilbert-base-uncased-finetuned-sst-2-english"):
    classifier = pipeline("text-classification", model=model_name)
    results = classifier(text)
    return results

# Example usage
text = "This is a fantastic product!"
print(classify_text(text))


[{'label': 'POSITIVE', 'score': 0.9998834133148193}]


In [11]:
from transformers import pipeline

def recognize_entities(text):
    model_name = "dbmdz/bert-large-cased-finetuned-conll03-english"
    ner_pipeline = pipeline("ner", model=model_name, grouped_entities=True)
    results = ner_pipeline(text)
    return results

# Example usage
text = "Ashi is born in Delhi."
print(recognize_entities(text))


Some weights of the model checkpoint at dbmdz/bert-large-cased-finetuned-conll03-english were not used when initializing BertForTokenClassification: ['bert.pooler.dense.bias', 'bert.pooler.dense.weight']
- This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


[{'entity_group': 'PER', 'score': 0.9957955, 'word': 'Ashi', 'start': 0, 'end': 4}, {'entity_group': 'LOC', 'score': 0.9983322, 'word': 'Delhi', 'start': 16, 'end': 21}]


In [12]:
from transformers import pipeline

def answer_question(question, context):
    model_name = "distilbert-base-cased-distilled-squad"
    qa_pipeline = pipeline("question-answering", model=model_name)
    results = qa_pipeline(question=question, context=context)
    return results

# Example usage
context = "Hugging Face Inc. is a company based in New York City. Its headquarters are in DUMBO, therefore very close to the Manhattan Bridge."
question = "Where is Hugging Face Inc. based?"
print(answer_question(question, context))


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

{'score': 0.9752461910247803, 'start': 40, 'end': 53, 'answer': 'New York City'}


In [17]:
from transformers import pipeline

def summarize_text(text, max_length=50, min_length=25):
    model_name = "facebook/bart-large-cnn"
    summarization_pipeline = pipeline("summarization", model=model_name)
    summary = summarization_pipeline(text, max_length=max_length, min_length=min_length, do_sample=False)
    return summary[0]['summary_text']

# Example usage
text = "A good human embodies kindness, empathy, and integrity. They act selflessly, helping others and showing compassion. Honesty and respect guide their interactions, fostering trust and positive relationships. A good human values diversity, promotes equality, and strives to make the world a better place through their actions and understanding."
print(summarize_text(text))


A good human embodies kindness, empathy, and integrity. They act selflessly, helping others and showing compassion. Honesty and respect guide their interactions, fostering trust and positive relationships.


In [18]:
from transformers import pipeline

def generate_text(prompt, max_length=50):
    model_name = "gpt2"
    text_generator = pipeline("text-generation", model=model_name)
    generated_text = text_generator(prompt, max_length=max_length, num_return_sequences=1)
    return generated_text[0]['generated_text']

# Example usage
prompt = "A White Rabbit"
print(generate_text(prompt))


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


A White Rabbit! The Jigsaw Witch The Journey ULTIMATE MATCH FINAL EDITION THE KING OF FIGHTERS 2002 UNLIMITED MATCH THE KING OF FIGHTERS XIII STEAM EDITION THE KING OF FIGHTERS XIV STEAM EDITION The King


In [19]:
from transformers import pipeline

def translate_text(text, source_lang="en", target_lang="fr"):
    model_name = f"Helsinki-NLP/opus-mt-{source_lang}-{target_lang}"
    translation_pipeline = pipeline("translation", model=model_name)
    translation = translation_pipeline(text)
    return translation[0]['translation_text']

# Example usage
text = "A pretty cat"
print(translate_text(text, source_lang="en", target_lang="fr"))


Un joli chat
