<a href="https://colab.research.google.com/github/suyash-srivastava-dev/A4I/blob/main/Colab/NLP_01.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# NLP Transformers

### hugging face pipeline

There are three main steps involved when you pass some text to a pipeline:

1. The text is preprocessed into a format the model can understand.
2. The preprocessed inputs are passed to the model.
3. The predictions of the model are post-processed, so you can make sense of them.

Key task performed by Pipeline ([Complete list](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.pipeline.task))

* Sentiment-analysis
* Zero-shot classification
* fill-mask
* question-answering
* summarization
* text-generation
* translation

Install the transformer library

In [1]:
!pip install transformers

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


1. **Sentiment Analysis**


In [2]:
from transformers import pipeline

classifier = pipeline("sentiment-analysis")
classifier("This is the worst possible product. Nothing works in this.")

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt: 0.00B [00:00, ?B/s]

Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers
pip install xformers.


[{'label': 'NEGATIVE', 'score': 0.9998025298118591}]

2. **Zero-shot classification**


  Zero-shot learning (ZSL) in NLP is a machine learning technique that allows a model to classify text from previously unseen classes, without receiving any specific training for those classes. This is done by leveraging the model's understanding of the semantic relationships between words and concepts.

  There are two main approaches to ZSL in NLP:

  * Attributes-based ZSL: This approach uses a set of attributes to describe the unseen classes. The model is then trained to map these attributes to class labels.
  * Semantic-based ZSL: This approach uses a knowledge base to represent the semantic relationships between words and concepts. The model is then trained to use this knowledge base to make predictions about unseen classes.

In [3]:
from transformers import pipeline

classifier = pipeline("zero-shot-classification")
classifier(
    "Airpods are the best",
    candidate_labels=["education", "politics", "technology", "business"],
)

No model was supplied, defaulted to facebook/bart-large-mnli and revision c626438 (https://huggingface.co/facebook/bart-large-mnli).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json: 0.00B [00:00, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]

Downloading (…)olve/main/merges.txt: 0.00B [00:00, ?B/s]

Downloading (…)/main/tokenizer.json: 0.00B [00:00, ?B/s]

{'sequence': 'Airpods are the best',
 'labels': ['technology', 'business', 'education', 'politics'],
 'scores': [0.9303504228591919,
  0.048402491956949234,
  0.012988047674298286,
  0.008259052410721779]}

3.**Text generation**

For the provide prompt, model will auto-complete it by generating the remaining text. This is similar to the predictive text feature that is found on phones or the gmail next words suggestion.

In [5]:
from transformers import pipeline

generator = pipeline("text-generation")
generator("Apple is one of the biggest")

No model was supplied, defaulted to gpt2 and revision 6c0e608 (https://huggingface.co/gpt2).
Using a pipeline without specifying a model name and revision in production is not recommended.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


[{'generated_text': 'Apple is one of the biggest players in the PC gaming industry. The company recently signed licensing agreements with some very notable companies including AMD, NVIDIA, and Electronic Arts. PC gaming has been a huge success across the board, with the release of more copies'}]

4. **Mask filling**

task is to fill in the blanks in a given sentence.

In [4]:
from transformers import pipeline

unmasker = pipeline("fill-mask")
unmasker("Apple is <mask> for the tech industry.", top_k=2) #top_k is to limit results, will fetch top_k results only

No model was supplied, defaulted to distilroberta-base and revision ec58a5b (https://huggingface.co/distilroberta-base).
Using a pipeline without specifying a model name and revision in production is not recommended.


[{'score': 0.08616828173398972,
  'token': 2149,
  'token_str': ' responsible',
  'sequence': 'Apple is responsible for the tech industry.'},
 {'score': 0.037314582616090775,
  'token': 12115,
  'token_str': ' bullish',
  'sequence': 'Apple is bullish for the tech industry.'}]

In [7]:
unmasker("AI is <mask> for the mordern world", top_k=3)

[{'score': 0.06428221613168716,
  'token': 19083,
  'token_str': ' destined',
  'sequence': 'AI is destined for the mordern world'},
 {'score': 0.03765387088060379,
  'token': 3475,
  'token_str': ' headed',
  'sequence': 'AI is headed for the mordern world'},
 {'score': 0.03600325062870979,
  'token': 1227,
  'token_str': ' ready',
  'sequence': 'AI is ready for the mordern world'}]

5. **Question answering**

In [8]:
from transformers import pipeline

question_answerer = pipeline("question-answering")
question_answerer(
    question="What is ChatGPT ?",
    context="ChatGPT is an advanced language model designed to assist users with a wide range of inquiries. Powered by the GPT-3.5 architecture, it leverages extensive training data to provide accurate and coherent responses. With its natural language processing capabilities, ChatGPT aims to enhance communication and offer valuable information, making it a valuable tool in various domains.",
)

No model was supplied, defaulted to distilbert-base-cased-distilled-squad and revision 626af31 (https://huggingface.co/distilbert-base-cased-distilled-squad).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Downloading model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading (…)solve/main/vocab.txt: 0.00B [00:00, ?B/s]

Downloading (…)/main/tokenizer.json: 0.00B [00:00, ?B/s]

{'score': 0.38619157671928406,
 'start': 11,
 'end': 37,
 'answer': 'an advanced language model'}

In [10]:
question_answerer(
    question="What is aim of ChatGPT",
    context="ChatGPT is an advanced language model designed to assist users with a wide range of inquiries. Powered by the GPT-3.5 architecture, it leverages extensive training data to provide accurate and coherent responses. With its natural language processing capabilities, ChatGPT aims to enhance communication and offer valuable information, making it a valuable tool in various domains.",
)

{'score': 0.32861292362213135,
 'start': 277,
 'end': 332,
 'answer': 'to enhance communication and offer valuable information'}

6. **Summarization**

In [11]:
from transformers import pipeline

summarizer = pipeline("summarization")
summarizer(
    """
    ChatGPT is an impressive feat of artificial intelligence, revolutionizing the way we interact with computer systems. Developed by OpenAI, it is built upon the powerful GPT-3.5 architecture, which enables it to generate human-like responses to user queries.
With a vast amount of training data, ChatGPT has been exposed to a diverse range of topics, making it well-equipped to provide accurate and insightful information. Whether you need assistance with general knowledge, technical inquiries, or even creative writing prompts, ChatGPT can offer valuable guidance.
The natural language processing capabilities of ChatGPT allow it to understand and respond to a wide array of linguistic nuances. It can decipher the context of a question and generate coherent and contextually appropriate answers. Moreover, ChatGPT can engage in conversational dialogues, adapting its responses to maintain a seamless interaction.
While ChatGPT excels at providing information, it is important to note that it operates within the limits of its training data. It may occasionally produce incorrect or incomplete answers, and it lacks the ability to verify the accuracy of the information it provides. Users should exercise critical thinking and verify facts independently when necessary.
OpenAI has implemented measures to ensure the responsible use of ChatGPT, such as its knowledge cutoff date and the ability to flag and address biased or inappropriate content. These efforts aim to create a safe and reliable environment for users.
As an AI language model, ChatGPT is continuously evolving. OpenAI actively seeks user feedback to improve its performance and address any limitations. The goal is to refine and enhance ChatGPT's capabilities, making it an even more valuable tool for individuals and businesses alike.
Overall, ChatGPT represents a significant advancement in conversational AI, bridging the gap between humans and machines. It has the potential to transform various industries and revolutionize the way we interact with technology, paving the way for a future where intelligent virtual assistants are an integral part of our daily lives.
"""
)

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).
Using a pipeline without specifying a model name and revision in production is not recommended.


Downloading (…)lve/main/config.json: 0.00B [00:00, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/1.22G [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/26.0 [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]

Downloading (…)olve/main/merges.txt: 0.00B [00:00, ?B/s]

[{'summary_text': ' ChatGPT is an impressive feat of artificial intelligence, revolutionizing the way we interact with computer systems . It is built upon the powerful GPT-3.5 architecture, which enables it to generate human-like responses to user queries . It can decipher the context of a question and generate coherent and contextually appropriate answers .'}]

In [2]:
# Install sentencepiece
!pip install transformers[sentencepiece]

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Tranlation Model list : https://huggingface.co/models?pipeline_tag=translation&sort=downloads

In [4]:
from transformers import pipeline
translates = pipeline("translation",model="Helsinki-NLP/opus-mt-en-hi")
translates("Apple has an ecosystem of devices.")

Downloading (…)olve/main/source.spm:   0%|          | 0.00/812k [00:00<?, ?B/s]

Downloading (…)olve/main/target.spm:   0%|          | 0.00/1.07M [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]

[{'translation_text': 'एप्पल उपकरणों का एक पर्यावरण है.'}]

In [5]:
# model="Helsinki-NLP/opus-mt-en-hi" translate en(english) to hi(hindi)
# model="Helsinki-NLP/opus-mt-hi-en" translate hi(hindi) to en(english)

translates = pipeline("translation",model="Helsinki-NLP/opus-mt-hi-en")
translates("एप्पल उपकरणों का एक पर्यावरण है.")

Downloading (…)lve/main/config.json: 0.00B [00:00, ?B/s]

Downloading pytorch_model.bin:   0%|          | 0.00/304M [00:00<?, ?B/s]

Downloading (…)neration_config.json:   0%|          | 0.00/293 [00:00<?, ?B/s]

Downloading (…)okenizer_config.json:   0%|          | 0.00/42.0 [00:00<?, ?B/s]

Downloading (…)olve/main/source.spm:   0%|          | 0.00/1.06M [00:00<?, ?B/s]

Downloading (…)olve/main/target.spm:   0%|          | 0.00/813k [00:00<?, ?B/s]

Downloading (…)olve/main/vocab.json: 0.00B [00:00, ?B/s]

[{'translation_text': 'The Appall devices have an environment.'}]