## Install Necessary Libraries

In [None]:
! pip install transformers huggingface_hub



## Login using Hugging Face Access Token

- Go to the [Hugging Face website](https://huggingface.co/).
- Log in to your account.
- Navigate to your account settings and find the "Access Tokens" section.
- Create a new token or copy an existing one.

In [None]:
from huggingface_hub import notebook_login # login
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Inferencing GPT2 for Text Generation

In [None]:
from transformers import GPT2Tokenizer, GPT2LMHeadModel # AutoTokenizer, AutoModelForCausalLM

model_id = 'gpt2'

# Initialize the tokenizer and model
tokenizer = GPT2Tokenizer.from_pretrained(model_id)
model = GPT2LMHeadModel.from_pretrained(model_id)

# Text input
text = "Hello, I'm a language model,"

# Tokenize the input text
encoded_input = tokenizer.encode(text, return_tensors='pt')

# Generate text
output_sequences = model.generate(
    input_ids=encoded_input,
    max_length=30,  # max length of the generated text
)

# Decode the generated sequences
generated_text = tokenizer.decode(output_sequences[0], skip_special_tokens=True)

print("-"*150)
print("Generated Text:", generated_text)


The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


------------------------------------------------------------------------------------------------------------------------------------------------------
Generated Text: Hello, I'm a language model, not a programming language. I'm a language model. I'm a language model. I'm a language model


## Inferencing using the Pipeline Interface (Quick Method)

### 1. Text Generation

In [None]:
from transformers import pipeline, set_seed

# Initialize the text-generation pipeline
generator = pipeline(task='text-generation', model='gpt2')
set_seed(42)

# Generate text
results = generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5)

print("-"*150)
for result in results:
    print(result)

Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.
Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


------------------------------------------------------------------------------------------------------------------------------------------------------
{'generated_text': "Hello, I'm a language model, but what I'm really doing is making a human-readable document. There are other languages, but those are"}
{'generated_text': "Hello, I'm a language model, not a syntax model. That's why I like it. I've done a lot of programming projects.\n"}
{'generated_text': "Hello, I'm a language model, and I'll do it in no time!\n\nOne of the things we learned from talking to my friend"}
{'generated_text': "Hello, I'm a language model, not a command line tool.\n\nIf my code is simple enough:\n\nif (use (string"}
{'generated_text': "Hello, I'm a language model, I've been using Language in all my work. Just a small example, let's see a simplified example."}


### 2. Sentiment Analysis

In [None]:
from transformers import pipeline, set_seed

# Initialize the sentiment-analysis pipeline
generator = pipeline(task='sentiment-analysis')

# Get the sentiment of the text
result = generator("I am very happy")

print("-"*150)
print(result)

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


------------------------------------------------------------------------------------------------------------------------------------------------------
[{'label': 'POSITIVE', 'score': 0.9998795986175537}]


### 3. Sentence Similarity

In [None]:
from transformers import pipeline, set_seed
from scipy.spatial.distance import cosine
import numpy as np

# Initialize the feature-extraction pipeline
generator = pipeline(task='feature-extraction')

text1 = "I love using Hugging Face transformers."
text2 = "Transformers from Hugging Face are very useful for NLP tasks."

# Generate embeddings for the texts
embedding1 = generator(text1)[0]
embedding2 = generator(text2)[0]

# Convert the embeddings to 1D by averaging over the token dimension
embedding1 = np.mean(embedding1, axis=0)
embedding2 = np.mean(embedding2, axis=0)

# Calculate cosine similarity
cosine_sim = 1 - cosine(embedding1, embedding2)

print("-"*150)
print(cosine_sim)

No model was supplied, defaulted to distilbert/distilbert-base-cased and revision 935ac13 (https://huggingface.co/distilbert/distilbert-base-cased).
Using a pipeline without specifying a model name and revision in production is not recommended.


------------------------------------------------------------------------------------------------------------------------------------------------------
0.9251842829210587
