# Introduction
In this notebook, we'll introduce you to the world of Hugging Face. We'll cover the basics of both the Transformers and Datasets libraries. Hugging Face has become the go-to library for state-of-the-art Natural Language Processing (NLP) tasks, offering a wide variety of pre-trained models and datasets.

In [1]:
# Setting Up
# First, let's install the required libraries:
!pip install transformers
!pip install datasets
!pip install torch



###  Transformers
Transformers provides thousands of pretrained models to perform tasks on texts such as classification, information extraction, and text generation. Let's explore its basic functionalities.

In [2]:
# Loading a Pre-trained Model
# We'll start by loading the BERT model, a popular transformer model:

from transformers import BertTokenizer, BertModel

# Load pre-trained BERT tokenizer and model
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")
model = BertModel.from_pretrained("bert-base-uncased")

  from .autonotebook import tqdm as notebook_tqdm


###  Tokenization
Before we can feed our text to the model, we need to convert it into tokens. This process is called tokenization.

In [3]:
text = "Hugging Face is creating transformative technology!"
encoded_input = tokenizer(text, return_tensors='pt')
print(encoded_input)

{'input_ids': tensor([[  101, 17662,  2227,  2003,  4526, 10938,  8082,  2974,   999,   102]]), 'token_type_ids': tensor([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}


# Model Inference
Now, we can feed our tokenized input into the model to get embeddings:

In [4]:
import torch as torch

with torch.no_grad():
    output = model(**encoded_input)
    embeddings = output.last_hidden_state
print(embeddings)

tensor([[[ 6.3859e-02,  5.0140e-02,  1.2451e-01,  ..., -4.0915e-01,
           1.4366e-01,  4.1960e-01],
         [ 2.7236e-02,  1.5208e-02,  8.3556e-01,  ..., -1.0636e-01,
           1.8042e-01,  4.1943e-01],
         [ 4.1696e-01, -2.5821e-01,  4.7547e-01,  ..., -3.6237e-01,
           7.8667e-02, -1.3397e-02],
         ...,
         [ 1.2395e-04, -2.7653e-01,  5.3409e-01,  ..., -8.0864e-01,
          -6.6008e-02, -2.4298e-01],
         [ 1.1219e-01, -2.1510e-01, -1.7536e-01,  ...,  5.5425e-01,
           7.1288e-02, -5.9522e-01],
         [ 6.9718e-01,  2.9182e-01, -1.6338e-01,  ...,  1.9831e-02,
          -6.3510e-01, -3.9752e-01]]])


# Datasets
Hugging Face also provides a library called datasets which makes it easy to access a large number of datasets used in NLP research.

In [5]:
# Loading a Dataset
# For demonstration purposes, we'll load the imdb dataset, a popular dataset for sentiment analysis:

from datasets import load_dataset

# Load the IMDB dataset
dataset = load_dataset("imdb")
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label'],
        num_rows: 50000
    })
})


## Exploring the Dataset
Datasets in the datasets library are often split into 'train', 'test', and sometimes 'validation' subsets. Let's take a peek at the first few entries of the training set:

In [6]:
print(dataset['train'][0:5])

{'text': ['I rented I AM CURIOUS-YELLOW from my video store because of all the controversy that surrounded it when it was first released in 1967. I also heard that at first it was seized by U.S. customs if it ever tried to enter this country, therefore being a fan of films considered "controversial" I really had to see this for myself.<br /><br />The plot is centered around a young Swedish drama student named Lena who wants to learn everything she can about life. In particular she wants to focus her attentions to making some sort of documentary on what the average Swede thought about certain political issues such as the Vietnam War and race issues in the United States. In between asking politicians and ordinary denizens of Stockholm about their opinions on politics, she has sex with her drama teacher, classmates, and married men.<br /><br />What kills me about I AM CURIOUS-YELLOW is that 40 years ago, this was considered pornographic. Really, the sex and nudity scenes are few and far b

# Conclusion
Congratulations! You've just had a brief introduction to Hugging Face's Transformers and Datasets. There's a lot more to explore, including fine-tuning models on custom datasets, leveraging community-contributed models, and more. We encourage you to dive deeper into the documentation and community forums to continue your learning journey.

Remember, the strength of Hugging Face lies not just in its powerful tools, but also in its vibrant community. Don't hesitate to share your projects and ask questions!

