In [1]:
from datasets import load_dataset

dataset = load_dataset("imdb")


  from .autonotebook import tqdm as notebook_tqdm


In [2]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)


Map: 100%|██████████████████████████████████████████████████████████████| 25000/25000 [00:05<00:00, 4210.79 examples/s]


In [3]:
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [4]:
#!pip install transformers[torch]
#!pip install accelerate

In [5]:
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)







In [6]:
trainer.train()


Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

In [None]:
eval_results = trainer.evaluate()
print(eval_results)


In [None]:
model.save_pretrained("./model")
tokenizer.save_pretrained("./model")


In [None]:
from transformers import pipeline

nlp = pipeline("sentiment-analysis", model="./model")
result = nlp("I love using Hugging Face!")
print(result)


In [None]:
Creating an end-to-end use case using Hugging Face's Natural Language Processing (NLP) models involves several steps, from data collection to model deployment. Below is a detailed guide to creating such a use case, assuming you want to build a text classification model (e.g., sentiment analysis) using the Hugging Face Transformers library.

1. Environment Setup
Install Necessary Libraries: Install the Hugging Face Transformers library along with other dependencies.
bash
Copy code
pip install transformers datasets torch
2. Data Collection & Preprocessing
Collect Data: Obtain a dataset for text classification. For sentiment analysis, you can use datasets like IMDb reviews, SST-2, or any custom dataset.
Load Data: Use Hugging Face's datasets library to load the dataset.
python
Copy code
from datasets import load_dataset

dataset = load_dataset("imdb")
Preprocess Data: Tokenize the text data using a pre-trained tokenizer from Hugging Face.
python
Copy code
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)
3. Model Selection
Choose a Pre-Trained Model: Select a pre-trained model from Hugging Face's model hub. For text classification, BERT, DistilBERT, or RoBERTa are good choices.
python
Copy code
from transformers import AutoModelForSequenceClassification

model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)
4. Training the Model
Prepare Data for Training: Convert the dataset to PyTorch tensors and set up the data loaders.
python
Copy code
from transformers import TrainingArguments, Trainer

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    logging_dir="./logs",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
)
Train the Model: Start the training process.
python
Copy code
trainer.train()
5. Evaluation
Evaluate the Model: After training, evaluate the model on the test set.
python
Copy code
eval_results = trainer.evaluate()
print(eval_results)
6. Fine-Tuning (Optional)
If the results are not satisfactory, you might want to fine-tune the model further by tweaking hyperparameters, adding more data, or using different model architectures.
7. Model Deployment
Save the Model: Once the model is trained and evaluated, save it for deployment.
python
Copy code
model.save_pretrained("./model")
tokenizer.save_pretrained("./model")
Deploy with Hugging Face's Inference API: You can use the Hugging Face Inference API to deploy the model or set up your server.
python
Copy code
from transformers import pipeline

nlp = pipeline("sentiment-analysis", model="./model")
result = nlp("I love using Hugging Face!")
print(result)
8. Model Monitoring & Maintenance
Monitor Performance: Once the model is in production, monitor its performance over time.
Update the Model: Periodically update the model with new data and retrain it as needed.
9. Documentation and Sharing
Document the Workflow: Document each step of your pipeline to ensure reproducibility and ease of understanding for others.
Share the Model: Share the model on Hugging Face's Model Hub for others to use.
This is a complete end-to-end use case of building an NLP application using Hugging Face. Depending on the complexity of your project, you can add more advanced techniques like model ensembling, knowledge distillation, or custom loss functions.