BERT Transformer.ipynb

### Cell 1: Importing Libraries

This cell imports necessary libraries including `BertTokenizer`, `BertModel`, `torch`, and `numpy`.

In [2]:
!pip install torch



In [4]:
# from transformers import BertTokenizer, BertModel
from transformers import BertTokenizer, BertModel
import torch
import numpy as np
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

ModuleNotFoundError: No module named 'transformers'

### Cell 2: Loading Pre-trained BERT Tokenizer

In this cell, the pre-trained BERT tokenizer (`bert-base-uncased`) is loaded using `BertTokenizer.from_pretrained()` method.

In [5]:
# Load pre-trained BERT tokenizer
tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

NameError: name 'BertTokenizer' is not defined

### Cell 3: Tokenizing Input Text

This cell tokenizes the input text "Hello, how are you?" using the loaded tokenizer. It then converts tokens to token IDs, adds special tokens `[CLS]` and `[SEP]`, and converts them to a tensor.

In [None]:
# Tokenize input text
text = "Hello, how are you?"
tokens = tokenizer.tokenize(text)

# Convert tokens to token IDs
token_ids = tokenizer.convert_tokens_to_ids(tokens)

# Add special tokens [CLS] and [SEP]
token_ids = [tokenizer.cls_token_id] + token_ids + [tokenizer.sep_token_id]

# Convert token IDs to tensor
input_ids = torch.tensor(token_ids)

### Cell 4: Loading Pre-trained BERT Model and Forward Pass

In this cell, the pre-trained BERT model (`bert-base-uncased`) is loaded using `BertModel.from_pretrained()` method. It performs a forward pass through the model to get outputs.

In [None]:
# Load pre-trained BERT model
model = BertModel.from_pretrained("bert-base-uncased")

# Forward pass through the model
with torch.no_grad():
    outputs = model(input_ids.unsqueeze(0))  # Add batch dimension

### Cell 5: Extracting Hidden States and Printing Shape

This cell extracts the hidden states (embeddings) from the outputs and prints the shape of the hidden states tensor.

In [None]:
# Get the hidden states (embeddings)
hidden_states = outputs.last_hidden_state

# Print the hidden states tensor and its shape
print(hidden_states)
print(hidden_states.shape)  # Shape of the output embeddings

### Cell 6: Converting Hidden States to NumPy Array

Here, the hidden states tensor is converted to a NumPy array and the batch dimension is removed. The shape of the NumPy array is printed.

In [None]:
# Convert hidden states tensor to NumPy array
hidden_states_np = hidden_states.numpy().squeeze(0)  # Remove the batch dimension
print(hidden_states_np.shape)

### Cell 7: Converting Token IDs to Tokens and Reconstructing Original Text

This cell converts token IDs back to tokens using the tokenizer's `convert_ids_to_tokens()` function. It then reconstructs the original input text from tokens and prints it.

In [None]:
# Convert token IDs to tokens using the tokenizer's convert_ids_to_tokens function
tokens = tokenizer.convert_ids_to_tokens(token_ids)

# Reconstruct the original input text from the tokens
original_text = tokenizer.convert_tokens_to_string(tokens)
print("Original Text:", original_text)

Fine Tuned MLM.ipynb

### Cell 1: Installing Transformers Library

This cell installs the `transformers` library using pip. The Transformers library is necessary for working with pre-trained models provided by Hugging Face for various natural language processing tasks.

In [None]:
! pip install transformers

### Cell 2: Using Sentiment Analysis Pipeline

This cell demonstrates the use of the sentiment analysis pipeline provided by the `transformers` library. It loads a pre-trained sentiment analysis model using the `pipeline` function and analyzes the sentiment of a single input text.

In [None]:
from transformers import pipeline
classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')

In [None]:
classifier('We are very happy for the holiday of Eid.')

In [None]:
classifier('My first job was not good because of tight schedule.')

### Cell 5: Performing Sentiment Analysis on Multiple Texts

This cell extends the sentiment analysis task to handle multiple input texts at once. It uses the same sentiment analysis pipeline to analyze the sentiment of multiple input texts and prints the results.

In [None]:
results = classifier(["We are very happy to show you the 🤗 Transformers library.",
           "We hope you don't hate it."])
for result in results:
    print(f"label: {result['label']}, with score: {round(result['score'], 4)}")

Training Mid-Size LM.ipynb

### Cell 1: Installing simpletransformers

In this cell, the `simpletransformers` package is installed using the following command:

In [None]:
!pip install simpletransformers

### Cell 2: Loading Data from JSON File

In this cell, data is loaded from a JSON file named "train.json" using the `json.load()` method.

In [None]:
import json
with open(r"train.json", "r") as read_file:
    train = json.load(read_file)

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
train

### Cell 4: Loading Data from JSON File

In this cell, data is loaded from a JSON file named "test.json" using the `json.load()` method.

In [None]:
with open(r"test.json", "r") as read_file:
    test = json.load(read_file)

In [None]:
test

### Cell 6: Setting up Question Answering Model

In this cell, the necessary libraries are imported and user warnings of the `UserWarning` category are ignored using the `warnings.filterwarnings("ignore", category=UserWarning)` method.

In [None]:
import logging
import warnings
warnings.filterwarnings("ignore", category=UserWarning)

from simpletransformers.question_answering import QuestionAnsweringModel, QuestionAnsweringArgs

### Cell 7: Selecting Pretrained Model

In this cell, a pretrained model is selected based on the specified `model_type`.

In [None]:
model_type="bert"
model_name= "bert-base-cased"
if model_type == "bert":
    model_name = "bert-base-cased"

elif model_type == "roberta":
    model_name = "roberta-base"

elif model_type == "distilbert":
    model_name = "distilbert-base-cased"

elif model_type == "distilroberta":
    model_type = "roberta"
    model_name = "distilroberta-base"

elif model_type == "electra-base":
    model_type = "electra"
    model_name = "google/electra-base-discriminator"

elif model_type == "electra-small":
    model_type = "electra"
    model_name = "google/electra-small-discriminator"

elif model_type == "xlnet":
    model_name = "xlnet-base-cased"

### Cell 8: Configuring the Model

In this cell, the model is configured with specific settings using `QuestionAnsweringArgs()`.

In [None]:
# Configure the model
model_args = QuestionAnsweringArgs()
model_args.train_batch_size = 16
model_args.evaluate_during_training = True
model_args.n_best_size=3
model_args.num_train_epochs=8

### Cell 9: Advanced Training Methodology

In this cell, advanced training methodology is defined using a dictionary named `train_args`.

In [None]:
### Advanced Methodology
train_args = {
    "reprocess_input_data": True,
    "overwrite_output_dir": True,
    "use_cached_eval_features": True,
    "output_dir": f"outputs/{model_type}",
    "best_model_dir": f"outputs/{model_type}/best_model",
    "evaluate_during_training": True,
    "max_seq_length": 128,
    "num_train_epochs": 8,
    "evaluate_during_training_steps": 1000,
    "wandb_project": "Question Answer Application",
    "wandb_kwargs": {"name": model_name},
    "save_model_every_epoch": False,
    "save_eval_checkpoints": False,
    "n_best_size":3,
    "train_batch_size": 128,
    "eval_batch_size": 64,
}

### Cell 10: Initializing Question Answering Model

In this cell, a Question Answering Model is initialized using the specified `model_type`, `model_name`, and training arguments (`train_args`).

In [None]:
model = QuestionAnsweringModel(
    model_type,model_name, args=train_args, use_cuda=False
)

In [None]:
### Remove output folder
!rm -rf outputs

### Cell 12: Training the Model

In this cell, the initialized model is trained using the `train` data and evaluated on the `test` data.

In [None]:
# Train the model
model.train_model(train, eval_data=test)

### Cell 13: Evaluating the Model

In this cell, the trained model is evaluated on the `test` data.

In [None]:
# Evaluate the model
result, texts = model.eval_model(test)

### Cell 14: Making Predictions with the Model

In this cell, predictions are made using the trained model on the given context and questions.

In [None]:
# Make predictions with the model
to_predict = [
    {
        "context": "Vin is a Mistborn of great power and skill.",
        "qas": [
            {
                "question": "What is Vin's speciality?",
                "id": "0",
            }
        ],
    }
]

### Cell 15: Generating Predictions

In this cell, predictions are generated using the trained model on the provided input.

In [None]:
answers, probabilities = model.predict(to_predict)

print(answers)