## Prerequisites

torch==1.1.0

In [125]:
import random
from collections import Counter

import numpy as np 
import pandas as pd 
import torch 
import torch.nn as nn 

from gensim.models import KeyedVectors

from sklearn.metrics import classification_report, accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer

from string import punctuation
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
from nltk.stem import WordNetLemmatizer

import nltk
nltk.download('stopwords')
nltk.download('punkt')
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer() 
stop_words = set(stopwords.words('english'))

[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


In [58]:
from google.colab import drive
drive.mount('/content/drive')
df = pd.read_csv("/content/drive/My Drive/train.csv")

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [59]:

df.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


In this notebook you will learn pytorch basics, this framework will help you to build simple neural networks during this task.   
The first neural network we will try to learn is Feed Forward Neural Network which contain one Fully Connected Layer.  
It can have 1 or more fully connected layers, also it could be called as MLP - multilayer perceptron. 

Read about PyTorch here:  
https://en.wikipedia.org/wiki/PyTorch

And here:

https://neurohive.io/ru/tutorial/glubokoe-obuchenie-s-pytorch/

While reading these articles probably you will meet some unknown terms: 
backpropagation algorithm, gradient descent, activation function, loss function, etc.  
Please, try to look for an information about why do you need all of these stuff. 

Answer this questions about Neural Nets: 

1. In previous tasks we created some features manually, tried to weight our features, tried to select special words for vectorization, how deep learning solves this problem? 

2. Why do we work with tensors in PyTorch?

3. Please, find and read information - why do we need an activation functions in our models? Please, refer to the XOR problem with MLP without activation function, find information about it and answer the previous question. 

4. Please, answer the following question - what gradient is? Why do we need gradient descent algorithm? Which problem it solves? 

5. What is backpropagation algorithm? 

6. What is loss function? 

1.Neural net creating features by itself. 

2.It's faster and more coveinient in context of Neural Networks.

3.We need activation functions to add non-linearity in NN. Without activation function we can't learn ever XOR, beacuse it's linearly inseparable. 

4.Gradient - vector of partial derivatives of a function. Gradiant descent is an optimization algorithm. We need it to find minimum of loss function.

5.Backprop is a rule how to update derivatives in gradient descent. 

6.Function which we are trying to minimize while training NN.

Read the following article:

https://en.wikipedia.org/wiki/Feedforward_neural_network

What is FFNN? 

NN without recurrent layers

## PyTorch basics

#### Autograd

In [60]:
# Creating a tensor:
x = torch.ones(1, requires_grad=True)

print(x.grad)    # returns None

None


print(x.grad) is None because a tensor x is a scalar, so there is nothing to be calculated.

In [61]:
x = torch.ones(1, requires_grad=True)
y = 20 + x
z = (y ** 2) * 2 
z.backward()     # auto gradient calculation

print(x.grad)    # ∂z/∂x 

tensor([84.])


### Prepare the data

In [62]:
df.head()

Unnamed: 0,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate
0,0000997932d777bf,Explanation\nWhy the edits made under my usern...,0,0,0,0,0,0
1,000103f0d9cfb60f,D'aww! He matches this background colour I'm s...,0,0,0,0,0,0
2,000113f07ec002fd,"Hey man, I'm really not trying to edit war. It...",0,0,0,0,0,0
3,0001b41b1c6bb37e,"""\nMore\nI can't make any real suggestions on ...",0,0,0,0,0,0
4,0001d958c54c6e35,"You, sir, are my hero. Any chance you remember...",0,0,0,0,0,0


In [0]:
def preprocess_text(tokenizer, lemmatizer, stop_words, punctuation, text): 
    tokens = tokenizer(text.lower())
    lemmas = [lemmatizer.lemmatize(token) for token in tokens]
    return [token for token in lemmas if token not in stop_words and token not in punctuation and len(token) > 4 and len(token) < 20]

df['cleaned'] = df.comment_text.apply(lambda x: preprocess_text(word_tokenize, lemmatizer, stop_words, punctuation, x))

In [0]:
# Modify labels dtype to 'int', to make summarizing them possible
for column in df.columns: 
    if column not in ['id', 'comment_text', 'cleaned']:
        df[column] = df[column].astype('int32')
        
# Create a toxicity column (sums all of the toxic labels)
df['toxicity'] = df.iloc[:,2:8].sum(axis=1)

# Clean data - where toxicity is == 0 
clean = df[df['toxicity'] == 0]
# Messages, which were labelled as obscene
obscene = df[df['obscene'] == 1]

# Create a dataset for binary classification 
df_binary = clean.append(obscene, ignore_index=True, sort=False)

In [0]:
# Shuffle
df_binary = df_binary.sample(frac=1)

# Reset index of the pd.DataFrame
df_binary.reset_index(inplace=True)

In [66]:
df_binary.head()

Unnamed: 0,index,id,comment_text,toxic,severe_toxic,obscene,threat,insult,identity_hate,cleaned,toxicity
0,4889,0e9009250597b838,It's a temporary measure; it's been there les...,0,0,0,0,0,0,"[temporary, measure, every, wikipedia, result,...",0
1,29161,5690a74218be68d3,"""\n\nI, for one, wish there had been discussio...",0,0,0,0,0,0,"[discussion, prior, relegating, article, coupl...",0
2,5741,11299cd34577b931,"Please feel free to ignore this warning, addin...",0,0,0,0,0,0,"[please, ignore, warning, adding, source, arti...",0
3,149900,93e24c1fec3135d3,Why do you keep blocking me? \n\nYou are a big...,1,0,1,0,1,0,"[blocking, idiot, dmacks, block, change, negot...",3
4,111552,9865b015628af9cc,The current Signpost suggests we will be seein...,0,0,0,0,0,0,"[current, signpost, suggests, seeing, sooner, ...",0


In [67]:
# Load W2V model 

we_model = KeyedVectors.load_word2vec_format('/content/drive/My Drive/GoogleNews-vectors-negative300.bin.gz', binary=True)

  'See the migration notes for details: %s' % _MIGRATION_NOTES_URL


In [0]:
# Make stratified sampling, for example: select 500 examples with obscene == 1, and 500 clean examples. 
from sklearn.model_selection import train_test_split

# Select only a small sample of your data (20%), do not train your model on all of the data available 
# But to make the task easier, make a stratified selection 
# (number of 1 labels would be approximately equal to number of 0 labels)
df_sample = df_binary.sample(20000)

# Split the data on the stratified training and test data sets 
df_train, df_test = train_test_split(df_sample, test_size = 0.3, stratify = df_sample.toxic)

In [69]:
print("Train shape: {}".format(df_train.shape))
print("Test shape: {}".format(df_test.shape))

Train shape: (14000, 11)
Test shape: (6000, 11)


In [0]:
def get_vectors(df_sample): 
    '''
    This function would process a DataFrame creating lists of:
        vectors, labels and documents corresponding to each raw document. 
        
    Args: 
        df: pd.DataFrame - DF to vectorize
    Returns: 
        X: list - Vectorized documents, each value in a list is a torch.tensor
        labels: list - Labels for each document, each value in a list is a torch.tensor
        documents: list - List of the raw texts of the vectorized documents 
    '''
    
    # Obtain vectors for documents, vectorized documents list and labels
    X, labels, documents = [], [], []
    for i, (document, tokens, label) in enumerate(zip(df_sample.comment_text, df_sample.cleaned, df_sample.toxic)):
        row_vectors = []
        for kw in tokens:
            try: 
                row_vectors.append(we_model[kw])
            except (IndexError, KeyError): 
                continue
        if not row_vectors:
            continue
        row_vectors = np.asarray(row_vectors)
        vec = row_vectors.mean(axis=0)
        X.append(torch.tensor(vec))
        documents.append(document)
        labels.append(torch.tensor(label, dtype=torch.float))
        
    return X, labels, documents

In [0]:
X_train, y_train, documents_train = get_vectors(df_train)
X_test, y_test, documents_test = get_vectors(df_test)

### How to create a simple NN: 

In [0]:
# Modify your model to work with batches, not only single item. 
''' TASK HERE'''

class FeedForward(nn.Module):
    
    def __init__(self, input_size, hidden_size):
        super().__init__()
        self.input_size = input_size
        self.hidden_size  = hidden_size
        
        self.fc1 = nn.Linear(self.input_size, self.hidden_size)
        self.relu = nn.ReLU()
        self.logits = nn.Linear(self.hidden_size, 1)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        # Makes a forward pass 
        hidden = self.fc1(x)
        relu = self.relu(hidden)
        logits = self.logits(relu)
        output = self.sigmoid(logits)
        return output

In [73]:
# Initialise the model 
model = FeedForward(300, 200)

# Specify loss and optimization functions:

# specify loss function
criterion = nn.BCELoss()
# specify optimizer
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)

# Move model to the training mode
model.train()

# init n_epochs 
n_epochs = 10

# init number of iterations for one epoch 
# we want our model during the epoch to walk trough all of the training examples 
# for batch_size == 1, number of iterations would be equal to number of examples 
# in the training set 
n_iters = len(X_train)

# initialise batch_size
# NOTE! for now it's equal == 1, you need to modify your model to make it possible to work with 
# batches during training, not only making an update for a single example 
batch_size = 1
for epoch in range(n_epochs):  
    epoch_loss = 0
    for idx in range(n_iters):
        
        # Selects only 1 sample, modify it to select N samples, N == batch_size
        ''' TASK HERE'''
        # idx = random.sample(range(len(X_train)), 1) # TIP: You can random sample N examples 
        
        optimizer.zero_grad()    # Forward pass

        # Select corresponding data from:
        # X (vectors) and labels - for calculating the loss and making a backward pass 
        # backward pass - updating our weights according to the obtained loss 
        ''' TASK HERE'''
        x = X_train[idx]
        y_true = y_train[idx]

        y_pred = model(x)    # Compute Loss
        loss = criterion(y_pred.squeeze(), y_true)
        
        epoch_loss += loss.item() / n_iters
        loss.backward()   # Backward pass 
        optimizer.step()
        
    print('Epoch {}: train loss: {}'.format(epoch, epoch_loss))    # Backward pass

Epoch 0: train loss: 0.16041085071335473


KeyboardInterrupt: ignored

In [0]:
def make_predictions(model, X_test, y_test, documents_test, threshold): 
    n_prints = 0
    preds = []
    for example, label, document in zip(X_test, y_test, documents_test):
        pred = model(example)
        y_pred = int(pred.item() > threshold)
        preds.append(y_pred)
        
        # Print some examples with obscene documents texts and predicted and true labels 
        if label.item() == 1.0 and n_prints < 10:
            print("Predicted label: {}".format(y_pred))
            print("True label: {}".format(label.item()))
            print("Document: {}".format(document))
            print("*-*-"*20)
            n_prints += 1
        
    return preds

In [0]:
# Move model to the eval mode before making a prediction
model.eval()
preds = make_predictions(model, X_test, y_test, documents_test, threshold=0.5)

test_labels = [label.item() for label in y_test]

In [0]:
# Pring a classification report: 
print(classification_report(test_labels, preds))

## Task 1: 

#### Find all of the ''' TASK HERE ''' messages. 

1. Create stratified dataset, make your classes balanced! Train the model. Try to beat the initial score.

2. While vectorizing by W2V model, add tf-idf weightning, look at TfidfVectorizer at sklearn. 

3. Add batch size, modify your model architecture to make it possible to process batches, not only single items. 

4. Change hidden_size, n_layers, activation function, etc to modify your model. 

5. Tweak learning rate, see what happened if LR is too small, if too big (0.0001 / 0.8 for example)

In [0]:
# Tip:
# Use tf-idf scores calculated by sklearn:

def dummy_fun(doc):
    # This function is used to replace a default tokenizer in sklearn. 
    # If you are passing a tokenized documents to the tf-idf vectorizer - 
    # it would be much faster 
    return doc

def make_predictions(model, X_test, threshold=0.5): 
    preds = []
    for example in X_test:
        pred = model(example)
        y_pred = int(pred.item() > threshold)
        preds.append(y_pred)

    return preds

def get_idf(tokenized_docs, max_features=180000):
    ''' Returns a tf-idf dictionary: 
            key: word,
            value: tf-idf score. 
    '''
    vectorizer = TfidfVectorizer(
        min_df=3,
        max_features=max_features,
        analyzer='word',
        tokenizer=dummy_fun,
        preprocessor=dummy_fun,
        token_pattern=None,
        ngram_range=(1, 1))

    vectorizer.fit(tokenized_docs)
    idf_dict = dict(zip(vectorizer.get_feature_names(), vectorizer.idf_))
    
    return idf_dict

def get_weighted_vectors(df_sample): 
    '''
    This function would process a DataFrame creating lists of:
        vectors, labels and documents corresponding to each raw document. 
        
    Args: 
        df: pd.DataFrame - DF to vectorize
    Returns: 
        X: list - Vectorized documents, each value in a list is a torch.tensor
        labels: list - Labels for each document, each value in a list is a torch.tensor
        documents: list - List of the raw texts of the vectorized documents 
    '''
    
    tfidf_dict = get_idf(df_sample.cleaned)
    # Obtain vectors for documents, vectorized documents list and labels
    X, labels, documents = [], [], []
    for i, (document, tokens, label) in enumerate(zip(df_sample.comment_text, df_sample.cleaned, df_sample.toxic)):
        row_vectors = []
        for kw in tokens:
            try: 
                row_vectors.append(we_model[kw]*tfidf_dict[kw])
            except (IndexError, KeyError): 
                continue
        if not row_vectors:
            continue
        row_vectors = np.asarray(row_vectors)
        vec = row_vectors.mean(axis=0)
        X.append(torch.tensor(vec))
        documents.append(document)
        labels.append(torch.tensor(label, dtype=torch.float))
        
    return X, labels, documents

In [0]:
df_sample = df_binary.sample(100000)
df_train, df_test = train_test_split(df_sample, test_size = 0.3, stratify = df_sample.toxic)

X_train, y_train, documents_train = get_weighted_vectors(df_train)
X_test, y_test, documents_test = get_weighted_vectors(df_test)

X_train = torch.stack(X_train)
y_train = torch.stack(y_train)

X_test = torch.stack(X_test)
y_test = torch.stack(y_test)

In [0]:
class FeedForward(nn.Module):
    
    def __init__(self, input_size_1, hidden_size_1, hidden_size_2):
        super().__init__()
        self.input_size_1 = input_size_1
        self.hidden_size_1 = hidden_size_1
        self.hidden_size_2 = hidden_size_2
        
        self.fc1 = nn.Linear(self.input_size_1, self.hidden_size_1)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(self.hidden_size_1, self.hidden_size_2)
        self.logits = nn.Linear(self.hidden_size_2, 1)
        self.sigmoid = nn.Sigmoid()
        
    def forward(self, x):
        hidden_1 = self.fc1(x)
        relu = self.relu(hidden_1)
        hidden_2 = self.fc2(relu)
        relu = self.relu(hidden_2)
        logits = self.logits(relu)
        output = self.sigmoid(logits)
        return output

In [131]:
model = FeedForward(300, 300, 150)
criterion = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.001)
model.train()
n_epochs = 120
n_iters = 100
batch_size = len(X_train) // n_iters

for epoch in range(n_epochs):  
    epoch_loss = 0
    for idx in range(n_iters):

        optimizer.zero_grad()

        idx = random.sample(range(len(X_train)), batch_size)

        x = X_train[idx]
        y_true = y_train[idx]

        y_pred = model(x)
        loss = criterion(y_pred.squeeze(), y_true)
        epoch_loss += loss.item() / n_iters
        
        loss.backward()
        optimizer.step()

    print('Epoch {}: train loss: {};'.format(epoch, epoch_loss))

model.eval()

preds_train = preds_test = make_predictions(model, X_train)
preds_test = make_predictions(model, X_test)

train_labels = [label.item() for label in y_train]
test_labels = [label.item() for label in y_test]


print(classification_report(train_labels, preds_train))
print(classification_report(test_labels, preds_test))

Epoch 0: train loss: 0.6211396008729934;
Epoch 1: train loss: 0.5746709966659546;
Epoch 2: train loss: 0.5313718396425247;
Epoch 3: train loss: 0.4879178181290627;
Epoch 4: train loss: 0.4476482385396957;
Epoch 5: train loss: 0.41142268151044836;
Epoch 6: train loss: 0.377664329111576;
Epoch 7: train loss: 0.34304334461688996;
Epoch 8: train loss: 0.31692695081233985;
Epoch 9: train loss: 0.2937179663777353;
Epoch 10: train loss: 0.27289988532662396;
Epoch 11: train loss: 0.2615836913883685;
Epoch 12: train loss: 0.24828577265143412;
Epoch 13: train loss: 0.238779913932085;
Epoch 14: train loss: 0.2307542480528354;
Epoch 15: train loss: 0.2225910311937331;
Epoch 16: train loss: 0.22129348337650295;
Epoch 17: train loss: 0.21392185091972352;
Epoch 18: train loss: 0.2108626137673854;
Epoch 19: train loss: 0.2098289927840232;
Epoch 20: train loss: 0.20392102241516105;
Epoch 21: train loss: 0.19967376783490176;
Epoch 22: train loss: 0.19969317376613616;
Epoch 23: train loss: 0.199223379045

In [132]:
model = FeedForward(300, 300, 150)
criterion = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.01)
model.train()
n_epochs = 60
n_iters = 100
batch_size = len(X_train) // n_iters

for epoch in range(n_epochs):  
    epoch_loss = 0
    for idx in range(n_iters):

        optimizer.zero_grad()

        idx = random.sample(range(len(X_train)), batch_size)

        x = X_train[idx]
        y_true = y_train[idx]

        y_pred = model(x)
        loss = criterion(y_pred.squeeze(), y_true)
        epoch_loss += loss.item() / n_iters
        
        loss.backward()
        optimizer.step()

    print('Epoch {}: train loss: {};'.format(epoch, epoch_loss))

model.eval()

preds_train = preds_test = make_predictions(model, X_train)
preds_test = make_predictions(model, X_test)

train_labels = [label.item() for label in y_train]
test_labels = [label.item() for label in y_test]


print(classification_report(train_labels, preds_train))
print(classification_report(test_labels, preds_test))

Epoch 0: train loss: 0.45577899217605594;
Epoch 1: train loss: 0.2343117994070052;
Epoch 2: train loss: 0.19893284708261494;
Epoch 3: train loss: 0.18276584520936015;
Epoch 4: train loss: 0.16685973629355433;
Epoch 5: train loss: 0.15708563975989823;
Epoch 6: train loss: 0.15171488553285595;
Epoch 7: train loss: 0.14197112061083317;
Epoch 8: train loss: 0.13539109043776984;
Epoch 9: train loss: 0.1317626535892487;
Epoch 10: train loss: 0.12730546973645687;
Epoch 11: train loss: 0.12257180735468866;
Epoch 12: train loss: 0.12220915049314503;
Epoch 13: train loss: 0.12117244623601438;
Epoch 14: train loss: 0.11656584806740283;
Epoch 15: train loss: 0.11489642336964606;
Epoch 16: train loss: 0.11434262789785857;
Epoch 17: train loss: 0.11873188726603981;
Epoch 18: train loss: 0.1103391744196415;
Epoch 19: train loss: 0.11163479052484038;
Epoch 20: train loss: 0.10562747403979296;
Epoch 21: train loss: 0.10999191753566265;
Epoch 22: train loss: 0.11200471796095367;
Epoch 23: train loss: 0.

In [133]:
model = FeedForward(300, 300, 150)
criterion = nn.BCELoss()
optimizer = torch.optim.SGD(model.parameters(), lr = 0.8)
model.train()
n_epochs = 60
n_iters = 100
batch_size = len(X_train) // n_iters

for epoch in range(n_epochs):  
    epoch_loss = 0
    for idx in range(n_iters):

        optimizer.zero_grad()

        idx = random.sample(range(len(X_train)), batch_size)

        x = X_train[idx]
        y_true = y_train[idx]

        y_pred = model(x)
        loss = criterion(y_pred.squeeze(), y_true)
        epoch_loss += loss.item() / n_iters
        
        loss.backward()
        optimizer.step()

    print('Epoch {}: train loss: {};'.format(epoch, epoch_loss))

model.eval()

preds_train = preds_test = make_predictions(model, X_train)
preds_test = make_predictions(model, X_test)

train_labels = [label.item() for label in y_train]
test_labels = [label.item() for label in y_test]


print(classification_report(train_labels, preds_train))
print(classification_report(test_labels, preds_test))

Epoch 0: train loss: 0.12322368115186688;
Epoch 1: train loss: 0.09818455047905446;
Epoch 2: train loss: 0.09242620784789324;
Epoch 3: train loss: 0.08677276540547611;
Epoch 4: train loss: 0.08317872837185855;
Epoch 5: train loss: 0.08250761818140746;
Epoch 6: train loss: 0.08085583455860613;
Epoch 7: train loss: 0.07468647360801699;
Epoch 8: train loss: 0.06834786148741841;
Epoch 9: train loss: 0.06840197412297129;
Epoch 10: train loss: 0.06276670208200812;
Epoch 11: train loss: 0.0641672505438328;
Epoch 12: train loss: 0.05692399621009827;
Epoch 13: train loss: 0.05894658770412206;
Epoch 14: train loss: 0.052016238626092656;
Epoch 15: train loss: 0.05173601506277919;
Epoch 16: train loss: 0.07829651961103083;
Epoch 17: train loss: 0.05043162874877455;
Epoch 18: train loss: 0.04616866538301109;
Epoch 19: train loss: 0.044699518550187355;
Epoch 20: train loss: 0.046010530432686214;
Epoch 21: train loss: 0.03971729012206197;
Epoch 22: train loss: 0.0398909389413893;
Epoch 23: train loss

  _warn_prf(average, modifier, msg_start, len(result))


## Task 2, advanced

Working with nn.Embedding layer 

https://pytorch.org/tutorials/beginner/nlp/word_embeddings_tutorial.html 

Read an example below. 

Please, try to modify your initial version of the SingleLayerPerceptron model to the model with one additional layer: 

1. Define your vocabulary size  
2. Add nn.Embedding layer to the model architecture (vocabulary_size, embedding_size) 
3. Retrain your model - see if metrics increased.

### Useful parts for the part 2: 

Refer  to the part 4.3 of the course:

https://stepik.org/lesson/262247/

It will help you to get the understanding how to use an nn.Embedding layer. 

#####  Let's create a vocabulary: 

In [0]:
def flat_nested(nested):
    flatten = []
    for item in nested:
        if isinstance(item, list):
            flatten.extend(item)
        else:
            flatten.append(item)
    return flatten

cnt_vocab = Counter(flat_nested(df.cleaned.tolist()))

In [0]:
threshold_count_l = 15
threshold_count_h = 500
threshold_len = 4
cleaned_vocab = [token for token, count in cnt_vocab.items() if 
                     threshold_count_h > count > threshold_count_l and len(token) > threshold_len
                ]
print("Vocab size: {}".format(len(cleaned_vocab)))

Vocab size: 13061


In [0]:
# You will need to have an id for each of your token 

token_to_id = {v: k for k, v in enumerate(sorted(cleaned_vocab))}
id_to_token = {v: k for k, v in token_to_id.items()}