# LSTM Bot

## Project Overview

In this project, you will build a chatbot that can converse with you at the command line. The chatbot will use a Sequence to Sequence text generation architecture with an LSTM as it's memory unit. You will also learn to use pretrained word embeddings to improve the performance of the model. At the conclusion of the project, you will be able to show your chatbot to potential employers.

Additionally, you have the option to use pretrained word embeddings in your model. We have loaded Brown Embeddings from Gensim in the starter code below. You can compare the performance of your model with pre-trained embeddings against a model without the embeddings.



---



A sequence to sequence model (Seq2Seq) has two components:
- An Encoder consisting of an embedding layer and LSTM unit.
- A Decoder consisting of an embedding layer, LSTM unit, and linear output unit.

The Seq2Seq model works by accepting an input into the Encoder, passing the hidden state from the Encoder to the Decoder, which the Decoder uses to output a series of token predictions.

## Dependencies

- Pytorch
- Numpy
- Pandas
- NLTK
- Gzip
- Gensim


Please choose a dataset from the Torchtext website. We recommend looking at the Squad dataset first. Here is a link to the website where you can view your options:

- https://pytorch.org/text/stable/datasets.html





In [8]:
!pip install torchdata==0.3.0

Defaulting to user installation because normal site-packages is not writeable
Collecting torch==1.11.0
  Using cached torch-1.11.0-cp37-cp37m-manylinux1_x86_64.whl (750.6 MB)
[31mERROR: torchvision 0.8.2+cu110 has requirement torch==1.7.1, but you'll have torch 1.11.0 which is incompatible.[0m
[31mERROR: torchaudio 0.7.2 has requirement torch==1.7.1, but you'll have torch 1.11.0 which is incompatible.[0m
Installing collected packages: torch
  Attempting uninstall: torch
    Found existing installation: torch 1.7.1+cu110
    Uninstalling torch-1.7.1+cu110:
      Successfully uninstalled torch-1.7.1+cu110
Successfully installed torch-1.11.0


In [1]:
!nvidia-smi

Sun Apr 16 16:46:17 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06    Driver Version: 450.51.06    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla K80           Off  | 00000000:00:04.0 Off |                    0 |
| N/A   70C    P8    32W / 149W |      0MiB / 11441MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+---------------------------------------------------------------------------

In [2]:
import gensim
import nltk
import numpy as np
import pandas as pd
import gzip
import torch
from nltk.corpus import brown
from nltk.tokenize import RegexpTokenizer
import torch.nn as nn
from sklearn.model_selection import KFold
import random

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

In [3]:
torch.__version__

'1.11.0+cu102'

In [4]:
torch.__version__

'1.7.1+cu110'

In [3]:
torch.__version__

'1.11.0+cu102'

In [5]:
nltk.download('brown')
nltk.download('punkt')

# Output, save, and load brown embeddings

model = gensim.models.Word2Vec(brown.sents())
model.save('brown.embedding')

w2v = gensim.models.Word2Vec.load('brown.embedding')

[nltk_data] Downloading package brown to /root/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt.zip.


In [4]:
def loadDF():
    '''
  
    You will use this function to load the dataset into a Pandas Dataframe for processing.

    '''
    from torchtext.datasets import SQuAD1
    train_iter, dev_iter = SQuAD1()
    
    train_and_dev_data = []
    
    for example in train_iter:
        data = {
#             "context": example[0],
            "question": example[1].lower().strip(),
            "answer": example[2][0].lower().strip(),
#             "answer_start": example[3]
        }
        train_and_dev_data.append(data)
    
    for example in dev_iter:
        data = {
#             "context": example[0],
            "question": example[1].lower().strip(),
            "answer": example[2][0].lower().strip(),
#             "answer_start": example[3]
        }
        train_and_dev_data.append(data)
        
    
    df = pd.DataFrame(train_and_dev_data)

    return df

In [5]:
def prepare_text(sentence):
    
    '''

    Our text needs to be cleaned with a tokenizer. This function will perform that task.
    https://www.nltk.org/api/nltk.tokenize.html

    '''
    
#     vocab = Vocab(name='Vocabulary')

#     tokens = vocab.cleanText(sentence)
    tokens = nltk.tokenize.RegexpTokenizer(r'\w+').tokenize(sentence)
    
    return tokens

In [6]:
data_df = loadDF()

In [7]:
#delete rows with empty "answer" column - they cause an error during training process
data_df = data_df.drop(axis=1, index=[55764, 27633])

In [8]:
data_df.head(10)

Unnamed: 0,question,answer
0,to whom did the virgin mary allegedly appear i...,saint bernadette soubirous
1,what is in front of the notre dame main building?,a copper statue of christ
2,the basilica of the sacred heart at notre dame...,the main building
3,what is the grotto at notre dame?,a marian place of prayer and reflection
4,what sits on top of the main building at notre...,a golden statue of the virgin mary
5,when did the scholastic magazine of notre dame...,september 1876
6,how often is notre dame's the juggler published?,twice
7,what is the daily student paper at notre dame ...,the observer
8,how many student news papers are found at notr...,three
9,in what year did the student paper common sens...,1987


In [9]:
data_df['question'] = data_df['question'].apply(prepare_text)
data_df['answer'] = data_df['answer'].apply(prepare_text)
data_df.head(10)

Unnamed: 0,question,answer
0,"[to, whom, did, the, virgin, mary, allegedly, ...","[saint, bernadette, soubirous]"
1,"[what, is, in, front, of, the, notre, dame, ma...","[a, copper, statue, of, christ]"
2,"[the, basilica, of, the, sacred, heart, at, no...","[the, main, building]"
3,"[what, is, the, grotto, at, notre, dame]","[a, marian, place, of, prayer, and, reflection]"
4,"[what, sits, on, top, of, the, main, building,...","[a, golden, statue, of, the, virgin, mary]"
5,"[when, did, the, scholastic, magazine, of, not...","[september, 1876]"
6,"[how, often, is, notre, dame, s, the, juggler,...",[twice]
7,"[what, is, the, daily, student, paper, at, not...","[the, observer]"
8,"[how, many, student, news, papers, are, found,...",[three]
9,"[in, what, year, did, the, student, paper, com...",[1987]


In [10]:
# first we are creating a Vocab Class
class Vocab:
    def __init__(self, name):
        self.name = name
        self.index = {}
        self.count = 0
        self.words = {}

    # This function cleans our words before adding them
    def cleanText(self, text):
        tokenizer = RegexpTokenizer(r'\w+')
        text = tokenizer.tokenize(text)
        return text
    
    # This function indexes words in our vocabulary
    def indexWord(self, word):
#         for word in sentence.split(" "):
        if word not in self.words:
            self.words[word] = self.count
            self.index[self.count] = word
            self.count += 1




def build_vocabularies(df):
    '''
    Input: df: pd.dataframe with data,  
    Output: q_vocab, a_vocab: objects of class Vocab for questions ans answers
    Function will build vocabularies for question and answers or "source" and "target"
    '''
    
#     for row in data_df.iterrows():
#         q_vocab.indexWord(" ".join(row[1][0]))
#         a_vocab.indexWord(" ".join(row[1][1]))
        
    q_vocab = Vocab(name='Question_Vocabulary')
    a_vocab = Vocab(name='Answer_Vocabulary')
    
    count_q = 0
    count_a = 0
    for i, r in df.iterrows():
        text_q = []
        text_a = []    
        text_q = r['question']
        text_a = r['answer']
        for t in text_q:
            if count_q % 1000 == 0:
                print("Adding word {} to our question vocabulary.".format(count_q))
            q_vocab.indexWord(t)
            count_q += 1
            
        for t in text_a:
            if count_a % 1000 == 0:
                print("Adding word {} to our answer vocabulary.".format(count_a))
            a_vocab.indexWord(t)
            count_a += 1

            
    return q_vocab, a_vocab

In [11]:
q_v, a_v = build_vocabularies(data_df)

Adding word 0 to our question vocabulary.
Adding word 0 to our answer vocabulary.
Adding word 1000 to our question vocabulary.
Adding word 2000 to our question vocabulary.
Adding word 3000 to our question vocabulary.
Adding word 4000 to our question vocabulary.
Adding word 1000 to our answer vocabulary.
Adding word 5000 to our question vocabulary.
Adding word 6000 to our question vocabulary.
Adding word 7000 to our question vocabulary.
Adding word 8000 to our question vocabulary.
Adding word 2000 to our answer vocabulary.
Adding word 9000 to our question vocabulary.
Adding word 10000 to our question vocabulary.
Adding word 11000 to our question vocabulary.
Adding word 12000 to our question vocabulary.
Adding word 3000 to our answer vocabulary.
Adding word 13000 to our question vocabulary.
Adding word 14000 to our question vocabulary.
Adding word 15000 to our question vocabulary.
Adding word 4000 to our answer vocabulary.
Adding word 16000 to our question vocabulary.
Adding word 17000 t

Adding word 151000 to our question vocabulary.
Adding word 40000 to our answer vocabulary.
Adding word 152000 to our question vocabulary.
Adding word 153000 to our question vocabulary.
Adding word 154000 to our question vocabulary.
Adding word 41000 to our answer vocabulary.
Adding word 155000 to our question vocabulary.
Adding word 156000 to our question vocabulary.
Adding word 157000 to our question vocabulary.
Adding word 158000 to our question vocabulary.
Adding word 42000 to our answer vocabulary.
Adding word 159000 to our question vocabulary.
Adding word 160000 to our question vocabulary.
Adding word 161000 to our question vocabulary.
Adding word 43000 to our answer vocabulary.
Adding word 162000 to our question vocabulary.
Adding word 163000 to our question vocabulary.
Adding word 164000 to our question vocabulary.
Adding word 165000 to our question vocabulary.
Adding word 166000 to our question vocabulary.
Adding word 44000 to our answer vocabulary.
Adding word 167000 to our qu

Adding word 292000 to our question vocabulary.
Adding word 293000 to our question vocabulary.
Adding word 76000 to our answer vocabulary.
Adding word 294000 to our question vocabulary.
Adding word 295000 to our question vocabulary.
Adding word 296000 to our question vocabulary.
Adding word 297000 to our question vocabulary.
Adding word 77000 to our answer vocabulary.
Adding word 298000 to our question vocabulary.
Adding word 299000 to our question vocabulary.
Adding word 300000 to our question vocabulary.
Adding word 301000 to our question vocabulary.
Adding word 302000 to our question vocabulary.
Adding word 78000 to our answer vocabulary.
Adding word 303000 to our question vocabulary.
Adding word 304000 to our question vocabulary.
Adding word 305000 to our question vocabulary.
Adding word 306000 to our question vocabulary.
Adding word 79000 to our answer vocabulary.
Adding word 307000 to our question vocabulary.
Adding word 308000 to our question vocabulary.
Adding word 309000 to our

Adding word 433000 to our question vocabulary.
Adding word 434000 to our question vocabulary.
Adding word 120000 to our answer vocabulary.
Adding word 435000 to our question vocabulary.
Adding word 436000 to our question vocabulary.
Adding word 437000 to our question vocabulary.
Adding word 121000 to our answer vocabulary.
Adding word 438000 to our question vocabulary.
Adding word 439000 to our question vocabulary.
Adding word 440000 to our question vocabulary.
Adding word 122000 to our answer vocabulary.
Adding word 441000 to our question vocabulary.
Adding word 442000 to our question vocabulary.
Adding word 443000 to our question vocabulary.
Adding word 444000 to our question vocabulary.
Adding word 123000 to our answer vocabulary.
Adding word 445000 to our question vocabulary.
Adding word 446000 to our question vocabulary.
Adding word 447000 to our question vocabulary.
Adding word 124000 to our answer vocabulary.
Adding word 448000 to our question vocabulary.
Adding word 125000 to o

Adding word 567000 to our question vocabulary.
Adding word 169000 to our answer vocabulary.
Adding word 568000 to our question vocabulary.
Adding word 569000 to our question vocabulary.
Adding word 570000 to our question vocabulary.
Adding word 571000 to our question vocabulary.
Adding word 170000 to our answer vocabulary.
Adding word 572000 to our question vocabulary.
Adding word 573000 to our question vocabulary.
Adding word 574000 to our question vocabulary.
Adding word 171000 to our answer vocabulary.
Adding word 575000 to our question vocabulary.
Adding word 172000 to our answer vocabulary.
Adding word 576000 to our question vocabulary.
Adding word 173000 to our answer vocabulary.
Adding word 577000 to our question vocabulary.
Adding word 578000 to our question vocabulary.
Adding word 579000 to our question vocabulary.
Adding word 174000 to our answer vocabulary.
Adding word 580000 to our question vocabulary.
Adding word 581000 to our question vocabulary.
Adding word 582000 to our

Adding word 709000 to our question vocabulary.
Adding word 220000 to our answer vocabulary.
Adding word 710000 to our question vocabulary.
Adding word 711000 to our question vocabulary.
Adding word 221000 to our answer vocabulary.
Adding word 712000 to our question vocabulary.
Adding word 713000 to our question vocabulary.
Adding word 714000 to our question vocabulary.
Adding word 222000 to our answer vocabulary.
Adding word 715000 to our question vocabulary.
Adding word 716000 to our question vocabulary.
Adding word 717000 to our question vocabulary.
Adding word 718000 to our question vocabulary.
Adding word 223000 to our answer vocabulary.
Adding word 719000 to our question vocabulary.
Adding word 720000 to our question vocabulary.
Adding word 721000 to our question vocabulary.
Adding word 224000 to our answer vocabulary.
Adding word 722000 to our question vocabulary.
Adding word 225000 to our answer vocabulary.
Adding word 723000 to our question vocabulary.
Adding word 226000 to our

Adding word 837000 to our question vocabulary.
Adding word 838000 to our question vocabulary.
Adding word 271000 to our answer vocabulary.
Adding word 839000 to our question vocabulary.
Adding word 840000 to our question vocabulary.
Adding word 841000 to our question vocabulary.
Adding word 272000 to our answer vocabulary.
Adding word 842000 to our question vocabulary.
Adding word 843000 to our question vocabulary.
Adding word 844000 to our question vocabulary.
Adding word 273000 to our answer vocabulary.
Adding word 845000 to our question vocabulary.
Adding word 846000 to our question vocabulary.
Adding word 847000 to our question vocabulary.
Adding word 274000 to our answer vocabulary.
Adding word 848000 to our question vocabulary.
Adding word 849000 to our question vocabulary.
Adding word 850000 to our question vocabulary.
Adding word 851000 to our question vocabulary.
Adding word 275000 to our answer vocabulary.
Adding word 852000 to our question vocabulary.
Adding word 853000 to o

Adding word 318000 to our answer vocabulary.
Adding word 996000 to our question vocabulary.
Adding word 997000 to our question vocabulary.
Adding word 998000 to our question vocabulary.
Adding word 999000 to our question vocabulary.
Adding word 319000 to our answer vocabulary.
Adding word 1000000 to our question vocabulary.
Adding word 1001000 to our question vocabulary.
Adding word 320000 to our answer vocabulary.
Adding word 1002000 to our question vocabulary.
Adding word 1003000 to our question vocabulary.
Adding word 321000 to our answer vocabulary.
Adding word 1004000 to our question vocabulary.
Adding word 1005000 to our question vocabulary.


In [12]:
def convert_to_tensor(df, q_vocab, a_vocab):

    source_data = []
    target_data = []
    for i, r in df.iterrows():
        temp_indexed_question = [q_vocab.words[word] for word in r['question']]
        temp_question_tensor = torch.Tensor(temp_indexed_question).long().to(device).view(-1, 1)
        source_data.append(temp_question_tensor)
        
        temp_indexed_answer = [a_vocab.words[word] for word in r['answer']]
        temp_answer_tensor = torch.Tensor(temp_indexed_answer).long().to(device).view(-1, 1)
        target_data.append(temp_answer_tensor)
    
    return source_data, target_data

In [13]:
source_data, target_data = convert_to_tensor(data_df, q_v, a_v)

In [14]:
def train_test_split(SRC, TRG):
    
    '''
    Input: SRC, our list of questions from the dataset
            TRG, our list of responses from the dataset

    Output: Training and test datasets for SRC & TRG

    '''
    
    return SRC_train_dataset, SRC_test_dataset, TRG_train_dataset, TRG_test_dataset

In [15]:
class Encoder(nn.Module):
    
    def __init__(self, input_size, hidden_size, embedding_size, n_layers=1):
        super(Encoder, self).__init__()
        
        # self.embedding provides a vector representation of the inputs to our model
        # self.lstm, accepts the vectorized input and passes a hidden state
        
        self.hidden_size = hidden_size
        self.input_size = input_size
        self.embedding_size = embedding_size
        self.n_layers = n_layers
        
        self.hidden = torch.zeros(1, 1, hidden_size)
        
        self.embedding = nn.Embedding(self.input_size, self.embedding_size)
        self.lstm = nn.LSTM(self.embedding_size, self.hidden_size, self.n_layers)
        
    
    def forward(self, x, hidden, cell):
        
        '''
        Inputs: x, the src vector
        Outputs: out, the encoder outputs
                hidden, the hidden state
                cell, the cell state
        '''
        out = self.embedding(x)
        out = out.view(1, 1, -1)
        out, (hidden, cell) = self.lstm(out, (hidden, cell))
        
        return out, hidden, cell
    

In [16]:
class Decoder(nn.Module):
      
    def __init__(self, hidden_size, output_size, embedding_size, n_layers=1):
        super(Decoder, self).__init__()
        
        # self.embedding provides a vector representation of the target to our model
        # self.lstm, accepts the embeddings and outputs a hidden state
        # self.ouput, predicts on the hidden state via a linear output layer     
        
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.embedding_size = embedding_size
        self.n_layers = n_layers
        
        self.embedding = nn.Embedding(self.output_size, self.embedding_size)
        self.lstm = nn.LSTM(self.embedding_size, self.hidden_size, self.n_layers)
        self.fc = nn.Linear(self.hidden_size, self.output_size)
        self.softmax = nn.LogSoftmax(dim=1)
        
    def forward(self, x, hidden, cell):
        
        '''
        Inputs: x, the target vector
        Outputs: out, the prediction
                hidden, the hidden state
                cell, the cell state
        '''
        out = self.embedding(x)
        out = out.view(1, 1, -1)
        out, (hidden, cell) = self.lstm(out, (hidden, cell))
        out = self.fc(out[0])
        out = self.softmax(out)
        
        return out, hidden, cell

In [17]:
class Seq2Seq(nn.Module):
    
    def __init__(self, input_size, hidden_size, output_size, embedding_size, device, n_layers=1):
        super(Seq2Seq, self).__init__()
        
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.embedding_size = embedding_size
        self.n_layers = n_layers
        self.device = device
        
        self.encoder = Encoder(self.input_size, self.hidden_size, self.embedding_size, self.n_layers)
        self.decoder = Decoder(self.hidden_size, self.output_size, self.embedding_size, self.n_layers)
    
    
    def forward(self, src, trg, teacher_forcing_ratio = 0.5):
        
        trg_len = trg.shape[0]
        src_len = src.shape[0]
        
        nn_output = {
            'decoder_output':[]
        }
        
        # initial hidden and cell states
        hidden = torch.zeros([self.n_layers, 1, self.hidden_size]).to(self.device)
        cell = torch.zeros([self.n_layers, 1, self.hidden_size]).to(self.device)
        
        for idx in range(src_len):
            encoder_output, hidden, cell = self.encoder(src[idx], hidden, cell)
            
        decoder_input = torch.Tensor([[0]]).long().to(self.device)
        
        for idx in range(trg_len):
            decoder_output, hidden, cell = self.decoder(decoder_input, hidden, cell)
            nn_output['decoder_output'].append(decoder_output)
            
            #deploying teaching improvement
            if self.training:
                decoder_input = trg[idx] if random.random() > teacher_forcing_ratio else decoder_output.argmax(1)
            else:
                _, top_index = decoder_output.data.topk(1)
                decoder_input = top_index.squeeze().detach()
        
        
        return nn_output   

In [18]:
learning_rate = 0.01
hidden_size = 64
embedding_size = 64
batch_size = 64
epochs = 3

In [19]:
seq2seq_model = Seq2Seq(q_v.count, hidden_size, a_v.count, embedding_size, device)

In [20]:
def training_function(source_data, target_data, model, epochs, batch_size, print_every, learning_rate, device):
    
    model.to(device)
    total_training_loss = 0
    total_validation_loss = 0
    loss = 0
    
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
    criterion = nn.NLLLoss()
    
    #cross validation
    kf = KFold(n_splits=5, shuffle=True)
    
    for e, (train_idx, test_idx) in enumerate(kf.split(source_data), 1):
        model.train()
        for idx in range(0, len(train_idx)):
            src = source_data[idx]
            trg = target_data[idx]
            
            try:
                nn_output = model(src, trg)
            except RuntimeError:
                print("training")
                print("idx: ", idx)
                print("source_data: ", src)
                print("target_data: ", trg)
            
            current_loss = 0
            for (s, t) in zip(nn_output['decoder_output'], trg):
#                 try:
                current_loss += criterion(s, t)
#                 except RuntimeError:
#                     print("Training current Loss error")
#                     print("Criterion: ", criterion)                
                
            loss += current_loss
#           total_training_loss += (current_loss.item() / trg.size(0)) #cummulating loss from each iteration
#             try:
            total_training_loss += (current_loss / trg.size(0)) #cummulating loss from each iteration
#             except ZeroDivisionError:
#                 print('idx:', idx)
#                 print("target:", trg)
            
            if idx % batch_size == 0 or idx == (len(train_idx)-1):
                loss.backward()
                optimizer.step()
                optimizer.zero_grad()
                loss = 0
                
                
        #validation
        model.eval()
        for idx in range(0, len(test_idx)):
            src = source_data[idx]
            trg = target_data[idx]
    
            try:
                nn_output = model(src, trg)
            except RuntimeError:
                print("validation")
                print("idx: ", idx)
                print("source_data: ", src)
                print("target_data: ", trg)
                
            current_loss = 0
            for (s, t) in zip(nn_output['decoder_output'], trg):
#                 try:
                current_loss += criterion(s, t)
#                 except RuntimeError:
#                     print("validation current Loss error")
#                     print("Criterion: ", criterion)
                
    #         total_validation_loss += (current_loss.item() / trg.size(0))
            total_validation_loss += (current_loss / trg.size(0))        

        if e % print_every == 0:
            training_loss_avg = total_training_loss / (len(train_idx)*print_every)
            validation_loss_avg = total_validation_loss / (len(test_idx)*print_every)
            print("{}/{} Epoch ---- Training Loss = {:.5f} ---- Validation Loss = {:.5f}".format(e, epochs, training_loss_avg, validation_loss_avg))
            total_training_loss = 0
            total_validation_loss = 0


In [21]:
training_function(source_data=source_data, target_data=target_data, model=seq2seq_model, print_every=1, epochs=epochs,
                  learning_rate=learning_rate, batch_size=batch_size, device=device)

1/3 Epoch ---- Training Loss = 9.35387 ---- Validation Loss = 9.14088
validation
idx:  4671
source_data:  tensor([[  12],
        [  13],
        [   3],
        [ 155],
        [  15],
        [   3],
        [ 851],
        [   3],
        [1144],
        [4108],
        [ 353],
        [   8]], device='cuda:0')
target_data:  tensor([[4008],
        [3988]], device='cuda:0')


RuntimeError: CUDA error: invalid resource handle
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

In [None]:
2+2

In [None]:
1+1+1

In [None]:
1+1+1+1

In [None]:
idx: 27633
target: tensor([], device='cuda:0', size=(0, 1), dtype=torch.int64)
idx: 55764
target: tensor([], device='cuda:0', size=(0, 1), dtype=torch.int64)
idx: 27633
target: tensor([], device='cuda:0', size=(0, 1), dtype=torch.int64)
idx: 55764
target: tensor([], device='cuda:0', size=(0, 1), dtype=torch.int64)