# Project 6: Analyzing Stock Sentiment from Twits
## Instructions
Each problem consists of a function to implement and instructions on how to implement the function.  The parts of the function that need to be implemented are marked with a `# TODO` comment.

## Packages
When you implement the functions, you'll only need to you use the packages you've used in the classroom, like [Pandas](https://pandas.pydata.org/) and [Numpy](http://www.numpy.org/). These packages will be imported for you. We recommend you don't add any import statements, otherwise the grader might not be able to run your code.

### Load Packages

In [1]:
import json
import nltk
import os
import random
import re
import torch

from torch import nn, optim
import torch.nn.functional as F

## Introduction
When deciding the value of a company, it's important to follow the news. For example, a product recall or natural disaster in a company's product chain. You want to be able to turn this information into a signal. Currently, the best tool for the job is a Neural Network. 

For this project, you'll use posts from the social media site [StockTwits](https://en.wikipedia.org/wiki/StockTwits). The community on StockTwits is full of investors, traders, and entrepreneurs. Each message posted is called a Twit. This is similar to Twitter's version of a post, called a Tweet. You'll build a model around these twits that generate a sentiment score.

We've collected a bunch of twits, then hand labeled the sentiment of each. To capture the degree of sentiment, we'll use a five-point scale: very negative, negative, neutral, positive, very positive. Each twit is labeled -2 to 2 in steps of 1, from very negative to very positive respectively. You'll build a sentiment analysis model that will learn to assign sentiment to twits on its own, using this labeled data.

The first thing we should to do, is load the data.

## Import Twits 
### Load Twits Data 
This JSON file contains a list of objects for each twit in the `'data'` field:

```
{'data':
  {'message_body': 'Neutral twit body text here',
   'sentiment': 0},
  {'message_body': 'Happy twit body text here',
   'sentiment': 1},
   ...
}
```

The fields represent the following:

* `'message_body'`: The text of the twit.
* `'sentiment'`: Sentiment score for the twit, ranges from -2 to 2 in steps of 1, with 0 being neutral.


To see what the data look like by printing the first 10 twits from the list. 

In [2]:
with open(os.path.join('..', '..', 'data', 'project_6_stocktwits', 'twits.json'), 'r') as f:
    twits = json.load(f)

print(twits['data'][:10])

[{'message_body': '$FITB great buy at 26.00...ill wait', 'sentiment': 2, 'timestamp': '2018-07-01T00:00:09Z'}, {'message_body': '@StockTwits $MSFT', 'sentiment': 1, 'timestamp': '2018-07-01T00:00:42Z'}, {'message_body': '#STAAnalystAlert for $TDG : Jefferies Maintains with a rating of Hold setting target price at USD 350.00. Our own verdict is Buy  http://www.stocktargetadvisor.com/toprating', 'sentiment': 2, 'timestamp': '2018-07-01T00:01:24Z'}, {'message_body': '$AMD I heard there’s a guy who knows someone who thinks somebody knows something - on StockTwits.', 'sentiment': 1, 'timestamp': '2018-07-01T00:01:47Z'}, {'message_body': '$AMD reveal yourself!', 'sentiment': 0, 'timestamp': '2018-07-01T00:02:13Z'}, {'message_body': '$AAPL Why the drop? I warren Buffet taking out his position?', 'sentiment': 1, 'timestamp': '2018-07-01T00:03:10Z'}, {'message_body': '$BA bears have 1 reason on 06-29 to pay more attention https://dividendbot.com?s=BA', 'sentiment': -2, 'timestamp': '2018-07-01T

### Length of Data
Now let's look at the number of twits in dataset. Print the number of twits below.

In [3]:
"""print out the number of twits"""

# TODO Implement 

print(len(twits['data']))

1548010


### Split Message Body and Sentiment Score

In [4]:
messages = [twit['message_body'] for twit in twits['data']]
# Since the sentiment scores are discrete, we'll scale the sentiments to 0 to 4 for use in our network
sentiments = [twit['sentiment'] + 2 for twit in twits['data']]
print(messages[0:10])
print(max(messages, key=len))

['$FITB great buy at 26.00...ill wait', '@StockTwits $MSFT', '#STAAnalystAlert for $TDG : Jefferies Maintains with a rating of Hold setting target price at USD 350.00. Our own verdict is Buy  http://www.stocktargetadvisor.com/toprating', '$AMD I heard there’s a guy who knows someone who thinks somebody knows something - on StockTwits.', '$AMD reveal yourself!', '$AAPL Why the drop? I warren Buffet taking out his position?', '$BA bears have 1 reason on 06-29 to pay more attention https://dividendbot.com?s=BA', '$BAC ok good we&#39;re not dropping in price over the weekend, lol', '$AMAT - Daily Chart, we need to get back to above 50.', '$GME 3% drop per week after spike... if no news in 3 months, back to 12s... if BO, then bingo... what is the odds?']
$AMD I bought Aug. 16.50 call for .98, rolled up out 3X, cashed in today for 12X gain.  It may need a rest.  https://finance.yahoo.com/chart/AMD#eyJpbnRlcnZhbCI6IndlZWsiLCJwZXJpb2RpY2l0eSI6MSwidGltZVVuaXQiOm51bGwsImNhbmRsZVdpZHRoIjoxMi4wNzY

## Preprocessing the Data
With our data in hand we need to preprocess our text. These twits are collected by filtering on ticker symbols where these are denoted with a leader $ symbol in the twit itself. For example,

`{'message_body': 'RT @google Our annual look at the year in Google blogging (and beyond) http://t.co/sptHOAh8 $GOOG',
 'sentiment': 0}`

The ticker symbols don't provide information on the sentiment, and they are in every twit, so we should remove them. This twit also has the `@google` username, again not providing sentiment information, so we should also remove it. We also see a URL `http://t.co/sptHOAh8`. Let's remove these too.

The easiest way to remove specific words or phrases is with regex using the `re` module. You can sub out specific patterns with a space:

```python
re.sub(pattern, ' ', text)
```
This will substitute a space with anywhere the pattern matches in the text. Later when we tokenize the text, we'll split appropriately on those spaces.

### Pre-Processing

In [5]:
nltk.download('wordnet')
def preprocess(message):
    """
    This function takes a string as input, then performs these operations: 
        - lowercase
        - remove URLs
        - remove ticker symbols 
        - removes punctuation
        - tokenize by splitting the string on whitespace 
        - removes any single character tokens
    
    Parameters
    ----------
        message : The text message to be preprocessed.
        
    Returns
    -------
        tokens: The preprocessed text into tokens.
    """ 
    #TODO: Implement 
    # Lowercase the twit message
    text = message.lower()
    # Replace URLs with a space in the message
    text = re.sub(r'(http|https)(\:\/\/)\S*', ' ', text)
        
    # Replace ticker symbols with a space. The ticker symbols are any stock symbol that starts with $.
    text = re.sub(r'\$\b.*?(\s|$)', ' ', text)

    
    # Replace StockTwits usernames with a space. The usernames are any word that starts with @.
    text = re.sub(r'\@\b.*?(\s|$)', ' ', text)
          
    # Replace everything not a letter with a space
    text = re.sub(r'\W', ' ', text)
    text = re.sub(r'\d', ' ', text)
    
    # Tokenize by splitting the string on whitespace into a list of words
    text = re.split(r'\W+', text)
    
    # Lemmatize words using the WordNetLemmatizer. You can ignore any word that is not longer than one character.
    text = [w for w in text if len(w) > 1]
    wnl = nltk.stem.WordNetLemmatizer()
    tokens = [wnl.lemmatize(wnl.lemmatize(w,'n'),'v') \
                for w in text]
    assert type(tokens) == list, 'Tokens should be list'
    return tokens
print(preprocess('@StockTwits $MSFT this$ $THAT'))

[nltk_data] Downloading package wordnet to /home/student/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!


['this']


>Note: You must ensure that after preprocessing the text should NOT include:
- Numbers
- URLs
- Single character tokens
- Ticker symbols (these should be removed even if they don't appear at the beginning)

### Preprocess All the Twits 
Now we can preprocess each of the twits in our dataset. Apply the function `preprocess` to all the twit messages.

In [6]:
# TODO Implement
tokenized = [preprocess(message) for message in messages]

### Bag of Words
Now with all of our messages tokenized, we want to create a vocabulary and count up how often each word appears in our entire corpus. Use the [`Counter`](https://docs.python.org/3.1/library/collections.html#collections.Counter) function to count up all the tokens.

In [7]:
from collections import Counter
print(tokenized[0:20])
"""
Create a vocabulary by using Bag of words
"""

# TODO: Implement 
bow = Counter()
for message in tokenized:
    bow.update(message)
print(bow.most_common(3))

[['great', 'buy', 'at', 'ill', 'wait'], [], ['staanalystalert', 'for', 'jefferies', 'maintain', 'with', 'rat', 'of', 'hold', 'set', 'target', 'price', 'at', 'usd', 'our', 'own', 'verdict', 'be', 'buy'], ['hear', 'there', 'guy', 'who', 'know', 'someone', 'who', 'think', 'somebody', 'know', 'something', 'on', 'stocktwits'], ['reveal', 'yourself'], ['why', 'the', 'drop', 'warren', 'buffet', 'take', 'out', 'his', 'position'], ['bear', 'have', 'reason', 'on', 'to', 'pay', 'more', 'attention'], ['ok', 'good', 'we', 're', 'not', 'drop', 'in', 'price', 'over', 'the', 'weekend', 'lol'], ['daily', 'chart', 'we', 'need', 'to', 'get', 'back', 'to', 'above'], ['drop', 'per', 'week', 'after', 'spike', 'if', 'no', 'news', 'in', 'month', 'back', 'to', 'if', 'bo', 'then', 'bingo', 'what', 'be', 'the', 'odds'], ['strong', 'buy'], ['short', 'ratio', 'be', 'at', 'and', 'short', 'to', 'float', 'be', 'via'], ['price', 'squeeze', 'perfect', 'place', 'for', 'an', 'option', 'straddle', 'near', 'the', 'support'

### Frequency of Words Appearing in Message
With our vocabulary, now we'll remove some of the most common words such as 'the', 'and', 'it', etc. These words don't contribute to identifying sentiment and are really common, resulting in a lot of noise in our input. If we can filter these out, then our network should have an easier time learning.

We also want to remove really rare words that show up in a only a few twits. Here you'll want to divide the count of each word by the **number of messages** calculated in the code block above (i.e. `len(messages))`. Then remove words that only appear in some small fraction of the messages.

>Note: There is not an exact number for low and high-frequency cut-offs, however there is a correct optimal range.
You should ideally set up low-frequency cut-off from 0.0000002 to 0.000007 (inclusive) and high-frequency from 5 to 20 (inclusive). If the number is too big, we lose lots of important words that we can use in our data.

In [8]:
"""
Set the following variables:
    freqs
    low_cutoff
    high_cutoff
    K_most_common
"""

# TODO Implement 

# Dictionart that contains the Frequency of words appearing in messages.
# The key is the token and the value is the frequency of that word in the corpus.
freqs = dict(bow)
freqs = {k: (v / len(messages)) for k, v in freqs.items()}

# Float that is the frequency cutoff. Drop words with a frequency that is lower or equal to this number.
low_cutoff = 0.000007

# Integer that is the cut off for most common words. Drop words that are the `high_cutoff` most common words.
high_cutoff = 20

# The k most common words in the corpus. Use `high_cutoff` as the k.
K_most_common = bow.most_common()[:high_cutoff]

most_ = [pair[0] for pair in K_most_common]
filtered_words = {word for word in freqs if (freqs[word] > low_cutoff and word not in most_)}
print(most_)
print(len(filtered_words))
print(filtered_words)

['be', 'the', 'to', 'for', 'on', 'of', 'and', 'in', 'this', 'it', 'at', 'will', 'up', 'buy', 'report', 'go', 'you', 'short', 'that', 'what']
15610
{'credit', 'gimmie', 'esports', 'fookin', 'light', 'capitalize', 'healthier', 'acid', 'gooooooooo', 'winner', 'supercharge', 'fantasy', 'crowsignalservice', 'outlet', 'merkury', 'footlocker', 'negotiate', 'slide', 'orly', 'patient', 'transit', 'collab', 'oxidation', 'mississippi', 'flame', 'randal', 'mumble', 'inhibitor', 'pause', 'microcap', 'scurred', 'panama', 'bitcoin', 'dismiss', 'subsequent', 'tea', 'wew', 'overrun', 'complete', 'landfill', 'whatever', 'consumption', 'access', 'sharon', 'gorbachev', 'catherine', 'amateurzon', 'ticker', 'pfff', 'deficit', 'gangster', 'scoop', 'alrdy', 'catz', 'scout', 'gandalf', 'jim', 'similar', 'dissapointed', 'ttwo', 'chin', 'crave', 'asus', 'pendarvis', 'awx', 'welcome', 'shorturds', 'appliance', 'scalable', 'jazz', 'franchise', 'corner', 'wm', 'sic', 'qualcomm', 'roller', 'cleveland', 'dtv', 'outnu

### Updating Vocabulary by Removing Filtered Words
Let's creat three variables that will help with our vocabulary.

In [9]:
"""
Set the following variables:
    vocab
    id2vocab
    filtered
"""

#TODO Implement

# A dictionary for the `filtered_words`. The key is the word and value is an id that represents the word.
count = 0
vocab = dict()

for k, v in freqs.items():
    vocab[k] = count
    count += 1
# Reverse of the `vocab` dictionary. The key is word id and value is the word. 
id2vocab = {v: k for k, v in vocab.items()}


assert set(vocab.keys()) == set(id2vocab.values()), 'Check vocab and id2vocab dictionaries'

In [10]:
print(tokenized[0:10])
print(max(tokenized, key=len))

[['great', 'buy', 'at', 'ill', 'wait'], [], ['staanalystalert', 'for', 'jefferies', 'maintain', 'with', 'rat', 'of', 'hold', 'set', 'target', 'price', 'at', 'usd', 'our', 'own', 'verdict', 'be', 'buy'], ['hear', 'there', 'guy', 'who', 'know', 'someone', 'who', 'think', 'somebody', 'know', 'something', 'on', 'stocktwits'], ['reveal', 'yourself'], ['why', 'the', 'drop', 'warren', 'buffet', 'take', 'out', 'his', 'position'], ['bear', 'have', 'reason', 'on', 'to', 'pay', 'more', 'attention'], ['ok', 'good', 'we', 're', 'not', 'drop', 'in', 'price', 'over', 'the', 'weekend', 'lol'], ['daily', 'chart', 'we', 'need', 'to', 'get', 'back', 'to', 'above'], ['drop', 'per', 'week', 'after', 'spike', 'if', 'no', 'news', 'in', 'month', 'back', 'to', 'if', 'bo', 'then', 'bingo', 'what', 'be', 'the', 'odds']]
['you', 'get', 'cramer', 'cueballed', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'what', 'clown', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'lt', 'l

In [11]:
# tokenized with the words not in `filtered_words` removed.
filtered = list()
count = 0
for message in tokenized:
    new_message = []
    if message != []:
        for word in message:
            if word in filtered_words:
                new_message.append(vocab[word])
    filtered.append(new_message)
    if count % 100000 == 0:
        print(count)
    count += 1
print(filtered[:100])

0
100000
200000
300000
400000
500000
600000
700000
800000
900000
1000000
1100000
1200000
1300000
1400000
1500000
[[0, 3, 4], [], [5, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19], [21, 22, 23, 24, 25, 26, 24, 27, 28, 25, 29, 31], [32, 33], [34, 36, 37, 38, 39, 40, 41, 42], [43, 44, 45, 47, 48, 49], [50, 51, 52, 53, 54, 36, 15, 56, 57, 58], [59, 60, 52, 61, 62, 63, 64], [36, 65, 66, 67, 68, 69, 70, 71, 72, 63, 69, 73, 74, 75, 77], [78], [80, 82, 83], [15, 84, 85, 86, 87, 88, 89, 90, 91, 92], [93, 94, 95, 96, 78, 97, 98, 99, 100, 101, 102], [103, 104, 105, 106], [107, 44, 45, 47, 48, 49], [108, 109, 94, 111, 112, 113, 40, 114, 115, 116, 117, 118, 119, 120], [121, 122, 123, 40, 124, 125], [5, 126, 127, 13, 15, 14, 9, 10, 13, 14, 15, 16, 17, 18, 19], [27, 129, 130, 131, 133, 95, 134, 9, 105, 71, 135, 137, 22, 138, 139, 140], [121, 105, 80, 139, 141, 142, 121, 70, 143, 144, 145], [146, 147, 148, 149, 150, 151, 152], [153], [154, 155, 156, 66, 157, 158], [105, 80, 159, 160, 161, 162, 163], [1

### Balancing the classes
Let's do a few last pre-processing steps. If we look at how our twits are labeled, we'll find that 50% of them are neutral. This means that our network will be 50% accurate just by guessing 0 every single time. To help our network learn appropriately, we'll want to balance our classes.
That is, make sure each of our different sentiment scores show up roughly as frequently in the data.

What we can do here is go through each of our examples and randomly drop twits with neutral sentiment. What should be the probability we drop these twits if we want to get around 20% neutral twits starting at 50% neutral? We should also take this opportunity to remove messages with length 0.

In [12]:

amount = 90001
print(filtered[amount])
print([id2vocab[i] for i in filtered[amount]])

[509, 610, 172, 398, 344, 373, 12, 420, 147, 761, 66, 998, 138, 1386, 655, 429]
['reiterate', 'two', 'my', 'favorite', 'long', 'term', 'hold', 'dividend', 'stock', 'jump', 'week', 'ago', 'still', 'excellent', 'value', 'both']


In [13]:
balanced = {'messages': [], 'sentiments':[]}

n_neutral = sum(1 for each in sentiments if each == 2)
N_examples = len(sentiments)
keep_prob = (N_examples - n_neutral)/4/n_neutral

for idx, sentiment in enumerate(sentiments):
    message = filtered[idx]
    if len(message) == 0:
        # skip this message because it has length zero
        continue
    elif sentiment != 2 or random.random() < keep_prob:
        balanced['messages'].append(message)
        balanced['sentiments'].append(sentiment) 

If you did it correctly, you should see the following result 

In [14]:
n_neutral = sum(1 for each in balanced['sentiments'] if each == 2)
N_examples = len(balanced['sentiments'])
n_neutral/N_examples

0.19495200430826704

Finally let's convert our tokens into integer ids which we can pass to the network.

In [15]:
print(balanced['messages'][0:10])
print(balanced['sentiments'][0:100])
print(max(balanced['messages'], key=len))
print([id2vocab[word] for word in max(balanced['messages'], key=len)])
print(balanced['sentiments'][balanced['messages'].index(max(balanced['messages'], key=len))])
print(max(map(len,balanced['messages'])))

[[0, 3, 4], [5, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19], [21, 22, 23, 24, 25, 26, 24, 27, 28, 25, 29, 31], [34, 36, 37, 38, 39, 40, 41, 42], [43, 44, 45, 47, 48, 49], [50, 51, 52, 53, 54, 36, 15, 56, 57, 58], [59, 60, 52, 61, 62, 63, 64], [36, 65, 66, 67, 68, 69, 70, 71, 72, 63, 69, 73, 74, 75, 77], [78], [80, 82, 83]]
[4, 4, 3, 3, 0, 3, 4, 0, 4, 0, 4, 4, 4, 3, 4, 4, 3, 3, 3, 0, 1, 3, 3, 0, 4, 2, 4, 0, 3, 2, 1, 2, 2, 3, 3, 4, 3, 4, 3, 3, 3, 3, 3, 3, 1, 4, 3, 4, 0, 3, 0, 1, 4, 1, 3, 2, 4, 1, 2, 0, 2, 2, 4, 4, 4, 2, 3, 4, 2, 0, 0, 0, 3, 1, 1, 2, 3, 0, 2, 0, 4, 4, 2, 2, 3, 3, 4, 0, 4, 3, 3, 2, 4, 2, 4, 2, 2, 1, 1, 4]
[4555, 18239, 3306, 3306, 3306, 2585, 2585, 2585, 74, 1114, 1277, 179, 882, 3325, 179, 6063, 2256, 4170, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585, 2585]
['cramer', 'cueball', 'pump', 'pump', 'pump', 'lt', 'lt', 'lt', 'then', 'quickly', 'throw', 'quot', 'sell', 'alot', 'quot', 'con',

In [16]:
token_ids = [[word for word in message] for message in balanced['messages']]
sentiments = balanced['sentiments']


In [17]:
for i in range(5):
    print(token_ids[i])
    print(sentiments[i])

[0, 3, 4]
4
[5, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19]
4
[21, 22, 23, 24, 25, 26, 24, 27, 28, 25, 29, 31]
3
[34, 36, 37, 38, 39, 40, 41, 42]
3
[43, 44, 45, 47, 48, 49]
0


## Neural Network
Now we have our vocabulary which means we can transform our tokens into ids, which are then passed to our network. So, let's define the network now!

Here is a nice diagram showing the network we'd like to build: 

#### Embed -> RNN -> Dense -> Softmax
### Implement the text classifier
Before we build text classifier, if you remember from the other network that you built in  "Sentiment Analysis with an RNN"  exercise  - which there, the network called " SentimentRNN", here we named it "TextClassifer" - consists of three main parts: 1) init function `__init__` 2) forward pass `forward`  3) hidden state `init_hidden`. 

This network is pretty similar to the network you built expect in the  `forward` pass, we use softmax instead of sigmoid. The reason we are not using sigmoid is that the output of NN is not a binary. In our network, sentiment scores have 5 possible outcomes. We are looking for an outcome with the highest probability thus softmax is a better choice.

In [18]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(device)

cuda


In [40]:
class TextClassifier(nn.Module):
    def __init__(self, vocab_size, embed_size, lstm_size, output_size, lstm_layers, drop_prob):
        """
        Initialize the model by setting up the layers.
        
        Parameters
        ----------
            vocab_size : The vocabulary size.
            embed_size : The embedding layer size.
            lstm_size : The LSTM layer size.
            output_size : The output size.
            lstm_layers : The number of LSTM layers.
            dropout : The dropout probability.
        """
        
        super().__init__()

        
        self.output_size = output_size
        self.lstm_layers = lstm_layers
        self.lstm_size = lstm_size
        
        # TODO Implement

        # Setup embedding layer
       
        self.embedding = nn.Embedding(vocab_size, embed_size)
        self.lstm = nn.LSTM(embed_size, self.lstm_size, self.lstm_layers, 
                            dropout=drop_prob, batch_first=True)
        
        # dropout layer
        self.dropout = nn.Dropout(0.5)
        
        # linear and sigmoid layers
        mid_layer = int(self.lstm_size / 2)
        
        self.bn1 = nn.BatchNorm1d(num_features=lstm_size)
        self.fc1 = nn.Linear(self.lstm_size, mid_layer)
        self.bn2 = nn.BatchNorm1d(num_features=mid_layer)
        self.fc2 = nn.Linear(mid_layer, self.output_size)
        self.sig = nn.LogSoftmax(dim=1)

        
        # Setup additional layers



    def init_hidden(self, batch_size):
        """ 
        Initializes hidden state
        
        Parameters
        ----------
            batch_size : The size of batches.
        
        Returns
        -------
            hidden_state
            
        """
        
        # TODO Implement 
        
        ''' Initializes hidden state '''
        # Create two new tensors with sizes n_layers x batch_size x hidden_dim,
        # initialized to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (device=='cuda'):
            hidden = (weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_().cuda(),
                  weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_()).cuda()
        else:
            hidden = (weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_(),
                      weight.new(self.lstm_layers, batch_size, self.lstm_size).zero_())
        
        return hidden

    def forward(self, nn_input, hidden_state):
        """
        Perform a forward pass of our model on nn_input.
        
        Parameters
        ----------
            nn_input : The batch of input to the NN.
            hidden_state : The LSTM hidden state.

        Returns
        -------
            logps: log softmax output
            hidden_state: The new hidden state.

        """
        """
        Perform a forward pass of our model on some input and hidden state.
        """
        # embeddings and lstm_out
        nn_input = nn_input.long()
        embeds = torch.nn.functional.relu(self.embedding(nn_input))
        lstm_out, hidden_state = self.lstm(embeds, hidden_state)
        
        lstm_out = lstm_out[:, -1, :] # getting the last time step output
        
        # dropout and fully-connected layer
        
        
        out = self.dropout(self.bn1(lstm_out))
        out = torch.nn.functional.relu(self.fc1(out))
        out = torch.nn.functional.relu(self.bn2(out))
        out = torch.nn.functional.relu(self.fc2(out))
        out = torch.nn.functional.relu((self.dropout(out)))
        # sigmoid function
        sig_out = self.sig(out)
        
        # return last sigmoid output and hidden state
        return sig_out, hidden

### View Model

In [41]:
model = TextClassifier(len(vocab), 10, 6, 2, drop_prob=0.5, lstm_layers=2)
model.embedding.weight.data.uniform_(-1, 1)
input = torch.randint(0, 1000, (10, 4), dtype=torch.int64)
hidden = model.init_hidden(10)

logps, _ = model.forward(input, hidden)
print(logps)

tensor([[-0.7426, -0.6461],
        [-0.6931, -0.6931],
        [-0.7421, -0.6465],
        [-1.5308, -0.2438],
        [-0.6931, -0.6931],
        [-0.6931, -0.6931],
        [-0.6931, -0.6931],
        [-0.6931, -0.6931],
        [-0.6931, -0.6931],
        [-0.6931, -0.6931]], grad_fn=<LogSoftmaxBackward0>)


## Training
### DataLoaders and Batching
Now we should build a generator that we can use to loop through our data. It'll be more efficient if we can pass our sequences in as batches. Our input tensors should look like `(sequence_length, batch_size)`. So if our sequences are 40 tokens long and we pass in 25 sequences, then we'd have an input size of `(40, 25)`.

If we set our sequence length to 40, what do we do with messages that are more or less than 40 tokens? For messages with fewer than 40 tokens, we will pad the empty spots with zeros. We should be sure to **left** pad so that the RNN starts from nothing before going through the data. If the message has 20 tokens, then the first 20 spots of our 40 long sequence will be 0. If a message has more than 40 tokens, we'll just keep the first 40 tokens.

In [42]:
def dataloader(messages, labels, sequence_length, batch_size, shuffle=False):
    """ 
    Build a dataloader.
    """
    if shuffle:
        indices = list(range(len(messages)))
        random.shuffle(indices)
        messages = [messages[idx] for idx in indices]
        labels = [labels[idx] for idx in indices]

    total_sequences = len(messages)
    for ii in range(0, total_sequences, batch_size):
        batch_messages = messages[ii: ii+batch_size]
        # First initialize a tensor of all zeros
        batch = torch.zeros((len(batch_messages), sequence_length), dtype=torch.int64)
        for batch_num, tokens in enumerate(batch_messages):
            #token_tensor = torch.tensor(tokens)
            #print(token_tensor)
            # Left pad!
            start_idx = max(sequence_length - len(tokens), 0)
            tokens = tokens[:sequence_length]
            start = [0] * start_idx
            start = start + tokens
            token_tensor = torch.tensor(start)
            batch[batch_num, :] = torch.tensor(start)
            # batch[start_idx:, batch_numz] = token_tensor[:sequence_length]
        
        label_tensor = torch.tensor(labels[ii: ii+len(batch_messages)])
        yield batch, label_tensor


In [43]:
def inference_loader(messages, sequence_length, batch_size, shuffle):
    """ 
    Build a dataloader.
    """
    total_sequences = len(messages)
    for ii in range(0, total_sequences, batch_size):
        batch_messages = messages[ii: ii+batch_size]
        # First initialize a tensor of all zeros
        batch = torch.zeros((len(batch_messages), sequence_length), dtype=torch.int64)
        for batch_num, tokens in enumerate(batch_messages):
            #token_tensor = torch.tensor(tokens)
            #print(token_tensor)
            # Left pad!
            start_idx = max(sequence_length - len(tokens), 0)
            tokens = tokens[:sequence_length]
            start = [0] * start_idx
            start = start + tokens
            token_tensor = torch.tensor(start)
            batch[batch_num, :] = torch.tensor(start)
            # batch[start_idx:, batch_numz] = token_tensor[:sequence_length]
        
        yield batch

### Training and  Validation
With our data in nice shape, we'll split it into training and validation sets.

In [44]:
"""
Split data into training and validation datasets. Use an appropriate split size.
The features are the `token_ids` and the labels are the `sentiments`.
"""   
print(len(token_ids))
print(len(sentiments))
print(len(token_ids) if len(token_ids) % 100 == 0 else len(token_ids) - len(token_ids) % 100 + 1)
# TODO Implement 
train_features = token_ids[0: 800000]
valid_features = token_ids[800001:len(token_ids) if len(token_ids) % 100 == 0 else len(token_ids) - len(token_ids) % 100 + 1]
train_labels = sentiments[0: 800000]
valid_labels = sentiments[800001:len(token_ids) if len(token_ids) % 100 == 0 else len(token_ids) - len(token_ids) % 100 + 1]

1026863
1026863
1026801


In [45]:
text_batch, labels = next(iter(dataloader(train_features, train_labels, sequence_length=40, batch_size=64, shuffle=True)))
model = TextClassifier(len(vocab)+1, 1024, 512, 5, drop_prob=0.3, lstm_layers=2)
hidden = model.init_hidden(64)
print(len(text_batch))
print(labels)
logps, hidden = model.forward(text_batch, hidden)

print(logps)

64
tensor([4, 2, 3, 2, 2, 0, 3, 1, 3, 4, 1, 1, 0, 3, 0, 3, 3, 4, 3, 4, 2, 4, 3, 0,
        0, 4, 2, 2, 1, 4, 1, 3, 4, 0, 1, 3, 3, 4, 2, 0, 1, 0, 4, 2, 3, 3, 1, 0,
        4, 0, 2, 1, 4, 1, 4, 0, 1, 4, 0, 2, 4, 1, 3, 0])
tensor([[-1.7516, -1.3048, -1.5688, -1.7516, -1.7516],
        [-1.8448, -1.1720, -1.8669, -1.8063, -1.5450],
        [-1.7653, -1.3701, -1.8335, -1.3661, -1.8335],
        [-1.9571, -1.6740, -1.9571, -1.1722, -1.5128],
        [-1.5566, -1.6512, -1.6512, -1.5432, -1.6512],
        [-1.9430, -1.9138, -1.9430, -1.1913, -1.3389],
        [-1.5281, -1.6946, -1.3874, -1.7440, -1.7440],
        [-0.7137, -1.9148, -2.1125, -2.1125, -2.1125],
        [-1.2622, -1.8876, -1.8876, -1.8876, -1.3369],
        [-1.6380, -1.6380, -1.5280, -1.6380, -1.6097],
        [-1.6543, -1.6065, -1.6543, -1.5846, -1.5516],
        [-1.4204, -1.8308, -1.7750, -1.6993, -1.4033],
        [-1.7976, -1.5459, -1.5192, -1.4413, -1.7976],
        [-1.2392, -1.3989, -2.0181, -1.6210, -2.0181],
        [-

### Training
It's time to train the neural network!

In [46]:
batch_size = 50


In [47]:
print(device)
model = TextClassifier(len(vocab)+1, 1024, 512, 5, lstm_layers=2, drop_prob=0.5)
hidden = model.init_hidden(batch_size)
model.embedding.weight.data.uniform_(-1, 1)
model.to(device)

cuda


TextClassifier(
  (embedding): Embedding(92118, 1024)
  (lstm): LSTM(1024, 512, num_layers=2, batch_first=True, dropout=0.7)
  (dropout): Dropout(p=0.7, inplace=False)
  (bn1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc1): Linear(in_features=512, out_features=256, bias=True)
  (bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc2): Linear(in_features=256, out_features=5, bias=True)
  (sig): LogSoftmax(dim=1)
)

In [48]:
import numpy as np
"""
Train your model with dropout. Make sure to clip your gradients.
Print the training loss, validation loss, and validation accuracy for every 100 steps.
"""
clip=5
epochs = 8
sequence_length = 50
shuffle = True
learning_rate = 0.1
model_path_load = '10.pth'
model_path_save = '10.pth'

print_every = 100
criterion = nn.NLLLoss()
optimizer = optim.SGD(model.parameters(), lr=learning_rate)
model.train()
try:
    model.load_state_dict(torch.load(model_path_load))
    print('loaded weights {}'.format(model_path_load))
except:
    print('could not load weights {}'.format(model_path_load))

rate_of_change_deque = list()


for epoch in range(epochs):
    print('Starting epoch {}'.format(epoch + 1))
    hidden = model.init_hidden(batch_size)
    steps = 0
    train_losses = []
    for text_batch, labels in dataloader(
            train_features, train_labels, batch_size, sequence_length, shuffle):
        steps += 1
        # print(steps)
        hidden = tuple([each.data for each in hidden])
        
        # Set Device
        text_batch, labels = text_batch.to(device), labels.to(device)
        for each in hidden:
            each.to(device)
        
        # TODO Implement: Train Model
        model.zero_grad()
        output, hidden = model(text_batch, hidden)
        loss = (criterion(output.squeeze(), labels.float().long()))
        train_losses.append(loss.item())
        loss.backward()
        nn.utils.clip_grad_norm_(model.parameters(), clip)
        optimizer.step()
        if steps % print_every == 0:
            val_losses = []
            model.eval()
            val_hidden = model.init_hidden(batch_size)
            for val_inputs, val_labels in dataloader(
            valid_features, valid_labels, batch_size, sequence_length, shuffle):
                val_hidden = tuple([each.data for each in val_hidden])
                for each in val_hidden:
                    each.to(device)
                val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
                val_output, val_hidden = model(val_inputs, val_hidden)
                val_loss = (criterion(val_output.squeeze(), val_labels.float().long()))
                val_losses.append(val_loss.item())
            # TODO Implement: Print metrics
            
            rate_of_change_deque.append(np.mean(train_losses))
            if len(rate_of_change_deque) > 30:
                rate_of_change_deque.pop(0)
            if len(rate_of_change_deque) == 1:
                rate_of_change_deque.append(np.mean(train_losses))
            
            model.train()
            print("Epoch: {}/{}...".format(epoch+1, epochs),
                  "Step: {}...".format(steps),
                  "Max: {:.1f}...".format(np.max(val_losses)),
                  "Min: {:.1f}... \n".format(np.min(val_losses)),
                  "100s avg Train Loss: {:.4f}".format(np.mean(train_losses)),
                  "100s avg Val Loss: {:.4f}".format(np.mean(val_losses)),
                  "RoC_15: {:.4f}".format(np.mean(np.diff(rate_of_change_deque[-15:]))),
                  "RoC_30: {:.4f}".format(np.mean(np.diff(rate_of_change_deque))))
            train_losses = []
            torch.save(model.state_dict(), model_path_save)
            print('model {} saved'.format(model_path_save))


could not load weights 10.pth
Starting epoch 1
Epoch: 1/8... Step: 100... Max: 1.5... Min: 0.8... 
 100s avg Train Loss: 1.1344 100s avg Val Loss: 1.1269 RoC_15: 0.0000 RoC_30: 0.0000
model 10.pth saved
Epoch: 1/8... Step: 200... Max: 1.6... Min: 0.7... 
 100s avg Train Loss: 1.0760 100s avg Val Loss: 1.1432 RoC_15: -0.0292 RoC_30: -0.0292
model 10.pth saved
Epoch: 1/8... Step: 300... Max: 1.6... Min: 0.7... 
 100s avg Train Loss: 1.0647 100s avg Val Loss: 1.1026 RoC_15: -0.0232 RoC_30: -0.0232
model 10.pth saved
Epoch: 1/8... Step: 400... Max: 1.6... Min: 0.7... 
 100s avg Train Loss: 1.0542 100s avg Val Loss: 1.0484 RoC_15: -0.0200 RoC_30: -0.0200
model 10.pth saved
Epoch: 1/8... Step: 500... Max: 1.6... Min: 0.7... 
 100s avg Train Loss: 1.0701 100s avg Val Loss: 1.0777 RoC_15: -0.0129 RoC_30: -0.0129
model 10.pth saved
Epoch: 1/8... Step: 600... Max: 1.5... Min: 0.7... 
 100s avg Train Loss: 1.0718 100s avg Val Loss: 1.0789 RoC_15: -0.0104 RoC_30: -0.0104
model 10.pth saved
Epoch: 

model 10.pth saved
Epoch: 1/8... Step: 5300... Max: 1.5... Min: 0.6... 
 100s avg Train Loss: 1.0264 100s avg Val Loss: 1.0425 RoC_15: -0.0009 RoC_30: -0.0001
model 10.pth saved
Epoch: 1/8... Step: 5400... Max: 1.6... Min: 0.6... 
 100s avg Train Loss: 1.0130 100s avg Val Loss: 1.0245 RoC_15: -0.0006 RoC_30: -0.0006
model 10.pth saved
Epoch: 1/8... Step: 5500... Max: 1.5... Min: 0.6... 
 100s avg Train Loss: 1.0144 100s avg Val Loss: 1.0431 RoC_15: -0.0006 RoC_30: -0.0006
model 10.pth saved
Epoch: 1/8... Step: 5600... Max: 1.6... Min: 0.7... 
 100s avg Train Loss: 1.0255 100s avg Val Loss: 1.0159 RoC_15: -0.0006 RoC_30: -0.0009
model 10.pth saved
Epoch: 1/8... Step: 5700... Max: 1.5... Min: 0.6... 
 100s avg Train Loss: 1.0215 100s avg Val Loss: 1.0464 RoC_15: -0.0003 RoC_30: -0.0004
model 10.pth saved
Epoch: 1/8... Step: 5800... Max: 1.6... Min: 0.6... 
 100s avg Train Loss: 1.0194 100s avg Val Loss: 1.0358 RoC_15: -0.0003 RoC_30: -0.0004
model 10.pth saved
Epoch: 1/8... Step: 5900...

KeyboardInterrupt: 

In [50]:
model.eval()
val_hidden = model.init_hidden(batch_size)
for val_inputs, val_labels in dataloader(
valid_features, valid_labels, batch_size, sequence_length, shuffle):
    val_hidden = tuple([each.data for each in val_hidden])                
    val_inputs, val_labels = val_inputs.to(device), val_labels.to(device)
    val_output, val_hidden = model(val_inputs, val_hidden)
    val_loss = criterion(val_output.squeeze(), val_labels.float().long())
    results = list()
    for l in torch.exp(val_output.squeeze()).tolist():
        results.append(l.index(max(l)))
    print('val_output: {}\n val_label: {}'.format(results, val_labels.float().long()))
    setA = set(results)
    setB = set(val_labels.float().long().tolist())
    print(float(len(setA & setB) / len(results)) * 100)
    print(val_loss)
    val_losses.append(val_loss.item())
# TODO Implement: Print metrics
print("Epoch: {}/{}...".format(epoch+1, epochs),
      "Step: {}...".format(steps),
      "Max: {:.6f}...".format(np.max(val_losses)),
      "Min: {:.6f}...".format(np.min(val_losses)),
      "100 Step Avg Train Loss: {:.6f}".format(np.mean(train_losses)),
      "100 Step Avg Val Loss: {:.6f}".format(np.mean(val_losses)))
train_losses = []

val_output: [0, 2, 2, 2, 3, 3, 4, 4, 4, 0, 0, 2, 3, 2, 2, 2, 1, 0, 2, 3, 3, 3, 0, 2, 2, 3, 1, 0, 4, 3, 2, 4, 1, 3, 3, 4, 2, 0, 2, 0, 1, 3, 4, 4, 1, 3, 1, 4, 3, 3]
 val_label: tensor([0, 1, 4, 2, 1, 3, 4, 4, 4, 0, 0, 4, 0, 2, 0, 3, 1, 0, 2, 3, 4, 3, 0, 2,
        2, 3, 1, 3, 2, 3, 2, 0, 0, 4, 3, 4, 0, 0, 2, 0, 1, 1, 0, 4, 1, 3, 1, 3,
        3, 3], device='cuda:0')
10.0
tensor(0.9224, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 2, 4, 2, 2, 3, 3, 0, 1, 0, 4, 3, 2, 4, 0, 0, 3, 4, 2, 1, 1, 0, 3, 4, 4, 4, 2, 3, 4, 3, 4, 2, 4, 3, 2, 4, 3, 3, 0, 3, 4, 2, 3, 0, 1, 2, 2, 4, 2]
 val_label: tensor([2, 2, 3, 4, 2, 2, 3, 2, 0, 1, 0, 4, 3, 3, 3, 0, 3, 3, 1, 0, 2, 0, 0, 3,
        2, 4, 3, 2, 3, 1, 3, 1, 0, 3, 1, 1, 1, 1, 3, 0, 3, 3, 2, 1, 0, 1, 2, 2,
        3, 2], device='cuda:0')
10.0
tensor(1.3017, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 1, 3, 1, 4, 1, 4, 3, 3, 3, 4, 2, 3, 3, 2, 2, 4, 2, 4, 1, 4, 2, 2, 1, 4, 3, 2, 3, 3, 3, 2, 2, 3, 2, 1, 3, 4, 3, 3, 4, 

val_output: [2, 3, 3, 3, 3, 2, 2, 4, 0, 1, 0, 0, 2, 0, 4, 1, 1, 3, 1, 3, 4, 3, 4, 4, 4, 3, 4, 1, 2, 2, 4, 1, 1, 2, 2, 1, 4, 2, 1, 0, 3, 2, 1, 3, 4, 1, 2, 0, 3, 1]
 val_label: tensor([2, 1, 2, 3, 0, 2, 0, 3, 0, 1, 0, 0, 4, 0, 4, 0, 1, 4, 1, 3, 4, 0, 1, 4,
        0, 4, 4, 1, 1, 4, 4, 2, 0, 0, 3, 1, 4, 2, 3, 3, 2, 2, 1, 3, 3, 3, 2, 0,
        3, 1], device='cuda:0')
10.0
tensor(1.0153, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 0, 3, 3, 4, 2, 4, 4, 4, 4, 3, 2, 2, 2, 2, 4, 3, 3, 3, 2, 4, 2, 3, 1, 4, 4, 2, 4, 2, 2, 3, 4, 4, 4, 4, 4, 4, 1, 2, 1, 3, 3, 2, 3, 4, 4, 4, 3, 2]
 val_label: tensor([4, 2, 0, 2, 2, 4, 2, 3, 4, 4, 3, 3, 2, 3, 2, 2, 3, 3, 3, 1, 2, 4, 3, 3,
        0, 4, 4, 2, 0, 2, 2, 2, 4, 2, 3, 4, 3, 0, 3, 4, 0, 3, 3, 2, 1, 4, 4, 0,
        3, 2], device='cuda:0')
10.0
tensor(1.0464, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 2, 1, 4, 1, 1, 3, 0, 2, 4, 4, 1, 4, 3, 3, 2, 3, 1, 3, 3, 2, 3, 3, 0, 3, 1, 3, 2, 0, 1, 4, 2, 4, 3, 4, 1, 3, 3, 1, 4, 

val_output: [2, 3, 2, 3, 4, 3, 2, 3, 4, 3, 3, 4, 4, 2, 3, 4, 3, 2, 4, 1, 0, 3, 4, 4, 3, 4, 3, 3, 3, 4, 3, 3, 0, 4, 4, 3, 3, 2, 3, 4, 3, 3, 3, 2, 1, 3, 3, 3, 1, 3]
 val_label: tensor([3, 0, 2, 0, 2, 3, 3, 3, 3, 3, 3, 4, 4, 2, 4, 3, 3, 3, 4, 1, 0, 3, 4, 4,
        3, 4, 3, 1, 0, 0, 3, 3, 0, 4, 4, 3, 3, 2, 3, 4, 1, 3, 3, 2, 1, 3, 3, 1,
        1, 0], device='cuda:0')
10.0
tensor(0.9199, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 1, 1, 2, 4, 3, 4, 3, 2, 1, 2, 2, 1, 3, 3, 3, 2, 4, 1, 3, 0, 3, 4, 4, 2, 3, 2, 4, 3, 3, 4, 3, 4, 2, 0, 3, 3, 3, 3, 1, 4, 0, 0, 2, 2, 4, 0, 3]
 val_label: tensor([3, 3, 2, 2, 1, 2, 4, 2, 1, 3, 3, 1, 2, 3, 0, 1, 0, 3, 2, 4, 3, 1, 0, 3,
        4, 4, 0, 3, 1, 3, 2, 0, 4, 3, 1, 2, 0, 3, 3, 3, 4, 0, 4, 0, 0, 2, 2, 4,
        1, 1], device='cuda:0')
10.0
tensor(1.1239, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 0, 2, 3, 1, 2, 4, 0, 2, 2, 3, 3, 0, 1, 0, 3, 2, 1, 4, 4, 3, 3, 1, 2, 0, 3, 4, 4, 3, 0, 3, 2, 4, 3, 4, 2, 3, 0, 4, 3, 

val_output: [2, 4, 4, 2, 3, 4, 2, 4, 3, 3, 4, 2, 2, 4, 3, 2, 3, 1, 2, 4, 2, 0, 1, 4, 4, 2, 2, 3, 0, 2, 2, 3, 3, 4, 2, 4, 3, 0, 2, 4, 3, 2, 4, 2, 0, 0, 4, 3, 2, 2]
 val_label: tensor([3, 4, 4, 0, 3, 2, 0, 4, 1, 3, 4, 4, 0, 0, 1, 1, 3, 1, 1, 4, 2, 0, 1, 3,
        4, 2, 4, 1, 3, 1, 4, 0, 3, 1, 2, 4, 1, 0, 0, 3, 3, 2, 4, 3, 0, 0, 4, 3,
        0, 2], device='cuda:0')
10.0
tensor(1.1666, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 2, 2, 4, 1, 3, 4, 4, 4, 3, 0, 1, 3, 4, 2, 1, 2, 3, 0, 2, 3, 4, 2, 0, 0, 1, 3, 1, 4, 0, 1, 4, 3, 4, 1, 0, 1, 4, 1, 4, 2, 1, 3, 0, 2, 2, 3, 2, 4]
 val_label: tensor([1, 4, 3, 2, 0, 1, 3, 3, 2, 4, 3, 0, 1, 1, 4, 2, 1, 0, 3, 0, 2, 3, 2, 0,
        0, 0, 1, 1, 1, 4, 3, 1, 2, 3, 2, 1, 0, 1, 4, 3, 3, 2, 1, 3, 0, 2, 3, 3,
        1, 2], device='cuda:0')
10.0
tensor(0.9750, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 2, 4, 1, 3, 0, 3, 1, 2, 1, 3, 2, 2, 1, 2, 4, 2, 2, 0, 3, 3, 3, 4, 1, 2, 3, 1, 4, 2, 4, 1, 1, 1, 1, 1, 0, 3, 4, 2, 

val_output: [3, 4, 3, 4, 3, 3, 4, 3, 3, 4, 4, 2, 1, 1, 3, 2, 2, 2, 3, 3, 2, 2, 3, 4, 3, 4, 4, 0, 4, 1, 2, 1, 4, 3, 4, 3, 3, 4, 2, 3, 3, 0, 0, 2, 3, 3, 3, 1, 4, 2]
 val_label: tensor([3, 3, 0, 3, 3, 3, 4, 3, 1, 3, 3, 3, 1, 0, 4, 0, 3, 0, 1, 2, 2, 2, 0, 3,
        3, 0, 4, 0, 4, 1, 2, 1, 3, 4, 2, 4, 3, 2, 2, 3, 1, 0, 0, 2, 3, 3, 3, 1,
        1, 3], device='cuda:0')
10.0
tensor(1.1915, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 1, 2, 2, 3, 2, 1, 3, 4, 3, 0, 1, 4, 1, 3, 4, 4, 4, 4, 2, 4, 0, 0, 3, 3, 3, 4, 0, 2, 4, 4, 2, 1, 4, 0, 2, 4, 2, 2, 4, 4, 2, 3, 0, 0, 3, 3, 1, 4]
 val_label: tensor([1, 4, 0, 3, 2, 1, 2, 1, 1, 3, 0, 1, 1, 4, 1, 3, 4, 3, 3, 4, 2, 0, 0, 0,
        3, 4, 3, 4, 0, 3, 4, 4, 1, 1, 3, 0, 3, 4, 2, 2, 3, 4, 2, 0, 0, 0, 1, 4,
        1, 4], device='cuda:0')
10.0
tensor(0.9050, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 2, 3, 3, 3, 3, 1, 3, 4, 2, 2, 0, 3, 3, 2, 4, 4, 3, 2, 3, 2, 2, 3, 4, 1, 1, 1, 0, 4, 3, 4, 1, 3, 3, 3, 0, 3, 3, 3, 3, 

val_output: [3, 1, 2, 3, 3, 4, 4, 2, 4, 4, 1, 3, 2, 2, 3, 4, 3, 3, 3, 3, 3, 2, 4, 4, 2, 0, 3, 2, 3, 3, 2, 3, 0, 2, 3, 2, 2, 3, 1, 2, 1, 3, 1, 0, 3, 2, 2, 3, 1, 3]
 val_label: tensor([3, 1, 3, 3, 3, 3, 3, 2, 4, 1, 2, 2, 2, 2, 1, 4, 3, 1, 1, 3, 3, 2, 4, 3,
        2, 0, 0, 3, 2, 3, 3, 4, 4, 4, 0, 2, 3, 2, 3, 2, 1, 4, 1, 0, 3, 2, 2, 2,
        1, 2], device='cuda:0')
10.0
tensor(1.0850, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 1, 3, 3, 4, 3, 4, 2, 4, 2, 3, 1, 3, 0, 3, 4, 4, 4, 1, 2, 2, 4, 4, 2, 4, 3, 3, 2, 1, 4, 3, 4, 4, 0, 4, 0, 2, 0, 3, 0, 0, 3, 4, 2, 3, 0, 4, 3]
 val_label: tensor([4, 2, 2, 1, 3, 3, 4, 2, 0, 2, 2, 3, 2, 1, 3, 0, 3, 4, 1, 4, 1, 4, 3, 2,
        4, 3, 3, 1, 3, 2, 1, 4, 1, 4, 2, 1, 2, 0, 2, 0, 2, 1, 0, 4, 3, 0, 1, 0,
        0, 2], device='cuda:0')
10.0
tensor(1.1969, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 2, 3, 0, 3, 1, 0, 3, 2, 3, 2, 4, 2, 0, 3, 4, 1, 2, 3, 4, 3, 1, 3, 3, 3, 3, 4, 3, 3, 3, 2, 1, 4, 4, 3, 2, 3, 1, 1, 3, 3, 

val_output: [2, 4, 1, 4, 3, 3, 3, 4, 3, 2, 2, 3, 3, 3, 3, 3, 3, 0, 1, 3, 4, 4, 3, 2, 1, 4, 0, 1, 3, 1, 1, 4, 3, 4, 2, 1, 1, 3, 3, 2, 4, 2, 4, 3, 4, 3, 3, 3, 4, 2]
 val_label: tensor([3, 4, 1, 4, 3, 4, 3, 2, 3, 2, 2, 4, 3, 4, 4, 3, 3, 0, 1, 3, 3, 3, 3, 2,
        1, 3, 0, 3, 1, 0, 1, 3, 3, 4, 3, 1, 0, 4, 3, 2, 0, 3, 4, 4, 3, 1, 0, 4,
        1, 0], device='cuda:0')
10.0
tensor(1.0923, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 4, 1, 2, 4, 3, 2, 2, 3, 3, 3, 4, 4, 3, 2, 2, 4, 3, 0, 2, 4, 3, 4, 3, 3, 1, 3, 1, 4, 4, 1, 0, 3, 3, 3, 0, 4, 1, 3, 2, 4, 2, 1, 1, 4, 1, 2, 3, 2]
 val_label: tensor([3, 0, 4, 1, 4, 4, 1, 0, 2, 3, 3, 3, 4, 3, 1, 4, 3, 4, 1, 1, 2, 2, 3, 3,
        2, 3, 1, 3, 1, 3, 0, 1, 0, 2, 3, 0, 0, 4, 1, 2, 0, 4, 2, 0, 1, 1, 0, 2,
        0, 2], device='cuda:0')
10.0
tensor(1.1608, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 0, 2, 3, 3, 1, 4, 1, 0, 3, 3, 0, 3, 2, 1, 3, 2, 1, 2, 3, 0, 4, 3, 3, 3, 3, 3, 1, 3, 0, 0, 3, 2, 2, 1, 2, 1, 3, 1, 4, 

10.0
tensor(1.0718, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 2, 3, 3, 2, 3, 4, 3, 2, 4, 2, 1, 2, 3, 2, 1, 2, 2, 3, 1, 2, 1, 3, 3, 4, 3, 4, 0, 0, 3, 0, 4, 0, 4, 3, 0, 4, 3, 4, 4, 2, 0, 3, 2, 2, 3, 4, 4, 3]
 val_label: tensor([0, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4, 2, 1, 2, 2, 2, 1, 2, 3, 3, 1, 4, 1, 0,
        3, 4, 3, 4, 0, 0, 2, 4, 0, 3, 4, 3, 0, 0, 3, 4, 2, 0, 0, 3, 2, 2, 4, 2,
        4, 3], device='cuda:0')
10.0
tensor(1.1659, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 2, 3, 4, 0, 2, 4, 4, 3, 2, 3, 3, 1, 4, 1, 2, 4, 4, 2, 3, 4, 3, 3, 3, 3, 3, 2, 4, 3, 1, 3, 3, 2, 1, 0, 4, 1, 4, 3, 3, 2, 4, 4, 3, 3, 4, 4, 2]
 val_label: tensor([3, 0, 4, 2, 3, 4, 0, 3, 4, 2, 3, 2, 3, 3, 1, 4, 1, 2, 3, 4, 2, 1, 3, 1,
        3, 1, 3, 3, 2, 4, 3, 1, 4, 3, 3, 0, 0, 4, 2, 1, 0, 0, 3, 3, 3, 2, 0, 3,
        1, 2], device='cuda:0')
10.0
tensor(1.1835, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 4, 4, 2, 3, 4, 1, 4, 4, 2, 3, 4, 4, 4, 1, 3, 4, 4, 4

val_output: [2, 1, 3, 1, 0, 2, 4, 2, 1, 3, 4, 4, 4, 0, 3, 3, 2, 2, 4, 2, 2, 2, 2, 3, 2, 2, 4, 4, 3, 2, 3, 3, 2, 1, 4, 1, 2, 4, 4, 0, 3, 0, 0, 0, 2, 4, 4, 2, 4, 2]
 val_label: tensor([2, 2, 1, 1, 0, 2, 4, 2, 1, 3, 4, 3, 4, 0, 0, 3, 4, 2, 3, 1, 2, 2, 2, 3,
        2, 2, 3, 4, 0, 2, 3, 2, 2, 1, 4, 1, 1, 3, 4, 0, 0, 2, 0, 3, 2, 3, 4, 2,
        4, 3], device='cuda:0')
10.0
tensor(1.0437, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 3, 2, 4, 4, 3, 1, 3, 2, 2, 3, 0, 0, 3, 3, 1, 3, 4, 3, 4, 4, 3, 4, 4, 1, 2, 3, 3, 4, 4, 4, 4, 2, 3, 4, 3, 3, 4, 4, 1, 1, 1, 1, 2, 2, 1, 4, 3, 4]
 val_label: tensor([3, 4, 1, 2, 3, 3, 4, 1, 1, 2, 0, 3, 0, 0, 3, 3, 1, 4, 4, 3, 4, 3, 1, 3,
        3, 1, 1, 3, 4, 0, 3, 4, 4, 2, 3, 4, 2, 1, 2, 4, 1, 3, 1, 1, 2, 4, 0, 4,
        2, 2], device='cuda:0')
10.0
tensor(1.2061, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 4, 3, 4, 3, 3, 4, 3, 1, 2, 4, 3, 3, 1, 3, 1, 1, 2, 1, 2, 2, 2, 3, 2, 2, 2, 3, 3, 2, 2, 0, 4, 1, 4, 4, 4, 3, 4, 4, 

val_output: [1, 3, 2, 3, 3, 0, 2, 3, 3, 4, 2, 2, 3, 4, 1, 3, 3, 2, 2, 2, 4, 4, 2, 2, 4, 1, 0, 4, 4, 4, 3, 3, 3, 4, 0, 2, 3, 3, 3, 3, 2, 4, 1, 3, 4, 1, 3, 1, 2, 3]
 val_label: tensor([3, 2, 2, 3, 0, 0, 2, 3, 2, 2, 2, 2, 3, 4, 0, 1, 4, 2, 2, 1, 0, 3, 2, 2,
        3, 0, 0, 4, 4, 3, 3, 2, 1, 4, 0, 4, 4, 4, 1, 4, 2, 1, 2, 0, 4, 1, 3, 3,
        4, 3], device='cuda:0')
10.0
tensor(1.1765, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 3, 3, 3, 0, 0, 3, 1, 1, 4, 4, 3, 4, 3, 3, 0, 3, 4, 1, 4, 2, 2, 3, 2, 3, 0, 3, 4, 2, 3, 3, 2, 4, 2, 3, 4, 4, 3, 3, 3, 0, 3, 2, 2, 4, 3, 3, 3]
 val_label: tensor([1, 4, 1, 3, 1, 4, 0, 0, 1, 1, 1, 4, 4, 3, 1, 1, 2, 0, 4, 4, 1, 4, 2, 2,
        3, 0, 3, 0, 3, 0, 2, 4, 0, 2, 4, 2, 0, 4, 4, 3, 3, 3, 0, 4, 2, 2, 4, 1,
        3, 2], device='cuda:0')
10.0
tensor(1.0094, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 0, 1, 3, 2, 2, 3, 2, 3, 1, 4, 0, 3, 4, 4, 4, 1, 2, 3, 3, 0, 3, 3, 1, 3, 1, 4, 3, 0, 1, 0, 4, 3, 0, 0, 2, 4, 2, 3, 3, 

tensor(1.0096, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 4, 0, 3, 0, 1, 3, 3, 2, 3, 0, 3, 3, 4, 0, 3, 3, 3, 2, 0, 0, 4, 1, 3, 4, 3, 4, 3, 1, 1, 2, 1, 2, 2, 4, 1, 2, 3, 4, 4, 2, 1, 3, 3, 4, 3, 0, 3, 2, 1]
 val_label: tensor([0, 4, 3, 4, 0, 3, 3, 0, 0, 0, 0, 3, 3, 4, 0, 3, 2, 3, 2, 0, 0, 4, 1, 3,
        4, 1, 4, 1, 1, 2, 2, 0, 3, 2, 4, 0, 1, 3, 4, 4, 3, 1, 1, 3, 0, 1, 0, 2,
        2, 1], device='cuda:0')
10.0
tensor(1.0314, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 3, 1, 4, 3, 2, 2, 2, 4, 2, 4, 4, 2, 3, 4, 3, 4, 3, 4, 4, 2, 2, 0, 3, 3, 3, 1, 4, 2, 3, 4, 3, 4, 1, 2, 2, 2, 3, 3, 4, 4, 3, 3, 4, 3, 2, 3, 2, 3]
 val_label: tensor([0, 1, 2, 1, 3, 2, 4, 1, 2, 4, 4, 4, 4, 0, 4, 3, 3, 4, 3, 3, 4, 0, 2, 0,
        4, 4, 3, 3, 3, 0, 1, 4, 2, 3, 1, 2, 2, 3, 3, 3, 3, 0, 3, 3, 4, 4, 3, 3,
        0, 4], device='cuda:0')
10.0
tensor(1.2132, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 1, 4, 3, 1, 3, 4, 3, 4, 3, 4, 4, 4, 2, 2, 2, 4, 4, 2, 1, 3, 

val_output: [2, 4, 1, 2, 3, 2, 0, 2, 1, 4, 2, 3, 3, 4, 1, 0, 3, 1, 3, 3, 1, 2, 0, 2, 1, 2, 4, 2, 1, 2, 4, 3, 2, 0, 4, 0, 1, 1, 1, 3, 1, 1, 3, 3, 4, 3, 3, 1, 0, 3]
 val_label: tensor([2, 4, 1, 2, 3, 2, 0, 3, 1, 4, 0, 4, 4, 3, 1, 0, 3, 1, 2, 4, 0, 2, 0, 1,
        1, 2, 0, 3, 3, 2, 4, 1, 3, 0, 2, 0, 1, 1, 1, 4, 1, 0, 2, 2, 3, 3, 3, 0,
        0, 3], device='cuda:0')
10.0
tensor(0.9499, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 0, 2, 1, 1, 2, 4, 1, 1, 1, 3, 1, 0, 2, 2, 4, 3, 1, 1, 4, 4, 3, 3, 4, 3, 4, 4, 3, 3, 3, 3, 3, 1, 4, 2, 4, 1, 3, 0, 2, 3, 1, 2, 4, 2, 3, 2, 4, 3, 2]
 val_label: tensor([1, 0, 2, 1, 1, 2, 0, 3, 2, 0, 2, 3, 0, 2, 3, 0, 4, 1, 0, 3, 4, 3, 3, 1,
        1, 4, 4, 0, 1, 3, 0, 0, 1, 2, 2, 4, 0, 3, 0, 2, 3, 1, 2, 4, 2, 2, 2, 0,
        2, 1], device='cuda:0')
10.0
tensor(1.1455, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 2, 2, 3, 3, 4, 3, 4, 1, 2, 4, 4, 4, 3, 3, 2, 3, 4, 3, 2, 1, 1, 4, 3, 2, 3, 1, 0, 4, 3, 1, 4, 1, 0, 2, 3, 3, 2, 3, 

tensor(1.0779, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 3, 1, 3, 4, 4, 3, 3, 0, 3, 2, 2, 2, 0, 1, 3, 2, 3, 0, 2, 3, 3, 3, 3, 4, 2, 3, 2, 4, 1, 3, 3, 1, 3, 3, 4, 1, 3, 4, 2, 4, 0, 3, 2, 3, 1, 3, 4, 3]
 val_label: tensor([3, 3, 3, 3, 4, 2, 4, 0, 3, 0, 2, 0, 0, 2, 0, 1, 2, 2, 3, 0, 2, 3, 3, 3,
        1, 4, 3, 2, 2, 0, 1, 4, 4, 2, 2, 4, 4, 1, 3, 3, 0, 4, 0, 1, 0, 3, 1, 2,
        0, 3], device='cuda:0')
10.0
tensor(1.0503, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 0, 3, 0, 2, 1, 3, 3, 4, 3, 4, 1, 3, 3, 2, 2, 1, 2, 3, 3, 2, 0, 4, 2, 4, 3, 1, 2, 1, 1, 3, 2, 0, 0, 2, 4, 0, 3, 3, 3, 3, 2, 0, 4, 3, 4, 4, 1, 3]
 val_label: tensor([2, 2, 0, 2, 0, 2, 1, 0, 4, 3, 3, 3, 1, 3, 1, 3, 1, 1, 2, 2, 0, 4, 3, 4,
        2, 4, 0, 1, 3, 2, 3, 2, 2, 0, 0, 3, 4, 0, 4, 4, 2, 1, 0, 0, 4, 3, 4, 4,
        1, 3], device='cuda:0')
10.0
tensor(1.1071, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 0, 3, 3, 3, 2, 0, 4, 3, 3, 3, 2, 3, 4, 2, 4, 4, 4, 3, 1, 4, 

val_output: [1, 4, 4, 3, 2, 2, 2, 3, 1, 3, 3, 3, 3, 3, 2, 3, 2, 1, 3, 1, 1, 3, 3, 1, 2, 3, 2, 2, 3, 2, 4, 3, 4, 3, 1, 1, 0, 2, 2, 0, 3, 2, 0, 3, 2, 4, 3, 3, 2, 2]
 val_label: tensor([1, 1, 1, 0, 2, 1, 1, 3, 3, 2, 2, 0, 4, 3, 3, 3, 2, 1, 2, 3, 1, 4, 4, 1,
        2, 3, 3, 2, 1, 1, 4, 0, 4, 1, 1, 1, 0, 2, 0, 0, 1, 3, 3, 2, 4, 4, 3, 0,
        3, 2], device='cuda:0')
10.0
tensor(1.3103, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 2, 4, 4, 3, 2, 1, 4, 3, 3, 1, 3, 3, 2, 3, 2, 1, 3, 4, 3, 4, 4, 3, 3, 1, 2, 1, 1, 2, 2, 3, 1, 3, 3, 4, 4, 3, 2, 3, 3, 1, 4, 2, 3, 2, 0, 0, 3]
 val_label: tensor([1, 2, 2, 2, 3, 4, 3, 3, 1, 1, 4, 3, 1, 3, 4, 0, 3, 0, 1, 3, 4, 3, 1, 4,
        1, 3, 1, 2, 0, 1, 3, 3, 4, 1, 3, 2, 1, 3, 1, 2, 3, 4, 1, 2, 2, 3, 2, 0,
        0, 1], device='cuda:0')
10.0
tensor(0.9894, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 4, 2, 4, 4, 1, 2, 1, 3, 2, 4, 1, 2, 4, 3, 1, 4, 2, 3, 4, 3, 4, 3, 1, 4, 1, 4, 1, 2, 2, 4, 3, 1, 2, 3, 3, 3, 4, 3, 0, 

val_output: [1, 3, 3, 2, 1, 0, 3, 2, 2, 3, 4, 3, 4, 3, 3, 4, 1, 4, 4, 0, 2, 4, 3, 0, 3, 3, 3, 4, 2, 3, 4, 4, 1, 3, 1, 3, 3, 3, 2, 4, 3, 3, 0, 2, 4, 2, 3, 4, 3, 4]
 val_label: tensor([3, 1, 3, 2, 1, 0, 1, 2, 2, 3, 0, 3, 2, 1, 0, 3, 1, 4, 3, 0, 2, 4, 4, 0,
        3, 0, 3, 4, 1, 3, 0, 4, 1, 2, 1, 3, 3, 3, 3, 4, 1, 3, 0, 1, 3, 2, 3, 4,
        4, 4], device='cuda:0')
10.0
tensor(0.9890, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 1, 0, 3, 3, 2, 1, 1, 3, 1, 4, 2, 4, 3, 3, 3, 4, 1, 4, 3, 4, 3, 3, 3, 0, 4, 4, 0, 2, 0, 2, 2, 4, 1, 1, 4, 4, 2, 3, 0, 2, 4, 3, 2, 3, 4, 0, 3, 1]
 val_label: tensor([4, 1, 1, 0, 2, 2, 3, 1, 2, 2, 1, 4, 2, 4, 3, 3, 3, 4, 0, 4, 1, 3, 3, 3,
        3, 0, 4, 4, 0, 4, 0, 4, 3, 3, 1, 1, 3, 2, 2, 3, 0, 4, 4, 3, 2, 0, 4, 0,
        1, 0], device='cuda:0')
10.0
tensor(0.9383, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 4, 4, 3, 1, 1, 1, 2, 3, 3, 0, 2, 4, 2, 3, 3, 3, 0, 3, 4, 3, 2, 3, 0, 2, 0, 3, 4, 2, 1, 1, 2, 0, 2, 4, 2, 4, 3, 2, 2, 

tensor(1.3474, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 1, 1, 2, 3, 4, 1, 1, 3, 2, 2, 2, 2, 4, 3, 2, 0, 4, 2, 4, 4, 3, 3, 4, 3, 4, 0, 1, 3, 0, 3, 2, 3, 1, 2, 4, 3, 1, 2, 1, 3, 4, 3, 4, 0, 2, 3, 4]
 val_label: tensor([4, 1, 3, 0, 3, 2, 2, 3, 1, 0, 1, 0, 3, 2, 0, 3, 1, 2, 3, 3, 2, 3, 3, 4,
        3, 3, 2, 4, 0, 1, 3, 0, 3, 2, 3, 0, 3, 3, 3, 1, 0, 1, 0, 4, 3, 0, 0, 2,
        0, 3], device='cuda:0')
10.0
tensor(1.1905, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 0, 3, 1, 3, 0, 3, 2, 4, 4, 2, 3, 4, 2, 4, 1, 3, 1, 0, 3, 0, 1, 2, 4, 4, 2, 3, 3, 4, 2, 1, 1, 4, 1, 4, 3, 4, 3, 3, 2, 3, 2, 4, 4, 3, 1, 2, 4, 2]
 val_label: tensor([3, 0, 0, 3, 1, 3, 3, 3, 3, 4, 2, 4, 3, 3, 0, 3, 3, 3, 0, 0, 3, 0, 1, 3,
        4, 1, 0, 3, 4, 4, 4, 1, 1, 4, 1, 4, 2, 4, 3, 3, 2, 3, 2, 4, 3, 3, 1, 2,
        4, 2], device='cuda:0')
10.0
tensor(0.9538, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 4, 1, 4, 4, 3, 3, 2, 4, 4, 1, 3, 2, 2, 3, 1, 2, 2, 3, 4, 2, 

tensor(1.2617, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 3, 3, 1, 0, 3, 2, 3, 2, 3, 2, 4, 2, 1, 0, 2, 3, 2, 3, 0, 3, 1, 4, 3, 2, 3, 0, 4, 3, 3, 4, 3, 3, 0, 1, 3, 3, 1, 4, 3, 2, 3, 3, 2, 4, 2, 1, 1, 0]
 val_label: tensor([1, 3, 4, 3, 1, 3, 3, 3, 2, 1, 3, 4, 4, 2, 1, 1, 2, 2, 2, 4, 0, 3, 1, 1,
        1, 2, 1, 0, 4, 3, 4, 4, 4, 3, 0, 1, 3, 1, 1, 4, 1, 2, 0, 4, 2, 4, 3, 1,
        0, 2], device='cuda:0')
10.0
tensor(1.1371, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 3, 3, 2, 4, 3, 1, 4, 3, 4, 2, 2, 1, 1, 4, 1, 1, 3, 4, 1, 1, 1, 2, 3, 3, 3, 4, 2, 3, 0, 4, 3, 2, 2, 3, 1, 2, 1, 2, 2, 4, 4, 3, 1, 2, 2, 3, 3, 2]
 val_label: tensor([1, 1, 3, 4, 4, 4, 1, 4, 3, 3, 2, 2, 2, 1, 1, 4, 1, 1, 1, 2, 1, 1, 0, 3,
        3, 0, 3, 1, 0, 3, 0, 2, 0, 2, 2, 3, 3, 2, 1, 2, 2, 1, 4, 4, 1, 3, 2, 1,
        4, 2], device='cuda:0')
10.0
tensor(1.2165, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 4, 2, 3, 2, 4, 0, 4, 1, 3, 2, 0, 3, 4, 4, 3, 3, 4, 4, 3, 

val_output: [0, 0, 4, 2, 3, 4, 1, 3, 3, 4, 2, 2, 3, 2, 4, 1, 1, 3, 1, 3, 1, 2, 3, 0, 1, 4, 2, 1, 3, 0, 2, 2, 3, 3, 3, 4, 3, 3, 3, 4, 3, 3, 0, 3, 3, 3, 3, 4, 3, 0]
 val_label: tensor([0, 0, 4, 1, 2, 4, 1, 3, 1, 4, 4, 2, 3, 2, 1, 3, 1, 4, 1, 3, 1, 2, 3, 0,
        1, 4, 2, 1, 4, 0, 2, 2, 1, 0, 3, 2, 1, 3, 3, 4, 2, 3, 0, 4, 3, 3, 1, 4,
        2, 0], device='cuda:0')
10.0
tensor(0.9579, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 1, 0, 3, 1, 4, 1, 3, 3, 2, 2, 1, 4, 2, 3, 4, 3, 2, 2, 3, 2, 3, 3, 3, 3, 3, 4, 2, 3, 2, 2, 4, 1, 3, 1, 2, 3, 4, 0, 3, 3, 3, 3, 3, 2, 4, 3, 2, 0]
 val_label: tensor([3, 1, 1, 0, 4, 1, 4, 1, 1, 2, 2, 2, 3, 4, 3, 2, 4, 3, 3, 2, 3, 2, 3, 3,
        0, 3, 4, 4, 2, 0, 4, 0, 4, 1, 1, 0, 2, 1, 4, 0, 2, 3, 3, 4, 4, 2, 2, 3,
        2, 0], device='cuda:0')
10.0
tensor(0.8686, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 1, 3, 2, 3, 4, 3, 3, 2, 2, 3, 1, 2, 4, 2, 3, 3, 3, 1, 3, 2, 3, 1, 3, 3, 2, 3, 3, 3, 3, 0, 0, 1, 2, 2, 4, 4, 3, 3, 1, 

val_output: [2, 2, 1, 3, 3, 3, 3, 2, 2, 1, 3, 0, 3, 3, 3, 4, 3, 2, 3, 3, 2, 0, 4, 3, 1, 0, 1, 3, 2, 3, 0, 3, 4, 2, 3, 1, 1, 2, 3, 2, 2, 3, 3, 2, 4, 3, 3, 3, 4, 1]
 val_label: tensor([3, 2, 1, 3, 1, 3, 4, 4, 3, 1, 3, 0, 0, 3, 3, 4, 3, 0, 4, 3, 2, 0, 4, 4,
        3, 0, 1, 0, 3, 0, 0, 2, 4, 2, 3, 1, 1, 2, 2, 2, 3, 1, 3, 2, 2, 3, 2, 0,
        4, 3], device='cuda:0')
10.0
tensor(1.1520, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 2, 3, 1, 4, 0, 1, 3, 3, 3, 1, 3, 1, 2, 3, 4, 2, 4, 3, 4, 3, 2, 2, 1, 0, 2, 2, 2, 3, 2, 2, 1, 3, 2, 1, 4, 1, 1, 0, 3, 4, 1, 3, 2, 2, 2, 3, 3]
 val_label: tensor([3, 2, 4, 2, 4, 1, 2, 0, 1, 1, 1, 3, 1, 0, 1, 3, 2, 4, 2, 2, 3, 3, 3, 3,
        1, 0, 0, 3, 0, 3, 1, 2, 4, 1, 3, 3, 2, 3, 0, 1, 0, 3, 0, 1, 0, 2, 2, 3,
        1, 3], device='cuda:0')
10.0
tensor(1.5506, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 2, 1, 0, 1, 4, 1, 2, 0, 4, 4, 3, 4, 1, 0, 2, 0, 1, 3, 2, 4, 4, 3, 2, 0, 3, 2, 2, 4, 2, 3, 4, 4, 3, 4, 4, 3, 3, 1, 0, 

val_output: [3, 1, 3, 4, 4, 3, 4, 4, 2, 2, 2, 3, 4, 3, 2, 4, 2, 0, 1, 3, 4, 4, 1, 4, 1, 3, 3, 3, 3, 0, 2, 4, 2, 4, 3, 4, 1, 1, 3, 1, 3, 2, 4, 4, 4, 3, 4, 1, 3, 3]
 val_label: tensor([3, 1, 4, 4, 4, 3, 1, 3, 3, 2, 2, 3, 2, 3, 1, 2, 0, 3, 1, 4, 4, 0, 1, 4,
        1, 3, 3, 1, 3, 0, 2, 2, 2, 0, 3, 3, 0, 2, 2, 1, 0, 2, 4, 3, 4, 3, 4, 1,
        3, 2], device='cuda:0')
10.0
tensor(1.0079, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 3, 2, 1, 3, 1, 4, 3, 3, 4, 1, 3, 2, 3, 2, 4, 1, 3, 4, 1, 2, 4, 3, 3, 3, 4, 3, 2, 3, 4, 4, 4, 2, 3, 1, 2, 2, 2, 4, 2, 4, 4, 2, 2, 3, 1, 2, 2, 4, 3]
 val_label: tensor([0, 1, 2, 0, 4, 0, 3, 2, 3, 3, 1, 3, 1, 3, 4, 4, 1, 4, 0, 1, 2, 4, 4, 2,
        1, 4, 3, 4, 0, 4, 4, 4, 2, 2, 3, 2, 2, 2, 4, 0, 4, 3, 2, 1, 1, 1, 3, 1,
        4, 0], device='cuda:0')
10.0
tensor(1.0780, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 1, 2, 4, 4, 1, 2, 3, 1, 1, 0, 3, 2, 4, 2, 2, 0, 3, 1, 4, 0, 2, 2, 0, 4, 1, 3, 4, 1, 1, 4, 3, 2, 1, 2, 1, 4, 4, 2, 2, 3, 

val_output: [2, 3, 1, 3, 3, 3, 4, 4, 3, 4, 1, 3, 2, 3, 1, 2, 3, 1, 3, 3, 1, 4, 3, 3, 3, 3, 4, 4, 3, 3, 3, 3, 1, 4, 4, 2, 3, 4, 4, 1, 0, 3, 3, 3, 3, 3, 4, 3, 3, 2]
 val_label: tensor([3, 3, 0, 3, 3, 2, 4, 4, 2, 2, 3, 3, 2, 1, 2, 3, 3, 0, 3, 3, 0, 2, 3, 3,
        3, 3, 4, 3, 3, 1, 2, 4, 1, 0, 4, 2, 4, 2, 4, 1, 3, 1, 1, 2, 3, 4, 3, 3,
        3, 4], device='cuda:0')
10.0
tensor(1.2861, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 4, 4, 4, 1, 3, 4, 3, 0, 1, 2, 3, 2, 2, 3, 0, 2, 4, 4, 0, 2, 4, 0, 1, 3, 4, 2, 3, 3, 4, 3, 4, 3, 2, 2, 2, 1, 4, 1, 3, 3, 2, 4, 2, 0, 0, 3, 1, 2]
 val_label: tensor([2, 2, 4, 4, 4, 3, 3, 4, 3, 0, 1, 3, 4, 2, 2, 3, 3, 0, 3, 4, 0, 2, 4, 0,
        0, 3, 4, 2, 0, 3, 4, 3, 4, 3, 3, 3, 2, 1, 4, 0, 4, 1, 2, 4, 2, 2, 3, 3,
        1, 1], device='cuda:0')
10.0
tensor(1.0248, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 4, 4, 4, 3, 3, 3, 3, 1, 2, 3, 2, 2, 3, 2, 3, 3, 3, 4, 3, 3, 2, 1, 2, 2, 3, 4, 2, 4, 4, 2, 1, 4, 3, 2, 2, 4, 4, 4, 3, 

val_output: [4, 1, 1, 2, 3, 3, 4, 3, 4, 3, 4, 2, 3, 3, 0, 1, 4, 3, 3, 2, 4, 1, 1, 3, 3, 2, 3, 2, 4, 0, 1, 2, 3, 3, 4, 1, 3, 2, 1, 3, 0, 3, 1, 3, 2, 3, 4, 0, 3, 3]
 val_label: tensor([4, 0, 4, 3, 3, 3, 4, 3, 2, 1, 3, 2, 3, 3, 0, 1, 4, 3, 3, 2, 4, 1, 2, 0,
        3, 3, 3, 2, 4, 0, 0, 2, 0, 4, 4, 3, 3, 2, 3, 2, 3, 1, 2, 3, 3, 3, 4, 0,
        4, 3], device='cuda:0')
10.0
tensor(1.0105, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 3, 2, 3, 2, 3, 3, 0, 3, 4, 4, 4, 2, 3, 1, 2, 2, 1, 3, 0, 3, 4, 3, 2, 4, 2, 4, 4, 3, 3, 1, 3, 2, 3, 3, 2, 3, 2, 2, 2, 2, 2, 3, 4, 3, 1, 2, 3]
 val_label: tensor([3, 3, 3, 2, 2, 0, 3, 3, 3, 0, 4, 4, 3, 4, 3, 3, 3, 2, 3, 1, 1, 0, 3, 3,
        3, 2, 4, 2, 3, 3, 3, 3, 1, 1, 2, 3, 3, 1, 3, 3, 2, 3, 2, 2, 1, 4, 3, 3,
        0, 4], device='cuda:0')
10.0
tensor(1.0472, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 1, 0, 4, 0, 3, 2, 1, 3, 3, 2, 3, 3, 2, 4, 3, 2, 2, 0, 3, 3, 2, 3, 0, 3, 0, 2, 1, 1, 2, 1, 4, 4, 3, 1, 2, 0, 1, 2, 2, 1, 

val_output: [3, 3, 4, 3, 3, 0, 2, 4, 3, 3, 0, 3, 1, 3, 4, 3, 1, 2, 2, 3, 3, 1, 2, 3, 1, 1, 2, 0, 2, 3, 4, 3, 2, 1, 1, 4, 1, 0, 3, 3, 3, 4, 0, 3, 1, 4, 4, 3, 2, 1]
 val_label: tensor([1, 1, 4, 3, 0, 0, 1, 4, 4, 3, 0, 3, 3, 1, 4, 3, 1, 4, 2, 3, 2, 1, 3, 3,
        0, 3, 2, 0, 2, 3, 3, 4, 2, 0, 1, 3, 0, 3, 3, 2, 0, 4, 0, 1, 0, 4, 4, 0,
        0, 1], device='cuda:0')
10.0
tensor(1.1392, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 1, 3, 2, 4, 0, 3, 2, 4, 3, 4, 3, 4, 4, 2, 2, 3, 3, 3, 4, 4, 1, 3, 4, 0, 4, 3, 3, 3, 4, 1, 3, 1, 3, 2, 0, 3, 3, 3, 3, 2, 4, 1, 3, 4, 2, 2, 4, 2]
 val_label: tensor([2, 3, 1, 3, 0, 3, 0, 3, 2, 4, 3, 3, 3, 4, 3, 2, 2, 1, 3, 0, 4, 4, 1, 4,
        0, 3, 4, 3, 3, 3, 1, 2, 2, 3, 3, 2, 0, 3, 3, 0, 2, 3, 4, 1, 0, 1, 2, 2,
        4, 2], device='cuda:0')
10.0
tensor(1.1574, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 0, 2, 0, 3, 3, 0, 3, 0, 2, 3, 3, 3, 4, 4, 0, 3, 1, 2, 3, 3, 4, 2, 2, 3, 2, 1, 1, 4, 3, 2, 4, 1, 2, 2, 1, 3, 1, 3, 3, 

val_output: [4, 3, 3, 4, 2, 0, 0, 3, 1, 1, 3, 3, 2, 3, 3, 3, 3, 1, 2, 3, 2, 2, 3, 4, 2, 0, 3, 3, 3, 3, 3, 3, 3, 1, 4, 4, 3, 2, 1, 3, 2, 4, 0, 0, 4, 3, 2, 2, 1, 3]
 val_label: tensor([4, 3, 3, 4, 1, 3, 0, 3, 2, 2, 1, 4, 3, 3, 4, 3, 3, 1, 2, 1, 2, 1, 3, 4,
        2, 0, 3, 4, 0, 3, 3, 2, 3, 1, 3, 4, 4, 1, 0, 4, 2, 4, 0, 0, 0, 3, 2, 2,
        1, 1], device='cuda:0')
10.0
tensor(1.1351, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 1, 2, 2, 3, 1, 0, 1, 3, 3, 4, 4, 3, 3, 3, 3, 4, 4, 3, 0, 3, 1, 4, 2, 4, 2, 2, 2, 1, 3, 4, 2, 2, 3, 1, 4, 4, 4, 1, 3, 3, 3, 2, 4, 4, 2, 3, 1, 3, 2]
 val_label: tensor([1, 1, 2, 3, 1, 3, 4, 1, 3, 1, 3, 1, 1, 4, 1, 3, 0, 3, 2, 0, 3, 1, 4, 2,
        3, 2, 3, 2, 1, 2, 4, 3, 4, 0, 3, 4, 1, 4, 1, 3, 3, 2, 2, 4, 4, 0, 3, 3,
        3, 0], device='cuda:0')
10.0
tensor(1.1747, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 0, 4, 0, 1, 1, 2, 4, 4, 4, 2, 0, 3, 2, 4, 2, 2, 2, 3, 3, 2, 4, 4, 3, 3, 2, 2, 0, 2, 3, 1, 0, 3, 4, 3, 4, 1, 1, 3, 3, 3, 

val_output: [0, 2, 3, 3, 2, 4, 4, 1, 1, 4, 3, 1, 4, 2, 1, 4, 3, 3, 1, 2, 1, 2, 3, 4, 4, 2, 1, 4, 3, 4, 4, 2, 3, 2, 4, 1, 4, 2, 3, 3, 2, 3, 3, 3, 2, 0, 2, 4, 4, 3]
 val_label: tensor([3, 4, 3, 1, 1, 3, 4, 1, 1, 4, 3, 0, 4, 2, 1, 4, 3, 3, 3, 2, 1, 4, 1, 3,
        3, 2, 3, 4, 2, 3, 4, 2, 3, 2, 2, 1, 4, 2, 3, 3, 1, 3, 2, 0, 2, 0, 4, 0,
        4, 1], device='cuda:0')
10.0
tensor(0.9457, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 3, 3, 3, 1, 2, 3, 1, 3, 3, 3, 3, 3, 1, 3, 2, 4, 2, 3, 4, 2, 3, 3, 3, 4, 3, 3, 3, 0, 3, 4, 4, 4, 2, 1, 3, 1, 2, 1, 2, 1, 3, 3, 2, 3, 4, 2, 3, 4]
 val_label: tensor([2, 1, 3, 4, 3, 3, 4, 3, 3, 3, 3, 3, 4, 2, 1, 2, 2, 3, 2, 0, 4, 0, 2, 4,
        3, 4, 2, 4, 0, 0, 3, 3, 0, 4, 3, 1, 3, 0, 3, 1, 0, 1, 0, 3, 2, 0, 0, 3,
        3, 4], device='cuda:0')
10.0
tensor(1.1390, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 0, 4, 3, 4, 3, 4, 3, 4, 2, 2, 3, 4, 0, 3, 2, 1, 4, 2, 1, 3, 3, 4, 4, 4, 1, 3, 0, 4, 3, 3, 4, 1, 3, 3, 2, 3, 2, 2, 2, 

val_output: [4, 2, 4, 3, 0, 3, 4, 4, 3, 4, 2, 3, 1, 2, 4, 3, 3, 3, 2, 3, 3, 4, 3, 2, 3, 3, 1, 3, 1, 3, 3, 3, 3, 3, 3, 4, 2, 0, 3, 3, 4, 3, 4, 3, 1, 4, 3, 2, 0, 1]
 val_label: tensor([4, 0, 4, 0, 0, 3, 4, 4, 1, 4, 0, 3, 1, 2, 4, 4, 1, 1, 4, 4, 3, 3, 3, 1,
        3, 3, 1, 3, 1, 4, 1, 4, 1, 2, 1, 4, 2, 1, 2, 0, 4, 1, 1, 4, 1, 4, 2, 2,
        0, 1], device='cuda:0')
10.0
tensor(1.2101, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 3, 3, 0, 2, 2, 1, 4, 4, 1, 4, 1, 3, 4, 1, 4, 4, 3, 4, 3, 2, 4, 0, 4, 3, 2, 2, 2, 3, 2, 4, 2, 1, 1, 1, 4, 3, 3, 1, 2, 1, 2, 1, 1, 4, 2, 0, 3, 4]
 val_label: tensor([1, 0, 3, 3, 0, 3, 3, 1, 4, 4, 2, 4, 1, 3, 4, 1, 4, 3, 3, 4, 3, 2, 4, 0,
        3, 3, 4, 0, 1, 4, 2, 4, 2, 0, 3, 1, 3, 4, 1, 1, 2, 0, 2, 0, 1, 2, 4, 0,
        3, 4], device='cuda:0')
10.0
tensor(1.1040, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 3, 3, 1, 3, 2, 4, 2, 2, 2, 4, 1, 2, 1, 3, 4, 4, 3, 0, 4, 3, 4, 1, 1, 0, 3, 4, 1, 1, 3, 4, 3, 4, 3, 4, 3, 2, 1, 4, 

val_output: [2, 2, 4, 2, 2, 1, 3, 4, 0, 1, 3, 3, 4, 2, 2, 3, 1, 1, 3, 4, 2, 4, 1, 2, 1, 3, 2, 4, 1, 4, 2, 2, 4, 3, 4, 3, 2, 4, 1, 4, 4, 2, 2, 3, 3, 0, 3, 1, 3, 2]
 val_label: tensor([2, 3, 0, 2, 2, 3, 3, 4, 0, 0, 4, 3, 3, 2, 2, 2, 1, 1, 2, 4, 2, 3, 2, 1,
        0, 1, 2, 0, 0, 2, 2, 2, 4, 3, 4, 4, 3, 3, 1, 0, 3, 2, 2, 0, 0, 2, 3, 1,
        3, 0], device='cuda:0')
10.0
tensor(1.1919, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 0, 3, 3, 4, 4, 1, 3, 4, 1, 3, 1, 3, 3, 2, 1, 2, 4, 4, 3, 2, 2, 2, 3, 1, 1, 3, 2, 4, 4, 3, 3, 0, 2, 3, 4, 3, 4, 3, 3, 4, 4, 1, 3, 2, 3, 3, 2, 1, 3]
 val_label: tensor([0, 0, 1, 3, 4, 4, 1, 3, 4, 1, 3, 1, 3, 2, 2, 1, 2, 3, 2, 0, 2, 2, 2, 3,
        1, 1, 3, 4, 3, 4, 0, 3, 0, 2, 0, 2, 2, 3, 1, 0, 4, 4, 0, 3, 0, 1, 3, 0,
        0, 2], device='cuda:0')
10.0
tensor(1.1052, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 1, 2, 3, 0, 2, 1, 3, 3, 2, 3, 3, 3, 4, 4, 3, 3, 3, 4, 2, 3, 4, 4, 2, 1, 1, 3, 3, 2, 3, 3, 2, 4, 4, 3, 3, 4, 3, 4, 2, 3, 

val_output: [4, 2, 4, 1, 3, 4, 4, 3, 3, 2, 3, 3, 1, 1, 1, 2, 4, 4, 4, 1, 0, 0, 2, 4, 4, 2, 2, 4, 3, 1, 3, 1, 4, 1, 3, 3, 1, 2, 2, 3, 1, 4, 4, 3, 0, 1, 1, 4, 2, 1]
 val_label: tensor([3, 3, 4, 1, 1, 1, 1, 3, 3, 1, 1, 3, 2, 1, 1, 2, 4, 0, 0, 4, 0, 0, 2, 4,
        1, 2, 4, 0, 3, 0, 1, 1, 4, 1, 1, 3, 1, 2, 1, 4, 1, 4, 4, 2, 0, 3, 1, 0,
        4, 1], device='cuda:0')
10.0
tensor(1.1668, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 0, 1, 1, 4, 3, 4, 1, 4, 0, 2, 1, 0, 4, 2, 3, 4, 0, 1, 3, 1, 3, 2, 3, 3, 0, 0, 3, 3, 3, 3, 3, 3, 4, 4, 4, 2, 3, 3, 4, 1, 1, 4, 0, 0, 2, 1, 2, 4, 2]
 val_label: tensor([2, 0, 1, 2, 4, 3, 4, 1, 4, 0, 2, 1, 0, 1, 2, 3, 4, 0, 1, 1, 2, 3, 3, 3,
        3, 0, 0, 1, 2, 3, 2, 1, 3, 4, 3, 3, 3, 0, 4, 4, 1, 1, 1, 0, 0, 4, 1, 2,
        3, 2], device='cuda:0')
10.0
tensor(0.9581, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 2, 2, 4, 2, 1, 2, 3, 3, 4, 4, 0, 2, 4, 1, 1, 0, 3, 1, 2, 4, 3, 3, 1, 4, 3, 3, 2, 4, 3, 4, 1, 1, 3, 3, 4, 1, 2, 3, 4, 

val_output: [2, 4, 2, 2, 4, 0, 3, 3, 2, 0, 1, 3, 2, 1, 4, 4, 3, 2, 2, 1, 4, 2, 1, 0, 3, 3, 2, 1, 0, 4, 2, 4, 4, 3, 3, 3, 3, 2, 3, 2, 2, 1, 2, 3, 1, 4, 1, 4, 1, 3]
 val_label: tensor([2, 4, 2, 2, 4, 0, 3, 4, 2, 0, 1, 0, 3, 4, 4, 4, 3, 2, 2, 1, 4, 4, 1, 0,
        1, 3, 2, 1, 1, 4, 1, 4, 4, 3, 3, 3, 1, 3, 3, 2, 2, 1, 0, 3, 1, 3, 1, 3,
        3, 3], device='cuda:0')
10.0
tensor(0.9592, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 4, 0, 3, 4, 4, 4, 4, 3, 2, 0, 4, 2, 4, 2, 4, 3, 1, 3, 3, 2, 2, 3, 3, 4, 4, 3, 3, 2, 4, 0, 3, 4, 2, 4, 3, 1, 2, 2, 3, 1, 4, 4, 2, 3, 3, 0, 0, 2]
 val_label: tensor([1, 2, 4, 0, 2, 4, 4, 4, 4, 3, 3, 0, 4, 3, 4, 0, 4, 1, 2, 3, 3, 3, 2, 3,
        3, 3, 4, 3, 4, 3, 3, 0, 1, 3, 2, 4, 3, 1, 2, 3, 2, 1, 2, 4, 1, 4, 0, 0,
        1, 1], device='cuda:0')
10.0
tensor(1.1338, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 1, 1, 3, 3, 4, 2, 3, 4, 4, 4, 2, 3, 1, 2, 4, 2, 4, 1, 2, 3, 4, 3, 3, 1, 4, 2, 3, 3, 0, 3, 3, 4, 4, 4, 2, 1, 3, 2, 4, 

val_output: [3, 2, 2, 4, 4, 2, 3, 2, 3, 2, 0, 3, 2, 4, 3, 1, 1, 3, 2, 3, 1, 4, 0, 4, 2, 0, 4, 3, 3, 4, 4, 3, 4, 1, 2, 2, 1, 4, 2, 4, 1, 4, 2, 3, 1, 2, 0, 4, 4, 0]
 val_label: tensor([4, 3, 2, 4, 4, 3, 4, 0, 4, 2, 0, 1, 1, 4, 1, 1, 1, 3, 2, 4, 1, 2, 0, 1,
        4, 0, 4, 2, 3, 0, 3, 3, 0, 0, 2, 4, 1, 4, 2, 4, 2, 4, 2, 1, 0, 2, 3, 4,
        4, 0], device='cuda:0')
10.0
tensor(1.1030, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 4, 0, 2, 3, 4, 4, 3, 1, 4, 1, 1, 3, 4, 1, 4, 4, 0, 4, 2, 3, 3, 3, 1, 1, 3, 3, 2, 2, 3, 2, 2, 0, 3, 4, 3, 2, 1, 1, 3, 2, 3, 3, 2, 2, 1, 3, 1, 4]
 val_label: tensor([4, 4, 4, 3, 0, 1, 4, 3, 3, 1, 4, 3, 0, 3, 4, 1, 4, 0, 0, 4, 3, 3, 0, 2,
        2, 0, 0, 3, 2, 2, 2, 1, 2, 0, 3, 0, 3, 2, 1, 1, 0, 0, 3, 3, 2, 2, 1, 2,
        1, 4], device='cuda:0')
10.0
tensor(1.2521, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 4, 4, 3, 3, 2, 4, 2, 4, 3, 2, 1, 2, 1, 3, 4, 4, 2, 2, 3, 2, 3, 3, 2, 2, 4, 4, 0, 3, 3, 4, 2, 2, 3, 2, 3, 2, 3, 3, 1, 

val_output: [4, 2, 4, 3, 2, 1, 2, 1, 3, 3, 4, 4, 3, 4, 2, 2, 4, 1, 2, 4, 0, 2, 1, 4, 4, 4, 3, 0, 4, 3, 1, 2, 0, 3, 3, 4, 3, 4, 1, 3, 3, 3, 0, 4, 3, 3, 3, 4, 1, 4]
 val_label: tensor([4, 3, 3, 1, 3, 1, 4, 1, 4, 0, 4, 4, 0, 4, 3, 4, 4, 1, 2, 3, 0, 0, 0, 4,
        4, 4, 2, 0, 3, 3, 1, 2, 0, 4, 3, 3, 4, 4, 1, 3, 0, 3, 0, 3, 0, 1, 3, 0,
        1, 4], device='cuda:0')
10.0
tensor(0.9713, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 4, 4, 4, 1, 4, 3, 4, 1, 3, 0, 0, 4, 2, 2, 3, 2, 2, 1, 2, 3, 1, 0, 2, 1, 2, 4, 3, 3, 4, 4, 4, 1, 3, 1, 2, 0, 2, 3, 3, 3, 4, 3, 2, 4, 4, 1, 2, 1]
 val_label: tensor([2, 3, 4, 4, 4, 2, 4, 1, 3, 1, 3, 3, 0, 0, 2, 2, 0, 3, 3, 3, 2, 2, 1, 1,
        2, 0, 0, 4, 1, 1, 4, 4, 2, 1, 0, 2, 1, 0, 2, 2, 1, 1, 3, 3, 3, 1, 1, 1,
        2, 1], device='cuda:0')
10.0
tensor(1.2081, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 3, 3, 2, 3, 4, 2, 0, 2, 3, 4, 3, 3, 4, 4, 3, 4, 4, 4, 1, 3, 3, 3, 0, 4, 4, 3, 1, 3, 2, 0, 4, 0, 0, 2, 3, 3, 3, 4, 0, 

val_output: [3, 1, 0, 3, 3, 2, 1, 1, 2, 1, 3, 2, 3, 1, 0, 4, 3, 4, 3, 3, 2, 4, 3, 0, 3, 4, 2, 0, 2, 3, 1, 3, 4, 2, 3, 0, 3, 2, 4, 2, 2, 3, 2, 2, 3, 3, 4, 0, 4, 3]
 val_label: tensor([3, 2, 0, 3, 2, 3, 1, 0, 2, 1, 1, 2, 0, 3, 0, 4, 3, 4, 1, 1, 2, 1, 4, 0,
        3, 4, 1, 0, 4, 3, 1, 2, 4, 2, 3, 0, 2, 2, 4, 2, 1, 2, 3, 2, 4, 4, 4, 1,
        1, 0], device='cuda:0')
10.0
tensor(1.1449, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 3, 3, 3, 4, 2, 1, 2, 3, 1, 3, 2, 2, 4, 2, 2, 3, 4, 3, 4, 3, 2, 4, 3, 4, 3, 1, 4, 2, 2, 4, 3, 4, 4, 4, 3, 1, 0, 3, 0, 4, 3, 3, 3, 4, 4, 0, 2, 0]
 val_label: tensor([1, 3, 3, 3, 3, 4, 2, 1, 2, 3, 0, 3, 3, 1, 2, 2, 3, 4, 4, 2, 3, 4, 0, 4,
        1, 4, 2, 3, 4, 2, 3, 4, 1, 3, 4, 3, 2, 1, 0, 4, 0, 3, 0, 4, 1, 4, 3, 0,
        1, 0], device='cuda:0')
10.0
tensor(1.1779, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 3, 2, 1, 4, 1, 4, 3, 2, 2, 3, 1, 3, 3, 3, 4, 1, 3, 1, 4, 4, 4, 2, 3, 4, 4, 3, 4, 2, 3, 1, 3, 3, 3, 3, 4, 2, 1, 3, 1, 

val_output: [1, 3, 0, 3, 3, 2, 4, 1, 0, 3, 2, 2, 3, 1, 1, 1, 0, 1, 4, 3, 2, 4, 2, 3, 0, 4, 3, 3, 0, 2, 2, 3, 2, 4, 3, 2, 0, 1, 1, 4, 3, 4, 0, 4, 2, 3, 4, 3, 3, 2]
 val_label: tensor([1, 4, 0, 3, 3, 1, 4, 0, 0, 3, 3, 2, 2, 1, 1, 3, 1, 1, 4, 3, 2, 3, 4, 2,
        0, 0, 3, 4, 1, 2, 2, 3, 3, 4, 1, 2, 0, 1, 1, 4, 0, 1, 3, 4, 2, 3, 2, 1,
        0, 2], device='cuda:0')
10.0
tensor(1.0254, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 1, 4, 3, 4, 1, 2, 3, 3, 3, 4, 2, 4, 3, 3, 1, 3, 1, 4, 2, 3, 3, 2, 4, 3, 3, 4, 4, 3, 1, 4, 1, 4, 0, 2, 3, 1, 4, 3, 4, 2, 4, 3, 1, 4, 4, 2, 1, 2, 3]
 val_label: tensor([1, 3, 3, 4, 0, 1, 2, 1, 4, 1, 3, 2, 4, 1, 4, 2, 4, 1, 3, 2, 3, 3, 3, 4,
        3, 0, 2, 4, 4, 1, 4, 1, 4, 0, 4, 3, 1, 4, 3, 3, 2, 4, 1, 3, 4, 1, 2, 0,
        2, 3], device='cuda:0')
10.0
tensor(1.0264, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 0, 3, 2, 2, 3, 2, 2, 1, 3, 1, 0, 1, 1, 4, 3, 0, 3, 4, 4, 4, 3, 3, 3, 1, 0, 3, 0, 0, 0, 4, 4, 2, 3, 2, 2, 1, 4, 1, 0, 

val_output: [3, 4, 2, 3, 4, 3, 0, 1, 4, 4, 4, 2, 4, 0, 1, 1, 1, 2, 2, 2, 0, 4, 2, 4, 3, 4, 4, 2, 1, 3, 3, 4, 3, 3, 3, 3, 4, 3, 4, 2, 3, 3, 3, 3, 3, 1, 1, 0, 4, 3]
 val_label: tensor([4, 4, 2, 3, 4, 0, 0, 0, 4, 4, 4, 1, 4, 0, 1, 4, 1, 2, 3, 2, 0, 4, 1, 4,
        3, 3, 3, 2, 2, 4, 2, 4, 3, 4, 2, 3, 4, 3, 3, 2, 2, 3, 3, 3, 2, 1, 1, 0,
        4, 3], device='cuda:0')
10.0
tensor(0.9361, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 3, 0, 1, 1, 2, 2, 3, 3, 2, 4, 4, 1, 3, 3, 2, 4, 0, 2, 3, 2, 4, 3, 1, 2, 3, 0, 4, 3, 3, 3, 2, 4, 1, 3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 2, 4, 2, 3, 3]
 val_label: tensor([4, 4, 3, 0, 1, 1, 2, 4, 3, 4, 2, 4, 4, 1, 1, 3, 2, 3, 0, 4, 4, 2, 1, 2,
        1, 3, 0, 0, 3, 1, 1, 1, 1, 3, 1, 1, 3, 2, 2, 4, 3, 3, 3, 2, 4, 0, 4, 0,
        3, 0], device='cuda:0')
10.0
tensor(1.2503, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 2, 0, 0, 4, 3, 3, 4, 1, 4, 0, 4, 4, 0, 3, 2, 2, 3, 3, 3, 0, 3, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 3, 3, 4, 4, 3, 3, 1, 2, 

val_output: [3, 4, 1, 1, 3, 2, 3, 4, 2, 2, 4, 3, 4, 2, 4, 1, 1, 3, 4, 1, 1, 0, 0, 0, 4, 4, 4, 3, 2, 1, 3, 0, 3, 3, 4, 3, 4, 3, 4, 2, 1, 1, 3, 4, 3, 2, 2, 4, 3, 3]
 val_label: tensor([0, 3, 1, 0, 3, 0, 1, 4, 2, 4, 3, 3, 4, 3, 4, 1, 0, 3, 4, 0, 0, 0, 0, 0,
        3, 4, 4, 3, 0, 3, 3, 0, 1, 0, 0, 1, 4, 3, 4, 2, 3, 0, 3, 3, 3, 3, 2, 4,
        2, 3], device='cuda:0')
10.0
tensor(1.0327, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 3, 2, 2, 4, 4, 2, 2, 3, 3, 0, 3, 3, 0, 3, 3, 4, 4, 3, 4, 3, 4, 3, 2, 3, 4, 4, 3, 1, 3, 3, 3, 3, 4, 4, 3, 0, 1, 4, 4, 3, 3, 2, 2, 2, 4, 3, 3]
 val_label: tensor([0, 4, 1, 3, 2, 2, 4, 0, 2, 1, 3, 3, 0, 1, 3, 0, 4, 4, 4, 2, 3, 4, 3, 4,
        2, 3, 4, 4, 3, 3, 1, 1, 2, 0, 1, 4, 4, 4, 0, 1, 4, 4, 3, 0, 2, 2, 2, 4,
        3, 1], device='cuda:0')
10.0
tensor(0.8649, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 3, 1, 3, 2, 4, 4, 1, 3, 3, 4, 3, 3, 1, 1, 2, 3, 2, 2, 4, 3, 1, 3, 3, 1, 3, 3, 4, 4, 3, 1, 3, 3, 0, 4, 3, 4, 3, 2, 3, 

val_output: [2, 4, 2, 3, 2, 1, 0, 4, 2, 4, 4, 4, 3, 1, 4, 1, 2, 4, 2, 4, 3, 2, 3, 4, 4, 4, 1, 4, 3, 3, 3, 4, 4, 4, 3, 2, 4, 4, 4, 1, 4, 0, 3, 4, 3, 2, 1, 4, 2, 3]
 val_label: tensor([2, 3, 2, 0, 2, 1, 3, 4, 0, 4, 4, 2, 4, 3, 4, 3, 1, 0, 2, 4, 3, 2, 3, 3,
        4, 4, 1, 4, 0, 3, 0, 3, 3, 4, 1, 1, 4, 2, 4, 1, 3, 0, 3, 2, 2, 2, 1, 3,
        3, 4], device='cuda:0')
10.0
tensor(1.1696, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 2, 3, 4, 3, 3, 3, 4, 3, 3, 2, 2, 3, 3, 3, 2, 3, 3, 3, 4, 1, 4, 3, 1, 2, 4, 3, 3, 1, 4, 2, 3, 4, 2, 3, 3, 2, 1, 3, 3, 4, 3, 1, 3, 2, 2, 2, 1, 1]
 val_label: tensor([1, 2, 1, 1, 4, 3, 3, 3, 4, 0, 3, 2, 2, 4, 4, 3, 2, 3, 3, 2, 3, 1, 4, 3,
        1, 2, 4, 3, 3, 1, 2, 1, 3, 4, 3, 3, 3, 4, 1, 4, 1, 4, 0, 0, 1, 0, 2, 2,
        2, 0], device='cuda:0')
8.0
tensor(1.0443, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 2, 3, 0, 2, 3, 0, 3, 3, 3, 3, 4, 1, 1, 4, 4, 3, 2, 2, 4, 4, 3, 4, 4, 2, 2, 3, 3, 4, 4, 1, 3, 3, 1, 3, 4, 1, 0, 0, 1, 3

val_output: [3, 4, 1, 3, 1, 3, 3, 3, 4, 3, 4, 3, 3, 3, 1, 3, 3, 4, 2, 3, 2, 2, 2, 3, 3, 2, 4, 0, 3, 4, 2, 3, 4, 1, 3, 3, 1, 0, 2, 4, 4, 3, 3, 3, 3, 2, 0, 3, 3, 4]
 val_label: tensor([0, 2, 1, 1, 1, 3, 3, 3, 0, 3, 4, 2, 4, 4, 1, 3, 0, 4, 3, 4, 3, 2, 2, 0,
        4, 0, 1, 0, 3, 2, 4, 3, 4, 1, 3, 3, 1, 0, 2, 3, 4, 3, 3, 0, 3, 2, 1, 3,
        3, 4], device='cuda:0')
10.0
tensor(1.1410, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 2, 3, 2, 3, 4, 0, 4, 2, 3, 2, 1, 2, 1, 1, 1, 2, 3, 2, 3, 4, 3, 3, 3, 4, 3, 4, 3, 4, 2, 2, 2, 3, 4, 4, 3, 2, 1, 4, 1, 4, 2, 0, 3, 3, 3, 3, 3, 3]
 val_label: tensor([4, 4, 2, 3, 4, 4, 4, 0, 4, 2, 3, 2, 0, 2, 1, 1, 3, 2, 0, 2, 0, 4, 2, 3,
        4, 4, 3, 3, 2, 3, 2, 3, 2, 1, 2, 3, 1, 2, 1, 3, 3, 4, 2, 0, 2, 0, 0, 3,
        3, 3], device='cuda:0')
10.0
tensor(1.0147, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 4, 1, 4, 0, 4, 4, 0, 0, 3, 0, 2, 0, 3, 4, 3, 3, 3, 4, 2, 4, 1, 1, 2, 3, 4, 1, 2, 2, 3, 4, 3, 4, 2, 3, 3, 2, 0, 3, 2, 

val_output: [1, 2, 2, 4, 3, 4, 3, 4, 4, 2, 1, 1, 3, 3, 3, 2, 4, 3, 4, 4, 4, 4, 4, 2, 1, 2, 4, 4, 3, 3, 4, 4, 3, 0, 1, 3, 3, 2, 1, 3, 3, 2, 2, 3, 0, 1, 2, 2, 2, 1]
 val_label: tensor([1, 2, 2, 3, 3, 2, 2, 4, 3, 4, 1, 3, 2, 3, 1, 2, 4, 4, 4, 2, 4, 4, 4, 2,
        1, 2, 4, 4, 3, 3, 4, 0, 1, 0, 2, 3, 0, 2, 1, 3, 3, 1, 0, 3, 0, 3, 3, 3,
        2, 1], device='cuda:0')
10.0
tensor(1.0951, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 2, 4, 3, 1, 3, 3, 4, 4, 3, 3, 1, 2, 2, 2, 0, 3, 2, 4, 1, 2, 2, 3, 2, 1, 2, 3, 1, 4, 0, 1, 3, 3, 3, 2, 3, 2, 4, 2, 4, 3, 2, 2, 2, 4, 4, 2, 4, 3]
 val_label: tensor([3, 0, 0, 0, 1, 2, 2, 1, 4, 4, 1, 3, 1, 2, 2, 3, 0, 3, 2, 2, 1, 2, 3, 1,
        2, 1, 4, 3, 1, 4, 3, 1, 0, 3, 3, 2, 1, 2, 3, 2, 4, 2, 4, 3, 3, 4, 4, 2,
        3, 3], device='cuda:0')
10.0
tensor(1.0528, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 2, 4, 2, 1, 3, 4, 4, 2, 3, 2, 0, 2, 2, 3, 3, 3, 2, 2, 1, 2, 1, 2, 3, 3, 2, 4, 4, 4, 4, 4, 4, 2, 4, 4, 0, 4, 3, 1, 2, 

val_output: [1, 3, 4, 4, 0, 4, 3, 4, 4, 4, 1, 3, 3, 1, 2, 3, 3, 4, 3, 4, 4, 2, 3, 4, 2, 4, 3, 3, 3, 3, 3, 3, 2, 4, 4, 2, 0, 4, 4, 1, 1, 2, 1, 0, 3, 4, 3, 3, 1, 2]
 val_label: tensor([1, 3, 4, 3, 0, 4, 3, 4, 4, 4, 1, 3, 3, 1, 2, 3, 3, 3, 2, 4, 4, 2, 3, 4,
        2, 4, 2, 3, 3, 3, 2, 3, 4, 4, 1, 4, 3, 0, 4, 0, 3, 2, 2, 2, 1, 4, 2, 1,
        2, 2], device='cuda:0')
10.0
tensor(0.9405, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 4, 3, 4, 3, 3, 1, 3, 4, 1, 1, 4, 4, 4, 3, 3, 1, 2, 3, 4, 4, 4, 1, 3, 4, 3, 1, 4, 4, 0, 4, 4, 2, 2, 2, 4, 4, 1, 3, 4, 3, 1, 1, 1, 1, 1, 2, 3]
 val_label: tensor([1, 4, 2, 4, 4, 4, 0, 3, 1, 3, 0, 3, 1, 0, 4, 4, 0, 3, 1, 2, 3, 4, 4, 3,
        1, 4, 4, 3, 1, 4, 4, 0, 1, 4, 2, 2, 2, 0, 2, 1, 3, 4, 3, 3, 1, 1, 1, 1,
        2, 1], device='cuda:0')
10.0
tensor(0.8012, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 3, 2, 4, 2, 3, 1, 4, 3, 2, 4, 2, 3, 4, 3, 3, 4, 2, 2, 2, 0, 0, 4, 1, 4, 2, 1, 2, 4, 3, 2, 1, 4, 2, 2, 1, 2, 2, 0, 

val_output: [3, 3, 2, 4, 3, 3, 1, 3, 0, 4, 3, 2, 1, 2, 1, 2, 1, 1, 2, 0, 1, 1, 3, 4, 3, 0, 4, 3, 3, 3, 2, 1, 2, 4, 4, 4, 4, 3, 3, 2, 2, 3, 4, 3, 3, 1, 0, 4, 0, 1]
 val_label: tensor([3, 1, 2, 4, 4, 3, 0, 3, 3, 4, 4, 2, 0, 3, 1, 3, 1, 1, 2, 0, 0, 1, 2, 4,
        3, 0, 3, 3, 1, 1, 2, 0, 3, 4, 3, 4, 4, 3, 1, 3, 4, 3, 0, 1, 3, 1, 0, 4,
        3, 1], device='cuda:0')
10.0
tensor(1.0747, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 3, 2, 4, 3, 4, 0, 2, 0, 2, 0, 4, 3, 3, 1, 3, 2, 0, 0, 4, 3, 0, 1, 4, 1, 3, 2, 4, 2, 3, 1, 3, 4, 3, 3, 4, 4, 3, 1, 3, 4, 2, 1, 1, 3, 4, 2, 3]
 val_label: tensor([4, 1, 3, 3, 3, 4, 3, 3, 0, 2, 0, 3, 0, 1, 1, 3, 1, 2, 2, 3, 0, 3, 0, 3,
        1, 3, 1, 3, 2, 3, 2, 3, 1, 4, 2, 0, 4, 3, 4, 3, 2, 3, 4, 2, 0, 1, 4, 2,
        2, 0], device='cuda:0')
10.0
tensor(1.0341, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 0, 4, 4, 0, 3, 4, 2, 1, 3, 3, 4, 4, 0, 3, 2, 4, 3, 4, 4, 1, 2, 3, 1, 3, 1, 1, 1, 3, 3, 3, 3, 3, 1, 4, 4, 4, 4, 1, 3, 

val_output: [3, 2, 4, 2, 2, 1, 3, 0, 4, 3, 4, 3, 4, 2, 4, 3, 1, 2, 2, 2, 2, 3, 2, 2, 3, 4, 4, 3, 2, 2, 3, 2, 2, 4, 4, 3, 3, 2, 4, 3, 2, 3, 2, 2, 0, 0, 2, 1, 3, 2]
 val_label: tensor([1, 2, 0, 2, 3, 3, 3, 1, 4, 3, 4, 3, 3, 2, 4, 1, 1, 2, 2, 1, 1, 2, 2, 2,
        3, 4, 4, 0, 2, 3, 2, 4, 2, 4, 4, 3, 1, 2, 3, 3, 2, 1, 3, 1, 0, 0, 1, 0,
        0, 2], device='cuda:0')
10.0
tensor(1.1335, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 4, 3, 1, 4, 3, 2, 2, 3, 3, 0, 0, 4, 4, 3, 3, 4, 4, 3, 3, 3, 3, 0, 1, 4, 3, 4, 0, 3, 3, 4, 1, 2, 0, 3, 3, 2, 0, 0, 1, 3, 4, 3, 4, 3, 4, 1, 2]
 val_label: tensor([3, 3, 4, 0, 2, 3, 2, 3, 3, 3, 2, 3, 0, 0, 4, 4, 2, 3, 4, 1, 1, 3, 0, 2,
        2, 3, 4, 1, 4, 0, 3, 4, 4, 1, 0, 2, 3, 2, 3, 0, 0, 1, 1, 4, 3, 3, 3, 4,
        0, 1], device='cuda:0')
10.0
tensor(1.0776, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 1, 4, 4, 3, 3, 4, 1, 3, 2, 2, 3, 3, 3, 3, 1, 1, 2, 4, 1, 2, 4, 2, 1, 3, 3, 4, 1, 4, 4, 3, 3, 2, 3, 2, 0, 4, 2, 2, 1, 

val_output: [3, 4, 2, 3, 0, 3, 4, 1, 3, 3, 2, 0, 3, 4, 1, 1, 2, 3, 3, 2, 2, 3, 0, 1, 1, 2, 4, 0, 4, 3, 1, 4, 3, 3, 2, 2, 2, 4, 4, 1, 4, 3, 0, 2, 3, 2, 3, 4, 3, 4]
 val_label: tensor([4, 4, 2, 0, 0, 3, 2, 1, 3, 3, 2, 0, 3, 4, 1, 1, 2, 2, 3, 3, 2, 3, 0, 1,
        1, 2, 4, 0, 3, 0, 0, 4, 2, 3, 2, 2, 2, 4, 3, 0, 4, 3, 3, 3, 3, 1, 3, 4,
        2, 3], device='cuda:0')
10.0
tensor(0.8665, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 4, 2, 3, 4, 4, 4, 2, 3, 0, 0, 0, 3, 3, 4, 4, 3, 4, 3, 1, 3, 4, 0, 1, 3, 1, 3, 3, 4, 2, 4, 1, 4, 2, 1, 4, 2, 3, 3, 3, 3, 4, 2, 2, 1, 4, 2, 2, 1]
 val_label: tensor([1, 4, 4, 2, 3, 4, 1, 4, 2, 3, 0, 0, 0, 1, 1, 4, 4, 3, 3, 3, 1, 3, 4, 0,
        1, 1, 1, 3, 4, 4, 2, 4, 1, 4, 2, 1, 4, 4, 3, 3, 0, 1, 3, 2, 3, 1, 4, 2,
        3, 0], device='cuda:0')
10.0
tensor(0.8688, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 0, 4, 2, 3, 4, 3, 3, 2, 4, 4, 3, 4, 4, 4, 2, 3, 2, 3, 2, 3, 3, 4, 3, 1, 1, 3, 2, 3, 2, 3, 2, 3, 4, 4, 3, 0, 3, 2, 4, 3, 

val_output: [1, 3, 4, 3, 3, 1, 3, 4, 0, 1, 4, 2, 2, 1, 1, 0, 4, 4, 4, 4, 1, 4, 4, 0, 3, 4, 1, 3, 3, 0, 4, 1, 3, 2, 1, 2, 0, 3, 1, 4, 2, 2, 3, 3, 3, 2, 1, 2, 4, 2]
 val_label: tensor([1, 4, 4, 3, 0, 0, 4, 4, 0, 1, 0, 2, 3, 1, 3, 3, 4, 2, 4, 4, 1, 4, 4, 0,
        2, 3, 3, 3, 3, 1, 4, 1, 3, 3, 2, 2, 0, 4, 1, 0, 2, 2, 3, 0, 4, 3, 1, 2,
        4, 2], device='cuda:0')
10.0
tensor(0.9362, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 1, 3, 2, 2, 2, 3, 3, 3, 2, 3, 2, 1, 4, 1, 2, 3, 1, 3, 3, 3, 0, 4, 1, 2, 2, 2, 4, 2, 4, 4, 1, 2, 2, 2, 1, 0, 4, 1, 0, 3, 3, 3, 4, 4, 3, 3, 2, 4]
 val_label: tensor([4, 1, 1, 2, 3, 2, 3, 0, 3, 3, 3, 3, 2, 1, 3, 0, 1, 4, 1, 3, 3, 1, 0, 4,
        1, 2, 2, 3, 4, 1, 4, 4, 1, 2, 2, 2, 1, 0, 4, 0, 3, 2, 2, 3, 2, 2, 0, 3,
        2, 2], device='cuda:0')
10.0
tensor(0.9767, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 3, 3, 0, 3, 3, 3, 4, 4, 1, 4, 3, 3, 2, 3, 4, 3, 2, 3, 4, 0, 3, 0, 1, 1, 4, 2, 3, 0, 3, 3, 0, 3, 3, 3, 4, 1, 0, 4, 0, 

val_output: [4, 3, 2, 3, 3, 1, 1, 4, 4, 4, 0, 1, 4, 1, 4, 2, 3, 2, 4, 2, 2, 4, 2, 3, 4, 2, 4, 3, 4, 3, 4, 3, 2, 4, 0, 4, 4, 3, 3, 2, 2, 0, 3, 3, 4, 4, 3, 3, 3, 1]
 val_label: tensor([4, 3, 0, 4, 3, 1, 0, 4, 4, 4, 0, 3, 4, 3, 2, 0, 4, 2, 4, 2, 2, 4, 2, 3,
        4, 1, 4, 1, 4, 1, 4, 3, 2, 4, 0, 1, 4, 3, 0, 3, 2, 1, 3, 3, 2, 4, 3, 3,
        3, 3], device='cuda:0')
10.0
tensor(0.9436, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 2, 2, 3, 4, 0, 4, 2, 3, 3, 2, 3, 2, 1, 2, 3, 4, 3, 2, 3, 3, 4, 3, 3, 4, 3, 3, 2, 4, 1, 3, 2, 4, 3, 3, 2, 2, 3, 3, 3, 1, 4, 2, 2, 4, 3, 3, 3, 1]
 val_label: tensor([2, 3, 2, 2, 3, 4, 0, 4, 3, 2, 1, 2, 1, 2, 1, 2, 3, 3, 4, 4, 3, 3, 0, 4,
        3, 3, 3, 3, 0, 3, 0, 1, 2, 4, 4, 3, 2, 2, 3, 3, 2, 1, 2, 3, 2, 4, 0, 3,
        3, 1], device='cuda:0')
10.0
tensor(1.0333, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 2, 3, 1, 4, 0, 4, 4, 2, 3, 2, 1, 2, 2, 3, 1, 3, 4, 2, 2, 3, 4, 3, 4, 2, 3, 1, 3, 2, 0, 0, 4, 3, 0, 2, 2, 1, 4, 0, 

tensor(1.0815, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 2, 4, 0, 3, 3, 2, 4, 2, 2, 2, 2, 2, 3, 2, 3, 2, 1, 1, 3, 0, 4, 1, 2, 2, 4, 3, 2, 4, 4, 4, 2, 3, 2, 3, 3, 1, 3, 2, 3, 4, 2, 4, 4, 4, 3, 4, 2]
 val_label: tensor([3, 3, 3, 2, 4, 1, 1, 3, 2, 4, 2, 2, 2, 4, 2, 3, 4, 3, 2, 1, 3, 3, 0, 3,
        0, 2, 2, 0, 3, 2, 3, 1, 4, 2, 3, 0, 3, 3, 1, 4, 2, 3, 1, 2, 3, 3, 4, 3,
        3, 2], device='cuda:0')
10.0
tensor(0.9442, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 2, 4, 2, 3, 4, 2, 2, 2, 1, 3, 4, 2, 4, 1, 0, 3, 4, 3, 0, 2, 2, 3, 3, 3, 3, 4, 3, 3, 1, 3, 2, 4, 1, 3, 0, 2, 2, 4, 4, 3, 4, 1, 0, 1, 1, 3, 3, 0, 3]
 val_label: tensor([0, 2, 4, 2, 3, 3, 2, 2, 2, 1, 2, 2, 2, 4, 1, 0, 3, 4, 3, 0, 4, 2, 3, 1,
        3, 3, 0, 0, 3, 1, 3, 2, 0, 3, 4, 0, 2, 3, 4, 4, 3, 2, 0, 0, 2, 1, 1, 2,
        0, 3], device='cuda:0')
10.0
tensor(0.9927, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 1, 3, 4, 3, 2, 2, 4, 4, 2, 2, 1, 1, 3, 3, 2, 0, 3, 3, 0, 

val_output: [1, 3, 4, 3, 3, 1, 3, 3, 3, 1, 2, 4, 3, 1, 3, 3, 4, 2, 4, 1, 4, 1, 4, 3, 2, 2, 2, 2, 0, 3, 4, 2, 1, 4, 1, 4, 3, 4, 3, 4, 4, 0, 3, 0, 3, 4, 4, 4, 1, 3]
 val_label: tensor([1, 3, 4, 0, 1, 1, 3, 1, 3, 0, 2, 2, 4, 1, 3, 3, 4, 2, 4, 1, 3, 1, 4, 2,
        2, 4, 2, 4, 0, 4, 4, 4, 1, 4, 2, 4, 1, 4, 3, 4, 4, 0, 3, 0, 4, 4, 3, 2,
        1, 4], device='cuda:0')
10.0
tensor(0.8500, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 3, 3, 1, 3, 3, 3, 2, 3, 3, 3, 0, 4, 4, 4, 3, 3, 2, 4, 2, 4, 3, 4, 2, 3, 4, 2, 3, 3, 0, 0, 4, 2, 2, 3, 2, 3, 3, 2, 3, 4, 4, 2, 1, 2, 3, 3, 4, 3]
 val_label: tensor([3, 0, 1, 3, 1, 3, 3, 2, 2, 4, 3, 4, 3, 4, 2, 2, 3, 3, 2, 4, 1, 4, 1, 0,
        4, 1, 4, 3, 3, 2, 1, 0, 4, 4, 2, 4, 2, 3, 1, 2, 3, 4, 0, 2, 1, 2, 1, 4,
        1, 3], device='cuda:0')
10.0
tensor(1.1227, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 1, 3, 2, 4, 2, 3, 4, 2, 1, 4, 1, 4, 2, 4, 3, 3, 3, 1, 1, 4, 1, 2, 4, 1, 3, 3, 3, 3, 1, 4, 1, 4, 2, 3, 4, 1, 3, 3, 4, 

val_output: [3, 3, 3, 3, 2, 3, 3, 2, 0, 3, 3, 2, 4, 1, 0, 2, 3, 2, 3, 4, 3, 1, 1, 2, 1, 3, 2, 1, 2, 2, 2, 1, 2, 2, 1, 4, 3, 1, 4, 4, 4, 2, 2, 4, 3, 4, 2, 2, 3, 4]
 val_label: tensor([3, 2, 3, 1, 2, 4, 3, 2, 0, 3, 2, 1, 4, 1, 3, 2, 2, 2, 1, 4, 3, 1, 1, 3,
        2, 3, 3, 2, 0, 2, 2, 1, 2, 0, 1, 4, 0, 0, 4, 2, 4, 2, 1, 1, 3, 2, 3, 1,
        3, 4], device='cuda:0')
10.0
tensor(1.1906, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 3, 4, 1, 1, 3, 2, 1, 1, 2, 3, 3, 1, 3, 2, 1, 3, 1, 4, 3, 2, 0, 4, 4, 2, 3, 4, 4, 2, 2, 4, 4, 2, 1, 4, 2, 3, 3, 3, 4, 1, 0, 3, 4, 3, 2, 2, 4, 3]
 val_label: tensor([4, 2, 3, 4, 1, 3, 3, 2, 0, 2, 2, 1, 3, 3, 3, 4, 0, 2, 1, 4, 3, 3, 0, 1,
        3, 2, 2, 4, 4, 3, 2, 1, 4, 2, 1, 3, 2, 4, 2, 3, 4, 1, 0, 4, 3, 3, 2, 1,
        0, 3], device='cuda:0')
10.0
tensor(1.2425, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 4, 4, 3, 4, 1, 3, 4, 3, 3, 3, 4, 4, 3, 3, 4, 2, 4, 2, 3, 2, 3, 1, 1, 0, 3, 3, 1, 4, 3, 2, 3, 4, 1, 2, 2, 3, 3, 3, 

val_output: [4, 4, 3, 4, 1, 4, 0, 2, 2, 1, 3, 1, 2, 2, 1, 2, 3, 4, 4, 0, 2, 0, 2, 1, 2, 3, 4, 4, 3, 0, 2, 2, 4, 3, 3, 3, 3, 4, 1, 3, 4, 2, 4, 4, 2, 1, 4, 3, 2, 2]
 val_label: tensor([4, 1, 3, 3, 3, 1, 0, 0, 2, 0, 2, 0, 2, 2, 1, 3, 3, 4, 0, 0, 2, 0, 2, 1,
        2, 3, 4, 3, 3, 3, 3, 3, 4, 3, 2, 4, 3, 4, 1, 3, 4, 2, 4, 3, 3, 1, 3, 2,
        2, 2], device='cuda:0')
10.0
tensor(1.0586, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 1, 0, 3, 3, 2, 3, 1, 4, 3, 3, 2, 1, 2, 3, 3, 2, 2, 3, 4, 2, 1, 3, 2, 1, 1, 3, 3, 4, 2, 4, 1, 2, 2, 1, 4, 3, 3, 1, 1, 2, 0, 1, 2, 4, 0, 2, 3, 3]
 val_label: tensor([3, 3, 1, 0, 3, 0, 2, 3, 1, 3, 2, 3, 1, 0, 2, 3, 4, 2, 4, 3, 4, 2, 1, 3,
        2, 1, 3, 3, 2, 4, 2, 4, 1, 2, 1, 1, 4, 1, 3, 3, 3, 3, 0, 1, 2, 3, 0, 2,
        3, 1], device='cuda:0')
10.0
tensor(0.8823, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 1, 4, 1, 3, 3, 3, 4, 3, 4, 3, 1, 3, 2, 3, 0, 4, 1, 2, 1, 4, 4, 2, 3, 0, 4, 2, 3, 3, 0, 2, 3, 2, 4, 3, 4, 3, 2, 4, 4, 

val_output: [3, 0, 1, 2, 2, 3, 4, 3, 4, 4, 2, 2, 3, 0, 0, 3, 2, 0, 2, 2, 3, 4, 2, 2, 1, 2, 2, 3, 2, 3, 2, 1, 0, 4, 4, 4, 1, 3, 3, 0, 0, 2, 3, 3, 4, 3, 4, 4, 3, 4]
 val_label: tensor([1, 0, 1, 2, 2, 3, 4, 3, 3, 4, 2, 0, 3, 0, 0, 0, 2, 0, 0, 2, 3, 4, 2, 2,
        2, 3, 3, 2, 2, 2, 2, 1, 0, 0, 4, 3, 1, 3, 3, 1, 1, 2, 3, 2, 0, 4, 4, 4,
        1, 3], device='cuda:0')
10.0
tensor(1.1277, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 3, 2, 4, 2, 3, 4, 1, 0, 3, 3, 0, 1, 2, 3, 3, 4, 4, 3, 2, 4, 3, 3, 4, 3, 2, 4, 3, 4, 2, 4, 3, 1, 2, 3, 0, 2, 4, 2, 4, 3, 0, 4, 3, 4, 3, 2, 4, 4]
 val_label: tensor([1, 3, 2, 2, 3, 2, 3, 4, 1, 0, 3, 3, 3, 1, 3, 4, 1, 4, 4, 1, 2, 4, 2, 0,
        3, 3, 3, 4, 2, 4, 2, 4, 2, 0, 1, 3, 0, 4, 4, 2, 4, 3, 0, 4, 4, 4, 1, 2,
        3, 2], device='cuda:0')
10.0
tensor(0.9811, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 3, 4, 3, 4, 1, 2, 0, 0, 3, 2, 2, 4, 3, 3, 3, 3, 0, 3, 3, 2, 4, 3, 3, 1, 1, 2, 2, 1, 1, 2, 2, 3, 3, 4, 3, 3, 3, 2, 2, 

val_output: [4, 0, 4, 4, 3, 4, 4, 4, 2, 2, 0, 4, 3, 3, 2, 4, 4, 4, 3, 4, 2, 3, 3, 4, 3, 3, 4, 4, 1, 3, 3, 3, 3, 1, 3, 3, 2, 2, 2, 2, 0, 2, 2, 2, 3, 3, 3, 2, 1, 4]
 val_label: tensor([4, 0, 4, 3, 4, 0, 0, 0, 2, 3, 0, 4, 3, 3, 3, 3, 4, 4, 1, 3, 0, 3, 3, 1,
        3, 0, 2, 4, 1, 2, 3, 3, 3, 3, 2, 2, 2, 3, 1, 2, 3, 2, 2, 2, 2, 3, 4, 2,
        0, 4], device='cuda:0')
10.0
tensor(1.1904, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 3, 1, 0, 2, 4, 4, 4, 3, 1, 3, 4, 3, 4, 2, 3, 2, 2, 3, 4, 3, 1, 1, 2, 1, 2, 3, 4, 3, 4, 3, 4, 4, 1, 2, 0, 3, 2, 3, 3, 4, 4, 2, 1, 3, 3, 4, 3, 2]
 val_label: tensor([2, 2, 1, 1, 0, 3, 0, 3, 2, 4, 2, 4, 2, 4, 4, 3, 3, 3, 2, 4, 2, 2, 1, 1,
        4, 1, 2, 1, 4, 0, 4, 3, 4, 3, 0, 0, 3, 3, 2, 2, 2, 3, 4, 2, 1, 0, 4, 0,
        3, 2], device='cuda:0')
10.0
tensor(1.1406, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 2, 3, 4, 0, 2, 2, 3, 2, 2, 2, 3, 3, 4, 2, 4, 3, 3, 4, 3, 2, 4, 1, 3, 3, 4, 4, 3, 1, 2, 3, 4, 0, 1, 2, 4, 4, 1, 4, 3, 2, 

tensor(1.0831, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 1, 2, 3, 3, 2, 3, 4, 4, 3, 3, 3, 4, 3, 0, 1, 2, 3, 3, 2, 1, 1, 2, 2, 3, 3, 4, 1, 3, 0, 3, 3, 4, 3, 2, 3, 2, 3, 4, 3, 2, 1, 3, 4, 3, 0, 3, 4, 2]
 val_label: tensor([3, 2, 1, 2, 3, 4, 3, 2, 4, 4, 3, 3, 3, 0, 3, 0, 1, 3, 1, 3, 0, 1, 1, 2,
        1, 0, 3, 4, 1, 4, 1, 3, 3, 3, 1, 0, 3, 0, 3, 4, 3, 2, 0, 3, 2, 1, 0, 4,
        2, 2], device='cuda:0')
10.0
tensor(1.1461, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 1, 3, 3, 3, 3, 4, 2, 3, 2, 1, 4, 2, 2, 3, 2, 3, 4, 3, 0, 4, 1, 4, 2, 3, 0, 3, 1, 4, 2, 0, 4, 3, 3, 2, 4, 1, 3, 1, 4, 3, 3, 4, 3, 3, 2, 2, 4]
 val_label: tensor([2, 2, 3, 0, 3, 3, 3, 2, 4, 1, 3, 2, 1, 4, 2, 4, 4, 2, 3, 4, 4, 0, 4, 1,
        4, 2, 1, 1, 1, 0, 3, 3, 0, 3, 3, 4, 4, 2, 2, 3, 1, 4, 0, 4, 4, 3, 3, 2,
        0, 4], device='cuda:0')
10.0
tensor(1.0281, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 2, 3, 0, 2, 3, 2, 3, 3, 1, 3, 2, 1, 2, 3, 4, 3, 4, 1, 3, 

val_output: [4, 2, 2, 4, 2, 3, 0, 2, 2, 0, 3, 3, 0, 4, 2, 3, 4, 3, 3, 4, 3, 0, 1, 3, 3, 4, 4, 2, 4, 3, 4, 4, 1, 3, 2, 1, 3, 4, 2, 0, 4, 2, 4, 4, 2, 4, 2, 3, 1, 2]
 val_label: tensor([4, 3, 2, 4, 2, 3, 0, 2, 2, 0, 4, 3, 0, 4, 2, 3, 0, 3, 4, 3, 3, 2, 3, 3,
        2, 4, 1, 3, 4, 3, 4, 4, 1, 2, 3, 1, 0, 4, 2, 0, 4, 0, 0, 0, 2, 4, 1, 1,
        1, 2], device='cuda:0')
10.0
tensor(1.1351, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 2, 4, 2, 2, 4, 3, 1, 0, 3, 2, 3, 3, 4, 0, 1, 2, 2, 1, 3, 0, 4, 4, 0, 3, 4, 3, 4, 4, 3, 3, 3, 0, 3, 3, 4, 3, 0, 4, 1, 4, 0, 3, 3, 4, 4, 2, 2, 4]
 val_label: tensor([2, 3, 1, 4, 4, 1, 1, 4, 3, 0, 3, 2, 2, 3, 4, 0, 2, 3, 3, 2, 0, 0, 4, 3,
        0, 3, 4, 3, 4, 4, 2, 0, 3, 0, 3, 1, 3, 4, 0, 4, 1, 3, 0, 3, 3, 3, 4, 4,
        1, 2], device='cuda:0')
10.0
tensor(1.0122, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 4, 1, 2, 2, 3, 3, 1, 4, 3, 3, 2, 0, 2, 2, 0, 2, 2, 1, 3, 2, 2, 2, 3, 0, 2, 4, 3, 4, 3, 3, 2, 4, 4, 4, 2, 1, 2, 3, 3, 

val_output: [4, 0, 3, 4, 4, 1, 3, 4, 0, 3, 2, 0, 4, 4, 4, 3, 3, 1, 4, 0, 2, 2, 4, 2, 3, 4, 4, 1, 3, 3, 4, 3, 3, 3, 0, 3, 2, 3, 2, 3, 2, 1, 4, 3, 2, 4, 3, 3, 3, 3]
 val_label: tensor([4, 0, 3, 4, 0, 2, 3, 4, 0, 2, 2, 0, 1, 3, 4, 2, 4, 3, 4, 0, 2, 4, 2, 3,
        3, 4, 2, 0, 3, 0, 4, 4, 3, 3, 0, 1, 2, 3, 2, 3, 3, 1, 2, 3, 0, 4, 0, 2,
        0, 3], device='cuda:0')
10.0
tensor(1.0838, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 4, 4, 3, 1, 4, 3, 3, 3, 1, 4, 3, 3, 3, 2, 1, 4, 2, 2, 0, 0, 0, 2, 4, 2, 0, 2, 2, 3, 2, 2, 4, 2, 4, 4, 0, 4, 3, 3, 4, 3, 4, 0, 4, 3, 2, 4, 0, 1]
 val_label: tensor([2, 3, 4, 3, 3, 3, 4, 3, 0, 3, 1, 0, 2, 2, 3, 2, 1, 4, 2, 0, 0, 0, 0, 0,
        1, 2, 0, 2, 3, 1, 3, 0, 4, 2, 4, 4, 0, 3, 2, 3, 4, 2, 4, 0, 3, 2, 0, 2,
        0, 3], device='cuda:0')
10.0
tensor(1.1165, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 2, 2, 2, 3, 3, 4, 2, 3, 3, 1, 1, 2, 1, 1, 2, 1, 1, 1, 4, 4, 1, 3, 3, 2, 4, 3, 2, 4, 3, 4, 4, 2, 3, 4, 2, 3, 3, 4, 1, 

val_output: [1, 2, 2, 1, 2, 4, 2, 2, 4, 3, 1, 1, 3, 4, 4, 3, 2, 4, 3, 3, 3, 3, 1, 3, 1, 2, 2, 3, 4, 0, 4, 3, 4, 1, 0, 2, 2, 3, 1, 2, 4, 3, 3, 2, 3, 1, 3, 3, 3, 3]
 val_label: tensor([1, 0, 0, 1, 1, 3, 2, 2, 4, 3, 1, 1, 3, 4, 0, 3, 3, 4, 0, 3, 2, 3, 0, 2,
        1, 2, 2, 3, 4, 1, 3, 3, 4, 0, 0, 1, 2, 3, 1, 4, 3, 1, 3, 0, 3, 1, 4, 3,
        3, 3], device='cuda:0')
10.0
tensor(1.0543, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 3, 2, 4, 2, 1, 4, 2, 1, 1, 3, 0, 1, 0, 2, 4, 4, 1, 3, 3, 4, 0, 0, 4, 4, 3, 3, 1, 3, 4, 1, 3, 4, 2, 1, 2, 2, 3, 3, 3, 4, 4, 2, 0, 3, 1, 0, 2, 4]
 val_label: tensor([4, 1, 1, 2, 4, 2, 1, 0, 3, 1, 1, 3, 0, 1, 0, 2, 3, 2, 2, 3, 0, 4, 0, 0,
        0, 3, 0, 3, 3, 3, 4, 1, 4, 4, 3, 3, 2, 0, 2, 1, 3, 4, 4, 3, 0, 3, 1, 0,
        0, 4], device='cuda:0')
10.0
tensor(1.0388, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 2, 2, 3, 4, 3, 1, 3, 1, 3, 1, 0, 3, 3, 3, 3, 3, 0, 2, 3, 2, 3, 3, 3, 3, 4, 0, 3, 2, 4, 1, 1, 2, 2, 4, 1, 1, 2, 3, 

val_output: [4, 0, 3, 3, 0, 2, 4, 2, 2, 1, 0, 3, 2, 3, 3, 4, 1, 1, 2, 4, 1, 4, 4, 3, 3, 1, 4, 2, 2, 2, 2, 0, 3, 1, 1, 2, 4, 3, 3, 3, 4, 2, 2, 1, 2, 4, 4, 2, 4, 4]
 val_label: tensor([3, 0, 2, 4, 0, 2, 4, 2, 4, 3, 0, 3, 2, 1, 4, 4, 1, 3, 2, 0, 1, 2, 3, 4,
        3, 3, 4, 2, 2, 4, 3, 0, 4, 0, 1, 3, 4, 3, 3, 3, 2, 2, 2, 1, 2, 4, 4, 0,
        3, 3], device='cuda:0')
10.0
tensor(1.0461, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 0, 1, 0, 3, 4, 2, 2, 1, 2, 3, 4, 4, 2, 4, 0, 4, 3, 2, 0, 4, 0, 3, 3, 1, 2, 2, 4, 4, 3, 4, 4, 4, 0, 2, 4, 4, 4, 2, 2, 3, 1, 3, 3, 3, 3, 0, 0, 3]
 val_label: tensor([2, 2, 0, 0, 0, 0, 3, 2, 2, 1, 3, 4, 4, 3, 2, 4, 2, 3, 2, 2, 0, 2, 0, 0,
        2, 0, 2, 2, 4, 1, 3, 1, 4, 3, 0, 2, 4, 4, 4, 3, 0, 3, 3, 2, 1, 3, 1, 0,
        0, 3], device='cuda:0')
10.0
tensor(1.0886, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 0, 3, 4, 4, 2, 4, 3, 4, 3, 2, 3, 4, 0, 3, 3, 1, 4, 1, 3, 4, 4, 4, 2, 4, 0, 1, 0, 1, 3, 2, 3, 2, 2, 2, 1, 4, 3, 4, 3, 

tensor(1.1421, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 4, 2, 4, 4, 4, 2, 4, 3, 3, 4, 2, 0, 1, 3, 2, 2, 2, 2, 3, 3, 1, 2, 4, 3, 3, 1, 2, 1, 1, 3, 4, 2, 3, 3, 1, 3, 4, 2, 2, 4, 4, 3, 3, 3, 3, 3, 3, 1]
 val_label: tensor([4, 1, 4, 3, 4, 4, 3, 3, 4, 3, 3, 3, 1, 0, 0, 4, 2, 2, 2, 2, 4, 1, 3, 2,
        3, 3, 3, 1, 2, 2, 2, 0, 4, 3, 3, 2, 1, 3, 4, 2, 3, 2, 4, 3, 3, 3, 3, 4,
        0, 1], device='cuda:0')
10.0
tensor(0.9725, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 3, 0, 3, 4, 3, 3, 1, 4, 4, 4, 2, 3, 3, 4, 3, 4, 3, 3, 3, 3, 1, 3, 3, 1, 3, 2, 4, 2, 0, 4, 3, 4, 3, 2, 3, 3, 4, 4, 4, 3, 0, 2, 3, 3, 1, 2, 3, 4]
 val_label: tensor([4, 2, 1, 1, 3, 4, 1, 3, 1, 4, 1, 0, 3, 2, 4, 4, 3, 4, 3, 3, 4, 4, 1, 3,
        4, 3, 1, 2, 4, 3, 0, 4, 0, 0, 1, 2, 3, 3, 4, 4, 4, 1, 0, 2, 3, 3, 3, 2,
        3, 0], device='cuda:0')
10.0
tensor(0.9963, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 4, 2, 2, 1, 0, 4, 3, 0, 2, 4, 3, 3, 3, 2, 4, 4, 1, 1, 1, 

val_output: [2, 1, 4, 3, 3, 2, 0, 2, 1, 2, 2, 4, 4, 2, 0, 4, 3, 2, 4, 3, 3, 4, 4, 3, 3, 4, 1, 0, 2, 2, 2, 1, 1, 0, 3, 2, 3, 1, 4, 2, 3, 2, 1, 3, 1, 4, 4, 2, 4, 4]
 val_label: tensor([4, 1, 4, 3, 4, 2, 0, 2, 1, 2, 2, 4, 3, 2, 3, 4, 3, 2, 1, 1, 1, 4, 4, 1,
        3, 3, 2, 0, 2, 2, 2, 1, 1, 0, 0, 2, 3, 1, 4, 4, 2, 4, 1, 4, 0, 3, 4, 0,
        3, 3], device='cuda:0')
10.0
tensor(0.9300, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 1, 2, 2, 2, 0, 1, 2, 4, 4, 3, 3, 3, 4, 3, 3, 2, 4, 4, 2, 4, 4, 3, 2, 2, 4, 4, 2, 3, 3, 4, 1, 4, 1, 4, 3, 1, 4, 3, 2, 2, 3, 3, 2, 3, 3, 3, 4, 4]
 val_label: tensor([4, 4, 3, 0, 2, 3, 0, 1, 4, 4, 4, 1, 3, 3, 4, 3, 3, 3, 3, 4, 3, 2, 4, 3,
        2, 2, 4, 4, 2, 1, 0, 4, 1, 1, 1, 3, 1, 1, 4, 0, 3, 2, 2, 2, 2, 3, 4, 3,
        3, 4], device='cuda:0')
10.0
tensor(1.0650, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 2, 2, 3, 4, 3, 1, 2, 2, 2, 4, 2, 3, 4, 3, 4, 3, 3, 4, 2, 0, 3, 0, 1, 3, 4, 3, 4, 0, 2, 3, 1, 3, 3, 0, 4, 3, 3, 1, 2, 

val_output: [2, 1, 4, 3, 4, 2, 2, 3, 3, 2, 4, 3, 1, 3, 3, 0, 3, 3, 4, 3, 1, 3, 4, 4, 1, 2, 3, 2, 4, 3, 2, 3, 0, 1, 4, 2, 3, 3, 4, 4, 0, 3, 3, 3, 3, 3, 2, 2, 1, 0]
 val_label: tensor([2, 0, 4, 2, 0, 2, 2, 3, 3, 0, 4, 1, 3, 2, 3, 0, 3, 1, 0, 0, 1, 3, 4, 3,
        1, 2, 3, 1, 4, 3, 2, 1, 0, 3, 3, 2, 0, 1, 4, 4, 0, 3, 0, 0, 3, 1, 4, 1,
        1, 0], device='cuda:0')
10.0
tensor(1.2212, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 4, 3, 3, 2, 1, 1, 4, 3, 2, 3, 2, 0, 1, 3, 3, 2, 3, 2, 3, 4, 0, 4, 0, 3, 4, 3, 4, 4, 4, 1, 2, 4, 4, 3, 4, 4, 2, 4, 1, 3, 4, 3, 3, 4, 3, 4, 3, 2]
 val_label: tensor([1, 3, 3, 1, 3, 2, 1, 1, 4, 3, 2, 3, 2, 0, 0, 3, 3, 3, 3, 2, 2, 2, 0, 1,
        0, 3, 4, 2, 3, 4, 3, 1, 4, 2, 4, 3, 1, 3, 3, 4, 3, 4, 0, 3, 3, 0, 3, 0,
        3, 0], device='cuda:0')
10.0
tensor(0.9968, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 0, 3, 2, 3, 4, 2, 1, 4, 2, 3, 3, 3, 4, 3, 3, 0, 1, 4, 3, 1, 2, 2, 2, 3, 3, 1, 2, 2, 2, 4, 3, 1, 2, 3, 2, 4, 4, 1, 3, 

val_output: [4, 3, 0, 3, 2, 2, 0, 2, 2, 3, 0, 3, 2, 2, 2, 2, 1, 1, 4, 3, 0, 4, 3, 2, 4, 3, 1, 4, 4, 2, 3, 3, 0, 2, 2, 3, 3, 3, 3, 4, 2, 4, 0, 3, 0, 2, 1, 0, 2, 3]
 val_label: tensor([2, 1, 0, 3, 2, 2, 0, 2, 2, 2, 0, 3, 2, 2, 2, 2, 0, 1, 3, 3, 0, 4, 3, 3,
        3, 3, 2, 4, 4, 0, 3, 3, 0, 3, 3, 3, 0, 4, 3, 2, 2, 4, 0, 4, 0, 2, 1, 0,
        2, 3], device='cuda:0')
10.0
tensor(0.8210, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 1, 4, 0, 3, 2, 3, 4, 3, 0, 2, 4, 3, 2, 1, 1, 4, 4, 2, 4, 4, 2, 4, 3, 1, 3, 2, 3, 3, 2, 3, 4, 4, 4, 3, 2, 4, 3, 2, 3, 1, 4, 3, 2, 3, 3, 3, 1, 3]
 val_label: tensor([3, 0, 1, 0, 0, 3, 1, 3, 4, 3, 0, 4, 4, 3, 3, 2, 3, 4, 4, 3, 4, 3, 2, 4,
        4, 3, 4, 2, 4, 3, 1, 3, 2, 4, 0, 1, 2, 1, 4, 0, 3, 3, 4, 4, 2, 0, 3, 3,
        1, 3], device='cuda:0')
10.0
tensor(1.1727, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 2, 3, 2, 4, 3, 2, 2, 4, 1, 0, 4, 4, 3, 3, 3, 2, 2, 0, 1, 3, 2, 1, 4, 2, 3, 2, 0, 1, 4, 3, 3, 4, 4, 4, 2, 3, 3, 3, 3, 

val_output: [0, 3, 1, 2, 1, 3, 3, 3, 3, 3, 2, 1, 4, 3, 2, 4, 2, 3, 3, 4, 1, 1, 4, 4, 3, 1, 4, 4, 3, 3, 2, 3, 3, 2, 4, 2, 0, 1, 3, 4, 4, 1, 2, 2, 1, 2, 2, 2, 4, 3]
 val_label: tensor([3, 2, 1, 2, 1, 4, 3, 4, 2, 4, 2, 1, 1, 0, 3, 4, 2, 3, 3, 4, 1, 0, 4, 4,
        3, 1, 4, 4, 3, 4, 0, 3, 2, 2, 4, 2, 0, 3, 3, 3, 4, 1, 2, 0, 4, 0, 4, 0,
        4, 1], device='cuda:0')
10.0
tensor(1.1268, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 2, 3, 4, 1, 0, 2, 3, 3, 3, 2, 1, 3, 4, 4, 0, 0, 3, 3, 2, 0, 4, 1, 3, 3, 2, 2, 3, 3, 3, 1, 4, 4, 3, 3, 3, 3, 3, 1, 0, 1, 4, 1, 1, 0, 1, 3, 3]
 val_label: tensor([3, 0, 3, 2, 2, 1, 1, 0, 4, 0, 1, 3, 2, 0, 4, 1, 3, 0, 0, 3, 4, 2, 1, 4,
        1, 3, 4, 3, 2, 3, 2, 4, 1, 3, 4, 3, 2, 2, 4, 3, 3, 0, 1, 2, 1, 0, 0, 1,
        3, 1], device='cuda:0')
10.0
tensor(1.0662, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 4, 3, 2, 2, 4, 1, 3, 1, 4, 1, 3, 3, 4, 1, 3, 3, 3, 2, 3, 3, 2, 1, 2, 4, 3, 2, 2, 3, 4, 3, 1, 3, 3, 3, 1, 3, 4, 4, 

tensor(0.8849, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 4, 0, 3, 2, 2, 1, 2, 3, 4, 2, 3, 4, 2, 2, 2, 1, 0, 4, 4, 2, 1, 1, 2, 2, 3, 3, 3, 4, 1, 2, 2, 1, 3, 2, 4, 3, 1, 4, 2, 4, 3, 2, 4, 3, 4, 0, 3]
 val_label: tensor([4, 2, 3, 4, 0, 3, 4, 2, 1, 0, 3, 4, 2, 3, 4, 2, 2, 2, 1, 0, 3, 0, 2, 1,
        1, 1, 3, 3, 0, 1, 3, 0, 1, 2, 1, 4, 3, 3, 2, 1, 4, 0, 4, 3, 2, 4, 2, 4,
        0, 2], device='cuda:0')
10.0
tensor(1.1375, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 3, 1, 4, 4, 1, 3, 1, 3, 3, 2, 3, 1, 1, 2, 3, 2, 4, 3, 4, 0, 3, 2, 2, 3, 4, 3, 2, 3, 3, 4, 3, 4, 2, 2, 3, 2, 2, 2, 1, 0, 3, 3, 2, 2, 2, 3, 3]
 val_label: tensor([3, 4, 3, 3, 1, 3, 4, 1, 3, 1, 1, 3, 2, 2, 3, 1, 2, 4, 2, 4, 4, 4, 4, 3,
        3, 2, 2, 4, 3, 2, 3, 3, 2, 3, 4, 2, 2, 1, 2, 3, 2, 1, 0, 3, 3, 2, 2, 2,
        3, 1], device='cuda:0')
10.0
tensor(0.8381, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 2, 2, 2, 1, 3, 4, 4, 3, 4, 4, 0, 3, 1, 4, 3, 2, 4, 3, 4, 

val_output: [3, 1, 1, 1, 2, 2, 2, 1, 3, 0, 3, 4, 4, 4, 2, 4, 3, 2, 4, 2, 3, 2, 3, 4, 3, 4, 1, 4, 2, 4, 1, 4, 3, 3, 2, 1, 3, 4, 1, 3, 3, 4, 0, 4, 1, 3, 4, 3, 3, 2]
 val_label: tensor([3, 1, 1, 0, 2, 2, 3, 1, 1, 0, 4, 4, 3, 3, 0, 4, 1, 2, 3, 2, 3, 2, 3, 2,
        3, 4, 1, 0, 2, 3, 0, 4, 4, 2, 4, 1, 3, 4, 1, 1, 3, 3, 0, 3, 2, 3, 4, 2,
        4, 2], device='cuda:0')
10.0
tensor(1.0918, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 1, 0, 2, 3, 4, 4, 2, 2, 1, 3, 1, 4, 4, 4, 3, 3, 1, 1, 1, 3, 4, 4, 3, 4, 2, 4, 4, 3, 4, 3, 3, 1, 4, 3, 4, 4, 3, 3, 3, 3, 0, 2, 4, 1, 3, 2, 2]
 val_label: tensor([0, 1, 4, 1, 0, 4, 2, 3, 4, 2, 4, 1, 3, 1, 4, 4, 4, 3, 3, 0, 0, 1, 1, 4,
        0, 0, 4, 2, 4, 4, 3, 2, 3, 3, 1, 1, 3, 4, 1, 3, 2, 3, 3, 0, 2, 3, 1, 2,
        2, 3], device='cuda:0')
10.0
tensor(0.8669, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 1, 3, 2, 4, 1, 3, 1, 4, 2, 2, 4, 3, 2, 1, 2, 2, 2, 3, 1, 3, 3, 0, 1, 4, 1, 1, 4, 4, 4, 4, 3, 4, 4, 1, 4, 3, 2, 3, 3, 3, 

val_output: [1, 2, 2, 3, 4, 2, 1, 3, 1, 2, 3, 3, 4, 1, 1, 1, 3, 3, 3, 1, 4, 2, 3, 3, 1, 3, 4, 2, 1, 2, 3, 2, 3, 4, 3, 0, 3, 3, 3, 2, 4, 3, 4, 1, 1, 3, 1, 2, 3, 3]
 val_label: tensor([1, 2, 2, 2, 0, 2, 1, 1, 1, 1, 3, 3, 3, 0, 0, 3, 3, 3, 1, 1, 0, 0, 2, 2,
        0, 3, 3, 0, 3, 2, 3, 2, 2, 4, 3, 0, 3, 4, 0, 0, 0, 3, 4, 1, 1, 4, 1, 2,
        3, 3], device='cuda:0')
10.0
tensor(1.2129, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 1, 4, 0, 3, 2, 3, 2, 1, 2, 1, 4, 1, 3, 3, 3, 2, 2, 3, 2, 1, 1, 3, 3, 3, 3, 0, 0, 1, 3, 3, 2, 1, 4, 1, 4, 4, 2, 2, 3, 3, 4, 1, 4, 4, 1, 1, 3, 3]
 val_label: tensor([1, 0, 1, 3, 0, 3, 0, 3, 2, 1, 4, 1, 4, 0, 3, 3, 0, 0, 2, 4, 2, 3, 1, 3,
        3, 3, 2, 0, 0, 2, 1, 0, 0, 1, 3, 1, 4, 4, 1, 0, 3, 3, 4, 1, 4, 2, 1, 1,
        3, 3], device='cuda:0')
10.0
tensor(1.0675, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 4, 4, 4, 3, 3, 2, 0, 2, 3, 2, 1, 3, 4, 2, 4, 3, 4, 3, 3, 1, 1, 3, 2, 1, 4, 1, 3, 4, 3, 3, 4, 4, 4, 4, 4, 4, 1, 4, 2, 

tensor(1.1714, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 3, 3, 1, 4, 2, 3, 3, 2, 3, 3, 4, 2, 3, 2, 2, 1, 3, 4, 2, 4, 2, 0, 2, 3, 3, 3, 0, 3, 3, 4, 4, 1, 2, 3, 4, 3, 3, 4, 4, 2, 1, 2, 2, 4, 3, 2, 1]
 val_label: tensor([3, 4, 3, 4, 1, 1, 3, 2, 0, 1, 3, 2, 3, 4, 1, 3, 3, 0, 2, 3, 4, 2, 4, 2,
        0, 4, 1, 4, 4, 1, 2, 2, 4, 4, 1, 2, 2, 4, 3, 2, 4, 3, 1, 1, 2, 2, 3, 2,
        2, 0], device='cuda:0')
10.0
tensor(1.2334, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 1, 3, 4, 2, 3, 3, 4, 0, 4, 1, 3, 1, 3, 2, 4, 4, 0, 2, 4, 3, 3, 2, 4, 4, 3, 3, 4, 0, 3, 0, 2, 2, 3, 2, 1, 2, 3, 2, 2, 1, 3, 1, 2, 1, 3, 1, 0, 4]
 val_label: tensor([1, 2, 1, 3, 1, 1, 3, 3, 4, 0, 2, 1, 3, 0, 2, 4, 1, 1, 0, 2, 4, 2, 2, 2,
        4, 4, 3, 3, 4, 0, 4, 0, 2, 2, 3, 3, 0, 3, 3, 3, 2, 4, 2, 1, 2, 1, 1, 1,
        3, 1], device='cuda:0')
10.0
tensor(1.0164, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 4, 3, 3, 4, 0, 2, 3, 3, 3, 3, 2, 1, 3, 3, 4, 4, 3, 4, 4, 

val_output: [3, 4, 3, 2, 4, 3, 2, 1, 1, 2, 2, 1, 4, 1, 4, 3, 0, 3, 2, 1, 3, 3, 3, 3, 1, 4, 2, 2, 2, 3, 1, 3, 3, 4, 3, 4, 4, 1, 4, 3, 3, 3, 1, 4, 4, 4, 0, 2, 4, 4]
 val_label: tensor([3, 4, 3, 4, 4, 1, 1, 1, 1, 2, 1, 2, 3, 1, 4, 4, 0, 3, 2, 2, 1, 3, 4, 3,
        3, 4, 2, 3, 2, 1, 0, 1, 3, 2, 3, 4, 1, 1, 4, 4, 3, 3, 0, 4, 4, 0, 2, 2,
        3, 4], device='cuda:0')
10.0
tensor(1.1270, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 0, 2, 1, 0, 4, 3, 0, 3, 2, 4, 3, 4, 3, 4, 1, 3, 3, 2, 4, 2, 3, 3, 4, 4, 4, 2, 4, 1, 2, 0, 3, 3, 4, 2, 3, 4, 2, 1, 3, 3, 3, 4, 3, 2, 3, 1, 4, 1, 2]
 val_label: tensor([3, 0, 2, 2, 0, 0, 0, 0, 3, 2, 4, 3, 4, 3, 4, 1, 4, 3, 3, 4, 2, 3, 2, 4,
        1, 4, 2, 4, 1, 1, 0, 3, 3, 4, 1, 3, 4, 3, 1, 2, 4, 3, 4, 3, 4, 1, 1, 4,
        1, 2], device='cuda:0')
10.0
tensor(0.8544, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 3, 0, 2, 3, 3, 3, 2, 2, 3, 2, 2, 4, 3, 3, 3, 3, 4, 1, 3, 3, 0, 1, 3, 3, 3, 3, 3, 3, 3, 1, 0, 2, 0, 0, 3, 3, 3, 3, 

val_output: [2, 3, 3, 1, 3, 0, 3, 2, 4, 1, 3, 2, 4, 3, 3, 1, 4, 4, 3, 3, 4, 4, 2, 4, 4, 3, 1, 1, 2, 3, 3, 1, 2, 3, 1, 1, 3, 2, 4, 2, 1, 4, 4, 4, 2, 3, 2, 3, 3, 2]
 val_label: tensor([2, 3, 0, 0, 0, 0, 3, 2, 4, 1, 2, 3, 4, 1, 3, 1, 2, 3, 1, 3, 3, 4, 1, 3,
        4, 1, 1, 1, 2, 2, 4, 1, 2, 3, 0, 2, 3, 0, 1, 0, 1, 2, 1, 4, 2, 4, 0, 3,
        3, 2], device='cuda:0')
10.0
tensor(1.1585, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 2, 4, 4, 2, 4, 2, 3, 2, 2, 0, 4, 4, 0, 4, 3, 1, 1, 2, 3, 4, 4, 3, 1, 3, 2, 3, 4, 3, 3, 0, 2, 3, 4, 4, 3, 0, 0, 4, 3, 2, 3, 3, 2, 1, 3, 2, 0, 0]
 val_label: tensor([4, 1, 4, 4, 4, 2, 4, 1, 3, 0, 4, 0, 4, 4, 0, 3, 3, 3, 3, 2, 3, 1, 3, 4,
        3, 4, 2, 4, 0, 3, 2, 0, 2, 2, 4, 4, 0, 3, 0, 4, 2, 2, 4, 1, 4, 1, 3, 2,
        0, 0], device='cuda:0')
10.0
tensor(1.1580, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 2, 3, 2, 3, 1, 4, 4, 3, 1, 1, 3, 1, 3, 0, 4, 1, 3, 0, 4, 2, 4, 2, 4, 3, 3, 2, 1, 3, 0, 3, 1, 0, 3, 2, 2, 3, 2, 2, 

val_output: [2, 2, 2, 1, 1, 4, 2, 2, 3, 1, 3, 0, 2, 3, 2, 3, 4, 1, 3, 1, 2, 3, 2, 3, 3, 2, 2, 0, 4, 2, 3, 3, 2, 3, 4, 3, 2, 4, 2, 3, 1, 1, 4, 3, 1, 3, 1, 3, 3, 3]
 val_label: tensor([2, 2, 2, 1, 0, 4, 3, 4, 2, 1, 3, 0, 2, 1, 1, 2, 4, 1, 3, 3, 2, 1, 4, 2,
        2, 2, 0, 0, 1, 4, 3, 1, 2, 0, 4, 4, 2, 3, 3, 1, 2, 1, 3, 3, 3, 3, 0, 3,
        3, 3], device='cuda:0')
10.0
tensor(1.2058, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 3, 3, 3, 4, 4, 1, 1, 4, 2, 2, 4, 4, 0, 0, 4, 1, 0, 1, 2, 2, 4, 3, 4, 4, 3, 0, 3, 4, 1, 1, 2, 2, 4, 1, 4, 4, 2, 2, 3, 4, 3, 1, 4, 3, 4, 3, 3]
 val_label: tensor([1, 3, 2, 3, 0, 0, 4, 3, 1, 3, 3, 1, 0, 4, 4, 0, 0, 3, 1, 0, 1, 1, 0, 4,
        3, 4, 4, 1, 3, 3, 3, 3, 1, 2, 3, 4, 1, 4, 3, 3, 3, 3, 4, 3, 1, 2, 3, 4,
        3, 4], device='cuda:0')
10.0
tensor(1.0518, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 4, 3, 4, 4, 4, 3, 3, 2, 1, 3, 0, 0, 4, 3, 3, 4, 3, 1, 0, 1, 3, 3, 4, 4, 1, 2, 3, 1, 1, 4, 4, 0, 3, 2, 3, 3, 4, 2, 4, 

val_output: [2, 4, 2, 3, 3, 1, 4, 4, 2, 1, 4, 4, 3, 0, 3, 3, 2, 0, 4, 3, 2, 2, 3, 2, 4, 1, 3, 3, 2, 2, 3, 3, 3, 3, 3, 2, 3, 3, 4, 2, 2, 0, 3, 1, 0, 2, 3, 3, 1, 4]
 val_label: tensor([2, 4, 3, 0, 3, 1, 2, 4, 4, 3, 1, 3, 3, 0, 3, 3, 2, 0, 4, 3, 2, 2, 3, 2,
        1, 1, 4, 3, 2, 0, 3, 3, 3, 3, 2, 2, 3, 3, 4, 2, 2, 0, 3, 1, 0, 4, 3, 3,
        1, 4], device='cuda:0')
10.0
tensor(0.8654, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 1, 4, 1, 4, 3, 0, 2, 3, 3, 3, 2, 1, 0, 3, 3, 2, 2, 2, 4, 1, 2, 3, 4, 1, 4, 1, 3, 3, 2, 3, 3, 2, 1, 0, 3, 1, 2, 3, 3, 2, 2, 3, 4, 2, 3, 2, 3, 0]
 val_label: tensor([2, 4, 1, 3, 3, 1, 3, 1, 4, 1, 3, 3, 3, 1, 1, 1, 0, 2, 2, 2, 4, 2, 2, 1,
        4, 1, 0, 1, 4, 1, 3, 2, 3, 2, 1, 0, 3, 3, 4, 1, 1, 0, 2, 3, 1, 2, 3, 3,
        3, 0], device='cuda:0')
10.0
tensor(1.1563, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 1, 1, 3, 3, 2, 3, 1, 2, 0, 4, 4, 3, 4, 4, 1, 1, 3, 1, 3, 3, 4, 1, 3, 3, 0, 2, 3, 3, 4, 4, 4, 0, 3, 3, 1, 3, 3, 4, 

val_output: [1, 2, 2, 3, 2, 4, 3, 4, 4, 4, 4, 1, 2, 4, 4, 4, 3, 2, 1, 4, 4, 4, 4, 2, 2, 0, 4, 3, 3, 3, 3, 4, 3, 3, 4, 3, 3, 4, 2, 4, 4, 3, 2, 1, 3, 4, 1, 0, 1, 3]
 val_label: tensor([1, 2, 3, 4, 2, 0, 0, 4, 3, 1, 4, 3, 2, 4, 4, 0, 3, 2, 1, 4, 4, 3, 4, 2,
        3, 0, 4, 2, 1, 4, 1, 4, 3, 4, 4, 4, 0, 4, 2, 3, 4, 1, 2, 1, 3, 4, 0, 3,
        0, 3], device='cuda:0')
10.0
tensor(0.9854, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 3, 3, 2, 3, 1, 4, 3, 2, 3, 3, 4, 4, 4, 2, 4, 0, 4, 1, 1, 3, 2, 3, 3, 4, 2, 4, 3, 3, 2, 4, 3, 2, 4, 2, 4, 2, 3, 1, 4, 3, 2, 4, 2, 3, 0, 4, 2, 2]
 val_label: tensor([2, 4, 4, 3, 2, 2, 1, 4, 3, 2, 3, 3, 2, 4, 1, 0, 4, 0, 1, 0, 1, 4, 0, 3,
        3, 4, 2, 3, 1, 3, 1, 2, 3, 2, 4, 0, 4, 2, 3, 2, 0, 3, 1, 4, 2, 3, 0, 3,
        3, 2], device='cuda:0')
10.0
tensor(1.1460, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 4, 4, 3, 3, 0, 2, 2, 4, 2, 1, 3, 2, 0, 1, 0, 3, 4, 3, 2, 4, 2, 4, 4, 3, 1, 3, 4, 3, 1, 4, 3, 3, 2, 3, 1, 2, 2, 3, 1, 

tensor(0.9883, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 4, 0, 3, 3, 0, 4, 4, 2, 0, 3, 4, 2, 1, 3, 4, 4, 3, 2, 0, 4, 4, 3, 4, 0, 4, 3, 2, 4, 4, 2, 0, 1, 0, 2, 4, 2, 1, 4, 4, 1, 2, 2, 2, 4, 3, 3, 3, 3]
 val_label: tensor([4, 1, 3, 0, 3, 4, 3, 3, 4, 2, 0, 3, 4, 3, 0, 3, 4, 4, 4, 3, 0, 4, 4, 3,
        4, 0, 4, 4, 2, 4, 0, 3, 0, 1, 0, 0, 0, 2, 1, 3, 3, 1, 1, 1, 3, 4, 2, 4,
        2, 3], device='cuda:0')
10.0
tensor(1.1974, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 0, 3, 1, 3, 3, 3, 4, 2, 2, 0, 3, 4, 3, 3, 4, 4, 3, 3, 4, 3, 2, 1, 3, 3, 2, 3, 0, 3, 0, 3, 3, 4, 2, 1, 4, 4, 1, 1, 2, 1, 2, 1, 4, 3, 4, 3, 3, 4]
 val_label: tensor([4, 0, 0, 2, 3, 1, 3, 1, 4, 2, 3, 0, 4, 4, 3, 3, 1, 3, 3, 4, 3, 4, 4, 1,
        3, 2, 4, 4, 0, 3, 0, 4, 3, 4, 4, 1, 2, 1, 0, 1, 2, 1, 2, 2, 4, 3, 4, 3,
        2, 4], device='cuda:0')
10.0
tensor(1.1184, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 3, 4, 3, 2, 1, 4, 3, 1, 4, 0, 4, 3, 3, 3, 1, 4, 3, 3, 4, 

val_output: [3, 3, 4, 3, 4, 1, 3, 4, 2, 4, 3, 3, 2, 4, 4, 4, 2, 3, 2, 4, 4, 4, 0, 2, 4, 4, 4, 4, 3, 4, 3, 3, 3, 2, 4, 3, 2, 4, 2, 2, 2, 4, 1, 1, 2, 4, 3, 4, 3, 3]
 val_label: tensor([4, 3, 4, 1, 3, 2, 3, 3, 2, 4, 0, 3, 3, 0, 4, 4, 0, 4, 2, 2, 4, 4, 0, 0,
        4, 4, 0, 3, 3, 4, 3, 1, 3, 2, 0, 3, 0, 1, 2, 2, 2, 4, 1, 1, 2, 4, 2, 4,
        2, 3], device='cuda:0')
10.0
tensor(1.1978, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 1, 4, 4, 4, 3, 1, 4, 3, 3, 4, 3, 3, 3, 1, 1, 3, 2, 3, 2, 3, 3, 3, 2, 2, 4, 4, 2, 4, 4, 0, 3, 0, 4, 0, 4, 4, 0, 3, 2, 3, 1, 4, 4, 1, 3, 3, 4, 3]
 val_label: tensor([2, 2, 2, 2, 4, 4, 3, 1, 1, 2, 0, 4, 0, 3, 2, 1, 1, 4, 2, 1, 3, 3, 0, 3,
        3, 1, 4, 0, 2, 3, 4, 0, 1, 0, 3, 3, 3, 3, 1, 2, 2, 3, 1, 1, 4, 0, 2, 3,
        4, 3], device='cuda:0')
10.0
tensor(1.3668, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 0, 2, 3, 3, 4, 3, 3, 1, 4, 1, 4, 4, 0, 4, 2, 0, 4, 3, 3, 1, 4, 2, 4, 0, 3, 4, 1, 2, 2, 4, 3, 0, 3, 3, 3, 1, 4, 3, 3, 0, 

val_output: [2, 1, 3, 3, 1, 3, 4, 3, 1, 3, 3, 0, 0, 2, 3, 1, 2, 2, 4, 2, 2, 0, 3, 1, 3, 0, 3, 1, 2, 0, 4, 0, 3, 2, 3, 3, 2, 1, 4, 2, 2, 2, 3, 2, 1, 1, 3, 4, 4, 3]
 val_label: tensor([2, 2, 3, 3, 1, 3, 4, 2, 0, 3, 3, 0, 3, 2, 2, 1, 2, 2, 4, 3, 2, 0, 3, 1,
        3, 0, 3, 1, 3, 0, 4, 0, 3, 4, 3, 3, 3, 1, 0, 2, 2, 2, 3, 3, 1, 1, 2, 2,
        4, 3], device='cuda:0')
10.0
tensor(0.7966, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 1, 3, 3, 3, 3, 3, 2, 2, 3, 1, 2, 4, 3, 2, 3, 4, 4, 0, 3, 2, 3, 2, 2, 2, 4, 4, 3, 2, 2, 3, 2, 2, 1, 3, 4, 2, 4, 3, 3, 3, 2, 3, 4, 4, 4, 3, 2, 3]
 val_label: tensor([1, 1, 2, 3, 1, 3, 4, 0, 2, 2, 3, 1, 2, 4, 3, 2, 3, 0, 0, 0, 3, 0, 3, 2,
        2, 3, 1, 4, 4, 2, 1, 3, 2, 2, 1, 3, 4, 2, 4, 3, 0, 3, 2, 1, 0, 3, 1, 3,
        2, 3], device='cuda:0')
10.0
tensor(0.9577, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 1, 2, 3, 3, 2, 4, 2, 2, 2, 2, 3, 3, 0, 2, 4, 2, 3, 2, 4, 4, 2, 2, 3, 3, 2, 3, 0, 0, 3, 4, 2, 3, 1, 3, 3, 1, 4, 0, 4, 

tensor(1.1409, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 0, 4, 3, 3, 1, 1, 4, 3, 3, 1, 2, 4, 2, 2, 4, 4, 3, 2, 3, 3, 2, 0, 4, 4, 2, 3, 2, 3, 2, 3, 1, 4, 1, 3, 2, 3, 1, 4, 2, 3, 2, 1, 4, 2, 2, 0, 2, 3, 3]
 val_label: tensor([1, 0, 0, 2, 3, 1, 1, 3, 3, 2, 1, 2, 0, 2, 3, 4, 2, 2, 3, 1, 0, 1, 0, 3,
        4, 1, 3, 0, 0, 0, 1, 1, 3, 0, 3, 2, 3, 0, 4, 1, 3, 2, 3, 3, 3, 2, 0, 2,
        0, 3], device='cuda:0')
10.0
tensor(1.2389, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 4, 3, 2, 2, 4, 2, 1, 3, 4, 2, 1, 3, 1, 4, 2, 0, 4, 0, 1, 3, 1, 3, 1, 1, 3, 4, 2, 4, 3, 3, 2, 3, 1, 3, 1, 2, 3, 3, 4, 1, 3, 4, 4, 4, 3, 3, 3, 4]
 val_label: tensor([0, 2, 4, 3, 2, 2, 4, 2, 1, 2, 4, 1, 1, 1, 1, 3, 4, 0, 3, 0, 1, 2, 0, 4,
        0, 4, 1, 4, 2, 4, 4, 2, 4, 1, 1, 1, 1, 4, 3, 4, 3, 1, 3, 0, 3, 3, 1, 3,
        3, 4], device='cuda:0')
10.0
tensor(1.2693, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 4, 3, 2, 3, 2, 4, 4, 3, 3, 2, 2, 1, 4, 3, 2, 3, 3, 4, 2, 

val_output: [4, 1, 4, 3, 3, 3, 4, 3, 3, 1, 4, 3, 4, 4, 3, 4, 3, 4, 3, 4, 3, 2, 4, 2, 4, 2, 4, 4, 2, 3, 3, 3, 3, 3, 3, 1, 1, 0, 2, 4, 2, 0, 1, 2, 2, 3, 4, 3, 4, 4]
 val_label: tensor([4, 1, 4, 3, 3, 4, 3, 1, 1, 2, 3, 3, 4, 4, 4, 4, 2, 3, 3, 4, 4, 2, 4, 2,
        3, 2, 4, 4, 2, 3, 3, 3, 3, 2, 3, 1, 1, 0, 2, 4, 3, 2, 0, 2, 2, 3, 4, 3,
        4, 4], device='cuda:0')
10.0
tensor(0.9345, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 1, 3, 2, 1, 4, 3, 1, 3, 4, 3, 3, 1, 1, 1, 2, 4, 2, 2, 2, 1, 3, 2, 2, 1, 3, 4, 2, 2, 4, 3, 4, 1, 4, 2, 2, 3, 2, 1, 3, 4, 3, 1, 1, 4, 2, 1, 2, 4, 3]
 val_label: tensor([3, 1, 3, 1, 3, 4, 2, 1, 4, 3, 0, 3, 2, 1, 2, 2, 4, 2, 3, 2, 1, 3, 4, 2,
        1, 3, 0, 2, 2, 4, 3, 4, 2, 3, 2, 2, 2, 1, 3, 3, 0, 0, 1, 1, 4, 3, 1, 2,
        3, 1], device='cuda:0')
10.0
tensor(1.0130, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 3, 3, 3, 3, 4, 3, 3, 3, 3, 2, 2, 4, 4, 2, 3, 4, 1, 3, 1, 3, 4, 4, 1, 2, 0, 2, 3, 2, 3, 3, 3, 4, 4, 2, 3, 3, 3, 2, 

val_output: [1, 4, 4, 4, 3, 2, 4, 3, 4, 4, 3, 3, 4, 4, 2, 3, 4, 4, 3, 2, 4, 4, 1, 1, 2, 4, 3, 2, 1, 3, 3, 3, 4, 3, 0, 2, 3, 3, 4, 3, 4, 3, 3, 3, 4, 2, 3, 3, 3, 4]
 val_label: tensor([3, 4, 4, 4, 0, 2, 4, 2, 4, 4, 3, 3, 3, 2, 3, 0, 3, 0, 3, 3, 4, 4, 1, 1,
        4, 4, 4, 2, 1, 0, 1, 4, 2, 0, 0, 4, 0, 3, 4, 3, 4, 3, 3, 3, 4, 2, 3, 4,
        3, 3], device='cuda:0')
10.0
tensor(1.0568, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 0, 3, 1, 2, 2, 3, 2, 4, 3, 2, 2, 3, 3, 4, 1, 3, 4, 3, 1, 4, 4, 4, 3, 3, 1, 1, 4, 3, 0, 3, 0, 4, 4, 3, 3, 2, 2, 4, 4, 0, 3, 4, 3, 2, 4, 1, 4, 3, 4]
 val_label: tensor([1, 1, 3, 3, 2, 2, 2, 2, 3, 4, 3, 2, 1, 3, 2, 1, 4, 2, 2, 1, 1, 3, 4, 4,
        0, 1, 3, 4, 0, 1, 3, 0, 4, 3, 3, 4, 0, 2, 4, 4, 0, 3, 4, 2, 2, 3, 1, 4,
        2, 4], device='cuda:0')
10.0
tensor(1.2017, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 1, 2, 4, 0, 3, 3, 0, 0, 3, 0, 1, 2, 2, 2, 0, 3, 2, 3, 0, 2, 2, 4, 2, 2, 0, 4, 4, 3, 3, 4, 3, 4, 2, 2, 2, 2, 1, 2, 3, 

val_output: [1, 2, 4, 2, 3, 4, 4, 4, 4, 3, 4, 4, 4, 4, 0, 4, 4, 3, 2, 3, 1, 0, 3, 4, 1, 3, 4, 2, 4, 3, 1, 4, 4, 4, 2, 2, 4, 2, 3, 2, 3, 1, 0, 4, 2, 3, 0, 3, 4, 4]
 val_label: tensor([1, 2, 4, 2, 0, 3, 4, 0, 0, 1, 0, 2, 4, 4, 2, 4, 4, 2, 2, 3, 0, 0, 2, 4,
        1, 3, 0, 4, 4, 3, 0, 1, 3, 3, 3, 2, 1, 2, 3, 2, 3, 1, 0, 3, 3, 2, 0, 3,
        4, 4], device='cuda:0')
10.0
tensor(1.1652, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 2, 2, 2, 1, 4, 4, 4, 4, 3, 3, 3, 4, 3, 3, 2, 2, 3, 3, 4, 3, 4, 2, 3, 1, 3, 2, 2, 4, 4, 3, 1, 0, 2, 0, 3, 3, 4, 4, 4, 3, 1, 3, 3, 0, 4, 3, 0, 1]
 val_label: tensor([4, 3, 1, 2, 2, 0, 4, 3, 3, 4, 3, 3, 2, 4, 3, 0, 2, 3, 3, 3, 3, 3, 3, 4,
        3, 1, 3, 2, 2, 4, 4, 3, 1, 0, 0, 0, 3, 3, 4, 3, 2, 3, 1, 1, 4, 0, 3, 1,
        0, 1], device='cuda:0')
10.0
tensor(0.9264, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 2, 2, 2, 1, 3, 3, 4, 4, 2, 3, 4, 1, 2, 3, 2, 4, 3, 3, 2, 3, 4, 3, 3, 2, 2, 2, 1, 1, 1, 2, 3, 1, 3, 3, 4, 4, 1, 1, 4, 

val_output: [4, 4, 2, 4, 4, 0, 1, 3, 3, 4, 3, 3, 4, 1, 1, 2, 0, 4, 0, 3, 3, 1, 3, 2, 4, 4, 1, 1, 4, 2, 3, 0, 1, 3, 1, 2, 3, 2, 4, 4, 4, 1, 1, 3, 3, 2, 2, 2, 0, 4]
 val_label: tensor([4, 4, 2, 4, 4, 0, 3, 3, 1, 2, 3, 3, 4, 1, 1, 0, 0, 4, 0, 2, 3, 0, 4, 4,
        4, 4, 0, 1, 3, 2, 4, 0, 1, 3, 1, 2, 3, 2, 4, 3, 4, 1, 1, 1, 1, 1, 2, 3,
        0, 4], device='cuda:0')
10.0
tensor(0.8033, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 3, 3, 3, 3, 2, 1, 3, 3, 0, 0, 3, 2, 2, 1, 1, 3, 2, 3, 1, 3, 2, 2, 2, 3, 2, 1, 1, 2, 2, 4, 3, 3, 2, 2, 0, 1, 2, 1, 2, 3, 1, 3, 4, 2, 2, 1, 3, 4]
 val_label: tensor([2, 1, 3, 0, 3, 1, 0, 1, 3, 3, 0, 0, 3, 4, 0, 0, 1, 3, 3, 2, 0, 3, 2, 2,
        2, 3, 2, 1, 3, 2, 2, 2, 1, 3, 4, 3, 0, 1, 2, 0, 2, 4, 1, 3, 4, 3, 3, 1,
        3, 0], device='cuda:0')
10.0
tensor(1.0551, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 2, 1, 3, 4, 2, 4, 0, 1, 3, 2, 4, 2, 3, 3, 1, 4, 0, 1, 3, 4, 4, 4, 2, 1, 3, 2, 4, 4, 4, 3, 2, 2, 4, 1, 3, 4, 4, 2, 2, 

val_output: [3, 3, 3, 4, 2, 1, 3, 4, 3, 3, 3, 3, 2, 1, 4, 2, 2, 3, 0, 4, 1, 4, 3, 4, 0, 4, 2, 3, 2, 3, 2, 0, 3, 1, 3, 2, 4, 2, 4, 4, 4, 0, 4, 1, 3, 1, 2, 2, 2, 1]
 val_label: tensor([1, 0, 3, 4, 2, 0, 3, 2, 3, 3, 3, 1, 3, 3, 4, 3, 2, 3, 0, 4, 1, 4, 3, 0,
        0, 4, 2, 3, 3, 2, 2, 4, 1, 0, 2, 2, 4, 0, 4, 4, 4, 0, 3, 1, 0, 1, 1, 3,
        2, 1], device='cuda:0')
10.0
tensor(1.0220, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 0, 3, 3, 3, 2, 4, 3, 1, 4, 2, 2, 2, 2, 1, 2, 2, 2, 2, 1, 2, 3, 4, 3, 0, 3, 2, 2, 4, 1, 3, 2, 0, 3, 1, 2, 4, 2, 2, 3, 3, 2, 3, 4, 1, 3, 4, 2, 3]
 val_label: tensor([1, 4, 0, 4, 2, 1, 3, 4, 0, 3, 0, 1, 2, 2, 2, 1, 4, 2, 3, 3, 1, 0, 2, 2,
        1, 0, 3, 3, 4, 4, 0, 1, 2, 0, 2, 1, 3, 4, 1, 2, 3, 4, 4, 3, 3, 1, 3, 3,
        4, 3], device='cuda:0')
10.0
tensor(1.1513, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 4, 4, 3, 3, 3, 3, 3, 4, 3, 4, 1, 2, 1, 2, 1, 4, 3, 4, 2, 3, 0, 1, 3, 3, 3, 4, 3, 3, 1, 2, 0, 4, 4, 4, 2, 2, 3, 2, 

val_output: [3, 2, 4, 2, 4, 2, 2, 1, 4, 3, 3, 2, 1, 3, 3, 3, 2, 4, 0, 0, 0, 3, 2, 2, 2, 1, 2, 3, 4, 3, 0, 2, 2, 2, 4, 3, 4, 2, 2, 4, 3, 0, 1, 3, 3, 3, 2, 4, 3, 3]
 val_label: tensor([2, 2, 4, 2, 4, 0, 3, 1, 0, 3, 1, 3, 0, 3, 2, 3, 2, 4, 0, 0, 0, 4, 2, 4,
        2, 1, 0, 1, 4, 4, 3, 2, 3, 2, 4, 2, 3, 4, 0, 4, 3, 0, 1, 3, 1, 3, 2, 4,
        3, 3], device='cuda:0')
10.0
tensor(0.9887, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 2, 0, 1, 3, 3, 4, 3, 0, 3, 2, 3, 2, 3, 2, 0, 4, 2, 3, 1, 3, 4, 3, 4, 3, 2, 2, 2, 3, 3, 3, 3, 3, 0, 3, 3, 1, 2, 3, 4, 3, 4, 4, 3, 1, 3, 3, 4]
 val_label: tensor([4, 3, 1, 2, 0, 1, 1, 4, 4, 0, 3, 3, 2, 4, 2, 3, 2, 0, 0, 2, 1, 0, 4, 0,
        3, 4, 4, 3, 2, 1, 3, 1, 3, 3, 2, 0, 2, 1, 0, 2, 3, 3, 3, 4, 2, 1, 2, 3,
        4, 2], device='cuda:0')
10.0
tensor(1.1509, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 0, 0, 4, 3, 3, 4, 4, 3, 2, 4, 4, 3, 3, 2, 4, 2, 0, 3, 4, 0, 4, 3, 1, 4, 3, 3, 2, 4, 3, 3, 0, 0, 1, 2, 2, 2, 4, 4, 1, 2, 

val_output: [0, 1, 2, 2, 3, 2, 3, 3, 4, 4, 2, 2, 2, 2, 0, 2, 3, 0, 3, 4, 3, 3, 0, 3, 2, 4, 2, 2, 3, 4, 3, 1, 3, 4, 4, 1, 2, 3, 2, 0, 3, 3, 4, 2, 3, 4, 3, 1, 1, 3]
 val_label: tensor([0, 1, 3, 3, 1, 3, 2, 3, 3, 3, 2, 3, 2, 2, 0, 2, 3, 0, 3, 4, 0, 4, 0, 1,
        3, 4, 4, 2, 2, 4, 2, 0, 3, 4, 3, 0, 2, 3, 0, 2, 0, 3, 4, 3, 4, 4, 0, 3,
        3, 3], device='cuda:0')
10.0
tensor(1.0371, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 4, 2, 0, 2, 3, 3, 0, 4, 4, 3, 0, 3, 3, 0, 4, 2, 4, 4, 2, 4, 4, 4, 2, 2, 4, 3, 1, 3, 2, 1, 1, 1, 3, 3, 3, 2, 3, 4, 4, 4, 3, 1, 3, 0, 1, 3, 1, 3]
 val_label: tensor([4, 4, 4, 2, 0, 4, 2, 1, 0, 4, 4, 3, 0, 3, 0, 0, 4, 1, 0, 4, 2, 4, 2, 4,
        2, 2, 4, 1, 0, 4, 2, 3, 3, 1, 3, 3, 0, 2, 3, 4, 4, 4, 2, 1, 2, 0, 1, 2,
        3, 3], device='cuda:0')
10.0
tensor(0.8061, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 1, 2, 2, 2, 1, 0, 4, 2, 2, 4, 2, 3, 3, 1, 2, 4, 2, 1, 3, 3, 0, 4, 3, 3, 1, 3, 3, 2, 3, 3, 1, 4, 1, 4, 0, 1, 3, 2, 3, 

val_output: [2, 1, 4, 3, 3, 4, 4, 2, 2, 0, 3, 2, 3, 0, 3, 3, 3, 4, 3, 3, 3, 3, 4, 2, 0, 4, 3, 3, 2, 3, 4, 4, 4, 4, 4, 2, 4, 4, 2, 3, 4, 3, 3, 3, 1, 3, 2, 2, 0, 4]
 val_label: tensor([4, 1, 4, 3, 3, 4, 1, 2, 0, 0, 3, 3, 1, 0, 1, 3, 3, 4, 3, 1, 2, 1, 4, 3,
        3, 4, 0, 2, 2, 3, 4, 3, 0, 3, 4, 2, 0, 0, 3, 1, 3, 3, 3, 3, 1, 4, 2, 2,
        0, 4], device='cuda:0')
10.0
tensor(1.2985, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 4, 0, 3, 1, 3, 2, 0, 2, 4, 3, 3, 2, 4, 4, 1, 1, 3, 1, 2, 3, 3, 2, 2, 2, 2, 3, 1, 3, 2, 2, 4, 2, 4, 4, 3, 3, 2, 2, 1, 1, 2, 4, 1, 2, 3, 3, 3]
 val_label: tensor([0, 1, 2, 2, 0, 1, 0, 0, 2, 0, 2, 0, 3, 1, 2, 4, 4, 1, 1, 3, 1, 3, 1, 3,
        2, 2, 2, 3, 4, 3, 3, 1, 2, 3, 2, 4, 0, 3, 3, 3, 2, 1, 1, 2, 4, 1, 2, 3,
        2, 2], device='cuda:0')
10.0
tensor(0.9772, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 4, 3, 1, 0, 4, 2, 3, 3, 3, 4, 3, 3, 3, 3, 1, 3, 2, 2, 3, 3, 3, 2, 4, 3, 0, 4, 3, 4, 4, 2, 2, 3, 4, 1, 0, 3, 2, 3, 3, 

tensor(1.0429, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 1, 4, 1, 2, 2, 2, 3, 4, 4, 2, 4, 3, 2, 2, 4, 4, 3, 1, 3, 2, 2, 3, 4, 2, 1, 2, 2, 4, 1, 1, 2, 1, 1, 3, 3, 2, 2, 2, 4, 3, 3, 0, 3, 1, 1, 2, 2]
 val_label: tensor([1, 3, 1, 1, 4, 3, 2, 1, 2, 2, 0, 4, 3, 4, 1, 3, 1, 1, 0, 3, 1, 1, 4, 1,
        1, 4, 2, 1, 3, 2, 3, 2, 1, 2, 1, 1, 2, 2, 4, 2, 2, 0, 3, 2, 0, 3, 0, 2,
        0, 2], device='cuda:0')
10.0
tensor(1.2317, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 3, 4, 2, 3, 4, 1, 2, 2, 3, 3, 4, 3, 3, 3, 2, 4, 4, 2, 3, 1, 2, 3, 1, 0, 3, 4, 1, 2, 4, 2, 3, 4, 2, 3, 3, 2, 3, 2, 3, 1, 3, 2, 3, 0, 3, 2, 3, 3]
 val_label: tensor([1, 0, 3, 4, 3, 3, 4, 1, 2, 2, 3, 1, 0, 3, 0, 2, 2, 0, 0, 2, 4, 1, 3, 3,
        2, 0, 1, 2, 3, 2, 4, 2, 2, 3, 2, 3, 3, 4, 2, 2, 4, 1, 4, 4, 1, 0, 3, 2,
        3, 2], device='cuda:0')
10.0
tensor(1.0389, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 3, 1, 2, 1, 1, 1, 3, 2, 3, 1, 0, 4, 2, 4, 3, 3, 3, 3, 

val_output: [1, 4, 2, 2, 4, 3, 1, 3, 3, 1, 3, 3, 3, 2, 0, 2, 4, 4, 2, 3, 3, 4, 3, 3, 1, 4, 4, 4, 3, 2, 4, 2, 2, 4, 3, 1, 0, 3, 3, 4, 3, 3, 4, 3, 0, 3, 3, 3, 4, 3]
 val_label: tensor([0, 4, 2, 1, 0, 0, 1, 1, 1, 1, 3, 3, 2, 4, 0, 2, 3, 4, 2, 3, 1, 4, 3, 2,
        1, 4, 4, 3, 3, 2, 3, 2, 2, 0, 2, 1, 0, 0, 2, 4, 0, 2, 4, 0, 0, 2, 3, 3,
        1, 3], device='cuda:0')
10.0
tensor(1.0885, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 2, 3, 3, 3, 4, 0, 1, 2, 1, 1, 2, 2, 3, 2, 4, 0, 4, 1, 1, 1, 2, 3, 1, 4, 2, 0, 4, 4, 3, 2, 3, 3, 1, 3, 3, 4, 2, 3, 3, 0, 4, 3, 1, 1, 3, 4, 2, 3]
 val_label: tensor([1, 3, 1, 3, 3, 3, 4, 3, 2, 2, 1, 1, 2, 4, 1, 2, 4, 0, 3, 1, 2, 1, 2, 4,
        1, 2, 3, 0, 4, 4, 3, 2, 1, 3, 0, 1, 1, 4, 2, 3, 3, 0, 2, 3, 1, 1, 0, 4,
        0, 3], device='cuda:0')
10.0
tensor(1.0032, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 3, 3, 3, 0, 4, 1, 2, 1, 1, 1, 3, 2, 1, 3, 4, 4, 3, 0, 1, 4, 4, 1, 1, 3, 0, 3, 4, 2, 4, 3, 4, 3, 2, 2, 3, 3, 4, 1, 

val_output: [3, 1, 3, 2, 1, 1, 0, 3, 0, 3, 1, 0, 3, 4, 1, 3, 3, 3, 3, 2, 0, 3, 1, 4, 4, 3, 4, 1, 3, 1, 4, 3, 1, 1, 3, 0, 3, 2, 2, 1, 4, 4, 3, 3, 2, 0, 0, 1, 4, 3]
 val_label: tensor([3, 3, 0, 3, 2, 3, 0, 2, 0, 0, 1, 0, 3, 4, 2, 1, 1, 2, 4, 2, 1, 3, 1, 4,
        4, 3, 4, 1, 1, 1, 4, 0, 0, 3, 1, 0, 3, 2, 0, 1, 3, 4, 4, 3, 2, 0, 0, 1,
        4, 4], device='cuda:0')
10.0
tensor(1.1066, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 2, 4, 2, 0, 2, 3, 2, 4, 4, 1, 2, 3, 4, 3, 4, 3, 0, 3, 4, 1, 4, 1, 2, 4, 2, 3, 1, 4, 3, 4, 3, 4, 1, 3, 3, 2, 3, 1, 1, 3, 3, 3, 0, 1, 3, 4, 0]
 val_label: tensor([3, 2, 3, 3, 4, 2, 0, 3, 3, 2, 3, 4, 1, 2, 1, 3, 3, 4, 4, 3, 2, 2, 1, 3,
        1, 3, 0, 0, 1, 1, 0, 4, 4, 3, 4, 1, 3, 3, 2, 4, 2, 3, 3, 3, 1, 0, 0, 1,
        4, 3], device='cuda:0')
10.0
tensor(0.9637, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 1, 3, 3, 3, 2, 1, 2, 3, 3, 2, 2, 4, 3, 3, 3, 4, 2, 4, 4, 1, 1, 3, 1, 4, 2, 0, 2, 1, 3, 3, 4, 2, 1, 4, 1, 3, 1, 2, 

tensor(1.0544, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 4, 4, 2, 4, 3, 1, 3, 3, 4, 1, 4, 4, 4, 4, 2, 3, 4, 2, 2, 2, 1, 3, 2, 2, 4, 4, 3, 2, 3, 2, 2, 2, 0, 3, 4, 1, 3, 2, 1, 2, 1, 1, 3, 1, 4, 4, 3]
 val_label: tensor([4, 4, 3, 4, 4, 2, 4, 2, 1, 3, 0, 4, 3, 2, 3, 4, 4, 2, 3, 2, 2, 4, 2, 1,
        3, 2, 2, 0, 3, 3, 2, 3, 3, 2, 2, 0, 3, 4, 1, 3, 2, 3, 2, 2, 1, 0, 1, 0,
        4, 1], device='cuda:0')
10.0
tensor(0.9793, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 3, 3, 3, 3, 2, 4, 2, 4, 3, 3, 3, 2, 3, 1, 3, 3, 2, 2, 1, 4, 2, 3, 4, 3, 0, 1, 4, 4, 4, 4, 3, 4, 0, 2, 1, 2, 1, 4, 3, 1, 0, 3, 2, 4, 4, 3, 0, 4]
 val_label: tensor([0, 0, 0, 3, 1, 1, 2, 2, 2, 4, 0, 3, 4, 0, 3, 1, 0, 3, 2, 2, 3, 4, 3, 1,
        4, 2, 0, 0, 4, 4, 4, 3, 3, 4, 0, 2, 1, 2, 1, 3, 3, 1, 0, 2, 1, 4, 4, 2,
        1, 4], device='cuda:0')
10.0
tensor(1.0384, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 0, 2, 3, 4, 3, 4, 3, 3, 3, 1, 0, 0, 2, 3, 4, 3, 2, 4, 1, 

val_output: [0, 3, 4, 1, 4, 4, 0, 4, 4, 1, 3, 2, 3, 0, 1, 2, 3, 1, 3, 4, 3, 3, 2, 3, 3, 0, 4, 2, 4, 4, 2, 4, 2, 4, 2, 1, 2, 0, 3, 2, 4, 4, 4, 4, 3, 4, 3, 3, 3, 3]
 val_label: tensor([1, 3, 3, 1, 3, 4, 0, 4, 4, 2, 4, 0, 4, 0, 1, 1, 1, 3, 3, 0, 2, 0, 3, 4,
        3, 0, 4, 2, 4, 3, 2, 4, 4, 3, 1, 1, 1, 0, 3, 3, 3, 4, 4, 4, 3, 4, 3, 3,
        3, 3], device='cuda:0')
10.0
tensor(1.0999, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 4, 1, 3, 2, 4, 1, 3, 3, 3, 4, 2, 4, 4, 4, 3, 3, 4, 3, 3, 4, 0, 3, 4, 2, 3, 2, 4, 3, 4, 3, 0, 2, 3, 3, 3, 3, 1, 4, 0, 3, 0, 0, 1, 2, 0, 3, 3, 2]
 val_label: tensor([2, 0, 3, 1, 3, 0, 2, 1, 3, 3, 3, 4, 4, 3, 0, 4, 1, 1, 4, 0, 3, 4, 0, 3,
        4, 2, 0, 3, 4, 0, 0, 3, 0, 2, 1, 0, 3, 2, 3, 2, 0, 0, 0, 0, 1, 3, 0, 2,
        3, 2], device='cuda:0')
10.0
tensor(1.0990, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 4, 1, 3, 3, 3, 4, 4, 2, 3, 3, 4, 3, 3, 3, 4, 4, 3, 1, 3, 2, 4, 3, 3, 3, 1, 3, 3, 2, 1, 3, 3, 2, 3, 2, 4, 1, 0, 3, 

val_output: [3, 2, 2, 1, 2, 2, 2, 3, 1, 2, 3, 2, 4, 2, 1, 3, 2, 3, 1, 2, 3, 3, 4, 4, 0, 4, 3, 2, 1, 1, 3, 1, 0, 2, 3, 2, 3, 4, 2, 3, 4, 3, 0, 1, 3, 3, 3, 4, 3, 2]
 val_label: tensor([4, 0, 2, 3, 2, 2, 1, 3, 3, 3, 1, 2, 4, 4, 3, 3, 2, 3, 1, 2, 3, 3, 3, 4,
        0, 4, 3, 4, 3, 0, 3, 0, 0, 2, 4, 2, 0, 4, 1, 4, 4, 3, 0, 1, 3, 3, 1, 4,
        1, 2], device='cuda:0')
10.0
tensor(1.0149, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 3, 3, 2, 1, 3, 3, 3, 1, 1, 2, 1, 0, 4, 2, 1, 3, 1, 3, 1, 1, 0, 3, 2, 4, 1, 3, 4, 4, 1, 4, 3, 0, 4, 3, 3, 1, 0, 3, 2, 3, 4, 3, 4, 1, 4, 0, 1, 4]
 val_label: tensor([4, 0, 3, 4, 3, 1, 3, 2, 3, 4, 1, 2, 4, 1, 4, 3, 1, 0, 1, 1, 3, 1, 0, 1,
        2, 4, 1, 0, 4, 4, 1, 4, 3, 0, 4, 3, 4, 3, 0, 1, 2, 1, 4, 1, 4, 2, 4, 0,
        1, 4], device='cuda:0')
10.0
tensor(1.1034, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 1, 4, 0, 2, 3, 1, 0, 2, 4, 3, 4, 4, 3, 1, 4, 2, 2, 2, 3, 3, 3, 4, 1, 4, 4, 2, 1, 3, 3, 1, 2, 3, 1, 2, 3, 2, 1, 2, 

val_output: [2, 3, 2, 2, 1, 2, 2, 2, 2, 4, 2, 3, 2, 4, 2, 3, 4, 1, 1, 3, 3, 3, 2, 2, 3, 3, 1, 3, 0, 3, 2, 2, 2, 4, 4, 2, 3, 1, 3, 2, 4, 2, 3, 3, 1, 2, 3, 2, 3, 4]
 val_label: tensor([2, 3, 4, 3, 2, 0, 3, 0, 2, 2, 2, 4, 0, 4, 2, 1, 2, 0, 2, 3, 3, 4, 2, 2,
        3, 4, 3, 0, 0, 3, 4, 2, 2, 4, 4, 2, 0, 0, 4, 2, 1, 2, 4, 4, 3, 3, 2, 2,
        0, 4], device='cuda:0')
10.0
tensor(1.2661, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 3, 3, 3, 1, 3, 3, 3, 1, 4, 3, 3, 4, 3, 2, 2, 4, 2, 0, 3, 2, 3, 0, 4, 0, 3, 1, 2, 4, 4, 4, 3, 2, 3, 4, 3, 2, 1, 3, 2, 3, 1, 2, 2, 1, 2, 3, 1, 4]
 val_label: tensor([1, 2, 0, 3, 3, 1, 2, 3, 3, 1, 4, 3, 2, 4, 3, 2, 3, 2, 2, 0, 3, 2, 0, 3,
        4, 0, 2, 0, 2, 3, 4, 4, 3, 0, 1, 2, 3, 2, 1, 3, 2, 3, 1, 2, 2, 0, 1, 3,
        3, 4], device='cuda:0')
10.0
tensor(0.9547, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 4, 1, 3, 2, 4, 2, 2, 2, 3, 2, 0, 1, 3, 4, 1, 1, 0, 4, 3, 3, 4, 1, 1, 2, 0, 2, 2, 4, 4, 2, 2, 4, 3, 2, 3, 3, 3, 1, 3, 1, 

val_output: [4, 2, 4, 1, 4, 3, 2, 2, 3, 3, 4, 0, 3, 3, 4, 2, 3, 4, 3, 3, 3, 2, 3, 4, 3, 2, 3, 4, 1, 2, 1, 4, 1, 1, 4, 3, 2, 4, 3, 2, 3, 4, 4, 1, 2, 3, 1, 1, 0, 0]
 val_label: tensor([1, 2, 4, 1, 3, 1, 2, 3, 3, 3, 4, 0, 3, 3, 3, 2, 3, 3, 2, 3, 3, 3, 0, 2,
        4, 4, 2, 3, 0, 2, 1, 4, 1, 1, 2, 3, 4, 3, 4, 2, 3, 3, 4, 4, 2, 1, 1, 1,
        0, 1], device='cuda:0')
10.0
tensor(1.1356, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 3, 3, 3, 3, 4, 3, 3, 1, 1, 2, 3, 2, 4, 4, 3, 4, 2, 3, 2, 3, 1, 3, 4, 0, 3, 3, 1, 4, 1, 3, 3, 4, 3, 3, 3, 3, 2, 3, 2, 4, 3, 1, 3, 1, 1, 1, 2, 3, 4]
 val_label: tensor([0, 1, 2, 2, 2, 4, 2, 1, 1, 3, 3, 3, 2, 4, 4, 3, 2, 1, 3, 2, 3, 3, 3, 1,
        0, 4, 2, 1, 3, 1, 3, 3, 1, 3, 0, 0, 1, 1, 1, 2, 3, 3, 1, 3, 1, 3, 1, 4,
        3, 4], device='cuda:0')
10.0
tensor(1.2667, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 2, 3, 4, 3, 4, 3, 1, 3, 1, 0, 4, 4, 2, 3, 1, 3, 4, 2, 1, 3, 4, 2, 4, 4, 3, 1, 0, 1, 4, 3, 1, 1, 4, 4, 1, 3, 3, 2, 

val_output: [4, 1, 3, 4, 3, 2, 4, 1, 3, 3, 1, 2, 3, 3, 4, 1, 2, 2, 2, 3, 2, 4, 3, 4, 4, 0, 1, 2, 3, 4, 2, 3, 3, 2, 4, 1, 1, 4, 1, 4, 2, 4, 4, 1, 1, 2, 1, 3, 4, 1]
 val_label: tensor([4, 2, 3, 1, 3, 1, 4, 1, 3, 3, 1, 2, 3, 0, 4, 2, 2, 2, 2, 2, 2, 0, 4, 3,
        2, 2, 1, 2, 3, 4, 2, 2, 3, 2, 4, 1, 0, 4, 1, 4, 0, 4, 3, 2, 0, 4, 1, 0,
        3, 1], device='cuda:0')
10.0
tensor(1.2138, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 4, 0, 3, 2, 3, 4, 2, 4, 1, 0, 0, 1, 2, 3, 3, 4, 2, 1, 2, 1, 4, 4, 2, 3, 3, 3, 2, 3, 1, 1, 4, 3, 3, 3, 1, 2, 2, 0, 1, 2, 2, 1, 1, 0, 3, 2, 1, 3]
 val_label: tensor([4, 4, 4, 0, 1, 4, 3, 4, 1, 4, 0, 0, 0, 1, 3, 3, 0, 4, 2, 1, 2, 1, 0, 4,
        0, 3, 3, 3, 4, 4, 3, 0, 2, 3, 3, 3, 4, 1, 2, 0, 0, 2, 2, 1, 1, 0, 3, 2,
        1, 4], device='cuda:0')
10.0
tensor(1.0687, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 2, 3, 4, 3, 2, 4, 4, 3, 3, 1, 4, 3, 3, 3, 4, 4, 2, 0, 2, 4, 2, 4, 3, 3, 1, 3, 0, 3, 1, 4, 4, 3, 2, 2, 4, 3, 4, 2, 4, 

val_output: [3, 3, 3, 3, 3, 4, 3, 3, 4, 2, 3, 1, 2, 4, 3, 2, 1, 3, 2, 3, 1, 3, 4, 1, 3, 4, 2, 3, 1, 4, 3, 0, 1, 4, 3, 0, 3, 1, 3, 1, 2, 2, 3, 1, 2, 1, 3, 1, 4, 0]
 val_label: tensor([3, 3, 3, 3, 1, 4, 2, 1, 4, 2, 3, 1, 2, 4, 2, 2, 3, 3, 2, 3, 1, 4, 2, 1,
        3, 0, 2, 2, 3, 4, 0, 0, 1, 2, 1, 0, 3, 1, 4, 1, 2, 2, 0, 4, 0, 0, 3, 3,
        1, 0], device='cuda:0')
10.0
tensor(1.0584, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 3, 1, 3, 3, 3, 1, 3, 3, 3, 2, 3, 3, 3, 1, 0, 4, 2, 1, 2, 3, 3, 3, 4, 1, 3, 3, 2, 0, 1, 2, 2, 2, 1, 4, 3, 0, 4, 1, 3, 4, 0, 4, 2, 3, 2, 3, 0, 0, 3]
 val_label: tensor([0, 3, 1, 4, 3, 0, 1, 1, 4, 0, 0, 3, 0, 3, 1, 0, 4, 2, 1, 2, 4, 1, 0, 4,
        3, 0, 1, 2, 0, 0, 2, 2, 2, 1, 4, 2, 4, 3, 4, 1, 4, 0, 4, 2, 3, 2, 3, 0,
        2, 3], device='cuda:0')
10.0
tensor(1.0211, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 4, 3, 4, 4, 3, 3, 4, 4, 4, 4, 4, 4, 3, 2, 3, 1, 3, 3, 3, 2, 2, 3, 3, 3, 4, 4, 3, 2, 4, 3, 2, 4, 2, 0, 4, 2, 3, 4, 4, 

val_output: [3, 1, 4, 1, 2, 2, 2, 1, 2, 2, 3, 0, 3, 3, 2, 3, 1, 4, 2, 1, 2, 4, 3, 1, 2, 1, 4, 4, 3, 3, 2, 2, 3, 1, 2, 3, 2, 3, 2, 1, 1, 1, 3, 3, 4, 1, 2, 2, 4, 2]
 val_label: tensor([3, 3, 4, 1, 2, 2, 3, 1, 2, 2, 0, 0, 1, 3, 2, 4, 1, 4, 2, 1, 2, 4, 3, 0,
        2, 1, 3, 2, 3, 3, 3, 0, 2, 3, 2, 3, 2, 1, 2, 3, 1, 0, 3, 3, 4, 3, 2, 2,
        3, 0], device='cuda:0')
10.0
tensor(1.0760, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 0, 1, 3, 3, 2, 2, 4, 3, 3, 2, 0, 1, 3, 3, 3, 1, 4, 2, 3, 0, 4, 3, 0, 4, 4, 1, 1, 1, 4, 2, 4, 4, 1, 3, 3, 2, 1, 0, 3, 3, 2, 3, 2, 2, 3, 4, 1]
 val_label: tensor([4, 3, 4, 3, 0, 3, 4, 3, 3, 4, 3, 3, 2, 0, 1, 3, 1, 4, 1, 1, 4, 4, 3, 4,
        1, 0, 3, 3, 1, 1, 1, 3, 2, 4, 4, 3, 3, 3, 2, 1, 3, 3, 4, 2, 0, 2, 2, 1,
        2, 1], device='cuda:0')
10.0
tensor(1.1867, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 0, 0, 3, 2, 3, 3, 2, 3, 3, 3, 4, 4, 3, 0, 2, 4, 0, 4, 3, 4, 4, 3, 1, 4, 4, 4, 4, 2, 3, 3, 3, 1, 3, 4, 3, 3, 4, 2, 0, 0, 

val_output: [2, 3, 0, 0, 3, 3, 3, 3, 4, 4, 0, 1, 2, 4, 2, 4, 3, 1, 1, 3, 3, 3, 3, 3, 3, 3, 2, 3, 4, 2, 2, 1, 1, 1, 1, 4, 3, 3, 2, 2, 3, 3, 3, 4, 1, 3, 2, 2, 4, 2]
 val_label: tensor([0, 1, 0, 0, 0, 1, 3, 3, 4, 4, 0, 1, 1, 4, 2, 4, 3, 0, 0, 1, 0, 3, 4, 0,
        3, 4, 2, 1, 3, 2, 1, 1, 1, 1, 4, 4, 3, 2, 2, 1, 4, 3, 0, 4, 3, 3, 2, 3,
        4, 2], device='cuda:0')
10.0
tensor(1.0840, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 1, 3, 2, 3, 3, 4, 3, 4, 3, 1, 1, 3, 3, 3, 2, 2, 4, 3, 3, 3, 0, 4, 1, 3, 2, 4, 0, 3, 3, 3, 2, 1, 2, 2, 3, 3, 0, 3, 3, 3, 3, 1, 2, 2, 4, 1, 1, 3]
 val_label: tensor([2, 4, 1, 3, 2, 4, 2, 1, 3, 4, 3, 1, 1, 3, 1, 2, 2, 4, 4, 3, 3, 3, 0, 4,
        1, 3, 4, 4, 0, 2, 3, 4, 4, 1, 2, 2, 3, 2, 0, 3, 3, 3, 4, 0, 3, 1, 4, 1,
        0, 3], device='cuda:0')
10.0
tensor(0.9103, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 1, 3, 3, 2, 2, 4, 0, 2, 1, 2, 2, 3, 4, 3, 4, 2, 3, 2, 4, 0, 4, 2, 1, 3, 3, 3, 2, 1, 2, 3, 3, 3, 1, 3, 4, 4, 4, 4, 2, 

val_output: [4, 4, 3, 3, 3, 3, 1, 4, 3, 2, 2, 3, 2, 2, 2, 1, 2, 3, 1, 2, 2, 0, 3, 0, 2, 1, 4, 4, 4, 2, 3, 2, 3, 3, 4, 4, 0, 4, 0, 1, 3, 2, 4, 3, 3, 3, 3, 3, 3, 2]
 val_label: tensor([2, 4, 3, 1, 3, 3, 3, 3, 3, 2, 0, 3, 4, 2, 2, 0, 2, 1, 0, 2, 2, 0, 2, 0,
        2, 1, 4, 0, 0, 3, 4, 2, 3, 3, 4, 4, 0, 4, 0, 1, 0, 2, 0, 3, 2, 3, 3, 3,
        3, 4], device='cuda:0')
10.0
tensor(1.0184, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 4, 4, 1, 2, 0, 3, 2, 3, 2, 2, 3, 3, 0, 3, 0, 3, 0, 3, 4, 3, 4, 4, 0, 2, 1, 4, 3, 4, 2, 1, 2, 3, 4, 2, 3, 3, 2, 3, 1, 3, 2, 1, 4, 3, 3, 4, 3, 4]
 val_label: tensor([3, 3, 4, 4, 1, 2, 3, 2, 4, 3, 2, 0, 3, 0, 1, 4, 0, 4, 0, 0, 4, 3, 3, 3,
        3, 4, 1, 3, 2, 3, 2, 0, 2, 3, 4, 2, 0, 2, 2, 1, 1, 1, 1, 1, 2, 3, 1, 4,
        1, 4], device='cuda:0')
10.0
tensor(1.2118, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 1, 1, 1, 3, 1, 2, 4, 2, 4, 3, 4, 3, 3, 3, 2, 0, 3, 4, 3, 4, 4, 1, 4, 3, 3, 4, 1, 4, 3, 1, 4, 0, 1, 2, 3, 2, 2, 1, 3, 

val_output: [1, 1, 1, 3, 3, 3, 4, 3, 0, 3, 2, 2, 4, 3, 1, 0, 3, 4, 4, 4, 3, 3, 4, 3, 4, 3, 3, 3, 3, 4, 4, 3, 2, 2, 3, 3, 2, 3, 0, 3, 3, 3, 4, 1, 4, 4, 2, 3, 3, 2]
 val_label: tensor([1, 1, 1, 4, 3, 3, 3, 1, 0, 3, 2, 4, 4, 4, 1, 0, 3, 4, 0, 3, 3, 1, 3, 1,
        4, 3, 0, 3, 0, 3, 0, 4, 3, 2, 2, 3, 1, 4, 0, 3, 1, 1, 4, 0, 3, 3, 2, 1,
        1, 0], device='cuda:0')
10.0
tensor(1.2208, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 0, 2, 4, 4, 1, 4, 1, 3, 3, 3, 1, 3, 1, 4, 2, 3, 2, 3, 3, 3, 2, 2, 3, 2, 2, 3, 3, 3, 2, 3, 4, 2, 4, 1, 4, 3, 0, 0, 3, 0, 2, 3, 3, 4, 0, 3, 3, 3]
 val_label: tensor([4, 3, 4, 2, 2, 1, 1, 4, 0, 3, 4, 3, 1, 3, 1, 2, 1, 3, 3, 3, 3, 1, 2, 3,
        3, 3, 3, 4, 0, 4, 2, 4, 4, 2, 4, 0, 2, 3, 0, 0, 3, 0, 2, 1, 3, 4, 0, 4,
        4, 3], device='cuda:0')
10.0
tensor(1.0701, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 2, 3, 0, 4, 4, 4, 3, 2, 3, 4, 2, 3, 3, 2, 2, 1, 2, 1, 1, 4, 4, 3, 2, 3, 1, 4, 3, 3, 3, 4, 3, 3, 1, 3, 3, 4, 4, 4, 4, 

val_output: [4, 4, 2, 4, 2, 2, 3, 3, 3, 0, 3, 1, 3, 1, 2, 2, 3, 2, 3, 3, 4, 1, 3, 3, 1, 4, 3, 3, 1, 2, 3, 2, 3, 2, 2, 4, 2, 2, 2, 2, 3, 2, 4, 3, 1, 2, 2, 0, 2, 2]
 val_label: tensor([0, 4, 4, 4, 3, 2, 1, 3, 3, 0, 2, 1, 4, 2, 2, 1, 3, 2, 3, 3, 4, 1, 1, 3,
        1, 4, 3, 1, 1, 4, 0, 3, 4, 2, 2, 4, 2, 2, 3, 3, 4, 1, 4, 4, 1, 1, 0, 0,
        2, 2], device='cuda:0')
10.0
tensor(0.9367, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 2, 3, 2, 2, 1, 2, 2, 2, 3, 1, 3, 4, 2, 2, 3, 3, 4, 3, 1, 4, 2, 3, 3, 0, 3, 3, 2, 2, 3, 1, 3, 3, 3, 2, 1, 2, 3, 3, 0, 2, 1, 3, 2, 3, 1, 4, 3]
 val_label: tensor([2, 3, 0, 2, 2, 3, 2, 1, 0, 2, 2, 4, 1, 3, 4, 2, 2, 1, 3, 4, 2, 1, 3, 2,
        1, 3, 0, 0, 3, 3, 2, 3, 2, 3, 3, 3, 2, 3, 4, 0, 3, 0, 2, 3, 3, 2, 2, 1,
        4, 3], device='cuda:0')
10.0
tensor(1.0068, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 1, 0, 3, 4, 2, 4, 3, 4, 2, 0, 3, 3, 3, 2, 2, 2, 3, 3, 2, 3, 3, 1, 2, 1, 1, 1, 4, 3, 2, 1, 2, 1, 3, 1, 3, 4, 4, 4, 2, 

val_output: [2, 3, 4, 3, 4, 4, 2, 4, 3, 2, 3, 2, 3, 4, 2, 4, 4, 1, 2, 3, 2, 3, 3, 3, 4, 2, 3, 3, 2, 3, 2, 2, 2, 2, 0, 4, 0, 3, 3, 3, 3, 3, 2, 3, 4, 4, 3, 3, 4, 4]
 val_label: tensor([2, 1, 4, 3, 4, 2, 3, 4, 3, 2, 3, 4, 1, 4, 1, 4, 4, 1, 2, 0, 2, 3, 3, 1,
        2, 3, 3, 3, 3, 3, 2, 3, 4, 4, 0, 4, 0, 3, 3, 4, 1, 4, 2, 1, 4, 2, 3, 3,
        4, 1], device='cuda:0')
10.0
tensor(1.0312, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 1, 3, 2, 2, 4, 4, 4, 0, 2, 4, 0, 3, 1, 2, 2, 2, 2, 4, 1, 2, 2, 1, 2, 2, 1, 3, 2, 3, 4, 0, 0, 2, 4, 2, 2, 3, 2, 4, 3, 2, 3, 3, 3, 4, 3, 3, 2, 4]
 val_label: tensor([1, 1, 3, 3, 3, 0, 4, 4, 2, 0, 2, 0, 0, 2, 1, 2, 3, 2, 2, 1, 3, 2, 2, 1,
        4, 2, 1, 2, 3, 3, 4, 3, 0, 2, 0, 2, 2, 3, 2, 4, 2, 2, 3, 1, 3, 4, 2, 4,
        2, 4], device='cuda:0')
10.0
tensor(0.9591, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 4, 2, 4, 3, 4, 0, 4, 2, 1, 1, 3, 3, 2, 1, 2, 1, 0, 3, 3, 3, 1, 4, 3, 3, 3, 3, 3, 2, 2, 3, 4, 4, 1, 4, 2, 4, 2, 3, 

val_output: [1, 2, 2, 4, 0, 1, 3, 4, 2, 3, 4, 1, 3, 3, 3, 3, 1, 4, 2, 2, 3, 2, 3, 3, 4, 4, 3, 3, 4, 3, 1, 0, 4, 4, 3, 3, 4, 2, 4, 2, 3, 3, 2, 2, 1, 4, 4, 4, 3, 2]
 val_label: tensor([1, 1, 2, 4, 0, 1, 2, 0, 0, 3, 3, 0, 4, 3, 1, 3, 1, 1, 2, 0, 3, 2, 3, 4,
        4, 4, 3, 1, 4, 4, 2, 1, 2, 4, 3, 0, 4, 0, 3, 2, 3, 1, 2, 3, 3, 4, 4, 2,
        0, 3], device='cuda:0')
10.0
tensor(1.1606, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 4, 3, 0, 4, 4, 3, 3, 1, 3, 1, 3, 0, 1, 2, 3, 1, 3, 4, 2, 0, 3, 4, 4, 1, 1, 4, 3, 3, 4, 2, 2, 3, 0, 1, 4, 3, 3, 4, 1, 2, 0, 2, 3, 4, 3, 0, 3, 3]
 val_label: tensor([2, 1, 3, 4, 4, 1, 4, 4, 3, 1, 2, 3, 3, 0, 0, 0, 2, 1, 3, 2, 4, 0, 3, 4,
        3, 1, 3, 3, 3, 1, 4, 1, 0, 1, 0, 2, 4, 4, 3, 4, 0, 0, 3, 2, 3, 3, 3, 1,
        3, 3], device='cuda:0')
10.0
tensor(1.3782, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 3, 0, 0, 1, 1, 0, 4, 4, 2, 0, 4, 1, 2, 0, 3, 1, 1, 3, 1, 0, 2, 3, 1, 3, 3, 1, 1, 3, 2, 3, 1, 4, 2, 4, 4, 2, 4, 3, 3, 1, 

val_output: [2, 3, 1, 2, 4, 4, 4, 1, 2, 3, 3, 1, 3, 4, 2, 4, 3, 4, 1, 2, 2, 1, 3, 2, 3, 4, 4, 4, 2, 3, 4, 3, 3, 3, 2, 4, 3, 3, 2, 3, 4, 0, 3, 2, 1, 2, 3, 3, 4, 1]
 val_label: tensor([2, 3, 3, 2, 4, 4, 2, 0, 2, 0, 4, 1, 1, 4, 2, 4, 3, 4, 1, 2, 1, 1, 3, 4,
        3, 4, 3, 4, 2, 3, 1, 4, 2, 2, 2, 2, 1, 0, 2, 3, 4, 2, 3, 2, 1, 1, 3, 0,
        4, 3], device='cuda:0')
10.0
tensor(1.0423, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 2, 1, 2, 2, 2, 1, 4, 3, 1, 1, 0, 4, 1, 1, 4, 2, 3, 3, 2, 3, 4, 4, 2, 3, 2, 3, 3, 1, 3, 3, 3, 2, 3, 3, 3, 3, 4, 2, 1, 4, 3, 3, 4, 0, 3, 3, 3, 3]
 val_label: tensor([1, 3, 3, 1, 0, 2, 3, 1, 3, 3, 0, 0, 3, 2, 0, 1, 4, 2, 3, 3, 4, 3, 4, 4,
        0, 3, 1, 3, 3, 1, 3, 3, 2, 2, 4, 3, 3, 2, 3, 4, 3, 4, 1, 3, 4, 0, 2, 3,
        4, 3], device='cuda:0')
10.0
tensor(1.1117, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 2, 3, 0, 1, 3, 3, 2, 0, 3, 1, 2, 3, 2, 3, 2, 4, 3, 4, 2, 4, 4, 2, 2, 1, 2, 2, 2, 0, 3, 2, 2, 1, 3, 4, 4, 4, 1, 1, 4, 

val_output: [1, 0, 4, 3, 2, 3, 1, 3, 1, 2, 2, 2, 3, 3, 3, 4, 1, 1, 1, 2, 4, 3, 3, 2, 3, 3, 1, 3, 4, 2, 3, 2, 1, 2, 2, 1, 3, 3, 2, 2, 2, 2, 4, 4, 1, 3, 3, 3, 1, 4]
 val_label: tensor([0, 0, 4, 2, 2, 3, 0, 1, 1, 2, 3, 3, 3, 1, 3, 4, 3, 1, 3, 2, 4, 3, 1, 2,
        3, 2, 3, 2, 4, 2, 1, 2, 1, 2, 2, 1, 3, 3, 2, 2, 2, 2, 4, 4, 1, 2, 3, 1,
        1, 4], device='cuda:0')
10.0
tensor(0.9065, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 0, 2, 1, 2, 3, 3, 2, 0, 4, 4, 4, 4, 2, 2, 3, 3, 2, 2, 4, 3, 1, 3, 3, 4, 4, 3, 1, 4, 2, 3, 0, 2, 4, 3, 1, 1, 4, 0, 3, 0, 4, 4, 3, 4, 2, 2, 3, 3, 4]
 val_label: tensor([0, 0, 3, 1, 3, 3, 4, 1, 0, 4, 4, 4, 3, 2, 2, 3, 3, 3, 2, 4, 3, 1, 3, 4,
        4, 4, 2, 1, 1, 2, 0, 3, 2, 4, 3, 1, 0, 4, 0, 1, 0, 0, 4, 3, 1, 2, 2, 4,
        2, 2], device='cuda:0')
10.0
tensor(1.0434, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 4, 4, 4, 4, 3, 2, 2, 0, 3, 4, 4, 3, 3, 1, 4, 4, 2, 4, 1, 0, 3, 4, 2, 3, 0, 1, 3, 1, 1, 3, 4, 3, 3, 1, 2, 4, 3, 4, 3, 

val_output: [1, 3, 3, 2, 4, 2, 2, 1, 3, 3, 4, 3, 1, 1, 1, 3, 3, 2, 3, 2, 4, 2, 2, 2, 4, 1, 2, 4, 4, 2, 1, 4, 3, 1, 3, 1, 1, 4, 2, 4, 4, 4, 3, 3, 3, 4, 3, 0, 2, 2]
 val_label: tensor([2, 3, 3, 2, 4, 2, 3, 1, 3, 1, 4, 3, 0, 1, 3, 3, 3, 2, 0, 2, 4, 1, 2, 0,
        1, 3, 2, 2, 4, 2, 1, 3, 1, 1, 1, 1, 1, 3, 1, 4, 4, 4, 3, 3, 3, 4, 2, 3,
        3, 2], device='cuda:0')
10.0
tensor(1.0858, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 1, 2, 0, 0, 2, 2, 3, 4, 3, 3, 4, 0, 0, 2, 2, 2, 3, 1, 3, 4, 3, 1, 2, 2, 4, 2, 0, 4, 3, 3, 3, 3, 1, 3, 4, 2, 2, 3, 4, 0, 1, 2, 1, 4, 1, 2, 2, 4]
 val_label: tensor([3, 3, 1, 0, 1, 0, 1, 3, 3, 3, 4, 3, 3, 0, 0, 1, 0, 1, 2, 0, 2, 3, 1, 1,
        2, 2, 4, 2, 4, 1, 3, 4, 3, 4, 1, 4, 4, 3, 3, 3, 4, 0, 1, 3, 1, 3, 1, 4,
        3, 3], device='cuda:0')
10.0
tensor(1.2496, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 2, 3, 3, 4, 2, 3, 2, 2, 4, 3, 1, 4, 4, 2, 4, 3, 3, 4, 4, 1, 2, 3, 3, 4, 1, 1, 3, 0, 3, 0, 2, 0, 2, 4, 1, 1, 4, 4, 

val_output: [0, 4, 4, 0, 4, 1, 3, 4, 2, 3, 0, 1, 0, 2, 3, 1, 3, 2, 3, 1, 4, 1, 1, 3, 4, 0, 4, 2, 3, 3, 2, 4, 3, 2, 3, 4, 2, 3, 3, 3, 2, 4, 3, 2, 0, 4, 4, 4, 3, 3]
 val_label: tensor([0, 0, 4, 0, 4, 1, 1, 1, 3, 1, 0, 1, 0, 2, 2, 3, 1, 2, 3, 1, 0, 3, 1, 4,
        4, 0, 3, 2, 3, 1, 4, 4, 4, 0, 3, 4, 2, 4, 3, 2, 2, 4, 3, 3, 0, 3, 4, 2,
        3, 0], device='cuda:0')
10.0
tensor(1.0956, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 3, 1, 2, 3, 4, 3, 3, 4, 4, 3, 3, 1, 0, 2, 0, 1, 1, 1, 3, 2, 3, 4, 4, 3, 4, 4, 3, 3, 3, 3, 4, 1, 2, 2, 4, 2, 3, 4, 2, 4, 3, 4, 3, 3, 0, 2, 2]
 val_label: tensor([3, 4, 2, 3, 3, 2, 3, 0, 3, 3, 4, 4, 4, 3, 1, 0, 2, 0, 2, 1, 1, 4, 3, 2,
        4, 4, 3, 1, 3, 3, 3, 3, 2, 4, 1, 4, 2, 3, 2, 3, 3, 4, 4, 4, 1, 3, 3, 0,
        2, 3], device='cuda:0')
10.0
tensor(1.0108, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 3, 1, 2, 2, 1, 4, 4, 3, 2, 2, 3, 3, 2, 1, 3, 2, 2, 4, 1, 3, 3, 1, 4, 2, 3, 4, 3, 1, 1, 1, 1, 4, 3, 3, 2, 2, 3, 4, 4, 

val_output: [3, 4, 2, 3, 3, 3, 4, 2, 3, 3, 4, 3, 1, 2, 2, 3, 4, 3, 4, 3, 2, 1, 3, 4, 4, 4, 4, 3, 4, 4, 2, 2, 4, 2, 2, 0, 4, 2, 0, 3, 3, 3, 4, 3, 2, 4, 2, 2, 3, 3]
 val_label: tensor([1, 4, 4, 3, 4, 3, 1, 2, 2, 3, 4, 1, 0, 1, 3, 1, 3, 4, 4, 3, 2, 3, 3, 3,
        2, 3, 4, 3, 4, 1, 2, 2, 4, 2, 3, 4, 1, 2, 0, 3, 3, 3, 3, 0, 0, 3, 3, 1,
        3, 3], device='cuda:0')
10.0
tensor(1.2170, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 3, 1, 1, 4, 3, 3, 0, 1, 1, 3, 3, 2, 4, 1, 2, 3, 3, 3, 3, 4, 1, 3, 2, 2, 1, 1, 3, 4, 2, 2, 1, 4, 3, 3, 3, 1, 3, 3, 2, 1, 0, 0, 4, 3, 3, 2, 4, 2]
 val_label: tensor([4, 3, 1, 1, 1, 4, 3, 4, 0, 1, 1, 4, 3, 2, 4, 1, 3, 2, 3, 3, 3, 3, 3, 3,
        0, 2, 1, 1, 1, 3, 2, 2, 1, 3, 3, 3, 0, 0, 4, 3, 2, 1, 0, 0, 4, 3, 2, 4,
        4, 2], device='cuda:0')
10.0
tensor(0.9690, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 1, 4, 2, 1, 2, 2, 2, 4, 2, 2, 2, 1, 2, 3, 2, 3, 4, 2, 0, 1, 0, 4, 3, 0, 1, 3, 3, 2, 3, 1, 0, 2, 4, 1, 3, 3, 3, 3, 4, 

val_output: [4, 2, 4, 4, 2, 3, 3, 3, 1, 4, 3, 3, 2, 2, 2, 3, 3, 2, 4, 2, 3, 3, 3, 1, 3, 3, 4, 3, 3, 4, 4, 4, 2, 3, 2, 2, 1, 2, 0, 4, 3, 4, 4, 1, 3, 4, 4, 1, 3, 3]
 val_label: tensor([4, 3, 2, 3, 2, 4, 3, 4, 1, 1, 3, 4, 2, 2, 2, 4, 1, 4, 4, 2, 4, 1, 3, 2,
        1, 1, 3, 1, 1, 3, 4, 4, 3, 4, 2, 2, 1, 2, 0, 3, 2, 4, 4, 0, 2, 4, 3, 1,
        0, 3], device='cuda:0')
10.0
tensor(1.1756, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 2, 2, 1, 2, 4, 3, 4, 4, 4, 2, 0, 1, 2, 2, 3, 3, 4, 4, 1, 3, 2, 4, 2, 3, 1, 0, 1, 4, 1, 2, 0, 2, 4, 2, 3, 3, 1, 4, 0, 2, 4, 3, 3, 2, 3, 3, 1]
 val_label: tensor([4, 3, 3, 2, 3, 1, 2, 3, 3, 4, 2, 3, 2, 0, 1, 0, 1, 1, 3, 4, 2, 0, 3, 2,
        0, 4, 3, 3, 4, 1, 0, 0, 4, 0, 3, 3, 4, 3, 4, 2, 3, 0, 2, 2, 0, 3, 3, 3,
        0, 1], device='cuda:0')
10.0
tensor(1.2169, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 2, 3, 3, 4, 0, 2, 3, 4, 2, 4, 1, 4, 4, 4, 0, 4, 3, 4, 3, 4, 4, 1, 4, 3, 4, 0, 2, 2, 3, 4, 2, 3, 3, 2, 1, 0, 4, 2, 1, 1, 

val_output: [3, 1, 4, 3, 3, 0, 0, 1, 4, 2, 3, 2, 3, 4, 0, 0, 0, 2, 4, 3, 2, 3, 4, 4, 2, 2, 4, 1, 3, 3, 2, 3, 4, 3, 0, 3, 4, 3, 0, 3, 4, 2, 3, 3, 1, 1, 1, 2, 3, 3]
 val_label: tensor([3, 3, 4, 3, 2, 0, 3, 1, 2, 2, 0, 3, 2, 4, 0, 3, 0, 2, 4, 3, 2, 4, 3, 4,
        2, 2, 3, 1, 1, 3, 2, 1, 4, 0, 0, 3, 4, 3, 0, 0, 3, 2, 2, 0, 1, 0, 1, 2,
        2, 1], device='cuda:0')
10.0
tensor(0.9443, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 4, 3, 4, 4, 4, 2, 2, 3, 3, 2, 2, 2, 2, 3, 3, 3, 3, 2, 4, 1, 4, 2, 3, 2, 2, 2, 0, 3, 1, 3, 2, 1, 2, 3, 4, 2, 1, 3, 2, 2, 4, 1, 3, 3, 4, 4, 2, 4]
 val_label: tensor([2, 2, 4, 1, 0, 4, 0, 4, 3, 1, 0, 2, 2, 4, 2, 3, 3, 2, 2, 4, 4, 1, 4, 2,
        3, 2, 3, 3, 0, 3, 1, 4, 2, 1, 1, 3, 0, 2, 1, 4, 4, 2, 4, 2, 3, 4, 3, 4,
        3, 2], device='cuda:0')
10.0
tensor(1.0975, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 1, 2, 3, 2, 1, 0, 2, 1, 4, 1, 4, 2, 2, 0, 4, 3, 2, 4, 0, 2, 2, 2, 3, 3, 4, 1, 4, 4, 3, 3, 3, 3, 3, 3, 3, 2, 4, 4, 3, 

val_output: [2, 1, 4, 3, 3, 2, 3, 3, 4, 1, 2, 2, 2, 0, 2, 0, 2, 1, 1, 0, 2, 0, 4, 4, 4, 4, 1, 3, 3, 4, 3, 2, 1, 1, 2, 4, 2, 0, 2, 2, 3, 1, 3, 0, 2, 1, 4, 1, 3, 2]
 val_label: tensor([3, 1, 4, 3, 1, 4, 4, 4, 3, 2, 2, 2, 2, 0, 2, 0, 0, 0, 3, 0, 2, 0, 1, 4,
        3, 4, 1, 3, 2, 2, 3, 2, 1, 0, 2, 3, 3, 0, 2, 2, 3, 1, 3, 0, 2, 0, 2, 1,
        1, 3], device='cuda:0')
10.0
tensor(0.9689, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 2, 4, 1, 3, 1, 0, 3, 3, 3, 2, 4, 1, 2, 3, 2, 3, 1, 4, 4, 2, 2, 3, 4, 2, 4, 1, 3, 3, 3, 1, 2, 1, 3, 4, 3, 4, 3, 3, 4, 4, 1, 2, 0, 3, 2, 4, 3, 3]
 val_label: tensor([0, 3, 2, 3, 1, 2, 0, 0, 2, 4, 3, 2, 3, 1, 4, 3, 3, 4, 0, 4, 3, 2, 0, 3,
        4, 2, 4, 2, 3, 1, 3, 1, 2, 0, 4, 4, 3, 4, 3, 3, 1, 4, 1, 2, 1, 1, 2, 4,
        4, 3], device='cuda:0')
10.0
tensor(1.0492, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 4, 3, 3, 2, 2, 2, 4, 4, 3, 2, 2, 4, 3, 3, 2, 4, 4, 4, 3, 3, 3, 3, 3, 1, 4, 1, 1, 4, 3, 1, 3, 2, 2, 3, 3, 1, 0, 3, 1, 3, 

val_output: [3, 1, 4, 3, 1, 2, 4, 2, 1, 4, 1, 1, 1, 3, 4, 1, 2, 1, 1, 1, 3, 2, 2, 4, 0, 2, 2, 2, 4, 3, 2, 2, 2, 0, 3, 0, 0, 4, 4, 1, 4, 3, 3, 2, 3, 4, 3, 4, 3, 2]
 val_label: tensor([1, 1, 4, 0, 0, 2, 1, 1, 0, 3, 1, 0, 0, 4, 4, 0, 2, 1, 0, 0, 2, 4, 2, 4,
        0, 2, 2, 2, 4, 1, 2, 2, 2, 0, 3, 1, 0, 4, 4, 1, 4, 2, 1, 2, 3, 4, 3, 4,
        3, 2], device='cuda:0')
10.0
tensor(1.0288, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 2, 1, 4, 2, 2, 3, 4, 4, 4, 3, 4, 0, 3, 3, 3, 1, 2, 4, 4, 3, 1, 2, 3, 1, 2, 2, 3, 4, 2, 4, 4, 2, 4, 1, 2, 2, 4, 1, 4, 2, 3, 1, 2, 1, 3, 4, 3, 4]
 val_label: tensor([1, 3, 2, 1, 4, 0, 2, 3, 4, 3, 4, 2, 4, 0, 2, 4, 3, 1, 3, 4, 4, 3, 1, 2,
        4, 1, 4, 1, 3, 2, 0, 3, 0, 3, 0, 1, 1, 2, 4, 1, 0, 0, 0, 3, 3, 4, 3, 2,
        3, 4], device='cuda:0')
10.0
tensor(1.2083, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 2, 3, 3, 3, 4, 3, 2, 2, 2, 3, 3, 0, 2, 4, 2, 3, 3, 3, 3, 4, 1, 2, 3, 3, 1, 2, 3, 1, 2, 3, 4, 4, 2, 3, 4, 3, 4, 3, 

val_output: [3, 3, 3, 2, 3, 2, 3, 0, 3, 4, 3, 3, 3, 2, 3, 2, 4, 3, 0, 3, 3, 3, 0, 2, 1, 2, 4, 2, 1, 4, 0, 3, 2, 2, 2, 1, 4, 2, 1, 1, 4, 3, 3, 3, 0, 4, 4, 1, 2, 1]
 val_label: tensor([2, 4, 3, 3, 3, 0, 3, 0, 3, 4, 2, 2, 0, 3, 3, 2, 3, 3, 3, 3, 4, 0, 0, 0,
        1, 2, 4, 2, 1, 3, 0, 4, 2, 2, 2, 1, 4, 1, 0, 0, 4, 4, 3, 4, 3, 3, 4, 3,
        2, 1], device='cuda:0')
10.0
tensor(1.0474, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 3, 4, 2, 4, 4, 3, 4, 1, 0, 3, 3, 4, 1, 3, 3, 3, 2, 4, 3, 4, 2, 4, 1, 3, 1, 3, 3, 3, 2, 3, 1, 2, 1, 2, 4, 3, 3, 3, 4, 3, 2, 3, 2, 2, 2, 4, 1, 2, 1]
 val_label: tensor([0, 3, 4, 3, 4, 4, 3, 4, 1, 0, 3, 4, 3, 1, 3, 3, 3, 2, 4, 3, 4, 2, 4, 1,
        2, 0, 0, 3, 3, 2, 1, 1, 2, 3, 2, 4, 3, 3, 3, 2, 4, 2, 3, 2, 2, 3, 4, 1,
        2, 1], device='cuda:0')
10.0
tensor(0.7370, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 4, 3, 4, 4, 4, 3, 2, 3, 3, 3, 4, 3, 2, 0, 3, 4, 4, 3, 0, 1, 0, 3, 4, 3, 4, 3, 2, 1, 3, 4, 3, 3, 2, 1, 4, 2, 4, 4, 4, 

val_output: [3, 3, 4, 4, 4, 4, 1, 4, 3, 4, 1, 4, 1, 2, 1, 4, 2, 3, 0, 2, 4, 3, 4, 3, 2, 4, 3, 1, 1, 3, 4, 4, 3, 4, 0, 3, 4, 3, 3, 3, 1, 3, 2, 4, 4, 3, 3, 4, 0, 3]
 val_label: tensor([3, 4, 3, 4, 3, 4, 0, 4, 3, 3, 1, 4, 1, 2, 1, 2, 2, 2, 0, 3, 4, 0, 4, 0,
        2, 4, 3, 1, 1, 2, 3, 4, 3, 4, 3, 0, 0, 0, 3, 3, 1, 1, 2, 0, 4, 2, 3, 1,
        0, 1], device='cuda:0')
10.0
tensor(1.0012, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 3, 3, 2, 4, 2, 2, 0, 3, 2, 1, 3, 2, 3, 0, 3, 2, 3, 4, 3, 3, 3, 0, 3, 1, 1, 1, 4, 3, 4, 1, 1, 4, 3, 2, 4, 1, 1, 3, 0, 2, 3, 3, 2, 4, 3, 2, 0, 3]
 val_label: tensor([3, 4, 3, 3, 2, 0, 3, 2, 0, 3, 2, 1, 3, 2, 3, 0, 3, 2, 3, 4, 2, 0, 1, 0,
        3, 0, 0, 1, 4, 3, 4, 1, 1, 4, 3, 2, 4, 1, 1, 2, 0, 2, 2, 2, 2, 4, 0, 1,
        0, 1], device='cuda:0')
10.0
tensor(0.8521, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 1, 0, 4, 3, 3, 2, 1, 4, 2, 2, 3, 3, 1, 0, 2, 3, 2, 2, 0, 4, 4, 3, 4, 3, 2, 0, 4, 4, 4, 4, 2, 3, 3, 3, 4, 2, 3, 3, 2, 

val_output: [4, 1, 2, 2, 1, 2, 3, 3, 2, 4, 4, 0, 4, 3, 2, 1, 2, 2, 2, 4, 2, 3, 2, 3, 3, 3, 3, 3, 3, 3, 4, 3, 4, 2, 4, 2, 3, 1, 4, 4, 4, 4, 3, 3, 2, 3, 2, 1, 4, 4]
 val_label: tensor([4, 3, 2, 2, 1, 4, 3, 0, 2, 0, 4, 0, 4, 1, 3, 1, 2, 4, 0, 4, 3, 3, 3, 0,
        3, 2, 2, 4, 3, 1, 4, 3, 4, 2, 3, 3, 3, 1, 4, 3, 2, 4, 2, 3, 2, 3, 0, 1,
        2, 2], device='cuda:0')
10.0
tensor(1.1111, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 4, 3, 3, 1, 3, 4, 2, 4, 4, 4, 2, 1, 3, 3, 4, 4, 2, 2, 1, 4, 3, 3, 3, 3, 2, 4, 3, 2, 2, 3, 3, 2, 3, 3, 4, 3, 2, 3, 2, 2, 1, 2, 3, 4, 4, 2, 1, 4, 3]
 val_label: tensor([0, 4, 3, 2, 1, 0, 4, 2, 0, 4, 4, 2, 1, 2, 3, 3, 4, 2, 2, 1, 1, 3, 2, 2,
        3, 2, 4, 0, 4, 2, 1, 4, 2, 1, 4, 3, 3, 2, 2, 3, 2, 1, 4, 4, 3, 3, 2, 1,
        4, 3], device='cuda:0')
10.0
tensor(1.0015, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 4, 4, 0, 3, 2, 1, 3, 1, 1, 4, 3, 3, 0, 4, 4, 1, 3, 2, 4, 4, 4, 2, 0, 4, 4, 2, 3, 4, 2, 4, 0, 4, 0, 4, 1, 3, 1, 4, 1, 

val_output: [3, 4, 2, 2, 1, 4, 3, 4, 3, 4, 1, 3, 4, 3, 2, 3, 3, 4, 2, 1, 4, 2, 1, 1, 4, 1, 2, 3, 4, 1, 3, 3, 0, 4, 3, 3, 3, 4, 4, 3, 1, 2, 3, 3, 1, 4, 4, 4, 0, 4]
 val_label: tensor([1, 2, 3, 2, 1, 4, 4, 3, 1, 3, 3, 4, 1, 3, 2, 3, 3, 4, 3, 0, 4, 3, 1, 1,
        4, 1, 3, 4, 4, 1, 3, 3, 0, 4, 4, 0, 3, 4, 3, 4, 1, 4, 1, 3, 3, 2, 3, 4,
        0, 3], device='cuda:0')
10.0
tensor(0.9416, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 3, 2, 1, 4, 3, 2, 3, 4, 4, 2, 3, 3, 3, 3, 2, 3, 3, 4, 0, 2, 3, 0, 2, 1, 4, 4, 2, 2, 2, 4, 3, 2, 2, 0, 3, 4, 3, 4, 3, 4, 3, 3, 4, 2, 1, 4, 1]
 val_label: tensor([2, 4, 0, 3, 4, 3, 2, 2, 2, 4, 4, 3, 3, 3, 4, 0, 0, 0, 0, 3, 4, 0, 1, 3,
        3, 1, 1, 4, 4, 2, 2, 1, 3, 2, 2, 2, 0, 3, 0, 1, 3, 3, 3, 3, 3, 0, 2, 1,
        1, 1], device='cuda:0')
10.0
tensor(1.3323, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 2, 4, 3, 2, 3, 4, 4, 2, 4, 4, 3, 3, 3, 3, 3, 4, 3, 4, 3, 4, 3, 4, 2, 3, 1, 1, 4, 3, 2, 1, 3, 4, 1, 2, 0, 3, 0, 0, 

val_output: [3, 1, 0, 0, 3, 2, 2, 2, 3, 3, 1, 4, 3, 4, 3, 2, 4, 4, 2, 4, 1, 3, 4, 0, 1, 3, 2, 4, 2, 3, 3, 3, 2, 4, 4, 4, 0, 1, 1, 2, 4, 4, 4, 2, 2, 2, 0, 3, 3, 3]
 val_label: tensor([3, 2, 0, 3, 3, 2, 2, 1, 2, 3, 1, 1, 3, 2, 3, 3, 4, 4, 2, 2, 1, 3, 4, 0,
        1, 0, 2, 0, 3, 2, 3, 1, 2, 4, 4, 2, 0, 1, 0, 2, 4, 4, 4, 2, 3, 4, 0, 0,
        2, 3], device='cuda:0')
10.0
tensor(1.0182, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 4, 4, 1, 2, 0, 4, 4, 2, 1, 1, 4, 0, 0, 2, 3, 3, 3, 3, 3, 3, 0, 2, 3, 3, 4, 2, 3, 3, 3, 3, 1, 4, 3, 4, 4, 1, 3, 1, 4, 3, 3, 3, 3, 1, 4, 4, 3]
 val_label: tensor([3, 4, 2, 4, 4, 0, 2, 3, 3, 4, 2, 4, 1, 4, 0, 0, 2, 3, 3, 3, 0, 3, 4, 0,
        0, 1, 1, 4, 2, 4, 3, 3, 3, 1, 4, 2, 4, 1, 4, 1, 2, 3, 4, 3, 4, 4, 1, 4,
        2, 2], device='cuda:0')
10.0
tensor(1.1691, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 1, 1, 3, 3, 0, 4, 0, 1, 3, 0, 3, 2, 3, 1, 4, 3, 2, 1, 3, 3, 4, 3, 3, 0, 2, 4, 3, 2, 3, 2, 1, 1, 4, 3, 2, 2, 3, 0, 

val_output: [4, 0, 0, 1, 2, 2, 1, 1, 1, 4, 4, 2, 2, 0, 2, 1, 4, 3, 0, 4, 2, 1, 1, 3, 1, 3, 1, 3, 4, 3, 3, 3, 2, 3, 2, 3, 3, 3, 4, 4, 0, 2, 2, 2, 2, 0, 1, 2, 2, 3]
 val_label: tensor([1, 0, 0, 1, 2, 2, 1, 1, 0, 4, 3, 2, 2, 2, 2, 1, 4, 3, 0, 1, 3, 1, 1, 4,
        1, 4, 1, 0, 0, 1, 3, 3, 2, 4, 2, 3, 4, 1, 4, 3, 0, 2, 2, 1, 2, 0, 1, 2,
        2, 3], device='cuda:0')
10.0
tensor(1.0575, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 4, 1, 3, 0, 2, 2, 3, 3, 1, 4, 3, 3, 2, 3, 3, 1, 0, 3, 3, 2, 2, 3, 3, 0, 1, 3, 4, 2, 2, 2, 3, 2, 4, 3, 3, 4, 2, 2, 4, 1, 3, 1, 4, 3, 2, 2, 3]
 val_label: tensor([3, 0, 3, 4, 0, 2, 1, 2, 4, 3, 4, 1, 4, 2, 1, 2, 1, 2, 3, 3, 1, 1, 2, 0,
        4, 1, 0, 1, 3, 3, 2, 2, 2, 1, 0, 4, 3, 3, 0, 2, 2, 4, 1, 0, 1, 1, 1, 3,
        0, 1], device='cuda:0')
10.0
tensor(1.2732, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 0, 3, 2, 2, 3, 2, 3, 3, 3, 0, 2, 4, 4, 4, 3, 3, 0, 4, 3, 2, 3, 1, 2, 3, 4, 4, 2, 4, 3, 3, 3, 2, 4, 4, 3, 4, 1, 3, 2, 

val_output: [3, 4, 2, 3, 3, 2, 1, 0, 3, 3, 3, 2, 3, 3, 0, 1, 2, 3, 4, 3, 3, 4, 3, 1, 2, 2, 3, 3, 2, 4, 3, 3, 2, 4, 3, 3, 3, 3, 4, 3, 3, 1, 4, 2, 2, 2, 4, 3, 4, 1]
 val_label: tensor([0, 3, 3, 3, 4, 2, 1, 0, 3, 1, 3, 2, 3, 0, 3, 3, 2, 2, 1, 3, 1, 3, 2, 1,
        3, 0, 3, 3, 3, 2, 3, 3, 2, 4, 2, 0, 1, 1, 4, 3, 3, 1, 4, 3, 2, 2, 3, 3,
        4, 1], device='cuda:0')
10.0
tensor(1.1072, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 3, 1, 0, 1, 2, 4, 1, 4, 4, 3, 3, 1, 3, 2, 3, 2, 3, 3, 0, 4, 2, 4, 3, 2, 0, 2, 4, 3, 3, 2, 0, 4, 2, 1, 2, 1, 4, 3, 4, 1, 3, 3, 2, 3, 2, 3, 3, 1]
 val_label: tensor([1, 1, 1, 3, 0, 0, 3, 4, 0, 4, 4, 0, 2, 1, 2, 0, 4, 2, 3, 4, 3, 3, 2, 4,
        1, 3, 0, 4, 4, 2, 4, 2, 0, 4, 2, 0, 2, 1, 1, 0, 0, 1, 4, 3, 2, 3, 3, 2,
        3, 1], device='cuda:0')
10.0
tensor(1.1966, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 0, 3, 1, 3, 3, 3, 3, 3, 0, 4, 0, 1, 2, 2, 3, 3, 3, 3, 1, 3, 3, 4, 4, 4, 3, 3, 1, 3, 1, 4, 0, 0, 3, 4, 1, 0, 3, 4, 3, 3, 

val_output: [2, 2, 1, 3, 3, 4, 4, 3, 1, 2, 0, 3, 3, 4, 2, 3, 3, 1, 3, 0, 0, 4, 4, 3, 3, 4, 2, 3, 4, 1, 3, 1, 1, 0, 2, 3, 3, 2, 0, 3, 3, 4, 3, 1, 2, 3, 3, 3, 3, 4]
 val_label: tensor([1, 2, 0, 3, 3, 4, 3, 1, 1, 3, 0, 0, 0, 4, 3, 2, 0, 2, 1, 0, 0, 1, 2, 3,
        3, 0, 2, 3, 0, 1, 4, 1, 0, 0, 2, 1, 1, 2, 0, 3, 4, 3, 1, 0, 2, 3, 3, 2,
        3, 1], device='cuda:0')
10.0
tensor(1.1568, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 4, 2, 3, 2, 2, 0, 4, 3, 3, 3, 3, 1, 4, 0, 1, 2, 0, 1, 3, 2, 3, 2, 2, 3, 4, 1, 2, 4, 4, 4, 2, 3, 0, 4, 4, 2, 3, 3, 3, 4, 3, 2, 3, 4, 3, 3, 2, 3]
 val_label: tensor([3, 1, 0, 3, 3, 2, 2, 0, 0, 2, 3, 3, 2, 1, 0, 0, 1, 1, 0, 0, 4, 3, 3, 2,
        3, 3, 3, 0, 2, 1, 2, 0, 2, 3, 0, 4, 4, 2, 3, 3, 1, 0, 1, 4, 3, 4, 4, 3,
        2, 2], device='cuda:0')
10.0
tensor(1.1490, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 2, 2, 4, 2, 4, 3, 3, 4, 3, 3, 2, 3, 1, 2, 2, 2, 3, 3, 3, 3, 4, 1, 3, 2, 3, 2, 1, 1, 3, 3, 2, 4, 3, 4, 2, 0, 4, 4, 4, 

val_output: [4, 3, 3, 3, 3, 3, 3, 2, 2, 3, 4, 2, 2, 0, 3, 4, 3, 2, 1, 3, 3, 4, 4, 2, 4, 1, 2, 4, 4, 3, 2, 4, 0, 3, 4, 2, 1, 0, 3, 3, 3, 1, 3, 0, 4, 3, 3, 4, 4, 4]
 val_label: tensor([4, 3, 2, 0, 4, 4, 3, 0, 1, 2, 4, 0, 3, 0, 3, 4, 3, 2, 1, 3, 3, 2, 4, 3,
        3, 1, 2, 4, 2, 4, 3, 0, 3, 3, 3, 0, 1, 0, 3, 4, 3, 4, 1, 0, 1, 2, 3, 4,
        4, 4], device='cuda:0')
10.0
tensor(1.2681, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 3, 4, 1, 3, 3, 2, 2, 2, 2, 1, 4, 2, 3, 1, 3, 1, 2, 3, 3, 4, 4, 4, 3, 1, 0, 1, 4, 0, 4, 4, 3, 3, 3, 1, 4, 2, 4, 1, 1, 3, 1, 0, 4, 2, 2, 2, 4]
 val_label: tensor([0, 2, 2, 3, 4, 1, 2, 3, 2, 4, 2, 2, 1, 3, 2, 4, 1, 4, 1, 2, 1, 3, 4, 3,
        4, 3, 1, 0, 0, 3, 0, 0, 3, 3, 1, 4, 3, 2, 3, 4, 1, 1, 4, 1, 0, 3, 2, 4,
        2, 4], device='cuda:0')
10.0
tensor(0.9584, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 3, 3, 4, 0, 4, 2, 1, 3, 4, 3, 0, 2, 4, 2, 1, 3, 2, 1, 3, 3, 2, 2, 4, 4, 3, 3, 0, 3, 4, 3, 3, 3, 4, 1, 3, 0, 3, 4, 4, 

val_output: [1, 4, 3, 3, 3, 2, 3, 4, 2, 4, 4, 0, 4, 3, 3, 1, 2, 2, 3, 1, 3, 4, 2, 4, 0, 1, 2, 4, 3, 3, 0, 4, 1, 1, 2, 0, 4, 3, 4, 3, 1, 3, 3, 2, 2, 3, 4, 4, 1, 4]
 val_label: tensor([1, 4, 1, 3, 1, 0, 1, 4, 2, 0, 1, 0, 3, 1, 4, 0, 1, 1, 3, 0, 3, 0, 3, 0,
        3, 1, 2, 4, 3, 1, 0, 3, 3, 1, 3, 2, 3, 3, 3, 3, 1, 3, 1, 2, 2, 4, 3, 3,
        1, 4], device='cuda:0')
10.0
tensor(1.2821, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 1, 3, 2, 4, 3, 2, 4, 1, 2, 3, 4, 3, 4, 1, 1, 4, 3, 2, 2, 1, 2, 3, 2, 2, 0, 4, 1, 1, 3, 3, 1, 3, 4, 4, 3, 2, 2, 1, 2, 2, 4, 3, 3, 1, 2, 3, 1, 4]
 val_label: tensor([0, 1, 1, 3, 4, 4, 3, 4, 4, 1, 0, 3, 4, 0, 4, 1, 1, 4, 3, 2, 2, 1, 2, 2,
        0, 2, 1, 4, 1, 1, 3, 4, 1, 4, 4, 4, 3, 2, 2, 1, 2, 4, 3, 2, 3, 4, 2, 0,
        0, 4], device='cuda:0')
10.0
tensor(1.1693, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 1, 3, 1, 2, 3, 0, 3, 3, 4, 3, 4, 0, 3, 3, 3, 3, 1, 4, 4, 0, 2, 3, 3, 4, 4, 3, 2, 2, 2, 1, 3, 3, 1, 1, 3, 1, 4, 4, 3, 

val_output: [2, 1, 2, 1, 3, 2, 4, 4, 3, 3, 1, 1, 3, 3, 4, 4, 0, 1, 2, 3, 3, 1, 4, 3, 4, 2, 3, 4, 2, 4, 3, 4, 3, 1, 0, 4, 4, 3, 3, 3, 3, 3, 2, 1, 4, 3, 3, 0, 2, 4]
 val_label: tensor([2, 1, 2, 0, 4, 2, 3, 4, 3, 1, 0, 3, 4, 3, 4, 1, 0, 1, 2, 3, 4, 1, 0, 3,
        4, 2, 4, 4, 2, 4, 2, 0, 2, 1, 0, 4, 4, 3, 1, 3, 3, 3, 3, 3, 3, 3, 2, 3,
        2, 3], device='cuda:0')
10.0
tensor(1.0243, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 3, 1, 0, 1, 3, 3, 4, 4, 4, 1, 1, 3, 3, 4, 3, 3, 2, 0, 0, 3, 0, 2, 2, 3, 3, 0, 1, 2, 4, 1, 3, 4, 4, 3, 0, 1, 1, 4, 1, 2, 1, 0, 4, 4, 3, 2, 4]
 val_label: tensor([2, 3, 0, 4, 4, 0, 1, 2, 1, 3, 3, 1, 1, 3, 3, 0, 3, 2, 1, 1, 0, 4, 3, 0,
        2, 4, 3, 1, 0, 1, 2, 4, 1, 3, 0, 1, 2, 0, 1, 1, 1, 0, 2, 0, 0, 4, 4, 3,
        2, 0], device='cuda:0')
10.0
tensor(1.2446, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 3, 3, 3, 3, 4, 2, 3, 0, 3, 2, 2, 4, 3, 3, 4, 4, 0, 3, 4, 3, 3, 4, 4, 0, 1, 3, 1, 2, 2, 4, 3, 3, 3, 3, 3, 0, 1, 0, 1, 

val_output: [1, 4, 1, 4, 4, 2, 1, 2, 4, 3, 0, 4, 2, 3, 3, 1, 3, 0, 3, 2, 4, 4, 4, 2, 4, 4, 3, 2, 3, 4, 1, 2, 4, 3, 4, 4, 3, 4, 4, 2, 0, 1, 4, 3, 3, 4, 3, 2, 4, 3]
 val_label: tensor([1, 4, 0, 1, 4, 3, 1, 1, 4, 3, 0, 3, 2, 3, 3, 1, 4, 0, 3, 0, 4, 4, 4, 2,
        4, 4, 2, 1, 1, 1, 1, 2, 0, 3, 1, 4, 3, 3, 4, 2, 0, 1, 4, 2, 3, 0, 1, 2,
        4, 3], device='cuda:0')
10.0
tensor(1.0267, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 2, 3, 4, 4, 4, 4, 3, 3, 3, 1, 1, 3, 1, 3, 2, 1, 2, 3, 2, 4, 2, 4, 2, 4, 3, 4, 2, 3, 2, 1, 2, 2, 2, 2, 3, 3, 3, 0, 3, 3, 1, 2, 4, 3, 2, 4, 4, 4]
 val_label: tensor([4, 1, 2, 1, 0, 4, 4, 3, 3, 0, 3, 3, 3, 3, 1, 3, 2, 1, 2, 3, 3, 1, 2, 3,
        4, 0, 3, 3, 3, 3, 2, 1, 1, 2, 2, 4, 4, 3, 2, 0, 3, 3, 1, 3, 4, 1, 3, 3,
        4, 3], device='cuda:0')
10.0
tensor(1.0277, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 4, 3, 2, 1, 2, 1, 3, 4, 3, 1, 3, 3, 2, 1, 0, 4, 4, 4, 3, 0, 4, 3, 2, 0, 1, 4, 4, 3, 3, 3, 2, 2, 1, 2, 1, 2, 2, 1, 3, 

val_output: [4, 2, 3, 4, 2, 2, 0, 1, 1, 4, 3, 4, 3, 1, 3, 2, 2, 2, 2, 4, 4, 4, 1, 3, 2, 0, 3, 2, 3, 3, 4, 1, 4, 3, 2, 2, 2, 2, 3, 0, 2, 4, 3, 1, 3, 1, 3, 2, 2, 2]
 val_label: tensor([4, 4, 1, 4, 3, 4, 0, 0, 0, 4, 2, 1, 1, 3, 3, 2, 2, 2, 2, 3, 4, 4, 1, 3,
        3, 0, 3, 3, 3, 3, 4, 1, 4, 3, 3, 2, 2, 2, 3, 0, 2, 4, 2, 1, 0, 1, 3, 2,
        2, 2], device='cuda:0')
10.0
tensor(0.8933, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 2, 3, 2, 4, 4, 3, 3, 1, 1, 1, 2, 4, 4, 3, 1, 3, 2, 4, 4, 3, 4, 3, 3, 2, 2, 4, 2, 3, 0, 3, 4, 3, 2, 2, 2, 4, 2, 1, 1, 4, 4, 4, 4, 3, 2, 3, 2]
 val_label: tensor([3, 2, 4, 2, 2, 3, 4, 3, 3, 0, 1, 1, 3, 2, 3, 3, 3, 2, 0, 2, 3, 4, 1, 4,
        3, 4, 0, 2, 2, 3, 3, 0, 1, 4, 3, 4, 2, 4, 1, 4, 3, 1, 3, 3, 4, 4, 4, 4,
        3, 4], device='cuda:0')
10.0
tensor(1.2516, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 3, 4, 2, 3, 1, 3, 0, 1, 2, 4, 4, 3, 4, 0, 4, 2, 1, 3, 2, 1, 3, 1, 4, 0, 3, 2, 2, 3, 3, 1, 1, 2, 2, 2, 2, 4, 4, 4, 

val_output: [3, 0, 3, 2, 3, 4, 2, 3, 0, 3, 4, 4, 2, 4, 2, 4, 1, 0, 1, 1, 4, 4, 4, 2, 2, 3, 3, 3, 3, 0, 3, 0, 4, 2, 2, 4, 4, 4, 2, 4, 3, 2, 3, 4, 1, 3, 0, 2, 0, 4]
 val_label: tensor([3, 0, 3, 2, 4, 4, 2, 4, 0, 1, 4, 4, 0, 2, 2, 4, 1, 1, 3, 4, 2, 4, 2, 2,
        2, 0, 3, 2, 3, 0, 1, 0, 3, 4, 3, 2, 3, 3, 2, 0, 3, 2, 0, 4, 0, 3, 0, 1,
        0, 3], device='cuda:0')
10.0
tensor(1.1494, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 3, 3, 3, 3, 1, 1, 4, 4, 2, 2, 3, 1, 3, 4, 4, 1, 4, 2, 1, 3, 3, 3, 3, 2, 3, 1, 1, 4, 4, 2, 3, 3, 4, 1, 2, 4, 3, 2, 4, 2, 4, 1, 0, 4, 3, 2, 2]
 val_label: tensor([3, 3, 3, 3, 3, 3, 2, 1, 4, 4, 3, 3, 3, 3, 1, 3, 4, 4, 1, 2, 2, 0, 3, 4,
        3, 3, 1, 3, 1, 3, 4, 2, 2, 3, 2, 4, 1, 2, 3, 3, 2, 4, 3, 4, 1, 0, 4, 4,
        3, 3], device='cuda:0')
10.0
tensor(0.8491, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 3, 4, 4, 4, 3, 1, 4, 3, 2, 0, 1, 4, 4, 3, 2, 3, 2, 1, 3, 0, 2, 2, 4, 4, 2, 2, 4, 4, 0, 3, 3, 1, 3, 2, 3, 3, 2, 0, 2, 

val_output: [4, 2, 4, 2, 4, 2, 3, 3, 3, 2, 2, 3, 3, 3, 0, 2, 1, 2, 2, 3, 0, 1, 1, 1, 1, 3, 0, 4, 4, 2, 3, 4, 3, 3, 3, 0, 3, 2, 4, 4, 3, 1, 3, 3, 2, 2, 2, 3, 1, 2]
 val_label: tensor([0, 3, 4, 2, 1, 3, 2, 4, 2, 2, 1, 3, 0, 3, 0, 2, 1, 3, 3, 3, 0, 0, 1, 1,
        2, 3, 0, 4, 4, 0, 3, 4, 2, 4, 3, 0, 3, 2, 4, 3, 3, 1, 3, 1, 4, 0, 2, 3,
        1, 2], device='cuda:0')
10.0
tensor(1.1693, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 2, 4, 4, 4, 3, 1, 4, 2, 3, 0, 4, 3, 2, 3, 1, 1, 1, 3, 2, 3, 3, 1, 4, 3, 3, 1, 3, 1, 2, 3, 4, 2, 3, 3, 3, 3, 4, 2, 3, 4, 1, 1, 1, 2, 4, 4, 4, 3]
 val_label: tensor([4, 3, 3, 4, 3, 4, 3, 0, 4, 2, 3, 0, 4, 3, 4, 3, 0, 0, 1, 0, 2, 3, 0, 1,
        4, 4, 3, 0, 3, 1, 0, 3, 0, 2, 3, 3, 3, 0, 0, 1, 3, 1, 1, 0, 0, 1, 1, 3,
        3, 3], device='cuda:0')
10.0
tensor(1.1502, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 1, 2, 0, 4, 3, 3, 1, 4, 2, 4, 4, 0, 4, 3, 4, 4, 4, 3, 1, 1, 2, 3, 2, 2, 0, 3, 1, 3, 4, 3, 3, 4, 3, 4, 2, 0, 3, 4, 

val_output: [3, 0, 2, 3, 3, 4, 3, 3, 4, 4, 2, 3, 4, 1, 3, 3, 1, 0, 4, 4, 4, 2, 4, 0, 0, 1, 3, 3, 3, 2, 4, 1, 2, 1, 2, 4, 4, 2, 2, 2, 0, 4, 4, 3, 3, 3, 3, 4, 2, 3]
 val_label: tensor([2, 0, 3, 2, 1, 0, 3, 2, 3, 4, 2, 4, 0, 1, 2, 3, 1, 0, 4, 4, 4, 0, 0, 0,
        0, 1, 4, 1, 1, 3, 4, 1, 2, 1, 2, 4, 4, 2, 1, 4, 0, 4, 4, 3, 4, 0, 3, 4,
        2, 3], device='cuda:0')
10.0
tensor(0.9502, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 4, 4, 3, 3, 3, 3, 3, 4, 0, 1, 1, 3, 3, 3, 2, 0, 3, 4, 3, 4, 2, 1, 1, 3, 4, 4, 3, 3, 2, 2, 2, 0, 3, 3, 2, 1, 3, 4, 2, 1, 4, 0, 1, 1, 4, 0, 1, 2]
 val_label: tensor([4, 4, 2, 4, 3, 3, 1, 1, 4, 1, 0, 1, 1, 1, 4, 0, 2, 0, 3, 3, 3, 0, 2, 0,
        1, 3, 4, 4, 2, 1, 2, 3, 2, 0, 1, 3, 1, 1, 3, 2, 3, 3, 4, 0, 1, 2, 4, 0,
        1, 2], device='cuda:0')
10.0
tensor(1.0871, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 0, 2, 1, 2, 3, 4, 0, 3, 3, 0, 2, 0, 1, 2, 3, 2, 3, 3, 3, 3, 4, 3, 2, 3, 3, 3, 2, 3, 2, 3, 3, 2, 0, 1, 4, 3, 1, 0, 1, 

val_output: [2, 3, 2, 2, 3, 1, 1, 1, 1, 3, 3, 4, 3, 2, 3, 2, 1, 1, 4, 3, 3, 3, 4, 2, 2, 2, 4, 3, 2, 0, 3, 4, 3, 2, 1, 1, 3, 4, 3, 1, 3, 4, 1, 3, 3, 3, 3, 4, 0, 3]
 val_label: tensor([4, 0, 2, 2, 2, 0, 0, 2, 1, 1, 3, 2, 3, 4, 1, 0, 0, 1, 4, 1, 3, 3, 3, 1,
        3, 2, 4, 4, 2, 0, 2, 3, 1, 2, 3, 3, 4, 4, 2, 0, 4, 4, 1, 3, 3, 3, 0, 1,
        0, 3], device='cuda:0')
10.0
tensor(1.2923, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 2, 2, 1, 3, 3, 4, 4, 1, 1, 2, 2, 3, 1, 4, 3, 1, 3, 1, 3, 3, 3, 4, 4, 2, 1, 2, 4, 4, 2, 2, 4, 2, 2, 2, 3, 3, 3, 4, 4, 0, 2, 4, 0, 3, 2, 3, 1, 2]
 val_label: tensor([3, 1, 2, 2, 1, 3, 3, 1, 4, 1, 3, 2, 2, 3, 0, 3, 3, 1, 3, 2, 3, 3, 1, 3,
        4, 2, 0, 2, 4, 4, 2, 2, 3, 2, 3, 2, 3, 2, 4, 4, 0, 0, 2, 4, 2, 1, 2, 1,
        2, 3], device='cuda:0')
10.0
tensor(0.9523, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 1, 4, 3, 3, 4, 2, 3, 2, 1, 4, 3, 1, 2, 4, 4, 1, 3, 3, 0, 2, 4, 4, 3, 1, 4, 2, 3, 4, 3, 2, 3, 1, 3, 3, 2, 4, 3, 0, 

val_output: [0, 2, 4, 2, 1, 2, 4, 3, 3, 4, 2, 4, 4, 2, 4, 4, 3, 1, 3, 4, 3, 1, 4, 4, 4, 4, 0, 0, 2, 3, 2, 3, 3, 1, 3, 3, 3, 2, 4, 1, 3, 4, 3, 4, 3, 3, 1, 3, 4, 3]
 val_label: tensor([0, 2, 1, 3, 1, 3, 4, 3, 1, 4, 2, 2, 4, 2, 4, 3, 4, 3, 2, 4, 1, 1, 4, 1,
        4, 4, 0, 2, 0, 3, 2, 3, 0, 1, 1, 3, 3, 2, 4, 0, 3, 3, 3, 3, 4, 3, 1, 3,
        4, 3], device='cuda:0')
10.0
tensor(1.0101, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 4, 0, 0, 2, 4, 4, 3, 2, 4, 2, 3, 3, 2, 3, 2, 4, 4, 1, 3, 2, 3, 4, 2, 4, 4, 4, 3, 4, 3, 1, 3, 2, 2, 4, 4, 3, 4, 1, 4, 4, 3, 2, 4, 3, 2, 0, 3, 3]
 val_label: tensor([0, 2, 2, 0, 0, 2, 0, 0, 1, 0, 4, 2, 0, 4, 0, 3, 2, 2, 4, 0, 2, 2, 3, 4,
        4, 4, 4, 4, 1, 4, 2, 1, 2, 3, 2, 4, 3, 3, 4, 1, 0, 3, 3, 4, 4, 1, 2, 0,
        3, 4], device='cuda:0')
10.0
tensor(1.2071, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 3, 3, 3, 1, 3, 4, 4, 2, 3, 3, 2, 4, 3, 3, 3, 2, 3, 0, 3, 3, 4, 4, 1, 4, 2, 4, 3, 3, 3, 4, 3, 1, 3, 4, 0, 1, 3, 1, 3, 

val_output: [0, 3, 1, 2, 3, 3, 2, 3, 3, 3, 2, 3, 1, 3, 4, 1, 2, 3, 2, 0, 4, 3, 3, 1, 3, 1, 1, 1, 2, 2, 0, 3, 1, 2, 2, 3, 4, 3, 1, 1, 4, 1, 3, 4, 0, 4, 1, 3, 3, 0]
 val_label: tensor([0, 3, 1, 2, 3, 2, 4, 1, 3, 3, 2, 3, 1, 2, 4, 1, 0, 4, 1, 0, 4, 2, 3, 1,
        0, 4, 1, 1, 2, 2, 0, 4, 1, 3, 2, 4, 4, 2, 1, 0, 2, 1, 3, 4, 0, 2, 3, 2,
        3, 3], device='cuda:0')
10.0
tensor(1.1266, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 3, 2, 4, 4, 3, 4, 4, 3, 4, 4, 2, 4, 3, 2, 4, 3, 2, 2, 1, 1, 4, 0, 3, 4, 2, 1, 3, 3, 4, 3, 2, 2, 2, 4, 3, 1, 0, 2, 1, 3, 3, 2, 4, 3, 4, 2, 1, 4]
 val_label: tensor([2, 4, 3, 2, 3, 3, 3, 4, 4, 2, 3, 4, 2, 0, 4, 2, 3, 3, 2, 1, 1, 1, 4, 0,
        2, 4, 4, 1, 3, 3, 1, 3, 1, 2, 3, 4, 4, 1, 0, 2, 3, 2, 3, 1, 4, 1, 3, 3,
        3, 4], device='cuda:0')
10.0
tensor(0.9066, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 4, 0, 3, 2, 1, 2, 3, 1, 3, 0, 4, 4, 1, 3, 4, 3, 4, 2, 2, 0, 2, 4, 1, 1, 2, 3, 3, 3, 3, 4, 1, 0, 4, 0, 4, 3, 2, 3, 2, 

val_output: [2, 1, 1, 4, 3, 3, 2, 0, 4, 4, 4, 3, 4, 3, 4, 3, 2, 2, 3, 4, 2, 3, 2, 4, 0, 0, 2, 2, 2, 4, 2, 2, 0, 2, 1, 4, 3, 1, 3, 2, 4, 3, 3, 3, 1, 3, 3, 1, 3, 1]
 val_label: tensor([1, 3, 3, 3, 1, 4, 2, 0, 4, 4, 4, 1, 3, 1, 4, 1, 2, 0, 1, 4, 2, 4, 2, 3,
        0, 1, 4, 2, 2, 3, 0, 2, 0, 2, 0, 2, 3, 1, 3, 1, 0, 2, 3, 3, 1, 4, 1, 1,
        4, 1], device='cuda:0')
10.0
tensor(1.2523, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 3, 3, 3, 2, 3, 3, 0, 3, 4, 1, 4, 0, 2, 2, 1, 3, 0, 3, 2, 3, 3, 3, 4, 3, 4, 1, 1, 4, 2, 1, 3, 1, 4, 1, 1, 1, 2, 4, 2, 1, 2, 3, 3, 2, 3, 4, 1, 1]
 val_label: tensor([4, 1, 3, 3, 1, 2, 0, 3, 0, 0, 4, 1, 4, 0, 2, 0, 1, 3, 0, 0, 0, 4, 4, 3,
        4, 3, 4, 1, 3, 4, 2, 1, 3, 2, 0, 3, 1, 1, 2, 4, 4, 3, 2, 3, 1, 2, 4, 4,
        1, 1], device='cuda:0')
10.0
tensor(0.9259, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 4, 3, 2, 2, 3, 2, 3, 3, 0, 3, 0, 3, 2, 3, 4, 3, 4, 2, 3, 0, 3, 4, 1, 2, 4, 3, 1, 3, 2, 1, 4, 3, 3, 4, 3, 3, 2, 3, 

val_output: [3, 4, 0, 0, 3, 0, 2, 4, 3, 2, 3, 1, 3, 2, 1, 3, 1, 3, 2, 1, 2, 3, 2, 4, 4, 2, 2, 2, 4, 2, 0, 4, 2, 3, 3, 1, 0, 3, 0, 2, 4, 4, 3, 3, 3, 2, 1, 4, 4, 0]
 val_label: tensor([0, 3, 0, 0, 4, 0, 2, 4, 0, 1, 1, 1, 1, 2, 1, 3, 3, 0, 0, 0, 3, 1, 2, 4,
        4, 2, 3, 2, 4, 2, 0, 4, 2, 2, 4, 0, 1, 4, 1, 4, 2, 0, 3, 1, 3, 2, 1, 0,
        4, 1], device='cuda:0')
10.0
tensor(1.0101, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 1, 0, 3, 2, 3, 3, 3, 4, 4, 1, 3, 2, 1, 4, 3, 3, 2, 1, 1, 0, 3, 4, 4, 0, 2, 4, 0, 2, 4, 2, 3, 3, 3, 0, 3, 4, 3, 3, 4, 4, 4, 0, 1, 3, 3, 0, 1]
 val_label: tensor([3, 3, 1, 3, 0, 3, 2, 4, 3, 3, 4, 2, 1, 3, 3, 1, 0, 3, 3, 2, 1, 1, 0, 2,
        4, 4, 1, 4, 1, 3, 2, 4, 2, 3, 0, 4, 3, 2, 2, 2, 4, 4, 4, 4, 0, 1, 3, 1,
        0, 1], device='cuda:0')
10.0
tensor(0.9726, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 2, 2, 4, 1, 2, 2, 0, 3, 3, 4, 3, 2, 3, 3, 3, 2, 2, 2, 2, 3, 3, 2, 1, 0, 1, 3, 0, 4, 3, 0, 1, 3, 3, 3, 1, 1, 3, 3, 2, 3, 

val_output: [3, 2, 4, 2, 3, 4, 0, 4, 3, 3, 2, 0, 3, 0, 2, 0, 1, 1, 4, 3, 3, 2, 3, 4, 3, 4, 4, 4, 3, 1, 4, 4, 3, 4, 1, 4, 2, 4, 3, 1, 0, 2, 3, 3, 2, 3, 1, 3, 4, 3]
 val_label: tensor([3, 2, 0, 3, 3, 4, 0, 4, 3, 0, 2, 0, 3, 0, 3, 0, 1, 1, 3, 3, 1, 2, 3, 3,
        2, 4, 3, 3, 3, 1, 4, 2, 3, 0, 1, 3, 4, 3, 3, 1, 0, 4, 3, 1, 2, 4, 1, 4,
        4, 3], device='cuda:0')
10.0
tensor(0.9237, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 4, 2, 1, 1, 3, 1, 1, 0, 1, 4, 4, 1, 4, 4, 1, 3, 3, 1, 3, 4, 3, 2, 3, 1, 3, 1, 2, 1, 2, 4, 1, 4, 2, 4, 3, 2, 4, 1, 2, 4, 2, 1, 3, 1, 3, 3, 2, 3]
 val_label: tensor([2, 2, 4, 2, 0, 3, 3, 1, 1, 0, 1, 3, 4, 3, 4, 0, 1, 1, 3, 1, 2, 3, 3, 2,
        2, 1, 0, 1, 2, 0, 2, 4, 1, 4, 4, 4, 3, 3, 4, 0, 2, 0, 2, 1, 1, 1, 3, 3,
        2, 1], device='cuda:0')
10.0
tensor(1.0167, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 4, 3, 4, 3, 2, 2, 2, 3, 3, 4, 1, 3, 4, 4, 4, 2, 2, 3, 3, 4, 2, 1, 3, 4, 0, 4, 4, 2, 4, 2, 4, 4, 4, 2, 0, 1, 2, 4, 4, 

val_output: [4, 3, 3, 3, 2, 4, 2, 2, 2, 1, 4, 4, 1, 1, 2, 2, 3, 4, 1, 1, 2, 0, 3, 3, 3, 2, 0, 4, 1, 4, 2, 0, 4, 4, 3, 3, 3, 3, 3, 3, 2, 3, 1, 2, 3, 3, 1, 4, 3, 1]
 val_label: tensor([4, 2, 2, 2, 1, 3, 2, 3, 1, 1, 4, 4, 1, 1, 4, 2, 1, 3, 1, 1, 3, 0, 3, 4,
        2, 1, 0, 4, 1, 3, 0, 3, 4, 4, 3, 3, 3, 3, 4, 4, 2, 3, 1, 4, 1, 3, 1, 3,
        1, 1], device='cuda:0')
10.0
tensor(1.0834, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 3, 4, 2, 2, 2, 1, 4, 1, 3, 3, 4, 1, 4, 3, 4, 3, 1, 3, 4, 4, 1, 1, 2, 2, 2, 4, 1, 3, 3, 4, 3, 2, 1, 4, 4, 3, 3, 3, 2, 3, 2, 3, 0, 3, 3, 4, 1]
 val_label: tensor([2, 3, 3, 3, 4, 2, 2, 1, 1, 4, 0, 3, 2, 4, 1, 0, 1, 4, 3, 1, 1, 1, 4, 1,
        1, 3, 2, 2, 4, 1, 3, 4, 3, 3, 4, 1, 4, 2, 2, 1, 4, 1, 3, 2, 1, 1, 3, 1,
        3, 1], device='cuda:0')
10.0
tensor(1.2783, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 4, 3, 2, 4, 2, 3, 4, 0, 3, 3, 3, 2, 2, 1, 2, 2, 4, 3, 2, 3, 0, 2, 4, 2, 1, 1, 3, 4, 4, 3, 2, 4, 2, 0, 3, 3, 2, 2, 0, 

val_output: [3, 0, 3, 1, 3, 3, 2, 4, 1, 1, 3, 2, 3, 4, 4, 1, 3, 2, 0, 4, 2, 3, 3, 0, 2, 4, 3, 4, 2, 3, 2, 1, 3, 0, 4, 2, 1, 3, 3, 1, 1, 1, 4, 4, 4, 3, 1, 2, 4, 3]
 val_label: tensor([4, 0, 1, 0, 4, 4, 4, 4, 0, 0, 3, 3, 3, 2, 4, 0, 3, 2, 0, 4, 2, 3, 4, 0,
        3, 4, 1, 0, 2, 3, 1, 1, 3, 0, 4, 1, 1, 4, 2, 1, 0, 4, 4, 3, 4, 1, 1, 2,
        4, 3], device='cuda:0')
10.0
tensor(1.1360, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 4, 3, 4, 2, 2, 3, 3, 0, 2, 3, 2, 4, 1, 2, 3, 3, 2, 4, 4, 3, 3, 3, 4, 3, 3, 1, 4, 4, 3, 3, 3, 1, 3, 3, 0, 4, 4, 4, 4, 3, 4, 3, 2, 3, 0, 1, 2, 3]
 val_label: tensor([0, 3, 3, 2, 2, 2, 2, 2, 3, 0, 2, 4, 0, 2, 1, 2, 3, 3, 2, 4, 4, 3, 4, 2,
        3, 2, 1, 3, 2, 3, 3, 3, 3, 0, 3, 2, 2, 2, 4, 3, 4, 3, 3, 2, 1, 1, 2, 1,
        3, 2], device='cuda:0')
10.0
tensor(1.2290, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 0, 1, 3, 3, 3, 1, 3, 4, 3, 2, 3, 4, 2, 3, 3, 2, 2, 3, 3, 4, 1, 3, 3, 4, 4, 2, 3, 3, 1, 1, 4, 3, 1, 4, 2, 2, 3, 3, 4, 

val_output: [3, 3, 4, 3, 1, 1, 4, 3, 3, 1, 3, 1, 3, 3, 3, 4, 3, 2, 1, 2, 3, 3, 4, 3, 1, 2, 2, 4, 1, 4, 4, 3, 2, 1, 1, 0, 3, 3, 4, 4, 3, 4, 3, 3, 3, 2, 4, 3, 2, 3]
 val_label: tensor([3, 2, 4, 3, 3, 1, 4, 2, 3, 0, 4, 0, 3, 2, 2, 4, 3, 3, 1, 2, 2, 3, 4, 1,
        1, 2, 2, 2, 1, 4, 4, 3, 2, 1, 2, 3, 3, 3, 4, 2, 3, 4, 3, 2, 3, 2, 4, 4,
        0, 3], device='cuda:0')
10.0
tensor(1.0552, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 1, 3, 3, 0, 3, 1, 2, 3, 4, 0, 1, 2, 4, 3, 2, 0, 2, 2, 2, 4, 2, 2, 3, 2, 2, 1, 3, 3, 3, 3, 2, 1, 2, 3, 1, 1, 3, 1, 2, 3, 1, 1, 0, 4, 4, 2, 2, 2]
 val_label: tensor([3, 3, 3, 4, 3, 0, 2, 1, 1, 3, 3, 0, 1, 4, 4, 1, 1, 0, 2, 3, 3, 4, 3, 2,
        3, 2, 4, 0, 4, 1, 3, 3, 3, 1, 3, 3, 1, 0, 2, 1, 2, 3, 0, 3, 0, 4, 3, 4,
        3, 2], device='cuda:0')
10.0
tensor(1.0449, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 4, 1, 3, 4, 0, 0, 3, 2, 2, 3, 4, 4, 4, 2, 1, 2, 4, 3, 4, 4, 2, 1, 2, 3, 0, 2, 2, 1, 1, 3, 2, 1, 1, 4, 3, 3, 2, 3, 1, 

val_output: [2, 4, 2, 3, 4, 1, 4, 0, 3, 3, 3, 3, 1, 1, 4, 1, 4, 3, 4, 3, 3, 1, 4, 4, 4, 1, 2, 2, 3, 0, 4, 3, 3, 4, 2, 2, 3, 4, 3, 4, 2, 3, 3, 1, 2, 4, 2, 4, 4, 3]
 val_label: tensor([1, 1, 2, 3, 3, 0, 3, 0, 3, 2, 3, 2, 1, 3, 4, 1, 4, 2, 1, 3, 3, 2, 4, 4,
        3, 1, 2, 0, 0, 0, 4, 2, 3, 3, 0, 4, 2, 0, 2, 4, 3, 0, 3, 1, 1, 4, 2, 4,
        4, 3], device='cuda:0')
10.0
tensor(1.1971, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 4, 2, 4, 1, 0, 4, 0, 3, 3, 3, 1, 0, 2, 2, 1, 3, 4, 2, 3, 1, 3, 3, 3, 4, 2, 4, 4, 1, 1, 2, 1, 3, 2, 2, 4, 3, 1, 4, 2, 2, 2, 4, 2, 2, 3, 3, 3, 2]
 val_label: tensor([4, 4, 4, 2, 4, 1, 3, 3, 0, 3, 3, 3, 1, 0, 4, 2, 1, 3, 4, 2, 3, 0, 0, 3,
        4, 4, 3, 3, 4, 1, 0, 2, 0, 3, 2, 2, 4, 3, 1, 4, 3, 3, 2, 4, 2, 1, 1, 3,
        4, 2], device='cuda:0')
10.0
tensor(0.8188, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 3, 3, 3, 4, 2, 4, 2, 0, 2, 3, 2, 3, 1, 3, 3, 4, 0, 0, 3, 2, 3, 3, 3, 4, 3, 2, 3, 4, 3, 3, 4, 0, 1, 4, 4, 3, 1, 4, 4, 

val_output: [2, 4, 2, 4, 1, 4, 2, 3, 2, 4, 3, 2, 4, 0, 2, 0, 2, 2, 2, 4, 3, 3, 2, 4, 3, 2, 2, 3, 3, 3, 2, 0, 4, 3, 2, 3, 2, 2, 3, 2, 4, 3, 4, 3, 3, 3, 1, 4, 2, 0]
 val_label: tensor([2, 4, 1, 4, 2, 3, 0, 2, 2, 4, 1, 2, 4, 3, 2, 0, 2, 4, 2, 4, 2, 3, 2, 3,
        4, 3, 0, 3, 3, 3, 2, 0, 3, 3, 2, 2, 4, 4, 3, 3, 3, 3, 4, 4, 3, 0, 1, 3,
        3, 0], device='cuda:0')
10.0
tensor(1.0486, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 4, 4, 4, 1, 2, 1, 3, 3, 1, 3, 3, 3, 1, 3, 0, 1, 0, 1, 1, 4, 3, 3, 1, 2, 1, 2, 4, 4, 0, 3, 3, 4, 0, 2, 1, 2, 1, 1, 2, 3, 2, 1, 3, 3, 4, 3, 3, 4]
 val_label: tensor([4, 3, 3, 4, 3, 1, 0, 1, 3, 2, 1, 2, 3, 3, 1, 3, 0, 1, 3, 1, 4, 4, 1, 3,
        1, 3, 1, 3, 1, 3, 0, 0, 1, 4, 0, 0, 0, 3, 2, 1, 2, 3, 3, 1, 3, 3, 3, 3,
        1, 4], device='cuda:0')
10.0
tensor(1.0311, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 2, 4, 4, 1, 0, 0, 1, 4, 4, 3, 2, 0, 1, 2, 4, 4, 3, 2, 2, 3, 2, 3, 4, 3, 3, 3, 3, 2, 4, 3, 1, 2, 3, 3, 1, 3, 1, 4, 3, 3, 

val_output: [3, 3, 3, 2, 2, 2, 4, 3, 2, 0, 1, 2, 3, 4, 4, 3, 3, 2, 4, 3, 1, 1, 3, 1, 2, 3, 3, 3, 3, 2, 4, 3, 4, 3, 3, 1, 4, 4, 1, 2, 3, 3, 2, 3, 0, 3, 3, 3, 2, 4]
 val_label: tensor([3, 1, 1, 2, 0, 2, 1, 1, 2, 1, 1, 4, 3, 4, 0, 4, 3, 1, 4, 2, 1, 1, 4, 1,
        2, 1, 3, 0, 1, 2, 3, 3, 4, 3, 4, 1, 3, 1, 1, 2, 1, 1, 2, 2, 0, 2, 3, 2,
        2, 3], device='cuda:0')
10.0
tensor(1.3409, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 3, 2, 1, 2, 1, 1, 4, 3, 3, 0, 1, 2, 2, 2, 3, 2, 4, 4, 2, 3, 1, 0, 1, 1, 4, 0, 2, 3, 2, 2, 2, 3, 3, 3, 2, 2, 3, 2, 3, 4, 1, 0, 4, 2, 3, 4, 0, 3]
 val_label: tensor([2, 4, 3, 2, 1, 3, 1, 0, 4, 2, 1, 0, 4, 3, 2, 3, 3, 2, 4, 4, 3, 3, 1, 0,
        0, 1, 3, 0, 3, 0, 2, 2, 4, 2, 2, 4, 2, 3, 2, 1, 3, 4, 1, 0, 4, 2, 3, 3,
        1, 1], device='cuda:0')
10.0
tensor(1.0565, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 2, 3, 2, 3, 4, 3, 3, 0, 4, 2, 1, 3, 4, 2, 4, 3, 3, 3, 1, 4, 4, 2, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3, 1, 4, 4, 3, 3, 1, 

val_output: [2, 3, 2, 2, 4, 2, 3, 4, 0, 2, 3, 4, 2, 3, 3, 2, 0, 3, 3, 3, 4, 4, 3, 2, 4, 2, 3, 0, 2, 2, 3, 4, 3, 3, 2, 4, 0, 3, 3, 4, 4, 4, 4, 4, 3, 3, 2, 1, 3, 2]
 val_label: tensor([2, 3, 2, 2, 4, 3, 1, 4, 0, 3, 3, 4, 2, 3, 2, 2, 0, 3, 3, 3, 4, 4, 1, 2,
        0, 2, 3, 0, 2, 2, 4, 4, 1, 3, 2, 4, 0, 1, 3, 4, 4, 4, 2, 3, 3, 3, 2, 1,
        3, 0], device='cuda:0')
10.0
tensor(0.8510, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 4, 4, 3, 0, 2, 4, 3, 4, 4, 0, 4, 4, 3, 4, 3, 0, 4, 2, 2, 1, 1, 3, 1, 3, 2, 2, 2, 3, 3, 4, 2, 2, 3, 4, 2, 3, 3, 4, 4, 3, 1, 4, 3, 4, 3, 4, 2, 3]
 val_label: tensor([4, 1, 1, 4, 1, 0, 2, 0, 1, 4, 4, 0, 0, 4, 3, 4, 0, 0, 4, 2, 2, 1, 1, 1,
        1, 3, 2, 3, 2, 3, 3, 4, 2, 2, 3, 4, 2, 2, 3, 1, 4, 3, 0, 3, 3, 4, 3, 3,
        2, 4], device='cuda:0')
10.0
tensor(0.9016, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 4, 1, 4, 4, 2, 3, 2, 2, 4, 4, 4, 4, 2, 0, 3, 1, 3, 2, 3, 2, 1, 4, 1, 2, 3, 3, 3, 3, 3, 3, 4, 2, 1, 3, 3, 2, 2, 3, 4, 

val_output: [2, 2, 3, 2, 4, 1, 0, 4, 3, 2, 3, 2, 2, 3, 3, 0, 3, 2, 1, 2, 2, 3, 4, 3, 3, 2, 4, 2, 2, 3, 3, 3, 2, 4, 2, 0, 2, 2, 3, 4, 1, 2, 0, 3, 3, 4, 2, 2, 4, 3]
 val_label: tensor([2, 0, 3, 1, 3, 1, 1, 1, 3, 3, 2, 2, 2, 4, 3, 0, 2, 3, 3, 2, 2, 4, 3, 3,
        1, 3, 1, 2, 3, 3, 3, 0, 4, 4, 4, 0, 2, 2, 1, 4, 1, 2, 0, 4, 3, 1, 2, 2,
        4, 4], device='cuda:0')
10.0
tensor(1.0840, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 4, 1, 3, 0, 3, 2, 4, 4, 4, 3, 2, 4, 3, 2, 1, 2, 2, 4, 3, 0, 1, 1, 0, 2, 3, 3, 4, 3, 4, 1, 0, 4, 1, 4, 2, 2, 3, 2, 2, 3, 1, 2, 3, 4, 3, 1, 3, 0]
 val_label: tensor([0, 0, 3, 0, 2, 3, 3, 0, 3, 2, 4, 3, 2, 2, 2, 2, 0, 2, 1, 0, 3, 0, 1, 3,
        0, 2, 3, 2, 0, 2, 4, 1, 0, 4, 1, 4, 3, 2, 3, 3, 3, 3, 1, 3, 1, 4, 3, 0,
        4, 1], device='cuda:0')
10.0
tensor(1.3184, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 2, 3, 2, 4, 4, 4, 2, 4, 4, 2, 3, 4, 2, 3, 4, 3, 4, 2, 3, 4, 3, 3, 4, 1, 3, 4, 1, 3, 1, 4, 3, 3, 3, 3, 0, 2, 3, 4, 4, 

tensor(1.0806, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 0, 3, 2, 3, 4, 4, 4, 4, 4, 0, 3, 2, 3, 0, 3, 3, 3, 4, 4, 3, 4, 0, 2, 2, 4, 4, 2, 4, 4, 4, 3, 1, 4, 3, 3, 4, 3, 3, 3, 3, 1, 4, 2, 4, 0, 3, 1, 1, 4]
 val_label: tensor([1, 0, 4, 2, 2, 4, 4, 4, 3, 4, 0, 3, 3, 3, 0, 4, 3, 4, 4, 3, 2, 3, 2, 4,
        0, 4, 2, 3, 4, 4, 4, 3, 1, 3, 3, 3, 3, 4, 4, 0, 2, 1, 4, 3, 4, 0, 0, 1,
        2, 4], device='cuda:0')
10.0
tensor(1.0245, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 3, 2, 1, 4, 4, 3, 4, 4, 1, 2, 1, 2, 2, 3, 3, 4, 0, 2, 4, 3, 1, 4, 0, 3, 2, 2, 1, 3, 3, 3, 3, 3, 2, 3, 3, 1, 4, 3, 4, 4, 2, 2, 0, 3, 3, 2, 3, 2]
 val_label: tensor([3, 3, 0, 3, 1, 0, 1, 4, 3, 4, 0, 0, 1, 2, 2, 2, 3, 4, 0, 4, 3, 2, 1, 3,
        0, 4, 2, 3, 1, 3, 1, 1, 3, 3, 3, 3, 0, 0, 4, 4, 1, 4, 2, 3, 1, 2, 1, 3,
        3, 2], device='cuda:0')
10.0
tensor(1.3292, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 3, 0, 3, 4, 4, 1, 4, 3, 2, 3, 2, 0, 3, 1, 4, 4, 4, 1, 

val_output: [1, 3, 3, 4, 1, 0, 2, 1, 3, 2, 3, 0, 2, 3, 1, 2, 4, 2, 4, 2, 1, 1, 4, 1, 3, 0, 2, 0, 4, 2, 3, 3, 3, 4, 4, 1, 4, 3, 2, 2, 4, 2, 3, 4, 3, 4, 0, 1, 4, 2]
 val_label: tensor([1, 4, 3, 2, 1, 0, 2, 2, 1, 0, 3, 2, 2, 1, 1, 2, 0, 3, 3, 2, 1, 0, 4, 1,
        4, 0, 2, 0, 0, 2, 2, 3, 3, 3, 3, 1, 4, 3, 2, 4, 4, 2, 3, 4, 3, 1, 4, 1,
        2, 4], device='cuda:0')
10.0
tensor(1.2489, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 4, 4, 2, 3, 2, 3, 3, 4, 3, 3, 2, 1, 2, 2, 1, 3, 2, 2, 0, 4, 3, 3, 2, 2, 2, 3, 3, 2, 3, 1, 3, 3, 4, 2, 3, 3, 1, 4, 4, 4, 3, 3, 1, 1, 1, 2, 3, 1]
 val_label: tensor([3, 3, 4, 4, 2, 3, 0, 1, 3, 3, 3, 4, 3, 0, 2, 2, 3, 4, 2, 2, 0, 4, 2, 3,
        2, 3, 3, 3, 2, 2, 3, 1, 3, 2, 2, 2, 3, 3, 0, 4, 0, 4, 4, 0, 0, 2, 1, 2,
        1, 1], device='cuda:0')
10.0
tensor(0.9883, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 1, 3, 3, 2, 3, 3, 0, 3, 2, 1, 0, 3, 4, 3, 4, 4, 1, 1, 2, 3, 0, 2, 2, 4, 3, 2, 3, 3, 3, 4, 4, 3, 4, 3, 2, 3, 2, 4, 3, 2, 

val_output: [1, 1, 4, 0, 1, 3, 4, 1, 4, 4, 3, 1, 1, 1, 1, 1, 4, 3, 4, 2, 2, 0, 2, 4, 3, 4, 4, 0, 1, 1, 2, 4, 2, 4, 4, 3, 3, 4, 3, 3, 4, 1, 3, 2, 2, 2, 4, 2, 3, 3]
 val_label: tensor([0, 1, 4, 0, 1, 3, 3, 1, 4, 4, 1, 1, 3, 2, 3, 2, 4, 3, 2, 1, 1, 0, 2, 4,
        3, 3, 4, 0, 3, 4, 4, 4, 2, 1, 1, 3, 3, 4, 2, 3, 4, 1, 3, 2, 3, 3, 3, 2,
        3, 4], device='cuda:0')
10.0
tensor(1.0866, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 3, 3, 1, 4, 3, 2, 1, 3, 3, 4, 4, 1, 2, 3, 3, 2, 3, 2, 0, 3, 3, 4, 1, 2, 2, 3, 1, 3, 3, 2, 3, 3, 0, 3, 4, 3, 4, 2, 4, 2, 3, 3, 4, 0, 4, 4, 3, 4]
 val_label: tensor([3, 3, 2, 3, 1, 3, 3, 2, 1, 3, 3, 0, 4, 1, 2, 4, 2, 2, 3, 2, 0, 3, 3, 4,
        1, 2, 2, 1, 1, 3, 1, 2, 2, 2, 3, 3, 4, 3, 4, 2, 0, 2, 4, 2, 4, 0, 3, 0,
        1, 3], device='cuda:0')
10.0
tensor(0.8607, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 3, 4, 2, 2, 3, 3, 3, 3, 3, 3, 4, 3, 4, 2, 4, 1, 1, 3, 4, 4, 3, 1, 1, 1, 2, 4, 3, 4, 4, 3, 3, 2, 1, 0, 4, 1, 0, 4, 3, 

val_output: [3, 0, 0, 3, 3, 3, 2, 3, 4, 4, 2, 4, 2, 2, 2, 2, 3, 4, 1, 4, 4, 4, 3, 3, 4, 2, 2, 1, 1, 3, 4, 4, 4, 1, 3, 1, 3, 1, 1, 2, 1, 3, 3, 4, 0, 4, 0, 3, 1, 2]
 val_label: tensor([3, 0, 0, 3, 2, 2, 2, 1, 4, 4, 2, 4, 2, 2, 2, 2, 3, 1, 1, 4, 0, 4, 2, 3,
        0, 2, 3, 0, 2, 1, 2, 1, 4, 1, 3, 1, 2, 1, 1, 4, 1, 0, 3, 4, 0, 4, 0, 3,
        1, 2], device='cuda:0')
10.0
tensor(0.9469, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 2, 1, 1, 3, 2, 3, 3, 4, 1, 4, 4, 1, 2, 4, 1, 3, 4, 3, 4, 2, 4, 1, 2, 4, 0, 4, 4, 3, 2, 0, 1, 4, 3, 2, 3, 1, 1, 4, 3, 3, 0, 0, 3, 1, 3, 2, 1, 3, 4]
 val_label: tensor([4, 2, 1, 1, 3, 0, 3, 3, 3, 2, 4, 4, 0, 2, 4, 4, 3, 2, 2, 3, 3, 3, 1, 3,
        4, 0, 4, 3, 2, 2, 0, 2, 2, 3, 2, 1, 0, 1, 3, 4, 3, 0, 0, 1, 2, 4, 3, 1,
        1, 2], device='cuda:0')
10.0
tensor(1.0466, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 2, 3, 1, 2, 3, 3, 3, 2, 3, 4, 0, 3, 2, 1, 3, 4, 4, 2, 4, 2, 3, 3, 1, 2, 0, 3, 0, 4, 1, 3, 3, 3, 3, 3, 4, 4, 4, 3, 1, 

val_output: [4, 4, 1, 2, 1, 3, 0, 2, 3, 4, 3, 2, 2, 1, 2, 4, 4, 2, 4, 3, 3, 0, 3, 3, 1, 3, 4, 3, 2, 3, 1, 1, 2, 2, 4, 4, 1, 1, 4, 4, 2, 3, 3, 4, 0, 3, 4, 2, 3, 1]
 val_label: tensor([4, 3, 0, 1, 1, 3, 0, 2, 0, 4, 3, 2, 2, 1, 3, 3, 4, 2, 4, 3, 3, 0, 4, 3,
        0, 3, 1, 3, 3, 0, 1, 1, 4, 2, 4, 4, 1, 4, 4, 3, 2, 0, 3, 3, 0, 3, 4, 2,
        3, 3], device='cuda:0')
10.0
tensor(0.9329, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 1, 3, 4, 2, 3, 3, 4, 3, 3, 3, 3, 1, 2, 2, 3, 4, 2, 0, 2, 1, 0, 2, 4, 3, 3, 2, 4, 2, 3, 3, 2, 3, 2, 2, 1, 1, 2, 3, 1, 1, 1, 4, 4, 4, 4, 2, 2, 2]
 val_label: tensor([1, 3, 1, 2, 4, 1, 3, 3, 0, 3, 3, 2, 3, 0, 2, 0, 4, 4, 2, 0, 3, 1, 1, 0,
        4, 0, 2, 3, 4, 3, 1, 3, 2, 4, 2, 3, 2, 1, 0, 3, 1, 1, 1, 0, 2, 3, 1, 2,
        2, 2], device='cuda:0')
10.0
tensor(1.2561, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 2, 2, 3, 4, 3, 3, 2, 3, 4, 2, 1, 1, 3, 2, 4, 1, 2, 3, 2, 3, 0, 3, 2, 4, 3, 1, 4, 3, 4, 2, 3, 0, 4, 3, 1, 3, 4, 3, 3, 3, 

val_output: [3, 2, 4, 3, 3, 3, 4, 4, 3, 2, 3, 3, 3, 1, 4, 3, 4, 3, 3, 3, 4, 2, 4, 0, 4, 2, 1, 1, 4, 4, 4, 3, 0, 1, 3, 3, 0, 3, 1, 1, 2, 2, 3, 2, 3, 4, 3, 3, 3, 0]
 val_label: tensor([3, 0, 4, 3, 2, 1, 2, 4, 4, 1, 3, 4, 4, 1, 3, 2, 1, 3, 2, 1, 4, 0, 4, 1,
        4, 2, 1, 1, 2, 0, 4, 3, 0, 1, 1, 2, 3, 3, 1, 1, 2, 2, 3, 2, 4, 1, 1, 4,
        2, 0], device='cuda:0')
10.0
tensor(1.1813, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 1, 2, 2, 3, 3, 3, 0, 4, 4, 3, 3, 3, 0, 1, 2, 2, 3, 0, 2, 2, 2, 4, 4, 0, 0, 3, 1, 0, 4, 3, 3, 4, 3, 4, 1, 1, 2, 3, 4, 3, 3, 4, 3, 3, 3, 4, 4, 0]
 val_label: tensor([3, 1, 1, 2, 2, 3, 0, 3, 3, 2, 3, 3, 1, 0, 0, 1, 0, 3, 3, 0, 4, 1, 2, 4,
        2, 2, 0, 3, 1, 0, 4, 1, 3, 4, 3, 4, 1, 3, 1, 3, 0, 3, 3, 4, 2, 4, 0, 3,
        4, 0], device='cuda:0')
10.0
tensor(1.0747, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 4, 3, 4, 4, 3, 0, 2, 4, 2, 3, 3, 1, 3, 3, 4, 3, 3, 2, 2, 4, 4, 1, 3, 2, 4, 3, 4, 2, 3, 2, 3, 3, 0, 4, 4, 2, 3, 3, 4, 2, 

val_output: [1, 2, 4, 2, 4, 1, 0, 1, 3, 4, 4, 3, 2, 4, 3, 2, 4, 1, 1, 2, 4, 3, 2, 3, 1, 3, 1, 3, 3, 4, 4, 4, 4, 4, 3, 2, 1, 2, 4, 2, 1, 4, 1, 4, 3, 0, 4, 3, 3, 3]
 val_label: tensor([1, 3, 3, 1, 4, 1, 0, 3, 2, 4, 3, 4, 4, 0, 3, 2, 2, 1, 1, 3, 4, 3, 3, 3,
        3, 3, 3, 3, 4, 1, 3, 4, 4, 4, 1, 3, 0, 2, 1, 2, 1, 4, 0, 0, 2, 3, 0, 1,
        3, 3], device='cuda:0')
10.0
tensor(1.1149, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 3, 2, 2, 3, 3, 3, 4, 3, 3, 3, 3, 3, 1, 1, 0, 4, 2, 1, 0, 2, 2, 1, 3, 3, 4, 2, 0, 4, 2, 2, 4, 4, 0, 4, 1, 3, 2, 3, 3, 1, 3, 0, 4, 2, 2, 2, 3, 3, 2]
 val_label: tensor([0, 3, 2, 2, 4, 4, 1, 4, 3, 1, 3, 3, 3, 0, 1, 0, 4, 3, 1, 0, 2, 4, 3, 3,
        3, 4, 3, 0, 3, 2, 0, 4, 4, 0, 4, 0, 1, 1, 4, 2, 1, 2, 0, 4, 3, 2, 2, 3,
        4, 4], device='cuda:0')
10.0
tensor(1.0068, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 4, 1, 4, 2, 3, 2, 2, 3, 3, 1, 3, 3, 1, 2, 4, 3, 4, 1, 3, 3, 2, 1, 4, 1, 3, 2, 2, 2, 3, 3, 1, 3, 3, 4, 3, 4, 4, 3, 2, 1, 

val_output: [3, 0, 3, 3, 1, 4, 3, 3, 2, 2, 3, 1, 3, 2, 4, 1, 1, 3, 4, 4, 2, 3, 1, 3, 4, 3, 4, 4, 3, 2, 4, 1, 2, 4, 4, 3, 3, 3, 4, 4, 4, 0, 1, 4, 3, 3, 1, 0, 2, 3]
 val_label: tensor([1, 0, 3, 3, 0, 3, 3, 3, 2, 4, 3, 1, 3, 2, 3, 3, 0, 3, 4, 1, 2, 4, 1, 3,
        3, 2, 4, 4, 4, 2, 4, 1, 2, 4, 4, 2, 3, 3, 4, 4, 4, 0, 0, 3, 2, 3, 3, 0,
        2, 1], device='cuda:0')
10.0
tensor(0.9347, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 3, 0, 4, 2, 3, 3, 1, 3, 2, 2, 1, 2, 2, 3, 4, 2, 4, 1, 4, 3, 4, 4, 2, 3, 4, 4, 3, 3, 4, 4, 2, 2, 2, 3, 2, 2, 0, 3, 4, 0, 0, 2, 3, 4, 3, 4, 4, 1, 3]
 val_label: tensor([1, 4, 0, 2, 2, 3, 3, 1, 3, 4, 2, 3, 2, 2, 2, 4, 3, 4, 1, 4, 0, 2, 4, 2,
        1, 4, 4, 0, 3, 4, 4, 2, 2, 2, 2, 2, 2, 0, 3, 4, 0, 0, 2, 3, 3, 4, 3, 3,
        1, 2], device='cuda:0')
10.0
tensor(0.8743, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 4, 2, 2, 3, 3, 4, 3, 4, 3, 3, 2, 3, 1, 2, 1, 3, 3, 3, 4, 1, 3, 3, 0, 4, 4, 2, 1, 4, 3, 0, 2, 0, 0, 2, 4, 3, 4, 4, 3, 

val_output: [1, 3, 1, 3, 3, 3, 3, 2, 3, 4, 3, 3, 3, 2, 2, 2, 4, 3, 2, 4, 1, 0, 3, 1, 1, 3, 2, 3, 2, 3, 3, 1, 3, 3, 3, 3, 2, 2, 3, 2, 4, 4, 4, 3, 3, 4, 3, 3, 3, 3]
 val_label: tensor([3, 4, 1, 0, 2, 3, 3, 2, 4, 0, 4, 3, 3, 2, 2, 2, 4, 2, 3, 4, 1, 1, 1, 1,
        3, 3, 2, 2, 2, 3, 3, 1, 3, 3, 2, 3, 2, 2, 3, 2, 3, 4, 3, 3, 3, 2, 2, 3,
        3, 4], device='cuda:0')
10.0
tensor(0.8971, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 2, 2, 2, 3, 0, 4, 1, 1, 3, 3, 3, 4, 4, 3, 4, 4, 3, 4, 4, 3, 2, 3, 0, 4, 2, 3, 2, 3, 3, 1, 3, 4, 1, 2, 3, 4, 3, 4, 3, 3, 1, 2, 1, 3, 2, 1, 4, 3]
 val_label: tensor([2, 3, 2, 2, 2, 0, 0, 1, 1, 4, 2, 3, 1, 2, 4, 1, 4, 3, 3, 4, 4, 2, 1, 3,
        0, 4, 3, 4, 2, 3, 3, 1, 0, 0, 0, 2, 1, 3, 1, 4, 1, 1, 1, 2, 1, 4, 4, 1,
        4, 1], device='cuda:0')
10.0
tensor(1.1458, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 1, 1, 4, 3, 4, 3, 4, 1, 3, 4, 3, 4, 2, 1, 3, 0, 3, 3, 3, 4, 3, 0, 4, 4, 4, 1, 3, 0, 3, 4, 0, 4, 4, 2, 4, 2, 1, 4, 1, 

val_output: [3, 4, 1, 2, 1, 3, 2, 3, 3, 3, 1, 4, 3, 1, 3, 3, 4, 4, 2, 1, 4, 2, 1, 3, 0, 1, 2, 1, 2, 3, 1, 4, 2, 4, 3, 3, 4, 1, 3, 2, 3, 3, 4, 0, 2, 4, 3, 1, 1, 3]
 val_label: tensor([1, 4, 1, 2, 0, 1, 2, 2, 3, 3, 1, 4, 3, 1, 4, 3, 2, 3, 3, 1, 0, 2, 1, 4,
        0, 3, 2, 0, 3, 0, 0, 4, 3, 4, 4, 1, 0, 3, 4, 2, 3, 1, 4, 0, 0, 4, 3, 3,
        0, 1], device='cuda:0')
10.0
tensor(1.0156, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 1, 3, 3, 4, 1, 1, 1, 0, 4, 3, 1, 4, 4, 3, 3, 1, 4, 2, 2, 2, 4, 2, 4, 0, 2, 4, 3, 1, 2, 2, 2, 3, 2, 4, 3, 1, 3, 2, 3, 4, 1, 2, 4, 3, 4, 3, 4, 1, 4]
 val_label: tensor([2, 1, 1, 3, 0, 2, 3, 1, 0, 4, 1, 1, 0, 4, 3, 3, 1, 4, 0, 2, 1, 3, 2, 2,
        0, 3, 3, 3, 1, 3, 2, 1, 4, 2, 4, 4, 0, 3, 2, 3, 4, 1, 1, 3, 4, 4, 3, 4,
        1, 4], device='cuda:0')
10.0
tensor(0.9563, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 1, 3, 1, 2, 3, 1, 2, 2, 4, 4, 3, 3, 2, 3, 4, 2, 1, 2, 3, 1, 4, 4, 2, 4, 4, 3, 4, 4, 0, 2, 2, 0, 3, 3, 3, 4, 4, 3, 3, 1, 

val_output: [0, 3, 3, 2, 3, 3, 1, 2, 3, 1, 2, 1, 4, 4, 2, 4, 4, 3, 4, 0, 1, 2, 4, 3, 2, 3, 3, 2, 4, 1, 4, 3, 3, 4, 1, 1, 3, 0, 4, 0, 3, 2, 4, 3, 3, 3, 1, 2, 2, 3]
 val_label: tensor([0, 2, 0, 2, 1, 1, 1, 1, 2, 1, 2, 1, 4, 4, 2, 0, 4, 3, 4, 0, 1, 4, 4, 3,
        2, 3, 1, 1, 4, 0, 4, 0, 3, 3, 1, 1, 2, 0, 4, 2, 4, 3, 4, 3, 4, 3, 0, 4,
        2, 4], device='cuda:0')
10.0
tensor(1.0983, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 4, 3, 4, 3, 3, 3, 3, 2, 3, 3, 2, 3, 4, 4, 4, 1, 2, 1, 2, 3, 4, 3, 1, 3, 1, 3, 4, 3, 4, 4, 4, 1, 1, 4, 4, 0, 0, 2, 1, 4, 2, 3, 1, 2, 3, 1, 2]
 val_label: tensor([3, 4, 3, 4, 1, 4, 3, 4, 3, 0, 3, 2, 3, 2, 3, 4, 3, 4, 1, 1, 3, 3, 3, 4,
        1, 1, 4, 1, 4, 4, 3, 4, 4, 3, 1, 1, 4, 4, 1, 1, 3, 3, 3, 2, 2, 1, 0, 4,
        1, 2], device='cuda:0')
10.0
tensor(0.8820, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 1, 4, 1, 2, 4, 1, 1, 3, 3, 3, 1, 2, 4, 2, 2, 0, 3, 3, 4, 4, 3, 3, 0, 2, 3, 2, 3, 3, 3, 4, 2, 3, 2, 3, 3, 3, 3, 4, 4, 

val_output: [4, 3, 3, 2, 4, 2, 2, 3, 1, 3, 3, 2, 4, 4, 2, 1, 1, 2, 3, 3, 3, 3, 3, 3, 4, 2, 0, 4, 4, 3, 2, 0, 3, 1, 4, 3, 0, 0, 4, 1, 3, 3, 1, 1, 4, 1, 3, 4, 2, 1]
 val_label: tensor([4, 1, 1, 2, 0, 3, 3, 3, 1, 3, 3, 4, 4, 1, 2, 1, 0, 2, 3, 1, 1, 4, 1, 3,
        4, 2, 0, 4, 4, 2, 2, 3, 3, 1, 1, 4, 0, 0, 4, 1, 3, 3, 1, 3, 4, 1, 1, 4,
        2, 1], device='cuda:0')
10.0
tensor(1.0071, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 4, 4, 4, 2, 3, 4, 1, 2, 3, 2, 2, 3, 2, 1, 3, 0, 0, 0, 3, 4, 3, 3, 4, 3, 4, 4, 3, 3, 3, 2, 2, 1, 3, 3, 2, 2, 3, 4, 4, 2, 3, 0, 2, 4, 4, 3, 3, 1]
 val_label: tensor([2, 3, 4, 4, 2, 2, 0, 4, 1, 2, 3, 3, 0, 3, 2, 3, 3, 4, 0, 0, 3, 3, 3, 2,
        4, 3, 4, 4, 3, 1, 3, 2, 2, 1, 4, 0, 4, 1, 2, 0, 4, 3, 3, 3, 3, 4, 0, 3,
        4, 1], device='cuda:0')
10.0
tensor(1.0368, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 3, 4, 4, 1, 0, 4, 2, 4, 4, 3, 3, 2, 1, 4, 4, 1, 3, 4, 2, 2, 2, 2, 0, 3, 3, 0, 3, 3, 3, 3, 3, 3, 4, 3, 2, 2, 2, 4, 3, 

val_output: [3, 4, 4, 0, 3, 4, 2, 3, 3, 2, 3, 1, 3, 1, 2, 0, 1, 4, 3, 0, 1, 1, 1, 2, 2, 3, 0, 3, 3, 3, 4, 2, 3, 2, 0, 2, 2, 1, 2, 3, 3, 4, 4, 0, 3, 1, 2, 1, 2, 4]
 val_label: tensor([3, 4, 3, 0, 1, 0, 2, 3, 4, 1, 3, 0, 4, 0, 2, 0, 1, 1, 3, 3, 1, 1, 3, 2,
        2, 3, 3, 3, 3, 1, 4, 1, 1, 2, 3, 2, 3, 1, 4, 3, 1, 4, 4, 0, 1, 1, 2, 3,
        2, 1], device='cuda:0')
10.0
tensor(1.0361, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 0, 1, 2, 1, 4, 3, 1, 4, 1, 3, 3, 4, 3, 3, 3, 1, 2, 3, 1, 2, 4, 4, 3, 1, 4, 2, 0, 3, 4, 3, 3, 4, 3, 3, 4, 4, 1, 2, 3, 2, 3, 3, 3, 3, 2, 3, 2, 1, 3]
 val_label: tensor([3, 0, 1, 0, 1, 2, 3, 3, 4, 1, 3, 3, 1, 1, 3, 1, 1, 2, 2, 1, 2, 3, 3, 0,
        1, 3, 2, 0, 3, 0, 3, 3, 3, 3, 0, 4, 0, 1, 2, 3, 2, 3, 3, 4, 2, 3, 3, 2,
        1, 3], device='cuda:0')
10.0
tensor(1.1407, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 3, 3, 3, 1, 2, 3, 3, 1, 1, 1, 3, 2, 4, 1, 3, 4, 2, 3, 0, 3, 2, 4, 4, 2, 3, 1, 2, 3, 3, 0, 2, 4, 1, 1, 2, 3, 2, 4, 2, 3, 

val_output: [4, 4, 3, 4, 2, 3, 1, 3, 4, 0, 0, 2, 2, 2, 4, 0, 2, 3, 3, 3, 2, 3, 2, 4, 4, 1, 1, 4, 1, 1, 4, 0, 0, 3, 3, 3, 3, 2, 4, 3, 1, 1, 2, 3, 1, 2, 1, 0, 4, 2]
 val_label: tensor([4, 4, 3, 3, 2, 2, 0, 1, 4, 0, 0, 2, 2, 2, 0, 0, 2, 3, 0, 3, 4, 3, 2, 3,
        4, 3, 1, 4, 1, 1, 4, 0, 0, 3, 4, 3, 0, 3, 3, 3, 2, 0, 2, 3, 0, 2, 2, 0,
        3, 2], device='cuda:0')
10.0
tensor(1.0146, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 3, 1, 4, 2, 4, 3, 2, 1, 0, 2, 4, 0, 3, 3, 3, 3, 3, 3, 4, 3, 3, 4, 4, 2, 2, 3, 3, 1, 2, 3, 2, 1, 3, 0, 2, 4, 3, 2, 1, 3, 2, 4, 3, 2, 1, 3, 2, 4, 2]
 val_label: tensor([4, 3, 1, 4, 2, 4, 4, 2, 3, 1, 1, 4, 0, 4, 3, 3, 3, 3, 2, 4, 3, 0, 0, 4,
        4, 2, 3, 4, 1, 3, 3, 4, 1, 2, 0, 2, 4, 2, 2, 1, 3, 2, 2, 4, 2, 1, 4, 3,
        4, 3], device='cuda:0')
10.0
tensor(0.9505, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [2, 0, 1, 1, 2, 1, 4, 0, 3, 1, 3, 3, 2, 4, 2, 3, 1, 1, 2, 4, 1, 3, 3, 2, 2, 2, 4, 3, 4, 3, 3, 2, 4, 1, 1, 2, 3, 3, 4, 0, 3, 

val_output: [2, 3, 2, 4, 2, 4, 2, 2, 3, 3, 4, 2, 4, 3, 4, 1, 4, 0, 2, 2, 3, 4, 4, 1, 1, 3, 4, 3, 1, 4, 4, 4, 1, 4, 3, 4, 2, 2, 4, 2, 0, 3, 4, 3, 4, 3, 2, 1, 3, 2]
 val_label: tensor([2, 3, 1, 4, 2, 4, 2, 3, 1, 3, 4, 2, 4, 3, 4, 2, 4, 0, 3, 2, 3, 0, 4, 1,
        0, 3, 0, 3, 3, 1, 3, 4, 1, 4, 1, 4, 2, 4, 4, 4, 0, 3, 4, 3, 2, 3, 2, 1,
        3, 1], device='cuda:0')
10.0
tensor(0.8846, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 3, 3, 3, 2, 3, 3, 2, 2, 3, 3, 4, 3, 0, 3, 3, 4, 4, 0, 3, 4, 3, 0, 3, 0, 2, 1, 2, 3, 3, 2, 0, 0, 2, 0, 2, 4, 0, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 1]
 val_label: tensor([4, 4, 0, 2, 3, 2, 4, 3, 2, 3, 1, 2, 4, 3, 0, 3, 3, 3, 4, 0, 4, 4, 1, 0,
        4, 0, 3, 1, 2, 4, 4, 1, 0, 0, 3, 0, 3, 4, 0, 4, 1, 3, 3, 3, 0, 1, 3, 3,
        4, 0], device='cuda:0')
10.0
tensor(1.0238, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 3, 4, 3, 0, 1, 2, 2, 2, 1, 1, 0, 4, 2, 1, 4, 4, 3, 1, 4, 3, 1, 4, 4, 3, 3, 2, 3, 0, 3, 3, 4, 1, 3, 4, 4, 2, 1, 3, 

val_output: [2, 4, 3, 2, 4, 3, 4, 2, 2, 2, 3, 4, 3, 2, 1, 1, 3, 3, 4, 4, 3, 2, 2, 4, 3, 3, 1, 4, 0, 2, 1, 4, 4, 1, 4, 4, 1, 0, 3, 1, 2, 4, 0, 2, 2, 1, 4, 2, 4, 4]
 val_label: tensor([1, 0, 3, 1, 3, 3, 4, 4, 2, 2, 3, 3, 3, 4, 4, 1, 3, 1, 3, 4, 3, 2, 0, 4,
        3, 3, 3, 4, 4, 3, 1, 3, 4, 1, 0, 4, 3, 0, 3, 1, 2, 0, 0, 2, 2, 1, 4, 1,
        4, 0], device='cuda:0')
10.0
tensor(1.1297, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 0, 3, 4, 2, 3, 4, 0, 3, 1, 4, 3, 3, 2, 0, 3, 0, 2, 3, 3, 3, 2, 1, 4, 0, 2, 0, 1, 1, 4, 0, 0, 0, 4, 3, 4, 4, 4, 2, 4, 1, 4, 4, 0, 2, 4, 3, 3, 3, 3]
 val_label: tensor([3, 0, 3, 4, 2, 4, 4, 0, 3, 0, 4, 4, 0, 2, 0, 3, 1, 3, 1, 1, 1, 2, 1, 4,
        0, 3, 0, 1, 3, 4, 0, 0, 0, 1, 3, 4, 0, 4, 3, 4, 1, 4, 4, 0, 2, 4, 3, 1,
        0, 3], device='cuda:0')
10.0
tensor(0.9598, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 2, 4, 2, 1, 4, 3, 3, 3, 0, 4, 4, 3, 4, 3, 3, 3, 2, 3, 3, 2, 3, 3, 2, 1, 4, 1, 2, 4, 2, 4, 0, 4, 4, 3, 3, 2, 2, 3, 4, 2, 

val_output: [3, 2, 3, 3, 2, 2, 3, 2, 2, 2, 4, 3, 3, 3, 3, 3, 4, 2, 4, 2, 2, 3, 3, 4, 2, 2, 4, 3, 4, 3, 0, 2, 4, 2, 4, 3, 3, 2, 4, 3, 4, 3, 3, 4, 2, 3, 2, 3, 2, 2]
 val_label: tensor([3, 2, 0, 1, 2, 3, 3, 3, 2, 3, 4, 1, 3, 4, 3, 4, 4, 2, 4, 1, 2, 3, 4, 3,
        4, 3, 4, 4, 4, 3, 0, 0, 4, 3, 1, 3, 2, 3, 4, 3, 4, 1, 4, 2, 4, 2, 2, 1,
        2, 2], device='cuda:0')
8.0
tensor(1.0754, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 4, 3, 2, 3, 1, 0, 4, 1, 1, 3, 2, 3, 1, 3, 4, 2, 2, 4, 4, 4, 4, 4, 3, 4, 2, 1, 3, 4, 3, 1, 3, 0, 2, 0, 4, 0, 2, 2, 3, 3, 1, 2, 3, 4, 1, 0, 1, 3, 1]
 val_label: tensor([4, 4, 3, 2, 0, 3, 0, 0, 4, 0, 3, 2, 2, 0, 4, 3, 1, 2, 3, 4, 4, 3, 4, 1,
        4, 3, 0, 0, 4, 2, 1, 1, 0, 2, 2, 4, 0, 2, 2, 0, 4, 3, 2, 1, 4, 1, 0, 1,
        2, 1], device='cuda:0')
10.0
tensor(1.2352, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 2, 1, 2, 1, 2, 2, 4, 2, 2, 4, 2, 1, 0, 3, 1, 3, 4, 2, 4, 1, 3, 2, 1, 4, 3, 2, 3, 2, 3, 3, 3, 0, 3, 1, 3, 1, 4, 3, 0, 4

val_output: [0, 4, 2, 2, 2, 1, 0, 2, 3, 3, 1, 3, 0, 3, 4, 3, 3, 1, 2, 4, 3, 2, 4, 1, 2, 4, 1, 0, 4, 3, 0, 0, 2, 4, 3, 3, 1, 3, 1, 4, 4, 3, 3, 3, 0, 2, 3, 3, 1, 2]
 val_label: tensor([0, 4, 2, 1, 2, 1, 0, 2, 4, 3, 0, 3, 0, 3, 3, 4, 0, 3, 4, 4, 3, 2, 4, 3,
        2, 4, 1, 1, 4, 4, 1, 0, 2, 4, 4, 3, 1, 0, 1, 3, 4, 3, 1, 1, 0, 2, 3, 1,
        3, 3], device='cuda:0')
10.0
tensor(1.0116, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 1, 2, 1, 3, 1, 4, 1, 2, 4, 3, 3, 4, 1, 4, 4, 4, 4, 1, 3, 4, 2, 4, 4, 4, 2, 1, 3, 2, 2, 0, 0, 1, 3, 1, 3, 3, 3, 2, 1, 0, 3, 3, 1, 3, 2, 2, 2, 4, 2]
 val_label: tensor([1, 1, 0, 1, 2, 1, 3, 0, 2, 4, 4, 3, 4, 1, 4, 4, 4, 1, 1, 3, 4, 1, 3, 0,
        4, 2, 1, 2, 2, 4, 0, 0, 1, 2, 1, 3, 3, 3, 2, 3, 0, 1, 3, 0, 3, 2, 3, 3,
        4, 2], device='cuda:0')
10.0
tensor(1.0529, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [3, 4, 3, 4, 0, 3, 3, 3, 1, 1, 2, 2, 3, 4, 3, 3, 1, 4, 3, 2, 4, 3, 2, 4, 3, 4, 0, 3, 4, 2, 2, 3, 3, 4, 0, 1, 3, 2, 4, 3, 4, 

val_output: [2, 4, 3, 4, 3, 3, 0, 4, 3, 2, 3, 2, 3, 3, 0, 4, 3, 3, 1, 2, 3, 4, 2, 3, 4, 3, 2, 3, 4, 1, 2, 2, 3, 4, 3, 4, 2, 3, 3, 3, 2, 4, 1, 2, 2, 4, 4, 3, 2, 0]
 val_label: tensor([1, 4, 1, 4, 1, 3, 0, 2, 0, 2, 4, 3, 3, 3, 0, 3, 0, 3, 1, 2, 1, 4, 2, 1,
        3, 3, 2, 1, 2, 0, 3, 2, 3, 4, 3, 4, 2, 3, 3, 1, 2, 2, 3, 2, 3, 4, 4, 0,
        3, 0], device='cuda:0')
10.0
tensor(1.1597, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 3, 3, 3, 4, 1, 0, 2, 3, 3, 2, 4, 4, 1, 3, 3, 0, 2, 3, 1, 2, 4, 2, 2, 4, 2, 3, 3, 2, 4, 3, 3, 1, 4, 0, 4, 1, 2, 1, 4, 4, 2, 3, 4, 2, 2, 1, 4, 3, 4]
 val_label: tensor([0, 3, 2, 0, 4, 1, 0, 2, 3, 3, 0, 4, 1, 1, 3, 4, 0, 3, 3, 1, 2, 3, 2, 2,
        4, 3, 2, 2, 1, 3, 3, 3, 1, 4, 0, 4, 3, 1, 1, 3, 4, 2, 1, 4, 2, 2, 1, 4,
        2, 2], device='cuda:0')
10.0
tensor(0.8924, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [4, 1, 3, 4, 3, 2, 2, 3, 2, 0, 2, 3, 3, 1, 3, 3, 4, 4, 4, 3, 2, 1, 3, 3, 2, 2, 4, 4, 2, 2, 3, 3, 4, 3, 2, 4, 2, 2, 3, 2, 3, 

val_output: [2, 4, 1, 2, 1, 2, 4, 4, 1, 1, 1, 1, 3, 3, 0, 0, 2, 0, 4, 4, 3, 3, 4, 3, 1, 4, 3, 2, 3, 3, 3, 0, 3, 1, 1, 0, 3, 4, 2, 3, 4, 4, 3, 2, 3, 4, 2, 3, 0, 4]
 val_label: tensor([2, 4, 1, 2, 0, 2, 4, 0, 1, 1, 4, 1, 3, 0, 0, 0, 4, 3, 1, 4, 4, 3, 2, 3,
        1, 4, 2, 0, 3, 0, 3, 2, 3, 1, 1, 1, 3, 4, 2, 3, 3, 3, 0, 2, 3, 4, 4, 4,
        2, 3], device='cuda:0')
10.0
tensor(1.0645, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [0, 4, 1, 2, 3, 3, 3, 2, 3, 2, 3, 3, 3, 3, 4, 4, 3, 4, 0, 4, 1, 4, 2, 0, 4, 3, 4, 4, 4, 2, 1, 2, 1, 0, 3, 4, 0, 4, 3, 4, 4, 4, 3, 3, 2, 3, 1, 4, 0, 3]
 val_label: tensor([0, 4, 1, 4, 0, 3, 3, 2, 3, 2, 3, 1, 3, 4, 4, 4, 3, 4, 0, 4, 1, 4, 4, 0,
        4, 3, 1, 4, 3, 2, 2, 1, 0, 0, 0, 4, 0, 0, 4, 3, 4, 4, 2, 2, 3, 4, 0, 4,
        0, 3], device='cuda:0')
10.0
tensor(0.9257, device='cuda:0', grad_fn=<NllLossBackward0>)
val_output: [1, 3, 4, 4, 3, 2, 2, 4, 3, 3, 2, 1, 3, 4, 4, 4, 2, 3, 1, 4, 3, 4, 1, 1, 3, 2, 2, 2, 3, 3, 1, 4, 3, 3, 2, 3, 2, 3, 3, 0, 1, 

## Making Predictions
### Prediction 
Okay, now that you have a trained model, try it on some new twits and see if it works appropriately. Remember that for any new text, you'll need to preprocess it first before passing it to the network. Implement the `predict` function to generate the prediction vector from a message.

In [51]:
def predict(text, model, vocab):
    """ 
    Make a prediction on a single sentence.

    Parameters
    ----------
        text : The string to make a prediction on.
        model : The model to use for making the prediction.
        vocab : Dictionary for word to word ids. The key is the word and the value is the word id.

    Returns
    -------
        pred : Prediction vector
    """    
    
    # TODO Implement
    
    tokens = preprocess(text)
    # Filter non-vocab words
    tokens = [word for word in tokens if word in filtered_words]
    # Convert words to ids
    tokens = [[vocab[word] for word in tokens if word in vocab.keys()]]
        
    # Adding a batch dimension
    for text_input in inference_loader(tokens, 1, sequence_length, False):
        text_input.to(device)
    # Get the NN output
        hidden = model.init_hidden(1)
        for each in hidden:
            each.to(device)
        logps, _ = model.forward(text_input, hidden)
    # Take the exponent of the NN output to get a range of 0 to 1 for each label.
        pred = logps
    
        return pred

In [53]:
text = "Google is working on self driving cars, I'm bullish on $goog"
model.eval()
model.to("cpu")
torch.exp(predict(text, model, vocab))

tensor([[0.1156, 0.1770, 0.2216, 0.3068, 0.1790]], grad_fn=<ExpBackward0>)

### Questions: What is the prediction of the model? What is the uncertainty of the prediction?
The prediction is that this would be rates as a # 3 on a 0-4 scale. The model is about 50% sure that it is positive and 30% sure that it is negative.

Now we have a trained model and we can make predictions. We can use this model to track the sentiments of various stocks by predicting the sentiments of twits as they are coming in. Now we have a stream of twits. For each of those twits, pull out the stocks mentioned in them and keep track of the sentiments. Remember that in the twits, ticker symbols are encoded with a dollar sign as the first character, all caps, and 2-4 letters, like $AAPL. Ideally, you'd want to track the sentiments of the stocks in your universe and use this as a signal in your larger model(s).

## Testing
### Load the Data 

In [54]:
with open(os.path.join('..', '..', 'data', 'project_6_stocktwits', 'test_twits.json'), 'r') as f:
    test_data = json.load(f)

### Twit Stream

In [55]:
def twit_stream():
    for twit in test_data['data']:
        yield twit

next(twit_stream())

{'message_body': '$JWN has moved -1.69% on 10-31. Check out the movement and peers at  https://dividendbot.com?s=JWN',
 'timestamp': '2018-11-01T00:00:05Z'}

Using the `prediction` function, let's apply it to a stream of twits.

In [62]:
def score_twits(stream, model, vocab, universe):
    """ 
    Given a stream of twits and a universe of tickers, return sentiment scores for tickers in the universe.
    """
    for twit in stream:

        # Get the message text
        text = twit['message_body']
        symbols = re.findall('\$[A-Z]{2,4}', text)
        score = torch.exp(predict(text, model, vocab))

        for symbol in symbols:
            if symbol in universe:
                yield {'symbol': symbol, 'score': score, 'timestamp': twit['timestamp']}

In [63]:
universe = {'$BBRY', '$AAPL', '$AMZN', '$BABA', '$YHOO', '$LQMT', '$FB', '$GOOG', '$BBBY', '$JNUG', '$SBUX', '$MU'}
score_stream = score_twits(twit_stream(), model, vocab, universe)

next(score_stream)

{'symbol': '$AAPL',
 'score': tensor([[0.1081, 0.1647, 0.2310, 0.2948, 0.2015]], grad_fn=<ExpBackward0>),
 'timestamp': '2018-11-01T00:00:18Z'}

That's it. You have successfully built a model for sentiment analysis! 

## Submission
Now that you're done with the project, it's time to submit it. Click the submit button in the bottom right. One of our reviewers will give you feedback on your project with a pass or not passed grade. You can continue to the next section while you wait for feedback.