## Weighted K-Fold Resnet34 with fast.ai

- The first notebook of this experiment was a simple baseline using resnet34: [Fast Resnet34 with Fastai](https://www.kaggle.com/code/fmussari/fast-resnet34-with-fastai)  
- In this Notebook we are going to explore that same simlpe model applying K-folding cross validation to see what happens.
  


### Acknowledgements

**fastai course:**
- [Practical Deep Learning for Coders (a UQ collaboration with fast.ai)](https://itee.uq.edu.au/event/2022/practical-deep-learning-coders-uq-fastai)  

**Jeremy's Notebook Series:**
- [First Steps: Road to the Top, Part 1](https://www.kaggle.com/code/jhoward/first-steps-road-to-the-top-part-1)
- [Small models: Road to the Top, Part 2](https://www.kaggle.com/code/jhoward/small-models-road-to-the-top-part-2)
- [Scaling Up: Road to the Top, Part 3](https://www.kaggle.com/code/jhoward/scaling-up-road-to-the-top-part-3)
- [Multi-target: Road to the Top, Part 4](https://www.kaggle.com/code/jhoward/multi-target-road-to-the-top-part-4)

### K-Fold

- K-Fold is a technique in which we divide our data in K parts. In this case we are going to divide the datased in 5.
- We will do 5 trainings, each one with a different validation set (each time we take one fold for validation and the other four for training).
- In this way you end up with 5 models trained in slightly different data, but with completely different validations.
- We don't know wich of them is the best model, but we could assume that taking the mean of them would be the best for generalization.
- We are going to use [KFold from sklearn](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html).

### Installing the libraries


In [4]:
# fastkaggle allows you to work locally and then submit the results and notebook to Kaggle

try: import fastkaggle

except ModuleNotFoundError:
    !pip install -Uq fastkaggle

from fastkaggle import *
from sklearn.model_selection import KFold

In [6]:
competition = 'paddy-disease-classification'
path = setup_comp(competition, install='fastai')

from fastai.vision.all import *
from scipy.special import softmax, log_softmax
#set_seed(42)

### Setting data paths

In [3]:
# train images
train_path = path / 'train_images'
train_files = get_image_files(train_path)

# test images
test_path = path/'test_images'
test_files = get_image_files(test_path).sorted()

# sample submission
sample_submission = pd.read_csv(path/'sample_submission.csv')

# train labels
train_df = pd.read_csv(path / 'train.csv')


#### Lets see the target distribution

In [4]:
train_df.label.value_counts()

normal                      1764
blast                       1738
hispa                       1594
dead_heart                  1442
tungro                      1088
brown_spot                   965
downy_mildew                 620
bacterial_leaf_blight        479
bacterial_leaf_streak        380
bacterial_panicle_blight     337
Name: label, dtype: int64

### K-Fold
- If we apply sklearn KFold to all of or dataset we could end up with 5 folds that don't have the same distribution by targets as our full dataset.
- We could use [StratifiedKFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.StratifiedKFold.html#sklearn.model_selection.StratifiedKFold) instead to assure that each fold preserves the percentage of samples for each target class.
- Or. We can apply K-Fold for each of the target classes. If we divide each class in 5 random samples, we are sure that at the end we are going to have the same distribution.
- To keep track of which image belongs to what fold, we can create a field called `kfold` and initializa it with

In [6]:
train_df['kfold'] = -1

# Number of Splits
n_split=5

fold = -1
kfold = 1

# For each label we are going to create 5 folds
for i, label in enumerate(train_df.label.unique()):
    
    kf = KFold(
        n_splits=n_split, 
        random_state=42+i, 
        shuffle=True
    )
    filter = train_df.label == label
    # Indices for each label
    label_idxs = train_df[filter].index
    
    # Creating folds for those indexes
    kf.get_n_splits(label_idxs)
    
    fold *= -1
    if kfold==n_split+1: kfold=n_split
    if kfold==0: kfold=1

    for _, valid_index in kf.split(label_idxs):
    
        df_index = label_idxs[valid_index]
        train_df.loc[df_index, 'kfold'] = kfold
        kfold += fold

In [7]:
train_df.kfold.value_counts()

4    2083
3    2082
5    2081
2    2081
1    2080
Name: kfold, dtype: int64

In [9]:
train_df.sample(10)

Unnamed: 0,image_id,label,variety,age,kfold
192,107251.jpg,bacterial_leaf_blight,ADT45,60,2
3525,107270.jpg,brown_spot,ADT45,70,4
7183,105392.jpg,hispa,AtchayaPonni,65,3
5920,100725.jpg,downy_mildew,Zonal,55,2
193,108257.jpg,bacterial_leaf_blight,ADT45,60,2
3147,108557.jpg,brown_spot,ADT45,55,3
7801,100152.jpg,normal,ADT45,60,1
4108,100136.jpg,dead_heart,ADT45,70,4
2514,107002.jpg,blast,KarnatakaPonni,75,5
3433,104167.jpg,brown_spot,ADT45,70,4


- Now we have a field that specifies the fold each image belongs to.

### Dataloaders for fastai training


In [10]:
img2valid = []

for fold in train_df.kfold.unique():
    train_df['is_valid'] = False
    idxs = train_df[train_df.kfold == fold].index
    train_df.loc[idxs, 'is_valid'] = True
    
    img2valid.append({ r.image_id: r.is_valid for _, r in train_df.iterrows() })

In [11]:
len(img2valid)

5

In [12]:
#def get_split(p):
#    return img2valid[i_fold][p.name]

In [13]:
def custom_loss(inp, disease): 
        return F.cross_entropy(inp, disease, label_smoothing=label_smoothing)

In [14]:
def get_datablock(i_fold):
    
    def get_split(p):
        return img2valid[i_fold][p.name]
    
    dblock = DataBlock(
        blocks=(ImageBlock, CategoryBlock),
        get_items=get_image_files,
        get_y=parent_label,
        # pass get_split function
        splitter = FuncSplitter(get_split),
        item_tfms=Resize(480, method='squish'),
        batch_tfms=aug_transforms(size=224, min_scale=0.75)
    )
    return dblock.dataloaders(train_path)

In [15]:
predictions = []
tta_predictions = []

### Train Fold 0

In [16]:
dls = get_datablock(i_fold=0)

In [17]:
label_smoothing = 0
learn = vision_learner(dls, resnet34, metrics=error_rate, loss_func=custom_loss).to_fp16()

In [18]:
learn.fine_tune(20, 0.005)

epoch,train_loss,valid_loss,error_rate,time
0,1.776091,1.036593,0.338462,01:17


epoch,train_loss,valid_loss,error_rate,time
0,0.730779,0.466977,0.156731,01:21
1,0.438077,0.310089,0.095673,01:19
2,0.372175,0.302429,0.094231,01:17
3,0.310673,0.283511,0.0875,01:17
4,0.332722,0.46728,0.126923,01:17
5,0.295306,0.355379,0.100481,01:17
6,0.252319,0.296746,0.081731,01:18
7,0.217448,0.18664,0.05625,01:27
8,0.180768,0.242256,0.070673,01:18
9,0.186903,0.210754,0.065385,01:19


In [20]:
dls.valid.items

[Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100162.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100248.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100330.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100523.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100541.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100582.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100760.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100883.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100967.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/101004.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/101013.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_b

In [26]:
probs, _ = learn.get_preds(dl=dls.test_dl(test_files))

In [27]:
preds, _ = learn.tta(dl=dls.test_dl(test_files))

In [28]:
predictions.append(probs)
tta_predictions.append(preds)

In [29]:
dls.vocab[preds.argmax(axis=1)]

(#3469) ['hispa','normal','blast','blast','blast','brown_spot','dead_heart','brown_spot','hispa','normal'...]

In [30]:
sample_submission.label = dls.vocab[preds.argmax(axis=1)]
sample_submission.to_csv('22jul14-Fold0.csv', index=False)

In [31]:
#torch.from_numpy(
#    np.take(np.eye(10), probs.argmax(axis=1), axis=0)[0]
#)

### Train Fold 1

In [32]:
dls = get_datablock(i_fold=1)

In [33]:
label_smoothing = 0
learn = vision_learner(dls, resnet34, metrics=error_rate, loss_func=custom_loss).to_fp16()

In [34]:
learn.fine_tune(16, 0.005)

epoch,train_loss,valid_loss,error_rate,time
0,1.751425,0.988502,0.309467,01:14


epoch,train_loss,valid_loss,error_rate,time
0,0.756291,0.427743,0.126382,01:22
1,0.463508,0.288871,0.083614,01:21
2,0.398856,0.365157,0.110043,01:34
3,0.362747,0.27464,0.078328,01:24
4,0.322052,0.295719,0.085536,01:26
5,0.248449,0.393384,0.112446,01:22
6,0.237211,0.222533,0.068717,01:26
7,0.171213,0.267059,0.063431,01:20
8,0.155278,0.221697,0.060548,01:20
9,0.119536,0.176405,0.049015,01:20


In [35]:
probs, _ = learn.get_preds(dl=dls.test_dl(test_files))

In [36]:
preds, _ = learn.tta(dl=dls.test_dl(test_files))

In [37]:
predictions.append(probs)
tta_predictions.append(preds)

In [41]:
kfolds_preds = {
    'predictions': predictions,
    'tta_predictions': tta_predictions
}
save_pickle('22jul_kfold_preds_dict.pkl', kfolds_preds)

In [38]:
sample_submission.label = dls.vocab[preds.argmax(axis=1)]
sample_submission.to_csv('22jul14-Fold1.csv', index=False)

### Train Fold 2

In [25]:
kfolds_preds = load_pickle('22jul_kfold_preds_dict.pkl')

In [13]:
dls = get_datablock(i_fold=2)

In [14]:
dls.valid.items[:5]

[Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100133.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100169.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100234.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100382.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100516.jpg')]

In [15]:
label_smoothing = 0
learn = vision_learner(dls, resnet34, metrics=error_rate, loss_func=custom_loss).to_fp16()

In [16]:
learn.fine_tune(16, 0.005)

epoch,train_loss,valid_loss,error_rate,time
0,1.766622,1.05305,0.341499,01:28


epoch,train_loss,valid_loss,error_rate,time
0,0.771126,0.468777,0.158982,01:20
1,0.477258,0.325693,0.097983,01:19
2,0.39292,0.361818,0.106148,01:17
3,0.348329,0.419306,0.115274,01:17
4,0.326039,0.62573,0.154659,01:18
5,0.249776,0.299724,0.091739,01:19
6,0.231993,0.325945,0.097022,01:17
7,0.164781,0.251472,0.067243,01:17
8,0.150395,0.184289,0.049952,01:17
9,0.101905,0.169694,0.042747,01:18


In [17]:
probs, _ = learn.get_preds(dl=dls.test_dl(test_files))

In [18]:
preds, _ = learn.tta(dl=dls.test_dl(test_files))

In [29]:
sample_submission.label = dls.vocab[preds.argmax(axis=1)]
sample_submission.to_csv('22jul14-Fold2.csv', index=False)

In [33]:
kfolds_preds['predictions'].append(probs)
kfolds_preds['tta_predictions'].append(preds)

In [36]:
save_pickle('22jul_kfold_preds_dict.pkl', kfolds_preds)

### Train Fold 3

In [37]:
kfolds_preds = load_pickle('22jul_kfold_preds_dict.pkl')

In [38]:
dls = get_datablock(i_fold=3)

In [39]:
dls.valid.items[:5]

[Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100445.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100513.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100622.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100632.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100733.jpg')]

In [40]:
label_smoothing = 0
learn = vision_learner(dls, resnet34, metrics=error_rate, loss_func=custom_loss).to_fp16()

In [41]:
learn.fine_tune(16, 0.005)

epoch,train_loss,valid_loss,error_rate,time
0,1.764098,1.070464,0.337818,01:14


epoch,train_loss,valid_loss,error_rate,time
0,0.755913,0.434233,0.138395,01:19
1,0.440407,0.300845,0.093224,01:20
2,0.373542,0.317165,0.090341,01:22
3,0.347945,0.311864,0.102355,01:20
4,0.310837,0.361959,0.111965,01:18
5,0.268365,0.212954,0.069678,01:19
6,0.240314,0.230592,0.070639,01:20
7,0.193237,0.296935,0.084094,01:19
8,0.162659,0.191717,0.05382,01:19
9,0.10935,0.14916,0.03604,01:19


In [42]:
probs, _ = learn.get_preds(dl=dls.test_dl(test_files))

In [43]:
preds, _ = learn.tta(dl=dls.test_dl(test_files))

In [44]:
sample_submission.label = dls.vocab[preds.argmax(axis=1)]
sample_submission.to_csv('22jul14-Fold3.csv', index=False)

In [46]:
kfolds_preds['predictions'].append(probs)
kfolds_preds['tta_predictions'].append(preds)

In [47]:
save_pickle('22jul_kfold_preds_dict.pkl', kfolds_preds)

### Train Fold 4

In [48]:
kfolds_preds = load_pickle('22jul_kfold_preds_dict.pkl')

In [49]:
dls = get_datablock(i_fold=4)

In [50]:
dls.valid.items[:5]

[Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100023.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100049.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100148.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100268.jpg'),
 Path('paddy-disease-classification/train_images/bacterial_leaf_blight/100289.jpg')]

In [51]:
label_smoothing = 0
learn = vision_learner(dls, resnet34, metrics=error_rate, loss_func=custom_loss).to_fp16()

In [52]:
learn.fine_tune(16, 0.005)

epoch,train_loss,valid_loss,error_rate,time
0,1.761463,0.978713,0.31109,01:24


epoch,train_loss,valid_loss,error_rate,time
0,0.794363,0.401969,0.12722,01:29
1,0.463785,0.31099,0.087374,01:30
2,0.378451,0.349881,0.098896,01:29
3,0.359347,0.291711,0.086894,01:33
4,0.306631,0.283286,0.080173,01:22
5,0.269499,0.320978,0.091215,01:23
6,0.21544,0.197314,0.057609,01:31
7,0.189763,0.245345,0.06385,01:23
8,0.156215,0.16961,0.052808,01:28
9,0.106465,0.194469,0.052808,01:26


In [53]:
probs, _ = learn.get_preds(dl=dls.test_dl(test_files))

In [54]:
preds, _ = learn.tta(dl=dls.test_dl(test_files))

In [55]:
sample_submission.label = dls.vocab[preds.argmax(axis=1)]
sample_submission.to_csv('22jul14-Fold4.csv', index=False)

In [56]:
kfolds_preds['predictions'].append(probs)
kfolds_preds['tta_predictions'].append(preds)

In [57]:
save_pickle('22jul_kfold_preds_dict.pkl', kfolds_preds)

### Ensemble

In [60]:
kfolds_preds['tta_predictions'][0][0]

TensorBase([ 0.0596, -2.4420, -3.0153, -3.1522,  1.4467, -2.5806,  0.6302, 13.7930,
         0.2020, -1.6764])

In [65]:
softmax(kfolds_preds['tta_predictions'][0], axis=1)

TensorBase([[1.0856e-06, 8.8968e-08, 5.0151e-08,  ..., 9.9999e-01, 1.2517e-06,
         1.9132e-07],
        [3.8872e-08, 4.8012e-08, 1.7431e-08,  ..., 7.2525e-07, 1.0000e+00,
         8.7696e-08],
        [2.8856e-09, 1.4733e-06, 3.4506e-07,  ..., 8.0411e-06, 1.4984e-06,
         8.8227e-08],
        ...,
        [1.3997e-07, 2.1311e-07, 1.2382e-07,  ..., 1.1558e-06, 1.0000e+00,
         4.0590e-07],
        [1.1088e-08, 1.0000e+00, 6.5214e-08,  ..., 1.6343e-07, 1.9208e-07,
         2.7024e-07],
        [1.7107e-09, 1.6672e-11, 5.6055e-09,  ..., 4.5147e-11, 4.6310e-11,
         1.2087e-11]])

In [67]:
tta_predictions = [softmax(each, axis=1) for each in kfolds_preds['tta_predictions']]

In [72]:
avg_pr = torch.stack(tta_predictions).mean(0)

In [76]:
avg_pr.shape

torch.Size([3469, 10])

In [77]:
sample_submission.label = dls.vocab[avg_pr.argmax(axis=1)]
sample_submission.to_csv('22jul14-Stacked.csv', index=False)

In [None]:
## NON TTA

In [78]:
predictions = [softmax(each, axis=1) for each in kfolds_preds['predictions']]

In [79]:
non_tta_avg_pr = torch.stack(predictions).mean(0)

In [80]:
non_tta_avg_pr.shape

torch.Size([3469, 10])

In [81]:
sample_submission.label = dls.vocab[non_tta_avg_pr.argmax(axis=1)]
sample_submission.to_csv('22jul14-Stacked-noTTA.csv', index=False)

In [84]:
tta_and_notta_preds = [softmax(each, axis=1) for each in (predictions + tta_predictions)]
tta_and_non_tta_avg_pr = torch.stack(tta_and_notta_preds).mean(0)
tta_and_non_tta_avg_pr.shape

torch.Size([3469, 10])

In [85]:
sample_submission.label = dls.vocab[tta_and_non_tta_avg_pr.argmax(axis=1)]
sample_submission.to_csv('22jul14-Stacked-noTTAandTTA.csv', index=False)

## The experiment ended here


In [219]:
from scipy.special import softmax
softmax(preds2[0])

TensorBase([0.1625, 0.1188, 0.0447, 0.1711, 0.1148, 0.1117, 0.1167, 0.0518, 0.0614,
        0.0466])

In [224]:
softmax(preds2, axis=1)[0]

TensorBase([0.1625, 0.1188, 0.0447, 0.1711, 0.1148, 0.1117, 0.1167, 0.0518, 0.0614,
        0.0466])

In [157]:
def train(arch, lr=0.01, size=224, item_tfms=Resize(480, method='squish'), accum=1, finetune=True, epochs=12, i_fold=0):
      
    dblock = DataBlock(
        blocks=(ImageBlock, CategoryBlock),
        get_items=get_image_files,
        get_y=parent_label,
        
        #splitter=RandomSplitter(0.2, seed=42),
        splitter=FuncSplitter(get_split),
        item_tfms=item_tfms,
        batch_tfms=aug_transforms(size=size, min_scale=0.75)
    )
    dls = dblock.dataloaders(train_path, bs=64//accum)
    
    def combine_loss(inp, disease): 
        return F.cross_entropy(inp, disease, label_smoothing=1)
        
    cbs = GradientAccumulation(64) if accum!=1 else []
    
    learn = vision_learner(dls, arch, metrics=error_rate, cbs=cbs).to_fp16()
    
    if finetune:
        learn.fine_tune(epochs, lr)
        return learn#.tta(dl=dls.test_dl(tst_files))
    else:
        learn.unfreeze()
        learn.fit_one_cycle(epochs, 0.01)

In [158]:
learn = train(
    'resnet34',
    lr=0.01,
    epochs=2,
)

epoch,train_loss,valid_loss,error_rate,time
0,3.158795,2.252595,0.765978,01:33


epoch,train_loss,valid_loss,error_rate,time
0,2.623206,2.098186,0.728976,01:26
1,2.454458,2.040464,0.709755,01:24


### Create a learner and train

### Predictions and Test Time Augmentation

Lets compare the error rate -on the validation set- that are obtained with the normal prediction function and with the predictions we can get applying a technique called Test Time Augmentation (TTA). As you'll see, TTA is easy with fastai.

In [9]:
# Get predictions on validation set
probs, target = learn.get_preds(dl=dls.valid)
error_rate(probs, target)

TensorBase(0.0255)

In [10]:
# Get TTA predictions on validation set
probs, target = learn.tta(dl=dls.valid)
error_rate(probs, target)

TensorBase(0.0231)

So you can see a boost with TTA.

### Predictions on test set

In [11]:
# TTA predictions from test images
probs, _ = learn.tta(dl=dls.test_dl(test_files))

In [12]:
# get the index with the greater probability
preds = probs.argmax(dim=1)

In [13]:
dls.vocab[preds]

(#3469) ['hispa','normal','blast','blast','blast','brown_spot','dead_heart','brown_spot','hispa','normal'...]

### Submission

In [14]:
sample_submission.label = dls.vocab[preds]
sample_submission.to_csv('submission.csv', index=False)

### Conclusions

* I found this model being a good baseline, with a good accuracy for its speed and cost.
* You can try different epochs, learning rates, or even a different seed and see what happens when submitting the results.
* Then you can apply some of the techniques that Jeremy applied in his series.
* And keep trying.



In [15]:
# Pushing the notebook from my home PC to Kaggle

if not iskaggle:
    push_notebook(
        'fmussari', 
        'fast-resnet34-with-fastai',
        title='Fast Resnet34 with Fastai',
        file='2022-07. Fast and Agile Resnet34 with Fastai.ipynb',
        competition=competition, 
        private=True, 
        gpu=True
    )

Kernel version 1 successfully pushed.  Please check progress at https://www.kaggle.com/code/fmussari/fast-resnet34-with-fastai


## Cross Entropy

In [1]:
F.CrossEntropyLoss?

Object `F.CrossEntropyLoss` not found.


In [59]:
X = torch.tensor([
    [4.2, -2.4], 
    [1.6, -0.6], 
    [3.6, 1.2], 
    [-0.5, 0.5], 
    [-0.25, 1.7]
])
X

tensor([[ 4.2000, -2.4000],
        [ 1.6000, -0.6000],
        [ 3.6000,  1.2000],
        [-0.5000,  0.5000],
        [-0.2500,  1.7000]])

In [60]:
target = torch.tensor([0,1,1,0,0])
target

tensor([0, 1, 1, 0, 0])

In [61]:
F.cross_entropy(X, target)

tensor(1.6379)

In [19]:
softmax(X, axis=1)

tensor([[0.9986, 0.0014],
        [0.9002, 0.0998],
        [0.9168, 0.0832],
        [0.2689, 0.7311],
        [0.1246, 0.8754]])

In [21]:
F.cross_entropy(X, target)

tensor(1.6379)

In [104]:
X = torch.tensor(
    [[-4.88522478044709, 2.59747282063147, 0.591664975702642, -2.06894452227226, -4.56867917386799]]
)
X

tensor([[-4.8852,  2.5975,  0.5917, -2.0689, -4.5687]])

In [105]:
target = torch.tensor([1])
target

tensor([1])

In [106]:
softmax(X)

tensor([[4.9135e-04, 8.7314e-01, 1.1748e-01, 8.2127e-03, 6.7432e-04]])

In [107]:
F.cross_entropy(X, target)

tensor(0.1357)

In [108]:
F.nll_loss(X, target)

tensor(-2.5975)

In [75]:
X = torch.tensor([
    [4.2, -2.4], 
    [1.6, -0.6], 
    [3.6, 1.2], 
    [-0.5, 0.5], 
    [-0.25, 1.7]
])
X

tensor([[ 4.2000, -2.4000],
        [ 1.6000, -0.6000],
        [ 3.6000,  1.2000],
        [-0.5000,  0.5000],
        [-0.2500,  1.7000]])

In [76]:
y = torch.tensor([0,1,1,0,0])
y

tensor([0, 1, 1, 0, 0])

In [80]:
F.cross_entropy(X, y, label_smoothing=0.01)

tensor(1.6370)

In [109]:
F.cross_entropy??

https://amaarora.github.io/2020/07/18/label-smoothing.html

In [93]:
# Helper functions from fastai
def reduce_loss(loss, reduction='mean'):
    return loss.mean() if reduction=='mean' else loss.sum() if reduction=='sum' else loss

In [115]:
# Implementation from fastai https://github.com/fastai/fastai2/blob/master/fastai2/layers.py#L338
class LabelSmoothingCrossEntropy(nn.Module):
    def __init__(self, ε:float=0.1, reduction='mean'):
        super().__init__()
        self.ε,self.reduction = ε,reduction
    
    def forward(self, output, target):
        # number of classes
        c = output.size()[-1]
        print(f"output: {output}")
        log_preds = F.log_softmax(output, dim=-1)
        
        print(f"log_preds: {log_preds}")
        
        loss = reduce_loss(-log_preds.sum(dim=-1), self.reduction)
        nll = F.nll_loss(log_preds, target, reduction=self.reduction)
        # (1-ε)* H(q,p) + ε*H(u,p)
        return (1-self.ε)*nll + self.ε*(loss/c) 

In [112]:
# X: model logits or outputs, y: true labels
X = torch.tensor([
    [4.2, -2.4], 
    [1.6, -0.6], 
    [3.6, 1.2], 
    [-0.5, 0.5], 
    [-0.25, 1.7]
])
y = torch.tensor([0,1,1,0,0])

#out
tensor([[ 4.2000, -2.4000],
        [ 1.6000, -0.6000],
        [ 3.6000,  1.2000],
        [-0.5000,  0.5000],
        [-0.2500,  1.7000]]) 

tensor([[ 4.2000, -2.4000],
        [ 1.6000, -0.6000],
        [ 3.6000,  1.2000],
        [-0.5000,  0.5000],
        [-0.2500,  1.7000]])

In [113]:
LabelSmoothingCrossEntropy(ε=0.1, reduction='none')(X,y)

log_preds: tensor([[-1.3595e-03, -6.6014e+00],
        [-1.0508e-01, -2.3051e+00],
        [-8.6836e-02, -2.4868e+00],
        [-1.3133e+00, -3.1326e-01],
        [-2.0830e+00, -1.3302e-01]])


tensor([0.3314, 2.1951, 2.3668, 1.2633, 1.9855])

In [123]:
X = torch.tensor(
    [[-4.88522478044709, 2.59747282063147, 0.591664975702642, -2.06894452227226, -4.56867917386799]]
)
target = torch.tensor([1])
X

tensor([[-4.8852,  2.5975,  0.5917, -2.0689, -4.5687]])

In [146]:
total = sum([math.exp(xi) for xi in X[0]])
[math.log(math.exp(xi)/total) for xi in X[0]]

[-7.618357767044846,
 -0.13566004174882512,
 -2.141467977940384,
 -4.802077402054611,
 -7.301812280594651]

In [127]:
F.log_softmax(X, dim=-1)

tensor([[-7.6184, -0.1357, -2.1415, -4.8021, -7.3018]])

In [119]:
LabelSmoothingCrossEntropy(ε=0, reduction='none')(X,target)

output: tensor([[-4.8852,  2.5975,  0.5917, -2.0689, -4.5687]])
log_preds: tensor([[-7.6184, -0.1357, -2.1415, -4.8021, -7.3018]])


tensor([0.1357])

In [122]:
F.log_softmax??