#Applied Deep Learning & Computer Vision using Pytorch (Intro)
#Application: Classify Damaged RAM Cones (damage vs non-damage) in the National Ignition Facility (NIF) optics recycle loop

*Damaged RAM Cones: protocol used to mitigate damage on NIF optics in order to recycle and extend life of optics used to penetrate lase beam energy to dense center (this creates nuclear fusion)*

This notebook will discuss the main things we need to do to apply a Computer Vision deep learning model (eg. ResNet or Convnext) to predict Damaged RAM Cones (see slide).


 1. prepare data,
 2. prepare dataloaders,
 3. train setup,
 4. test/evaluation setup,
 5. model setup



A lot of Machine Learning involves **data preprocessing** (1) (prior to actual training and testing the model) which involves preparing inputs that can be well undestood by the model
- NOTE: In this case we assume that the preprocessed data is already stored in a numpy array (so data peprocessing can be discussed in another notebook)
  - preprocessing (for this specific app) involved a lot of steps, eg.:
    - read images from metrology server
    - extract damage metadata
    - extract image features of (center) cone in question (eg. Hough Circle finding/counting, segmentation)
- we therefore directly apply dataloader function to labels and data saved as a CSV and numpy array, repsectively
  - labels refer to the class in this case (Damage: 1 or Non-Damages RAM cone: 0)
  - images refer to the multidimensional image data (750x750x3 in this case)
    - numpy array is a common storage method because numpy array can be quickly converted to tensors
  - these need to be accessible to this code (eg. on the same CPU this is run, eg. Google Drive)


#SETUP environment
- imports
  - good to include sklearn and/or torchmetrics (analysis/calcluation/presentation of results), and common trasnforms in addition to torch (deep learning), cv2 (computer vision), and torchvision
  - other book-keeping tools, eg. time, to write time of train to file ...and read from config file (where we store our train configurations for current train)





In [None]:
import time
import random
import cv2
import torchmetrics
import matplotlib.pyplot as plt
from torchmetrics.classification import BinaryROC
from configparser import ConfigParser
from torchvision.transforms import (CenterCrop,
                                    Compose,
                                    ColorJitter,
                                    Normalize,
                                    RandomRotation,
                                    RandomAffine,
                                    RandomHorizontalFlip,
                                    RandomVerticalFlip,
                                    RandomApply,
                                    RandomCrop,
                                    ToTensor,
                                    Resize)
import time
from datetime import datetime

now = datetime.now()

year = now.strftime("%Y")
month = now.strftime("%m")
day = now.strftime("%d")
dtime = now.strftime("%H%M%S")

# can be used to save model weights/states
new = "states/" + month + day + year + dtime + "/"

# os.mkdir(new)

date_time = now.strftime("%m/%d/%Y, %H:%M:%S")
print("Train/test date and time:", date_time)




##Transforms

  - transforms can be used to increase the size of the data by artificial augmentation
    - more variation in train examples makes the model more robust to new data
  - transforms can also be useful for handling class imbalance (as we have here in this application) where we have only ~5% damaged RAMs because we can add more versions of just the minority class
    - other methods to handle this class-imbalance include increasing threshold for non-damage prediction, increasing the minority sampling rate (in the dataloader)
  - we may also need general transforms like resize to be applied to both test and train data

In [None]:
# required format for images and labels in torch
t = torchvision.transforms.ToTensor()
t0 = RandomRotation((-90,-90))
t1 = RandomRotation((90,90))
t2 = RandomVerticalFlip(p=1.0)
t3 = RandomHorizontalFlip(p=1.0)
t4 = RandomAffine(45,scale=(0.8,1.5))
t5 = RandomAffine(45,shear = 20)
t6 = RandomApply([ColorJitter(brightness=(0.8, 1.2), contrast=(0.8, 1.2)), RandomAffine(45, scale=(0.8,1.5)), RandomAffine(45, shear=20)], p = 0.8)
t7 = RandomApply([t0, t1, t2, t3], p = 0.2)
t8 = CenterCrop(752)
t9 = Resize(520)#736

#Prepare the dataloaders from stored datasets
- we first write a collate function used as input to torch `data.DataLoader` function
  - we will see this in the data prep function described next
- these dataloaders usually take input of lists of tuples of data and labels converted to float or int (in this case images and "Damage":1 or "Non-Damage":0)
 - eg. (image, int)
- during training/testing collate takes the tuples and converts them to separate tensors (of variable batch size, input) which is in a format that can be input into the model layers
 - can also apply in training transforms ( but better to do on GPU)
 - can also add another piece of metadata to the tuple if there are mutiple labels to learn from (eg. we may want to use a combination of scalar and image data as model input)


In [None]:
# used as input to dataloader; each batch needs to be a tensor of images and a tensor of labels
def collate_fn(examples):
    pixel_values = torch.stack([t9(t8(t(example[0]))) for example in examples])
    labels = torch.tensor([example[1] for example in examples])
    p = torch.tensor([example[2] for example in examples])
    c = torch.tensor([example[3] for example in examples])
    return {"pixel_values": pixel_values, "labels": labels, "count": c, "protocol": p}

This `prep_data_load` function
1. reads the data into numpy arrays (can also be done with pandas for easy tools for big metadata):
 - images from numpy arrays
  - numpy array can usually store large image data
 - labels from CSV file
  - csv good way to store metadata and labels

2. divides data up into test/train/eval sets
 - usually 70/15/15 % split is good
  - can also try different splits of the dataset for testing (eg. of same size and average, train with everything else)

3. applies normalization to float images, applies integer conversion to labels
  
4. creates tuples of labels and normalized images from numpy arrays

5. uses these along with collate function as input to `data.DataLoader` functions, which are returned for use as input to train and evaluation functions
  - `shuffle` train set always, `drop_last` useful for testing when data-size not divisible by batch size

In [None]:
# read in meta-data from CSV and corresponding image data from .npy
# load up data-loader (creates the train/test batches)
# Note: we have a sepaarate csv, npy for train/test
def prep_data_load(batch_size, ims1_path, meta1_path, test_ims_path, test_meta_path, cols, split = 0.3, k = 2):
  df1 = pd.read_csv(meta1_path[0])
  for i in range(1,len(meta1_path)):
    df2 = pd.read_csv(meta1_path[i])
    df1 = df1.append(df2, ignore_index=True)
  df1.to_csv("drp2.csv", index=False)

  ims = np.load(ims1_path[0])
  test_ims = np.load(test_ims_path)

  for i in range(1,len(ims1_path)):
      ims= np.concatenate([ims, np.load(ims1_path[i])], axis=0)

  with open('dr2.csv', 'r') as f:
    reader = csv.reader(f, delimiter=',')
    attributes = next(reader)
    meta1 = list(reader)  # np.array(list(reader))

  with open(test_meta_path, 'r') as f:
    reader = csv.reader(f, delimiter=',')
    attributes = next(reader)
    test_meta = list(reader)  # np.array(list(reader))

  train, valid, test = [], [], []
  ims = ims.astype(np.float32)
  test_ims = test_ims.astype(np.float32)

  eval_num = round(len(meta1)*split)
  tot = ims.shape[0]
  test_inds = [i for i in range(test_ims.shape[0])]
  eval_inds = [i for i in range(k*eval_num, (k+1)*eval_num)]

  train_inds = [ i for i in range(k*eval_num)] + [i for i in range((k+1)*eval_num, tot)]

  print("Original train length", len(train_inds), "eval length", len(eval_inds), "test length", len(test_inds))
  immn = [ims[:,:,:,0].mean(), ims[:,:,:,1].mean(), ims[:,:,:,2].mean()]
  imstd = [ims[:,:,:,0].std(), ims[:,:,:,1].std(), ims[:,:,:,2].std()]
  imnorm = Normalize(mean = immn, std = imstd)
  print("immn, imstd, imnorm", immn, imstd, imnorm)

  for i in range(ims.shape[0]):
    label = int(meta1[i][-4])
    im1 = ims[i]
    c = meta1[i][-3].split('RAM')[-1]
    p = int(meta1[i][1])
    if i in eval_inds:
     valid.append((im1, label, p, c))
    else:
      train.append((im1, label, p, c))

  for i in range(test_ims.shape[0]):
    im1 = test_ims[i]
    c = test_meta1[i][-1].split('RAM')[-1]
    p = int(test_meta1[i][cols[2]])
    test.append((im1, 0, p, c))

  print("check train and test lengths:", len(train), len(valid))
  test_loader = data.DataLoader(test, collate_fn=collate_fn, batch_size=1, shuffle = False)
  eval_loader = data.DataLoader(valid, collate_fn=collate_fn, batch_size=1, shuffle = False)
  train_loader = data.DataLoader(train, collate_fn=collate_fn, batch_size=batch_size, shuffle = True, drop_last = True)
  return train_loader, test_loader, eval_loader, imnorm

###Notes about data prepping and loading

- data may be too large to apply transforms pre-training (may not fit on disk, or in memory)
- normalizing the training data is a standard in machine learning
  - puts data on similar scale without losing information
  - reduces storage space of data
  - reduces training time
  - provides stability during training (eg. weights update applied evenly thorugh data) reduces possibility of exponential decay
- data may be in chunks, needs to be combined if cannot store all data in one array
- probably have separate test set
  - eg. to test on new data

#Models setup/define
- wrap any newly defined model in a class with `torch.nn.Module` inheritance
 -  otherwise torch, sklearn, HuggingFace, etc. have pre-defined models (load with one line of code)
- models usually have an `init` of model, model layers, and forward function which applies those to the input data when the model is applied in practice

below are some simple models I created for this application

Model `FinalModelWrapper` below is an example of test time transforms
 - averages predctions of different transforms of 1 image


In [None]:
# averages predictions on each rotation/flip
class FinalModelWrapper(torch.nn.Module):
  def __init__(self, model):
    super().__init__()
    self.model = model
  def forward(self, x):
    y = self.model(x)
    xmod = torch.rot90(x, 1, (2, 3))
    y = torch.add(y, self.model(xmod))
    xmod = torch.flip(xmod, (3,))
    y = torch.add(y, self.model(xmod))
    xmod = torch.flip(xmod, (2,))
    y = torch.add(y, self.model(xmod))
    xmod = torch.flip(x, (3,))
    y = torch.add(y, self.model(xmod))
    xmod = torch.flip(xmod, (2,))
    y = torch.add(y, self.model(xmod))
    xmod = torch.flip(x, (2,))
    y = torch.add(y, self.model(xmod))
    xmod = torch.rot90(xmod, 1, (2, 3))
    y = torch.add(y, self.model(xmod))
    y = y/8.0

    return y

- `TFModel`: wrapper for a different library model (HuggingFace) which puts output of model in element 1, unlike torch models

In [None]:
class TFModel(torch.nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model

    def forward(self, x):
        y = self.model(x)[0]
        return y

- `TwoModelWrapper`: model to average outputs of two differnet models to increase accracy (eg. ensemble method)

In [None]:
class TwoModelWrapper(torch.nn.Module):
    def __init__(self, model0, model1):
        super().__init__()
        self.model0 = model0
        self.model1 = model1

    def forward(self, x):
        y = self.model1(x)
        y = torch.add(y, self.model0(x).logits)
        y = y / 2.0

        return y

- `MultiInputSimpleV`: model to include metadat in prediction
- eg. we can include the number of RAMEOs in this DL application
 - RAMEOs are other RAM cones which overlap the RAM in the center (which we are predicting)
 - other overlapping RAMEOs may increase probability of damage so this information could help the model decide

In [None]:
class MultiInputSimpleV(torch.nn.Module):
  def __init__(self, modelb):
    super().__init__()
    self.modela = modelb
    self.fc = nn.Sequential(
      #nn.Dropout(0.2),
      nn.Linear(128, 64), #
      nn.PReLU(),
      nn.BatchNorm1d(64),
      nn.Dropout(0.3),
      nn.Linear(64, 1),
    )
  def forward(self, x1, c, p):

    o1 = self.modela(x1)
    c = c.unsqueeze(1) #add dim 1
    p = p.unsqueeze(1)
    combined = torch.cat((c, p, o1), dim = 1)
    out = self.fc(combined)
    return out

###Flattened Model layer before Fully Connected layers

- last layer of model is flattened (eg. there will be a `num_classes` input for most predefined models)
- last flattened layer concatenated with `torch.cat` with scalar metadata
- `unsqueeze` is required to convert scalar input to 1-d tensor before concatenation so the dimensions math (`squeeze` removes it)

###Fully Model Layers

- Linear: fully connected nodes
  - `torch.nn.Linear(in_features, out_features, bias=True, device=None, dtype=None)`
- `PReLU` (Parameterized Rectified Linear Unit): Relu (y = max(0,x)) is an activation function we need to have which introduces non-linearity to network
 - we need to have a model which can represent non-linear functions
 - `PReLU` is a variation (y = max(0,x) + a* min(0,x))
  - Certain activation functions, like the sigmoid function, squishes a large input space into a small input space between 0 and 1.
  - PRelu and Leaky ReLu allow a small negative output
- `BatchNorm1D`: Normalization between batches can help with stability

- `Dropout`: random dropout of nodes, provides regularization, prevents overfitting and cross correlation

- finally reduce final model `Linear` layer dimension to the number of classes ( one output representing th eprobability of each class)

#Test and evaluation function
- uses validation and test loaders
- validation is done during training in order to see (during training) how well the model is doing on new data
 - eg. we can see when the model **converges**
 - besure not to overfit model though by training to long( eg. adapt to test set)
  - **bias vs variance** trade off
    - Bias (underfitting) represents the error due to overly simplistic assumptions
    - variance model follows closely to the train data
- test is done only after the model is fully trained and the best model is saved
 - usually we use best or last model by looking at the validation set accuracy (or class averaged accuracy)
- valid is done during training eg. every 1, 5, 10 epochs

- note that no weights updated here, loss is calucated, no packpropogation; just using save model weights
- still best to do this on GPU and/or in batches if test set is large

##analysis of results:
- classification problem so we create a **confusion matrix** (sklearn and/or pandas are good tools for things like this and other accuracy metrics)
- also saved best model to **.pt** file to store for after training, future use
 - prod code usually reads in a **JIT traced model** (so we may save this as well)
- binary problem so we apply **sigmoid**, round
  - threshold may vary (eg. in cases wher we have class imbalance
  - multiclass would have been softmax
  these functions squish the sum of the final output values to between 0 and 1 so we can round or choose the max to get the prediction


In [None]:
# evaluate classification accuracy of binary model, check if score improved,
# if yes, update best score, save weights, and show new confusion matrix
def evaluate(model, test_loader, eval_loader, testforms, score = 0.0, test = False, thresh = 0.5):

  device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
  y_true, y_pred, dats, confs = [], [], [], []

  model.to(device)
  model.eval()

  if test == True:
      eval_loader = test_loader
      with open(test_meta_path, 'r') as f:
        # this is for reading name labels in test set so we can write
          reader = csv.reader(f, delimiter=',')
          attributes = next(reader)
          test_meta = list(reader)  # np.array(list(reader))
  with torch.no_grad():
    for i, batch in enumerate(eval_loader):
      x = batch['pixel_values'].float()
      y = batch['labels']
      p = batch['protocol'].float()
      c = batch['count'].float()
      x, p, c = x.to(device), p.to(device), c.to(device)
      x = testforms(x)
      #out = model(x)
      out = model(x, p, c)
      # adjust out activation function if mc
      if test == True:
        conf = torch.sigmoid(out).detach().cpu().numpy()[:,0].tolist()
        pred = torch.round(torch.sigmoid(out)).detach().cpu().numpy()[:,0].tolist()
        name = test_meta[i][0]
        y_true = y.numpy().tolist()
        if (y_true[0] == 1) and (y_true[0] != pred[0]):

            print(i)#, name)#, pred[0], y_true[0])
        if pred[0] == 0:
            conf = 1 - conf[0]
        else:
            conf = conf[0]
        dats.append([name, pred[0], conf ])
      else:

        conf0 = torch.sigmoid(out).detach().cpu().numpy()[:,0].tolist()
        confs += conf0
        pred = [1 if x > thresh else 0 for x in conf0 ]
        #pred = torch.round(torch.sigmoid(out)).detach().cpu().numpy()[:, 0].tolist()

        y_pred += pred

        true = y.numpy().tolist()
        y_true += true

    if test == False:
        print("Accuracy score: ", accuracy_score(y_true, y_pred))

        recall = np.diag(cm) / np.sum(cm, axis = 1)
        #precision = np.diag(cm) / np.sum(cm, axis = 0)

        print("Class averaged accuracy (np.sum(recall)/2)", np.sum(recall)/2)
        if score < np.sum(recall)/2:#accuracy_score(y_true, y_pred):
          y_true = pd.Series(y_true, name = "Actual")
          y_pred = pd.Series(y_pred, name = "Predicted")
          df_confusion = pd.crosstab(y_true, y_pred)
          print(df_confusion)

          cm = confusion_matrix(y_true, y_pred)
          #print(cm)
          recall = np.diag(cm) / np.sum(cm, axis = 1)
          #precision = np.diag(cm) / np.sum(cm, axis = 0)
          #print(precision)
          print("IMPROVED class averaged accuracy (np.sum(recall)/2)", np.sum(recall)/2)
          score = np.sum(recall)/2
  if test == True:

      with open("results_test_meta.csv", 'w', newline='') as csvfile:
          writer = csv.writer(csvfile)
          writer.writerow(["Server_Path", "Pred", "Confidence"])
          writer.writerows(dats)

      print(" meta (path, pred, conf) saved with length:", len(dats))

  return score

#Training the model

usually always need GPU for large data
iterate through each epoch (eg. send model.to(GPU)) and send data.to(GPU)
 - epoch is one full run through of all the data
 - number of batches depends on the batch size
- apply model
- caluclate loss
- zero/clear previous gradient
- backpropgate loss
- optimizer step

In [None]:
# accumulate run time and check if time >= max runtime if so evaluate(score = 0) so weights save
# train based on hyper-params from config file and evaluate each epoch
def train_eval(model, train_loader, test_loader, eval_loader, trainforms, testforms, loss_func, optimizer, epochs):
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print("Device", device)
    score = 0.0
    model.to(device)
    score = evaluate(model, test_loader, eval_loader, testforms, score)
    t0 = time.time()
    max_time = 20
    write_ims = False
    for epoch in range(epochs):
        model.train()
        print("Epoch", epoch + 1)
        for step, batch in enumerate(train_loader):
            x = batch['pixel_values'].float()
            y = batch['labels'].float()
            p = batch['protocol'].float()
            c = batch['count'].float()
            # x = trainforms(x)
            x, y, p, c = x.to(device), y.to(device), p.to(device), c.to(device)
            x = trainforms(x)
            #out = model(x)
            out = model(x, p, c)

            loss = loss_func(out, y.unsqueeze(1))
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            t1 = time.time()
            hours = (t1 - t0) / 3600
        if (hours >= max_time):
            write_ims = True
            epcohs_passed = epoch + 1
            print("Hours passed:", hours)
            return model, epochs_passed

        score = evaluate(model, test_loader, eval_loader, testforms, score)
    print("Final score: ", score)
    score = evaluate(model, test_loader, eval_loader, testforms, score, test=False)
    return model, epochs

##main function to call functions above and setup
- different optimizers and loss functions:

Optimizers: used to tune the parameters of a neural network in order to minimize the **cost function**
-
- 2 kinds adaptive and gradient descent
  - adaptive: dont require LR tuning
    - eg. Adam ( best - adds to the advantages of Adadelta and RMSprop, the storing of an exponentially decaying average of past gradients similar to momentum.)
  - gradient descent:
    - batch (entire dataset), stochastic (1 sample), mini-batch (1 batch at a time)

    theta = theta - lambda * grad J( theta)

- best choice depends on the type of problem (eg. regression vs classification)

Loss functions: computes the distance of a single prediction from its actual value,

- hyperparaters:
 - number of epochs: 1 epoch is one entire traversal of data through the network (eg. all samples have been passed through, n iterations = number of samples / batch size)
  - depends on batch size, data size; maybe guess a number that should be enough in the beginning and then check it converges
 - train/test/eval batch sizes: smaller batch sizes may take longer to train (may be hard to converge if too small because of fluctiations)

 - learning rate: how much to update each weight in prop to loss (eg. how quickly stepping toward inflection point)
  - to high causes issues (misses local minima
  - to low causes issues (doesnt converge)


In [None]:

if __name__ == '__main__':
    # new = "states/" + month + day + year + dtime

    # os.mkdir(new)

    configParser = ConfigParser()
    configParser.read('drp2.txt')

    get_data = configParser["DATA"]
    train_ims_path = get_data['train_ims_path']
    test_ims_path = get_data['test_ims_path']
    train_meta_path = get_data["train_meta_path"]
    test_meta_path = get_data['test_meta_path']
    train_label_col = -2#get_data['train_label_col']
    test_label_col = 0#get_data['test_label_col']
    meta_col = -1#get_data['meta_col']
    #test meta col

    cols = [int(train_label_col), int(test_label_col), int(meta_col)]
    split = float(get_data["split"])
    k = int(get_data["k_fold"])


    print("Train, Valid Images array: ", train_ims_path)
    print("Train, Valid Labels file: ", train_meta_path)
    print("cols [train_label_col, test_label_col, meta_col]: ", cols)
    print("k_fold, split:", k, split)

    get_model = configParser["MODEL"]
    model = get_model["model"]
    states = get_model["states"]
    print("Model used: ", model, ", states: ", states)
    # add in model configs

    get_params = configParser["PARAMS"]  # batch size, LR/schedule, optimizer, epochs, DA
    batch_size = int(get_params["batch_size"])
    epochs = int(get_params["epochs"])
    learning_rate = float(get_params["learning_rate"])
    print("Hyperperameters (batch size, epochs, learning rate): ", batch_size, epochs, learning_rate)

    model0 = TFModel(ConvNextForImageClassification.from_pretrained("states/weights/convnext/", ignore_mismatched_sizes=True, num_labels=126))#= torchvision.models.resnet18()
    #model0.fc = torch.nn.Linear(512, 1)
    if model == "res18":
        model1 = model0
    if model == "FMW":
        model1 = FinalModelWrapper(model0)
    if model == "res-convnext":
      model1 = ConvNextForImageClassification.from_pretrained("states/weights/convnext/", ignore_mismatched_sizes=True, num_labels=1)
      model1 = TwoModelWrapper(model1, model0)
    if model == "res18":
      model1 = model0
    if model == "FMW":
      model1 = FinalModelWrapper(model0)
    if model == "MultiInputSimpleV":
      #model0.fc = torch.nn.Linear(512, 63)
      model1 = MultiInputSimpleV(model0)
    if model == "convnext":
      model1 = TFModel(ConvNextForImageClassification.from_pretrained("states/weights/convnext/", ignore_mismatched_sizes=True, num_labels=126))
    # input mc labels or data paths if specified
    #train_loader, test_loader, eval_loader, imnorm = prep_data_load(batch_size, train_ims_path, train_meta_path, test_ims_path, test_meta_path, cols, split, k)
    train_loader, test_loader, eval_loader, imnorm = prep_data_load(batch_size, ['data/new2.npy', 'data/1260160.npy', 'data/711444_715276_715324_716189.npy', 'data/extractedTest1260s.npy'], ['data/new2.csv', 'data/1260160.csv', 'data/711444_715276_715324_716189.csv', 'data/1260_meta.csv'] , test_ims_path, test_meta_path, cols, split, k)
    #  transforms used in current AMH prod code so supposably helpful (but we should double check)
    t0 = RandomApply([ColorJitter(brightness=(0.8, 1.2), contrast=(0.8, 1.2)), RandomAffine(45, scale=(0.8, 1.5)),
                      RandomAffine(45, shear=20)], p=0.8)
    # update transforms if specified in config
    trainforms = nn.Sequential(
        imnorm,
        # RandomCrop(480),
        # t0
    )
    testforms = nn.Sequential(
        imnorm,
        # CenterCrop(480)
    )

    print("Transforms used: ", trainforms)

    # optimizer = torch.optim.Adam(model.parameters(), lr = lr)
    optimizer = SGD(model1.parameters(), lr=learning_rate, momentum=0.9, weight_decay=0.0001)
    # scheduler = lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)

    #ckpt = torch.load(states)
    #model1.load_state_dict(ckpt['model_state_dict'])

    #optimizer.load_state_dict(ckpt['optimizer_state_dict'])

    #epochs0 = epochs + ckpt['epochs_passed']
    #print("Overall total epochs that should be trained after this: ", epochs0)

    # scheduler = lr_scheduler.StepLR(optimizer, step_size=20, gamma=0.5)
    print("Optimizer used: ", optimizer)

    # update loss function if mc input
    # loss_func = nn.CrossEntropyLoss()
    loss_func = nn.BCEWithLogitsLoss()
    '''
    w_name = new + month + day + "epoch" + str(ckpt['epochs_passed']) + "-first.pt"
    torch.save({'model_state_dict': model1.state_dict(),
      'epochs_passed': ckpt['epochs_passed'],
      'optimizer_state_dict': optimizer.state_dict()}, w_name)
    print("First model saved to", w_name)
    '''
    model, epochs_passed = train_eval(model1, train_loader, test_loader, eval_loader, trainforms, testforms, loss_func,
                                      optimizer, epochs)
    #epochs_passed += ckpt['epochs_passed']
    '''
    w_name = new + "epoch_" + str(epochs_passed) + "-last.pt"
    torch.save({'model_state_dict': model.state_dict(),
      'epochs_passed': epochs_passed,
      'optimizer_state_dict': optimizer.state_dict()}, w_name)
    print("Last model saved to", w_name)
    '''


[link text](https://towardsdatascience.com/7-tips-to-choose-the-best-optimizer-47bb9c1219e)