***
<div class="header" style="
  padding: 20px;
  background: black;">
    <h1 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:50px;
               font-style:bold;
               color:white;">
        ManGanda
    </h1>
***
    <h2 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:30px;
               font-style:bold;
               color:white;">
        Regressive Approach in Rating Mangas thru Sample Art
    </h2>
</div>

***
by : JP Fabrero

***
<div class="header" style="
  padding: 20px;
  background: black;">
    <h2 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:30px;
               font-style:bold;
               color:white;">
        Introduction
    </h2>
</div>

***
We're diving into the marvelous world of manga, where things have been blowing up like never before. With its origins in Japan, manga has transcended cultural boundaries and captivated readers of all ages and backgrounds. It's a multidimensional art form, encompassing storytelling, character development, pacing, thematic depth, and stunning art.

Now, here's the thing: with the manga industry booming and countless series hitting the shelves every day, it can be a real challenge to figure out which ones are worth your time and money. That's where ManGanda comes in. It's on a mission to help both publishers and readers make better-informed decisions about which manga series to invest in and promote.

We'll take out all the literary elements of a manga and take a close look at the key visual and storytelling elements that make a manga shine. This approach is all about that art style, character design, and panel composition. By analyzing these factors, ManGanda aims and attempts to provide a sneak peek of a manga's potential rating.

***
<div class="header" style="
  padding: 20px;
  background: black;">
    <h2 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:30px;
               font-style:bold;
               color:white;">
        Methodology
    </h2>
</div>

***
The implementation of this model is pretty straight-forward. The biggest challenge is building up the custom dataset and defining the mechanism for dataloader. Used Transer Learning for impact and performance. This is a run-down of the steps made towards ManGanda's goals.

* Data Collection and Annotation:
    - Obtain a list of top-rated manga titles and their corresponding ratings as ground labels from MangaList: https://myanimelist.net/topmanga.php?type=manga.
    - Using the obtained list of mangas, download availabel manga panel samples from MangaFreak, https://w15.mangafreak.net/, to create a diverse dataset representing a wide range of art styles and genres.
    - Create an annotations library that includes file paths, ratings, and titles for each panel sample, establishing a mapping between the panel images and their associated manga ratings.
<br></br>
* Data Preprocessing and EDA:
    - Utilize the Torch library to create custom datasets.
    - Preprocess the panel samples by using the transformation functions and achieve the following: get the maximum square crop from the panel, grayscaled image samples, randomly, horizontally flips samples, and resized panels compatible with the ResNet model.
    - Split the annotated dataset into training, validation, and testing sets in preparation for training and evaluation.
<br></br>
* Dataloader Customization:
    - Utilize the Torch library to create dataloaders.
    - Employed a simple outlier detection mechanism in the collate function to reject loading outliers found in the loaded data.
<br></br>
* Pretrained Model Selection:
    - For transfer learning, choosing a suitable pretrained model is crucial to leverage its trained capabilities and powerful feature extraction. In this case, employ the pretrained ResNet18 model trained on the Danbooru2018 dataset, developed by Matthew Baas. Danbooru2018 consists of millions of annotated images collected from the Danbooru community, a popular imageboard site for anime and manga enthusiasts.[3]
<br></br>
* Model Modification and Training:
    - Adapt the selected pretrained model for regression application, as the goal is to predict manga ratings.
    - Replace the final classification layer of the ResNet18 model with a custom regression layer that outputs a continuous value representing the predicted rating.
    - Employ a suitable loss function, mean squared error (MSE), to measure the disparity between predicted ratings and ground truth ratings.
    - Perform model training using Adam algorithm to optimize and update the model parameters and minimize the loss.
<br></br>
* Model Evaluation:
     - Evaluate and apply the trained ManGanda model to predict the ratings of new manga panel samples and assess its performance.

<div class="header" style="
  padding: 20px;
  background: black;">
    <h3 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:22px;
               font-style:bold;
               color:white;">
        Importing Libraries
    </h3>
</div>

***

In [1]:
import os
os.environ['SKIMAGE_DATADIR'] = '/tmp/.skimage_cache'
os.environ['XDG_CACHE_HOME'] = '/home/msds2023/jfabrero/.cache'

import re, time, shutil, copy
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from PIL import Image
import  urllib.parse as urlp

from tqdm.notebook import tqdm, trange
from pyjanitor import auto_toc
from pickling import *
toc = auto_toc()

import torch
from torchvision import transforms
from torch.utils.data import DataLoader, Dataset, Subset
from sklearn.model_selection import train_test_split
from torch import nn, optim
import torch.nn.functional as F
from torchsummary import summary
from IPython.display import display, HTML

import warnings
warnings.filterwarnings('ignore')

<div class="header" style="
  padding: 20px;
  background: black;">
    <h3 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:22px;
               font-style:bold;
               color:white;">
        Data Collection and Annotation
    </h3>
</div>

***

Contains functions to collect mangalist and manga sample panels. See code runs for implementation.

#### Functions

In [2]:
def get_attrs(url):
    """Get titles, urls, and ratings in the rankings page"""
    soup = bs4.BeautifulSoup(requests.get(url).content)
    mangas = soup.select('tr[class="ranking-list"]')
    
    # Titles
    titles = [
        manga.select_one('h3[class="manga_h3"]').text for manga in mangas
    ]
    
    # URLs
    urls = [
        manga.select_one('h3[class="manga_h3"] a').get('href') 
        for manga in mangas
    ]
    
    # Ratings
    ratings = [
        float(manga.select_one('td[class="score ac fs14"]').text.strip())
        for manga in mangas
    ]
    
    # Check Results
    if not len(titles) == len(urls) == len(ratings):
        raise Error
    
    return titles, urls, ratings


def get_details(url):
    """Get n_faves, genres, and authors in the manga page"""
    soup = bs4.BeautifulSoup(requests.get(url).content)
    
    # Number of Favorites
    try:
        n_fav = [list(n_fav.parent.children)[1].text
                 for n_fav in soup.select('div[class="spaceit_pad"] span')
                 if n_fav.text == 'Favorites:'][0]
        n_fav = int(re.sub(r'\D', '', n_fav))
    except:
        n_fav = None
    
    # Genre
    try:
        genre = [genre.text for genre in 
                 soup.select('div[class="spaceit_pad"]'
                             ' span[itemprop="genre"]')
                 if genre.parent.select_one('span').text == 'Genres:']
    except:
        genre = []
    
    # Authors
    try:
        author = [author.text for author in 
                  soup.select('div[class="spaceit_pad"] a')
                  if author.parent.select_one('span').text == 'Authors:']
    except:
        author = []
        
    return n_fav, genre, author


def get_data(limit):
    """Crawl `myanimelist.net` for the list of mangas and details"""
    # Load data if available
    try:
        df = load_pkl(f'df_{limit}')
        
    except:
        # Instatiate Features
        titles  = []
        urls    = []
        ratings = []

        # Crawl Ranking Pages (https://myanimelist.net/topmanga.php?limit=0)
        for start in trange(0,limit,50):
            url = f'https://myanimelist.net/topmanga.php?limit={start}'
            try:
                results = get_attrs(url)
            except:
                print(f'Crawling Ended Prematurely, Stopped @ {start}')
                break

            # Update
            titles.extend(results[0])
            urls.extend(results[1])
            ratings.extend(results[2])

        # Instatiate Other Details
        n_favs  = []
        genres  = []
        authors = []

        # Scrape each Manga Page (https://myanimelist.net/manga/2/Berserk)
        for url in tqdm(urls):
            details = get_details(url)

            # Update
            n_favs.append(details[0])
            genres.append(details[1])
            authors.append(details[2])

        df = pd.DataFrame({
            'titles': titles,
            'urls': urls,
            'ratings': ratings,
            'n_favs': n_favs,
            'genres': genres,
            'authors': authors,
        })
        save_pkl(df, f'df_{limit}')
        df.to_csv(f'df_{limit}.csv', index=False)
                 
    return df

In [3]:
def download_samples(title, n=3):
    """Download manga pages"""
    # Format title
    title_url = re.sub(r'\W', '_', title.title())
    url = f'https://w15.mangafreak.net/Read1_{title_url}_1'
    soup = bs4.BeautifulSoup(requests.get(url, timeout=2).content)
    
    # Check if manga is found
    home_title = 'Read Free Manga Online - MangaFreak'
    if soup.select_one('title').text == home_title:
        return False # Skip if not
        
    else:
        # Check for folder
        if not os.path.exists(f'./data/{title_url.lower()}'):
            os.mkdir(f'./data/{title_url.lower()}')
        
        # Check for contents
        if len(os.listdir(f'./data/{title_url.lower()}')) < 5*n:

            # Get Chapters
            chapters = [re.findall(r'_(\d+)', chapter.get('value'))[0]
                        for chapter in soup.select('option')
                        if 'Read1' in chapter.get('value')]

            # Remove folder if no valid chapters
            if len(set(chapters)) < 5:
                shutil.rmtree(f'./data/{title_url.lower()}',
                              ignore_errors=True)
                return False

            # Get Random Chapters
            for chapter in np.random.choice(chapters, 5, replace=False):
                c_url = (f'https://w15.mangafreak.net/'
                         f'Read1_{title_url}_{chapter}')
                chap_soup = bs4.BeautifulSoup(
                    requests.get(c_url, timeout=2).content
                )

                # Get Number of Pages
                try:
                    pages = int(re.findall(r'(\d+) pages',
                                           chap_soup.text)[0])
                except:
                    try:
                        pages = int([re.findall(r'Page (\d+)',
                                                page.get('alt'))[0]
                                     for page in chap_soup.select('img')
                                     if 'Page' in page.get('alt')][-1])
                    except:
                        pages = 10

                title_lower = title_url.lower()

                # Download images
                half = pages // 2

                for page in range(half-int(n/2), half+round(n/2)):
                    url = (f'https://images.mangafreak.net/mangas'
                           f'/{title_lower}/{title_lower}_{chapter}'
                           f'/{title_lower}_{chapter}_{page}.jpg')
                    !wget -q '{url}' -P './data/{title_lower}' 
    
    # If all is successful
    return True

def download_mangas(df, n=3):
    """Download images samples for the listed mangas"""
    # Create folder for data
    if not os.path.exists('./data'):
        os.mkdir('./data')
    
        # Get list of mangas with downloadable images
        mangas = []

        # For each manga
        for title in tqdm(df['titles'].tolist()):

            try:
                # Download samples
                if download_samples(title, n=3):
                    # Append the title to mangas if filled with images
                    mangas.append(title)
            except:
                continue

        # Get the relevant mangas
        df['check'] = (
            df['titles'].apply(lambda x: re.sub(r'\W', '_', x.lower()))
        )
        df.set_index('titles', inplace=True)

        df_manga = df.loc[mangas]

        print('Downloading Samples: Success!')
        
    else:
        mangas = os.listdir('./data')
        
        df['check'] = (
            df['titles'].apply(lambda x: re.sub(r'\W', '_', x.lower()))
        )
        df = df.sort_values('ratings').drop_duplicates('titles', keep='last')
        df.set_index('titles', inplace=True)
        df_manga = df[df['check'].isin(mangas)]
        
        print('Samples are already downloaded!')
    
    print(f'Download Rate: {(len(df_manga)/len(df)*100):.2f}%')
    return df_manga

In [4]:
def get_metadata(df_data):
    """Get annotations for dataset"""
    if os.path.exists('metadata.csv'):
        pass
    else:
        df_data = df_data.reset_index().set_index('check')

        data_dir = './data'
        mangas = os.listdir(data_dir)


        df_metadata = pd.DataFrame()
        for manga in mangas:
            if manga[0] == '.':
                continue
            manga_path = os.path.join(data_dir, manga)
            paths = [os.path.join(manga_path, x)
                     for x in os.listdir(manga_path)]

            # Check data
            verified = []
            for path in paths:
                try:
                    transforms.ToTensor()(Image.open(path))
                    verified.append(path)
                except:
                    continue

            # Save good data
            df_manga = pd.DataFrame({'paths': verified})
            df_manga['rating'] = [df_data.loc[manga, 'ratings']]*len(verified)
            df_manga['title'] = [df_data.loc[manga, 'titles']]*len(verified)

            df_metadata = pd.concat([df_metadata, df_manga])

        df_metadata.to_csv('metadata.csv', index=False)

#### Code Runs

In [5]:
df_top = get_data(5_000) # Top 5_000

In [6]:
df_dl = download_mangas(df_top) # Dataframe of sampled manga details

Samples are already downloaded!
Download Rate: 25.65%


In [7]:
get_metadata(df_dl) # Write annotations based from sampled mangas

#### Section Notes

* *Limited the list of mangas to only the top 5,000 highly rated mangas in `mangalist`*
* *Only sampled mangas available in the `mangafreak` website. Sampling was done by choosing random chapters and getting the panels near the middle in order to avoid colored covers.*
* *Handled duplicates in the manga lists. Retained the higher available rating.*
* *Skipped corrupted image files.*

<div class="header" style="
  padding: 20px;
  background: black;">
    <h3 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:22px;
               font-style:bold;
               color:white;">
        Data Preprocessing and EDA
    </h3>
</div>

***

Contains functions for customizing the dataset and printing out analysis on data. See code runs for implementation.

#### Functions

In [8]:
# Custom Manga Dataset
class MangaDataset(Dataset):
    """Custom Dataset for mangas"""
    def __init__(self):
        """Get annotations of mangas"""
        self.metadata = pd.read_csv('metadata.csv')
                
    def __len__(self):
        return len(self.metadata)

    def __getitem__(self, index):
        """Get manga item and apply preset transformations"""
        path = self._get_manga_path(index)
        label = self._get_manga_label(index)
        title = self._get_manga_title(index)
        
        image = transforms.ToTensor()(Image.open(path))
        if image.shape[0] == 1:
            image = image.repeat(3, 1, 1)

        dim = min(torch.tensor(image.shape[1:])).item()

        self.transform = transforms.Compose([
            transforms.Grayscale(3),
            transforms.CenterCrop(dim),
            transforms.Resize(224, antialias=True),
            transforms.RandomHorizontalFlip(p=0.5),
        ])
        
        image = self.transform(image)
        
        return image, label, path, title

    def _get_manga_path(self, index):
        return self.metadata.iloc[index, 0]

    def _get_manga_label(self, index):
        return self.metadata.iloc[index, 1]
    
    def _get_manga_title(self, index):
        return self.metadata.iloc[index, 2]

In [9]:
def plot_ratings(dataset):
    """Plot the histogram of ratings"""
    plot = (
        dataset.metadata
        .groupby(['title'])['rating'].mean()
    )
    fig, ax = plt.subplots(figsize=(15, 7))
    ax.hist(plot, bins=60, color='k', alpha=0.8)
    ax.set_ylabel('Number of Mangas')
    ax.set_xlabel('Ratings')
    toc.add_fig('Distribution of Ratings', width=100)

In [10]:
def get_baseline(dataset):
    """Print the baseline value to beat"""
    baseline = dataset.metadata.rating.std()

    display(HTML(
        f'<b>An acceptable model performance will be MSE < '
        f'{baseline:0.4f}.</b>'
    ))
    
    return baseline

#### Code Runs

In [11]:
dataset = MangaDataset()

In [12]:
toc.add_table(dataset.metadata.sample(10),
              'Data Annotations')

paths,rating,title
./data/denpa_kyoushi/denpa_kyoushi_43_10.jpg,7.66,Denpa Kyoushi
./data/fantasista/fantasista_56_9.jpg,7.65,Fantasista
./data/shuukan_shounen_hachi/shuukan_shounen_hachi_4_11.jpg,7.28,Shuukan Shounen Hachi
./data/hidamari_ga_kikoeru/hidamari_ga_kikoeru_4_17.jpg,8.18,Hidamari ga Kikoeru
./data/boku_no_hero_academia/boku_no_hero_academia_111_10.jpg,8.11,Boku no Hero Academia
./data/ojojojo/ojojojo_54_2.jpg,7.47,Ojojojo
./data/spirit_fingers/spirit_fingers_72_16.jpg,8.22,Spirit Fingers
./data/livingstone/livingstone_10_15.jpg,7.43,Livingstone
./data/nobunaga_no_chef/nobunaga_no_chef_47_10.jpg,7.74,Nobunaga no Chef
./data/dantalian_no_shoka/dantalian_no_shoka_23_19.jpg,7.5,Dantalian no Shoka


In [13]:
plot_ratings(dataset)

In [14]:
baseline = get_baseline(dataset)

In [15]:
stages = ['train', 'val', 'test']

trainval, test = train_test_split(dataset.metadata.index.tolist(),
                                  stratify=dataset.metadata.rating,
                                  test_size=0.1,
                                  random_state=143)
train, val = train_test_split(trainval,
                              stratify=dataset.metadata.loc[trainval].rating,
                              test_size=1/6,
                              random_state=143)

# Size of datasets
dataset_sizes = {}
for stage in stages:
    exec(f'dataset_sizes["{stage}"] = len({stage})')

# Subsetting of datasets for each stage/phase
datasets = {}
for stage in stages:
    exec(f'datasets["{stage}"] = Subset(dataset, {stage})')

#### Section Notes

* *Since the list of mangas were limited to the top 5,00, the distribution of ratings seemingly follows the right-tail end of a bell curve*
* *Added grayscaling transformation as added layer of ensuring uniformity.*
* *Stratified the splitting in attempt to help training.*

<div class="header" style="
  padding: 20px;
  background: black;">
    <h3 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:22px;
               font-style:bold;
               color:white;">
        Dataloader Customization
    </h3>
</div>

***

Contains functions for collation and employing outlier detection. See code runs for implementation and instatiation of the dataloaders.

#### Functions

In [16]:
def collate_fn(batch):
    """Collating function for the dataset"""
    X, Y = [], []

    # Gather in lists, and encode labels as indices
    for x, y, _, _ in batch:
        
        # Outlier Detection
        if (x.mean().item() >= dataset._mean_thresh and 
            x.std().item() >= dataset._std_thresh):

            X += [torch.tensor(x)]
            Y += [torch.tensor(y)]

    # Group the list of tensors into a batched tensor
    X = torch.stack(X)
    Y = torch.stack(Y)
    
    return X, Y

#### Code Runs

In [17]:
# Getting Training Distribution
try:
    means = load_pkl('means')
    stds = load_pkl('stds')
except:
    means = []
    stds = []
    for img_t, *_ in datasets['train']:
        means += [img_t.mean().item()]
        stds += [img_t.std().item()]
        
dataset._mean_thresh = np.quantile(means, 0.01)
dataset._std_thresh = np.quantile(stds, 0.01)

In [18]:
# Building the DataLoader
dataloaders = {}
for stage in stages:
    shuffle = stage != 'test'
    exec(f"""dataloaders["{stage}"] = DataLoader(datasets["{stage}"],
                                                 batch_size=24,
                                                 shuffle={shuffle},
                                                 collate_fn=collate_fn,
                                                 num_workers=1)""")

#### Section Notes

* *Employed a simple Out-of-Distribution Detection and Rejection mechanism via collate function. The statistics were based from the training data.*

<div class="header" style="
  padding: 20px;
  background: black;">
    <h3 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:22px;
               font-style:bold;
               color:white;">
        Pretrained Model Selection
    </h3>
</div>

***
Loading and downloading the pretrained model.

#### Code Runs

In [19]:
# Get the device for loading the model
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

# Instatiate the model
model = torch.hub.load('RF5/danbooru-pretrained', 'resnet18',
                       pretrained=False)

# Load the pretrained weights
checkpoint = torch.hub.load_state_dict_from_url(
    'https://github.com/RF5/danbooru-pretrained/releases/download/v0.1/resnet18-3f77756f.pth',
    map_location=device.type
)
state_dict = {key.replace("module.", ""): value 
              for key, value in checkpoint.items()}

model.load_state_dict(state_dict)

Using cache found in /home/msds2023/jfabrero/.cache/torch/hub/RF5_danbooru-pretrained_master


<All keys matched successfully>

#### Section Notes
* *This model was chosen primarily due to its training with anime and manga images.*

<div class="header" style="
  padding: 20px;
  background: black;">
    <h3 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:22px;
               font-style:bold;
               color:white;">
        Model Modification and Training
    </h3>
</div>

***

#### Functions

In [20]:
def train_model(model, criterion, optimizer, num_epochs=25):
    """A simple training loop"""
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_loss = 1_000_000

    for epoch in trange(num_epochs):
        log = f'Epoch {epoch+1:2d}/{num_epochs}  |'
        # Each epoch has a training and validation phase
        for stage in ['train', 'val']:
            if stage == 'train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode
                
            running_loss = 0.0

            # Iterate over data.
            for X, y in tqdm(dataloaders[stage]):
                X = X.to(device)
                y = y.float().to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(stage == 'train'):
                    out = model(X)
                    loss = criterion(out, y)
                    
                    # backward + optimize only if in training phase
                    if stage == 'train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * X.size(0)
           
            epoch_loss = running_loss / dataset_sizes[stage]
            
            if stage == 'val':
                stage = 'validation'
            log += (f'  {stage.title()} Loss: {epoch_loss:.4f}  |')

            # deep copy the model
            if stage == 'validation' and epoch_loss < best_loss:
                best_loss = epoch_loss
                best_model_wts = copy.deepcopy(model.state_dict())

        print(log)

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m'
          f'{time_elapsed % 60:.0f}s')
    print(f'Best Validation Loss: {best_loss:4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)
    
    torch.save(model.state_dict(), 'MangaModel.pth')
    
    return model

#### Code Runs

In [21]:
# Freezing the weights of the pretrained model
for param in model.parameters():
    param.requires_grad = False           

In [22]:
# Change the Last Layer and adding linear layers
model[1][8] = nn.Linear(512, 128)
if len(model[1]) != 14:
    model[1].append(nn.ReLU())
    model[1].append(nn.Dropout(0.2))
    model[1].append(nn.Linear(128, 32))
    model[1].append(nn.ReLU())
    model[1].append(nn.Linear(32, 1))
    model[1].append(nn.Threshold(0, 10))

In [23]:
# Preview or summary of the modified model for regression
details = summary(model, (3, 224, 224))

Layer (type:depth-idx)                   Output Shape              Param #
├─Sequential: 1-1                        [-1, 512, 7, 7]           --
|    └─Conv2d: 2-1                       [-1, 64, 112, 112]        (9,408)
|    └─BatchNorm2d: 2-2                  [-1, 64, 112, 112]        (128)
|    └─ReLU: 2-3                         [-1, 64, 112, 112]        --
|    └─MaxPool2d: 2-4                    [-1, 64, 56, 56]          --
|    └─Sequential: 2-5                   [-1, 64, 56, 56]          --
|    |    └─BasicBlock: 3-1              [-1, 64, 56, 56]          (73,984)
|    |    └─BasicBlock: 3-2              [-1, 64, 56, 56]          (73,984)
|    └─Sequential: 2-6                   [-1, 128, 28, 28]         --
|    |    └─BasicBlock: 3-3              [-1, 128, 28, 28]         (230,144)
|    |    └─BasicBlock: 3-4              [-1, 128, 28, 28]         (295,424)
|    └─Sequential: 2-7                   [-1, 256, 14, 14]         --
|    |    └─BasicBlock: 3-5              [-1, 256, 

In [24]:
model.to(device)

# Set the loss function for Regression
criterion = nn.MSELoss()

# Only the parameters of the regressor are being optimized
optimizer = optim.Adam(model[1].parameters(), lr=0.001)

In [25]:
# Load the retrained model, else, train
try:
    model.load_state_dict(torch.load('MangaModel_v2.pth'))
    MangaModel = model
except:
    MangaModel = train_model(model, criterion, optimizer, num_epochs=50)

#### Section Notes

* *Minimized training cost by freezing most of the model weights and changing/appending linear layers*
* *Added a threshold to discourage model from sacrificing certain datapoints to achieve lower loss, resulting in poor generalizability*

<div class="header" style="
  padding: 20px;
  background: black;">
    <h3 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:22px;
               font-style:bold;
               color:white;">
        Model Evaluation
    </h3>
</div>

***

Contains functions to print out evaluation metrics and sample ratings. See code runs for implementation.

#### Functions

In [26]:
def eval_model(model, baseline):
    """Evaluate model on test set"""
    model.eval()   # Set model to evaluate mode
    mse = 0.0
    mae = nn.L1Loss()
    mae_losses = 0.0
    
    # Iterate over data.
    for X, y in tqdm(dataloaders['test']):
        X = X.to(device)
        y = y.float().to(device)
        out = model(X)
        loss = criterion(out, y)
        mae_loss = mae(out, y)
        
        # statistics
        mse += loss.item() * X.size(0)
        mae_losses += mae_loss.item() * X.size(0)
        
    test_loss = mse / dataset_sizes['test']
    test_mae_loss = mae_losses / dataset_sizes['test']
    
    if test_loss < baseline:
        evaluation = 'ManGanda is performing better than the set baseline!'
    else:
        evaluation = 'ManGanda needs more fine-tuning'
        
    display(HTML(
        '<b>'
        f'Test MSE Loss - {test_loss:.2f}<br>'
        f' Baseline - {baseline:.2f}<br><br>'
        'Other Metrics:<br>'
        f'Test MAE Loss - {test_mae_loss:.2f}<br>'
        '----------------------------------------------------------------<br>'
        f'{evaluation}'
        '</b>'
    ))

In [27]:
def plot_samples(model, dataset, n=3):
    """Plot sample images and print prediction"""
    df = dataset.metadata
    mangas = np.random.choice(df.title.unique(), n, replace=False)
    
    for i, manga in enumerate(mangas):
        panels = np.random.choice(df[df['title'] == manga].index,
                                  3,
                                  replace=False)

        fig, ax = plt.subplots(1, 3, figsize=(15, 5))

        sample_data = []
        for j, idx in enumerate(panels):
            data = dataset[idx]
            sample_data.append(data)
            ax[j].imshow(transforms.ToPILImage()(data[0]))
            ax[j].axis('off')

        x, y = collate_fn(sample_data)
        
        if y.shape[0] < 1:
            print('All sampled images are outliers')
        x = x.to(device)
        out = model(x)

        toc.add_fig(f'Sample Prediction # {i+1} - {manga}', width=100)
        display(HTML(
            '<center><b>'
            f'Average Model Prediction - {out.mean().item():.2f}<br>'
            f'   Actual Rating - {y.mean().item():.2f}'
            '</b><center><br><br><br>'
        ))

#### Code Runs

In [28]:
eval_model(MangaModel, baseline)

  0%|          | 0/37 [00:00<?, ?it/s]

In [29]:
plot_samples(MangaModel, dataset)

***
<div class="header" style="
  padding: 20px;
  background: black;">
    <h2 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:30px;
               font-style:bold;
               color:white;">
        Conclusion
    </h2>
</div>

***

Using different techniques for handling images, ManGanda was born. As evaluated using a test set, our model was able to achieve Test MSE of $0.16$ and Test MAE of $0.31$. With this, ManGanda is able to perform well enough to predict manga ratings, off only by $\pm0.31$ on average.

The performance of this model implies that manga ratings, however multidimensional, are dominantly influenced by the level of artistry. Such thought-provoking findings!

Over-all, ManGanda has more room to improve and can be further leverage for XAI in cracking down the key visual elements and guide artists of what works or don't.

***
<div class="header" style="
  padding: 20px;
  background: black;">
    <h2 style="font-family:Copperplate, Papyrus, fantasy;
               font-size:30px;
               font-style:bold;
               color:white;">
        References
    </h2>
</div>

***

[1] Matthew Baas. (2019). Danbooru2018 pretrained resnet models for PyTorch. GitHub. https://github.com/RF5/danbooru-pretrained