# Mobile Net V2 Fine Tuning 
- Fine tune of [google/mobilenet_v2_1.0_224] (https://huggingface.co/google/mobilenet_v2_1.0_224)
- Inspired by the blog "Making ML-powered web games with Transformers.js" [blog post](https://huggingface.co/blog/ml-web-games)

In [1]:
pip -q install accelerate datasets evaluate transformers'[torch]' scikit-learn numpy 

Note: you may need to restart the kernel to use updated packages.


# Packages
- Hugging face dataset api for loading and preparing the dataset 
- Hugging face transformers api for loading, training, and testing the model 
- numpy for converting to pytroch tensors 
- torch for actually running the model 


In [5]:
from datasets import load_dataset, DatasetDict, Dataset
from transformers import MobileNetV2ImageProcessor, TrainingArguments, MobileNetV2ForImageClassification, Trainer, AutoModelForImageClassification
import numpy as np
import evaluate
import torch 

## Dataset Preparation

- dataset is a subset of the google quickdraw dataset[link](https://huggingface.co/datasets/Xenova/quickdraw-small) 
- hugging face handles a lot of the hard work for us and converts everything to a dataset dictionary object
- dictionary is divided into three subdictionaries 
    - training (4.5 million images 90%)
    - test (250,000 5%)
    - validation (250,000 5%)
- images are pil 


In [6]:
dataset:DatasetDict = load_dataset('Xenova/quickdraw-small')
dataset

DatasetDict({
    train: Dataset({
        features: ['image', 'label'],
        num_rows: 4500000
    })
    test: Dataset({
        features: ['image', 'label'],
        num_rows: 250000
    })
    valid: Dataset({
        features: ['image', 'label'],
        num_rows: 250000
    })
})

In [13]:
image_processor_config = {
    "crop_size": {
        "height": 28,
        "width": 28
    },
    "do_center_crop": True,
    "do_normalize": True,
    "do_rescale": True,
    "do_resize": True,
    "image_mean": [0.5],
    "image_processor_type": "MobileNetV2FeatureExtractor",
    "image_std": [0.5],
    "resample": 2,
    "rescale_factor": 0.00392156862745098,
    "size": {
        "shortest_edge": 28
    }
}

new_processor = MobileNetV2ImageProcessor(**image_processor_config)

We'll also define a `transform` function, which will be used to transform the batches of our dataset into the correct format. This includes taking the list of PIL images and extracting their pixel values. One important thing to note is that we have to add the channel dimension (of 1) back before running the processor.

In [14]:
def transform(example_batch):
    inputs = new_processor([
        # [h, w] -> [c, h, w]
        # => [28, 28] -> [1, 28, 28]
        np.expand_dims(np.array(x), 0)
        for x in example_batch['image']
    ], return_tensors='pt')

    # Don't forget to include the labels!
    inputs['label'] = example_batch['label']
    return inputs

transformed_dataset = dataset.with_transform(transform)

Taking the first 2 samples of the transformed dataset, we see that the shape is as expected!

## Training and Evaluation

Before we start training, let's set up our data collator and evaluation metric.



[Data collators](https://huggingface.co/docs/transformers/main_classes/data_collator) are objects that will form a batch by using a list of dataset elements as input. In our case, it's as simple as stacking the pixel values of the inputs (in the batch dimension) and creating a tensor of the labels in the batch.

In [15]:
def collate_fn(batch):
    return {
        'pixel_values': torch.stack([x['pixel_values'] for x in batch]),
        'labels': torch.tensor([x['label'] for x in batch])
    }

In [16]:
metric = evaluate.load("accuracy")
def compute_metrics(p):
    return metric.compute(predictions=np.argmax(p.predictions, axis=1), references=p.label_ids)

Now, we could start training a new model from scratch (starting with random weights), but a far better approach is to [finetune a pretrained model](https://huggingface.co/docs/transformers/training). In our case, we will finetune [apple/mobilevit-small](https://huggingface.co/apple/mobilevit-small), which has been pretrained on ImageNet-1k. This will greatly reduce up our training times, as the pretrained model already has learnt some important concepts which can be used to classify other types of images (like edge detection). We can do this using `MobileViTForImageClassification.from_pretrained`, and then specifying some of our custom settings.

In [2]:
labels = transformed_dataset['train'].features['label'].names
model = MobileNetV2ForImageClassification.from_pretrained(
    'google/mobilenet_v2_1.0_224',
    num_labels=len(labels),
    id2label={str(i): c for i, c in enumerate(labels)},
    label2id={c: str(i) for i, c in enumerate(labels)},
    ignore_mismatched_sizes=True,
    num_channels=1,
    image_size=28,
)

NameError: name 'transformed_dataset' is not defined

We can specify the parameters to use for training using the `TrainingArguments` dataclass. Feel free to adjust the settings to your liking: you can find the list [here](https://huggingface.co/docs/transformers/v4.30.0/en/main_classes/trainer#transformers.TrainingArguments).

In [56]:
training_args = TrainingArguments(
  output_dir="./doodle_mvit-small50/",
  per_device_train_batch_size=512,
  per_device_eval_batch_size=512,
  evaluation_strategy="steps",
  logging_strategy="steps",
  num_train_epochs=10,
  fp16=True,
  save_steps=5000,
  eval_steps=5000,
  logging_steps=1000,
  learning_rate=8e-4,
  save_total_limit=3,
  remove_unused_columns=False,
  push_to_hub=False,
  report_to='none',
  dataloader_num_workers=2,
)


We can now create a new `Trainer` object, which will contain everything we just created:

In [57]:
trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=collate_fn,
    compute_metrics=compute_metrics,
    train_dataset=transformed_dataset["train"],
    eval_dataset=transformed_dataset["valid"],
    tokenizer=new_processor,
)


dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)


... and finally, we can start training! Go grab a cup of coffee... you've earned it! I'll see you in a couple of hours...

In [58]:
train_results = trainer.train()
trainer.save_model()
trainer.log_metrics("train", train_results.metrics)
trainer.save_metrics("train", train_results.metrics)
trainer.save_state()

  self.pid = os.fork()


Step,Training Loss,Validation Loss


  self.pid = os.fork()


KeyboardInterrupt: 

Once the model has finished training, we can evaluate it on both the validation and test set:

In [None]:
metrics = trainer.evaluate(transformed_dataset['test'])
trainer.log_metrics("test", metrics)
trainer.save_metrics("test", metrics)

***** test metrics *****
  epoch                   =        7.0
  eval_accuracy           =     0.7093
  eval_loss               =     1.1665
  eval_runtime            = 0:00:16.41
  eval_samples_per_second =  15226.188
  eval_steps_per_second   =     29.782


If you want, you can upload your model to the Hugging Face Hub. Note that in order to push to the hub, you must have git-lfs installed and be logged into your Hugging Face account (which can be done via `huggingface-cli login`).

In [None]:
from huggingface_hub import login
login()

In [None]:
trainer.push_to_hub('uno')