# **Microsoft Vision classification example**

This example shows a simple way to use Microsoft vision with PyTorch for transfer learning and feature extraction from the Microsoft Vision model.

This shows an example to plug-in a fully connected Neural network on top of the vision model which provides features from the data..

Reading the necessary imports.

In [1]:
import time
import torch
import numpy as np
import torch.nn as nn
from torch import Tensor
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader,TensorDataset
from torchvision.datasets import CIFAR10
import torchvision.transforms as transforms
from progressbar import progressbar

##  Install the Microsoft Vision

In [2]:
!pip install microsoftvision

import microsoftvision

Collecting microsoftvision
  Downloading https://files.pythonhosted.org/packages/71/db/65a4aebd1eac4c5920ac5fcf7c964f9834675b129ef82871435ea902b393/microsoftvision-1.0.5-py3-none-any.whl
Collecting azure-identity
[?25l  Downloading https://files.pythonhosted.org/packages/2a/35/64e29615e7709c10c4f1d4310a8c13a6770142e9fcb9358fb8fa4d9b1578/azure_identity-1.5.0-py2.py3-none-any.whl (103kB)
[K     |████████████████████████████████| 112kB 22.0MB/s 
Collecting azure-storage-blob
[?25l  Downloading https://files.pythonhosted.org/packages/09/14/4ca417a9c92b0fb93516575dd7be9b058bf13d531dcc21239b5f8f216a69/azure_storage_blob-12.8.0-py2.py3-none-any.whl (341kB)
[K     |████████████████████████████████| 348kB 14.4MB/s 
Collecting msal<2.0.0,>=1.6.0
[?25l  Downloading https://files.pythonhosted.org/packages/e6/69/83ffc3004a19140a3c5d7151d7f79c280ac1b40a425fe5308b879eefcf25/msal-1.10.0-py2.py3-none-any.whl (60kB)
[K     |████████████████████████████████| 61kB 6.9MB/s 
Collecting azure-core<2.0.

## Preprocess the Input Images

Microsoft Vision model is using images in BGR format, hence the swapping of image channels at the end of preprocessing

In [3]:
class Preprocess:
    def __init__(self):
        self.preprocess = transforms.Compose([
                                           transforms.Resize(224),
                                           transforms.CenterCrop(224),
                                           transforms.ToTensor(),
                                           transforms.Normalize(mean=[0.406, 0.456, 0.485], std=[0.225, 0.224, 0.229])])

    def __call__(self, x):
        return self.preprocess(x)[[2,1,0],:,:]

Import the CIFAR-10 dataset with the division to train and test sets. This can be replaced with any dataset without any changes to the rest of the code.

Even this can be replaced with PyTorch with torchvision.datasets.ImageFolder

A generic data loader where the images are arranged in this way:

>root/dog/xxx.png<br>
> root/dog/xxy.png<br>
>root/dog/[...]/xxz.png<br>
>root/cat/123.png<br>
>root/cat/nsdf3.png<br>
>root/cat/[...]/asd932_.png<br>

In [4]:
train_dataset = CIFAR10('path', download=True, train=True, transform=Preprocess())
test_dataset = CIFAR10('path', download=True, train=False, transform=Preprocess())

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to path/cifar-10-python.tar.gz


HBox(children=(FloatProgress(value=0.0, max=170498071.0), HTML(value='')))


Extracting path/cifar-10-python.tar.gz to path
Files already downloaded and verified


## Loading Microsoft Vision pretrained model

In [5]:
model = microsoftvision.models.resnet50(pretrained=True)

Loading Microsoft Vision pretrained model
Downloading model.


  0%|          | 0/23 [00:00<?, ?MB/s]

Model size: 89 MB


100%|██████████| 23/23 [00:36<00:00,  1.61s/MB]

Model saved to MicrosoftVision.ResNet50.tar





In [6]:
# using GPU for speed-up computation
model.eval()
model.cuda()

ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

## Feature Extarction

In [8]:
def get_features(dataset, model):
    all_features = []
    all_labels = []

    with torch.no_grad():
        for images, labels in progressbar(DataLoader(dataset, batch_size=128, num_workers=8)):
            images = images.cuda()
            labels = labels.cuda()
            features = model(images)

            all_features.append(features)
            all_labels.append(labels)

    return torch.cat(all_features).cpu(), torch.cat(all_labels).cpu()

In [9]:
train_features, train_labels = get_features(train_dataset, model)
test_features, test_labels = get_features(test_dataset, model)

  cpuset_checked))
100% (391 of 391) |######################| Elapsed Time: 0:02:30 Time:  0:02:30
100% (79 of 79) |########################| Elapsed Time: 0:00:31 Time:  0:00:31


In [100]:
# train features preprocessing

train_labels = train_labels.to(dtype=torch.float)
train = TensorDataset( train_features, train_labels)

# Create a data loader from the features
train_loader = DataLoader(train, batch_size= 30,shuffle=True)

In [101]:
# test features preprocessing

test_labels = test_labels.to(dtype=torch.float)
test = TensorDataset( Tensor(test_features), Tensor(test_labels))

# Create a data loader from the features
test_loader = DataLoader(test, batch_size= 10)

## Custom Fully Connected Model

In [137]:
import torch.nn as nn
import torch.nn.functional as F

class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.fc = nn.Linear(2048, 1024)
        self.dropout_layer = nn.Dropout(p=0.5)
        self.out = nn.Linear(1024, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.dropout_layer(self.relu(self.fc(x)))
        x = self.out(x)
        return x

#instantiate our Neural Network class and moving it to the GPU
network =  Network().cuda()

In [138]:

#initialize Cross Entropy loss function
criterion = nn.CrossEntropyLoss()


# set Adam as optimizer function
optimizer = optim.Adam(network.parameters(), lr=0.0001) # Optimizer



In [139]:
dataset_sizes = {'train':len(train_labels),'test':len(test_labels)}

In [140]:
device = 'cuda'
def train_model(model, criterion, optimizer, num_epochs=20):
    since = time.time()

    for epoch in range(num_epochs):
        print('Epoch {}/{}'.format(epoch, num_epochs - 1))
        print('-' * 10)

        model.train()  # Set model to training mode


        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        for inputs, labels in progressbar(train_loader):
            inputs, labels = inputs.to(device), labels.to(device)
            labels = labels.to(dtype=torch.long)

            # zero the parameter gradients
            optimizer.zero_grad()

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # statistics
            running_loss += loss.item() * inputs.size(0)
            running_corrects += torch.sum(preds == labels.data)

        epoch_loss = running_loss / dataset_sizes['train']
        epoch_acc = running_corrects.double() / dataset_sizes['train']

        print(' Loss: {:.4f} Acc: {:.4f}'.format(
             epoch_loss, epoch_acc))



    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(
        time_elapsed // 60, time_elapsed % 60))
    return model

In [141]:
model_ft = train_model(network, criterion,optimizer,num_epochs=5)

  2% (43 of 1667) |                      | Elapsed Time: 0:00:00 ETA:   0:00:04

Epoch 0/4
----------


100% (1667 of 1667) |####################| Elapsed Time: 0:00:03 Time:  0:00:03
  5% (85 of 1667) |#                     | Elapsed Time: 0:00:00 ETA:   0:00:03

 Loss: 0.2625 Acc: 0.9115
Epoch 1/4
----------


100% (1667 of 1667) |####################| Elapsed Time: 0:00:03 Time:  0:00:03
  5% (85 of 1667) |#                     | Elapsed Time: 0:00:00 ETA:   0:00:03

 Loss: 0.1985 Acc: 0.9304
Epoch 2/4
----------


100% (1667 of 1667) |####################| Elapsed Time: 0:00:03 Time:  0:00:03
  3% (64 of 1667) |                      | Elapsed Time: 0:00:00 ETA:   0:00:03

 Loss: 0.1784 Acc: 0.9378
Epoch 3/4
----------


100% (1667 of 1667) |####################| Elapsed Time: 0:00:03 Time:  0:00:03
  3% (64 of 1667) |                      | Elapsed Time: 0:00:00 ETA:   0:00:03

 Loss: 0.1659 Acc: 0.9414
Epoch 4/4
----------


100% (1667 of 1667) |####################| Elapsed Time: 0:00:03 Time:  0:00:03


 Loss: 0.1543 Acc: 0.9459
Training complete in 0m 19s


## Testing the model with Test Dataset

In [142]:
correct = 0
total = 0
with torch.no_grad():
    for data in progressbar(test_loader):
        images, labels = data
        images, labels = images.to(device), labels.to(device)
        outputs = model_ft(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the network on the test images: %d %%' % (
    100 * correct / total))

100% (1000 of 1000) |####################| Elapsed Time: 0:00:00 Time:  0:00:00


Accuracy of the network on the test images: 93 %
