# Universidade Federal do Rio Grande do Norte


## Programa de Pós-Graduação em Engenharia Elétrica e de Computação
## EEC1509 - Aprendizagem de Máquina


# Group

## João Lucas Correia Barbosa de Farias

## Júlio Freire Peixoto Gomes


# Project 2 - Traffic Sign Recognition


## About the Project
This project is divided in 6 files including this one, where each one represents one step in the process of deploying a machine learning algorithm. In this case, we chose a Neural Network algorithm as Classifier. The goal is to explore learning, generalization and batch-normalization techniques and compare results.

The dataset has over 50k images of traffic signs. Our goal is to predict which sign a specific image refers to.


### The details about the dataset are shown below.

The German Traffic Sign Benchmark is a multi-class, single-image classification challenge held at the International Joint Conference on Neural Networks (IJCNN) 2011.

*   Single-image, multi-class classification problem
*   More than 40 classes
*   More than 50,000 images in total
*   Large, lifelike database

For more information, visit:

https://www.kaggle.com/datasets/meowmeowmeowmeowmeow/gtsrb-german-traffic-sign

Also, for each class, that is a respective shape, color and sign id's. They are describred as follows:



1.   Shape ID
  *   0: red
  *   1: blue
  *   2: yellow
  *   3: white
2.   Color ID
  *   0: triangle
  *   1: circle
  *   2: diamond
  *   3: hexagon
  *   4: inverse-triangle
3.   Sign ID
  *   float: value according to Ukranian Traffic Rule

## The dataset was taken from Kaggle:
https://www.kaggle.com/datasets/uciml/red-wine-quality-cortez-et-al-2009

# 1.0 Install and Load Libraries


In [None]:
%%capture
# install wandb
!pip install wandb

In [None]:
%%capture
# install pytest
!pip install pytest pytest-sugar

In [None]:
import wandb

# 2.0 Data Check

After the preprocessing stage, we need to check the data to see if it is in accordance with what we expect

## 2.1 Login to Weights & Biases

In [None]:
# login to wandb
!wandb login --relogin

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
[34m[1mwandb[0m: Paste an API key from your profile and hit enter, or press ctrl+c to quit: 
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


## 2.2 Write a .py file to run pytest on

In [None]:
%%file test_data.py
import pytest
import wandb
import numpy as np
import h5py

# This is global so all tests are collected under the same run
run = wandb.init(project="traffic_sign_recognition", job_type="data_checks")

@pytest.fixture(scope="session")
def data():

    train_local_path = run.use_artifact("ppgeec-ml-jj/traffic_sign_recognition/preprocessed_data_train.h5:latest").file()
    labels_local_path = run.use_artifact("ppgeec-ml-jj/traffic_sign_recognition/preprocessed_data_train_labels.csv:latest").file()

    image_data = []
    with h5py.File(train_local_path, 'r') as hf:
      images = list(hf.keys())
      for img in images:
        data = hf[img]
        data_array = np.array(data)
        image_data.append(np.array(data_array))
    image_data = np.array(image_data)

    labels = np.loadtxt(labels_local_path, delimiter=',')

    tensor = [image_data, labels]

    return tensor

def test_data_length(data):
    """
    Here we test if the train set has at least 5000 images
    """
    assert len(data[0]) > 5000


def test_size_of_imagem(data):
    """
    Here we test if all images have 3 channels (RGB).
    """

    RGB = True

    for i in data[0]:
      if i.shape[2] != 3:
        RGB = False

    assert RGB

def test_column_ranges(data):

    for i in data[0]:
      range_img_0 = np.where(i < 0)[0]
      range_img_255 = np.where(i > 255)[0]

    assert f'Images number {range_img_0} has values less than 0'
    assert f'Images number {range_img_255} has values great than 0'

def test_num_labels(data):

    unique = len(np.unique(data[1]))
    min = int(np.min(data[1]))
    max = int(np.max(data[1]))

    assert unique == 43
    assert min == 0
    assert max == 42

run.finish()

Overwriting test_data.py


In [None]:
# running pytest
!pytest . -vv

[1mTest session starts (platform: linux, Python 3.7.13, pytest 3.6.4, pytest-sugar 0.9.5)[0m
cachedir: .pytest_cache
rootdir: /content, inifile:
plugins: typeguard-2.7.1, sugar-0.9.5

 [36mtest_data.py[0m::test_data_length[0m [32m✓[0m                                 [32m25% [0m[40m[32m█[0m[40m[32m█▌       [0m
 [36mtest_data.py[0m::test_size_of_imagem[0m [32m✓[0m                              [32m50% [0m[40m[32m█[0m[40m[32m█[0m[40m[32m█[0m[40m[32m██     [0m
 [36mtest_data.py[0m::test_column_ranges[0m [32m✓[0m                               [32m75% [0m[40m[32m█[0m[40m[32m█[0m[40m[32m█[0m[40m[32m██[0m[40m[32m█[0m[40m[32m█▌  [0m
 [36mtest_data.py[0m::test_num_labels[0m [32m✓[0m                                 [32m100% [0m[40m[32m█[0m[40m[32m█[0m[40m[32m█[0m[40m[32m██[0m[40m[32m█[0m[40m[32m█[0m[40m[32m█[0m[40m[32m██[0m
test_data.py::test_data_length
    image_data = np.array(image_data)


Results (12.43s