# **Cats and Dogs Classify**

***

## Preparação do ambiente

### Criação da pasta *data/*

Pasta que vai conter toda a base de dados

In [85]:
!test ! -d data && mkdir data

### Download do arquivo compactado de dados

caso os dados já tenham sido baixados na pasta *data/*, o download não será feito novamente

In [69]:
!test ! -d data/images && ! -f data/images.tar.gz && wget -P data https://www.robots.ox.ac.uk/~vgg/data/pets/data/images.tar.gz

### Extração dos dados compactados para a pasta *data/images/*

a extração ocorrerá somente se o arquivo compactado de dados existir dentro da pasta *data/*

In [70]:
!test -f data/images.tar.gz && tar xf data/images.tar.gz -C data

### Remoção do arquivo compactado de dados da pasta *data/*

In [84]:
!test -f data/images.tar.gz && rm data/images.tar.gz

### Criação e ativação do ambiente virtual (*venv*) juntamente com o download das bibliotecas Python necessárias

caso o ambiente virtual já tenha sido criado, ele será ativado somente

In [72]:
%%bash

venv_name="cats_and_dogs"
venv_path="$VIRTUAL_ENV/envs/$venv_name"

if [ -d "$venv_name" ]; then
    echo "O ambiente $venv_name já existe, ativando."

    source "$venv_name"/bin/activate
else
    pip install virtualenv

    virtualenv -p python3.10 "$venv_name"
    
    chmod +x "$venv_name"/bin/activate
    
    source "$venv_name"/bin/activate
    
    pip install -r requirements.txt
fi

O ambiente cats_and_dogs já existe, ativando.


### Bibliotecas utilizadas

In [73]:
import os
import glob
import random
import numpy as np
import pandas as pd
import tensorflow as tf

***

## Criação do *dataset*

### Setando a *seed*

In [74]:
seed_value = 42

os.environ['PYTHONHASHSEED'] = str(seed_value)

random.seed(seed_value)

np.random.seed(seed_value)
    
tf.random.set_seed(seed_value)

### Extraindo o caminho de cada imagem

In [75]:
images_path_list = glob.glob(os.path.join('data/images','*.jpg'))

images_path_list

['data/images/wheaten_terrier_192.jpg',
 'data/images/saint_bernard_135.jpg',
 'data/images/boxer_128.jpg',
 'data/images/keeshond_38.jpg',
 'data/images/pomeranian_180.jpg',
 'data/images/Siamese_14.jpg',
 'data/images/British_Shorthair_101.jpg',
 'data/images/japanese_chin_197.jpg',
 'data/images/Birman_19.jpg',
 'data/images/american_bulldog_56.jpg',
 'data/images/leonberger_51.jpg',
 'data/images/Siamese_130.jpg',
 'data/images/english_cocker_spaniel_5.jpg',
 'data/images/Abyssinian_176.jpg',
 'data/images/Siamese_254.jpg',
 'data/images/wheaten_terrier_150.jpg',
 'data/images/Sphynx_61.jpg',
 'data/images/great_pyrenees_79.jpg',
 'data/images/miniature_pinscher_165.jpg',
 'data/images/american_bulldog_102.jpg',
 'data/images/Ragdoll_5.jpg',
 'data/images/shiba_inu_10.jpg',
 'data/images/British_Shorthair_148.jpg',
 'data/images/pomeranian_142.jpg',
 'data/images/english_setter_46.jpg',
 'data/images/boxer_92.jpg',
 'data/images/British_Shorthair_90.jpg',
 'data/images/pug_135.jpg'

### Extraindo o nome das raças de gato e cachorro

In [76]:
images_path = [image_path.split('/')[-1] for image_path in images_path_list]

images_name_list = [image_name[:image_name.rfind('_')] for image_name in images_path]

images_name_list

['wheaten_terrier',
 'saint_bernard',
 'boxer',
 'keeshond',
 'pomeranian',
 'Siamese',
 'British_Shorthair',
 'japanese_chin',
 'Birman',
 'american_bulldog',
 'leonberger',
 'Siamese',
 'english_cocker_spaniel',
 'Abyssinian',
 'Siamese',
 'wheaten_terrier',
 'Sphynx',
 'great_pyrenees',
 'miniature_pinscher',
 'american_bulldog',
 'Ragdoll',
 'shiba_inu',
 'British_Shorthair',
 'pomeranian',
 'english_setter',
 'boxer',
 'British_Shorthair',
 'pug',
 'scottish_terrier',
 'german_shorthaired',
 'basset_hound',
 'boxer',
 'boxer',
 'samoyed',
 'saint_bernard',
 'yorkshire_terrier',
 'newfoundland',
 'wheaten_terrier',
 'Sphynx',
 'german_shorthaired',
 'Sphynx',
 'Russian_Blue',
 'Bombay',
 'basset_hound',
 'shiba_inu',
 'Maine_Coon',
 'saint_bernard',
 'basset_hound',
 'scottish_terrier',
 'Egyptian_Mau',
 'leonberger',
 'english_setter',
 'boxer',
 'samoyed',
 'american_bulldog',
 'american_bulldog',
 'Sphynx',
 'english_setter',
 'english_setter',
 'japanese_chin',
 'chihuahua',
 '

Percebe-se aqui que os nomes das raças de gatos estão em **maiúsculo** e os de cachorro em **minúsculo**

In [77]:
[image_name for image_name in images_name_list if image_name[0].isupper()][:10]

['Siamese',
 'British_Shorthair',
 'Birman',
 'Siamese',
 'Abyssinian',
 'Siamese',
 'Sphynx',
 'Ragdoll',
 'British_Shorthair',
 'British_Shorthair']

In [78]:
[image_name for image_name in images_name_list if not image_name[0].isupper()][:10]

['wheaten_terrier',
 'saint_bernard',
 'boxer',
 'keeshond',
 'pomeranian',
 'japanese_chin',
 'american_bulldog',
 'leonberger',
 'english_cocker_spaniel',
 'wheaten_terrier']

### Definindo um identificador numérico para cada **classe**

* 1 se gato
* 0 se cachorro

In [79]:
species_id_list = [1 if image_name[0].isupper() else 0 for image_name in images_name_list]

species_id_list

[0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 1,
 1,
 1,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 0,
 1,
 0,
 0,
 1,
 0,
 1,
 0,
 1,
 0,
 0,
 1,
 1,
 0,
 0,
 1,
 1,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 0,
 0,
 0,
 0,
 0,
 1,
 1,
 1,
 0,
 0,
 0,
 1,
 0,
 0,
 0,
 0,
 0,
 0,
 1,
 0,
 0,
 1,
 1,
 0,


### Criando o *DataFrame* de dados

In [80]:
dataset = pd.DataFrame({'image_path': images_path_list, 'name': images_name_list, 'specie': species_id_list})

dataset

Unnamed: 0,image_path,name,specie
0,data/images/wheaten_terrier_192.jpg,wheaten_terrier,0
1,data/images/saint_bernard_135.jpg,saint_bernard,0
2,data/images/boxer_128.jpg,boxer,0
3,data/images/keeshond_38.jpg,keeshond,0
4,data/images/pomeranian_180.jpg,pomeranian,0
...,...,...,...
7385,data/images/samoyed_91.jpg,samoyed,0
7386,data/images/Siamese_64.jpg,Siamese,1
7387,data/images/boxer_96.jpg,boxer,0
7388,data/images/Siamese_128.jpg,Siamese,1


***

## Avaliando os Dados

verificar a quantidade de cada classe para evitar desbalanceamento, para isso será criado um conjunto (*set*) com todas as raças de ambas as classes

In [81]:
classes_set = set(images_name_list)

classes_set

{'Abyssinian',
 'Bengal',
 'Birman',
 'Bombay',
 'British_Shorthair',
 'Egyptian_Mau',
 'Maine_Coon',
 'Persian',
 'Ragdoll',
 'Russian_Blue',
 'Siamese',
 'Sphynx',
 'american_bulldog',
 'american_pit_bull_terrier',
 'basset_hound',
 'beagle',
 'boxer',
 'chihuahua',
 'english_cocker_spaniel',
 'english_setter',
 'german_shorthaired',
 'great_pyrenees',
 'havanese',
 'japanese_chin',
 'keeshond',
 'leonberger',
 'miniature_pinscher',
 'newfoundland',
 'pomeranian',
 'pug',
 'saint_bernard',
 'samoyed',
 'scottish_terrier',
 'shiba_inu',
 'staffordshire_bull_terrier',
 'wheaten_terrier',
 'yorkshire_terrier'}

### Quantidade de raças da classe gato

In [82]:
cats_set = {cat for cat in classes_set if cat[0].isupper()}

len(cats_set)

12

### Quantiade de raças da classe cachorro

In [83]:
dogs_set = {dog for dog in classes_set if not dog[0].isupper()}

len(dogs_set)

25