## Step 01: Setup
Start out by installing the experiment tracking library and setting up your free W&B account:

* **pip install wandb** – Install the W&B library
* **import wandb** – Import the wandb library
* **wandb login** – Login to your W&B account so you can log all your metrics in one place

In [1]:
!pip install wandb -qU

[K     |████████████████████████████████| 1.9 MB 4.7 MB/s 
[K     |████████████████████████████████| 168 kB 44.0 MB/s 
[K     |████████████████████████████████| 182 kB 48.1 MB/s 
[K     |████████████████████████████████| 62 kB 134 kB/s 
[K     |████████████████████████████████| 168 kB 49.3 MB/s 
[K     |████████████████████████████████| 166 kB 44.9 MB/s 
[K     |████████████████████████████████| 166 kB 3.2 MB/s 
[K     |████████████████████████████████| 162 kB 42.9 MB/s 
[K     |████████████████████████████████| 162 kB 12.4 MB/s 
[K     |████████████████████████████████| 158 kB 22.9 MB/s 
[K     |████████████████████████████████| 157 kB 49.4 MB/s 
[K     |████████████████████████████████| 157 kB 15.3 MB/s 
[K     |████████████████████████████████| 157 kB 37.7 MB/s 
[K     |████████████████████████████████| 157 kB 9.0 MB/s 
[K     |████████████████████████████████| 157 kB 40.6 MB/s 
[K     |████████████████████████████████| 157 kB 19.0 MB/s 
[K     |████████████████████

In [2]:
import wandb
wandb.login()

ERROR:wandb.jupyter:Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.


<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


True

## Step 02: Download the dataset file

In [9]:
!pip install -U --no-cache-dir gdown --pre

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting gdown
  Downloading gdown-4.5.4-py3-none-any.whl (14 kB)
Installing collected packages: gdown
  Attempting uninstall: gdown
    Found existing installation: gdown 4.4.0
    Uninstalling gdown-4.4.0:
      Successfully uninstalled gdown-4.4.0
Successfully installed gdown-4.5.4


In [10]:
!gdown --no-cookies https://drive.google.com/uc?id=1PPknAezTQrEQMss2SeDnZPRhH4wXGG49

Downloading...
From: https://drive.google.com/uc?id=1PPknAezTQrEQMss2SeDnZPRhH4wXGG49
To: /content/age_gender.csv
100% 200M/200M [00:01<00:00, 111MB/s] 


### Import packages

In [11]:
from imutils import paths
import os
import logging
import cv2
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

In [12]:
# configure logging
# reference for a logging obj
logger = logging.getLogger()

# set level of logging
logger.setLevel(logging.INFO)

# create handlers
c_handler = logging.StreamHandler()
c_format = logging.Formatter(fmt="%(asctime)s %(message)s",datefmt='%d-%m-%Y %H:%M:%S')
c_handler.setFormatter(c_format)

# add handler to the logger
logger.handlers[0] = c_handler

## Generating images

In [13]:
df = pd.read_csv("age_gender.csv")
df['pixels'] = df['pixels'].map(lambda pixels: np.array(pixels.split(" "), dtype="float32"))
imagens = np.array(df['pixels'].to_list())
imagens = imagens.reshape(imagens.shape[0],48,48)

In [14]:
os.mkdir("imagens")
os.makedirs('imagens/crianca')
os.makedirs('imagens/adolecente')
os.makedirs('imagens/adulto')
os.makedirs('imagens/idoso')

In [15]:
for index, imagem in enumerate(imagens):
  idade = int(df.iloc[index]['age'])
  if idade <= 11:
    cv2.imwrite(f'imagens/crianca/pessoa{index}.jpeg', imagem)
  elif idade >= 12 and idade <= 20:
    cv2.imwrite(f'imagens/adolecente/pessoa{index}.jpeg', imagem)
  elif idade >=21 and idade <=65:
    cv2.imwrite(f'imagens/adulto/pessoa{index}.jpeg', imagem) 
  else:
    cv2.imwrite(f'imagens/idoso/pessoa{index}.jpeg', imagem)

## Step 02: Upload raw data

In [16]:
# since we are using Jupyter Notebooks we can replace our argument
# parsing code with *hard coded* arguments and values
args = {
	"dataset": "imagens",
  "project_name": "classifier_age_gender",
  "artifact_name": "age_gender_raw_data"
}

In [17]:
imagePaths = list(paths.list_images(args["dataset"]))
print(imagePaths)

['imagens/crianca/pessoa955.jpeg', 'imagens/crianca/pessoa623.jpeg', 'imagens/crianca/pessoa162.jpeg', 'imagens/crianca/pessoa23535.jpeg', 'imagens/crianca/pessoa22734.jpeg', 'imagens/crianca/pessoa20473.jpeg', 'imagens/crianca/pessoa2848.jpeg', 'imagens/crianca/pessoa10696.jpeg', 'imagens/crianca/pessoa806.jpeg', 'imagens/crianca/pessoa1166.jpeg', 'imagens/crianca/pessoa1058.jpeg', 'imagens/crianca/pessoa21814.jpeg', 'imagens/crianca/pessoa15560.jpeg', 'imagens/crianca/pessoa712.jpeg', 'imagens/crianca/pessoa22800.jpeg', 'imagens/crianca/pessoa10642.jpeg', 'imagens/crianca/pessoa2721.jpeg', 'imagens/crianca/pessoa608.jpeg', 'imagens/crianca/pessoa23547.jpeg', 'imagens/crianca/pessoa15409.jpeg', 'imagens/crianca/pessoa137.jpeg', 'imagens/crianca/pessoa2946.jpeg', 'imagens/crianca/pessoa22786.jpeg', 'imagens/crianca/pessoa2709.jpeg', 'imagens/crianca/pessoa10747.jpeg', 'imagens/crianca/pessoa2762.jpeg', 'imagens/crianca/pessoa809.jpeg', 'imagens/crianca/pessoa2711.jpeg', 'imagens/crianc

In [18]:
run = wandb.init(entity="igordias", project=args["project_name"], job_type="fetch_data")
wandb.run.name="dados"
# create an artifact for all the raw data
raw_data = wandb.Artifact(args["artifact_name"], type="raw_data")

# grab the list of images that we'll be describing
logger.info("[INFO] loading images...")
imagePaths = list(paths.list_images(args["dataset"]))

# append all images to the artifact
for img in imagePaths:
  label = img.split(os.path.sep)
  raw_data.add_file(img, name=os.path.join(label[-2],label[-1]))
  #raw_data.add_file(img)

# save artifact to W&B
run.log_artifact(raw_data)
run.finish()

[34m[1mwandb[0m: Currently logged in as: [33migordias[0m. Use [1m`wandb login --relogin`[0m to force relogin


26-11-2022 19:24:45 [INFO] loading images...
