# Extensión del dataset

Este cuaderno agrega ejemplos a los originales con el objetivo de mejorar el desempeño del modelo entrenado en la demostración técnica.

Se crea un nuevo directorio con un dataset que mantiene la organización del original, agregando ejemplos.

~~~
./kaggle-fisheries-ext-yolo/
  |- data/
  |   |- 00000.jpg
  |   |- 00000.txt
  |   |- ...
  |- classes.names
  |- train.txt
  |- test.txt
  |- yolo-fisheries.cfg
  |- fisheries.data
 ~~~

## Procedimiento

In [48]:
import glob
import os
import numpy as np
import cv2

### 1. Copiar dataset original en nuevo directorio

In [9]:
#WORKSPACE_PATH = "/home/jovyan/work/" # docker
#WORKSPACE_PATH = "/notebooks/" # otro docker
WORKSPACE_PATH = "../../" # pc-invap
!ls {WORKSPACE_PATH}

ai-fisheries.yml		 env.sh
ai-for-fisheries.code-workspace  kaggle.json
applications			 libcudart.so.10.1
assets				 model-development-and-training
biblio				 planning
build.sh			 README.md
conda				 reports
data				 run-jupyter-data-science.sh
data-preparation		 run-jupyter-local.sh
doc				 TestEnvironment.ipynb
docker-compose			 tmp
dockers				 videoanalytics
draft


In [13]:
# Dataset YOLO (original)
WORKSPACE_DATA_PATH = WORKSPACE_PATH+"/data/"
ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO = WORKSPACE_DATA_PATH+"/datasets/kaggle-fisheries-yolo/"
!ls {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}

backup		   eval-tiny.sh			   logs       train.sh
copy-to-models.sh  kaggle-fisheries.data	   obj.names  train-tiny.sh
data		   kaggle-fisheries-yolo4.cfg	   README.md  train.txt
eval.sh		   kaggle-fisheries-yolo4tiny.cfg  test.txt


In [21]:
NEW_KAGGLE_FISHERIES_DATASET_YOLO = WORKSPACE_DATA_PATH+"/datasets/kaggle-fisheries-ext-yolo/"
!mkdir -pv {NEW_KAGGLE_FISHERIES_DATASET_YOLO}
!cp  {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/kaggle-fisheries-yolo4.cfg \
     {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/kaggle-fisheries-ext-yolo4.cfg
!cp {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/kaggle-fisheries.data \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/kaggle-fisheries-ext.data
!cp {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/obj.names \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO}
!cp {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/data \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO} -r
!cp {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/train.txt \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO} 
!cp {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/test.txt \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO} 
!ls {NEW_KAGGLE_FISHERIES_DATASET_YOLO}

data			   kaggle-fisheries-ext-yolo4.cfg  obj.names  train.txt
kaggle-fisheries-ext.data  kaggle-fisheries-yolo4.cfg	   test.txt


In [70]:
!cat {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/obj.names

alb
bet
dol
lag
shark
yft
other

### 2. Extraer nuevos ejemplos de los videos

In [45]:
def train_test_split_frames(cap,dataset_sz = None,test_split = 0.3):
    """ Particionar un video en dos conjuntos train,set
    cap: video (OpenCV)
    dataset_sz: tamaño máximo en frames
    test_split: porcentaje de partición destinado a test
    """
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    dataset_sz = total_frames if dataset_sz is None else dataset_sz
    data = np.random.randint(total_frames, size=dataset_sz)
    indices = np.random.permutation(data.shape[0])
    i0 = int(dataset_sz*(1.-test_split))
    train_idx, test_idx = indices[:i0], indices[i0:]
    train, test = data[train_idx], data[test_idx]    
    return train,test

def save_images(cap, prefix,dst_path,img_frames):
    """ Extraer imagenes de un video y almacenarlas como JPG.
    dst_path: ruta de destino
    img_frames: array con índices de frames
    """
    for x in img_frames:
        cap.set(cv2.CAP_PROP_POS_FRAMES,x)
        ret, frame = cap.read()
        img_filename = dst_path+"/img_{}_{}.jpg".format(prefix,x)
        cv2.imwrite(img_filename, frame)

In [67]:
!mkdir -pv {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/new_samples
!mkdir -pv {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/new_samples/train
!mkdir -pv {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/new_samples/test
!cp {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/obj.names \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/new_samples/train
!cp {ORIGINAL_KAGGLE_FISHERIES_DATASET_YOLO}/obj.names \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/new_samples/test
!ls {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/new_samples

test  train


In [68]:
INPUT_VIDEOS_PATH= WORKSPACE_DATA_PATH+"/media/fishcount/input_videos/"
input_videos = [os.path.basename(x) for x in glob.glob(INPUT_VIDEOS_PATH+"*.mp4")]
input_videos

['14.mp4',
 '19.mp4',
 '001.mp4',
 '18.mp4',
 '002.mp4',
 '13.mp4',
 '15.mp4',
 '12.mp4',
 '008.mp4',
 '007.mp4',
 '17.mp4',
 '003.mp4',
 '004.mp4',
 '009.mp4',
 '006.mp4',
 '005.mp4',
 '16.mp4',
 '10.mp4',
 '11.mp4']

In [69]:
for video_filename in input_videos:
    cap = cv2.VideoCapture(INPUT_VIDEOS_PATH+video_filename)
    total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
    #print("Total:", total_frames)
    dataset_total_frames = 20
    train,test =  train_test_split_frames(cap,dataset_sz = dataset_total_frames,test_split = 0.3)
    #print("Train:", train.shape[0])
    #print("Test:", test.shape[0])
    prefix=os.path.splitext(video_filename)[0]
    save_images(cap, prefix,NEW_KAGGLE_FISHERIES_DATASET_YOLO+"/new_data/train",train)
    save_images(cap, prefix,NEW_KAGGLE_FISHERIES_DATASET_YOLO+"/new_data/test",test)

Etiquetar (paso manual).

Nota: editar primero la imagen 'aux' con todas las clases para evitar errores de índice.

~~~bash
labelImg
~~~~

In [99]:
all_new_samples = ["./data/"+os.path.basename(x) for x in glob.glob(NEW_KAGGLE_FISHERIES_DATASET_YOLO+"/new_data/train/*.jpg")]

with open(NEW_KAGGLE_FISHERIES_DATASET_YOLO+"/train.onlynew.txt","w") as fp:
    for x in all_new_samples:
        print(x,file=fp)    
        
!cp {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/train.original.txt \
    {NEW_KAGGLE_FISHERIES_DATASET_YOLO}/train.oldandnew.txt
with open(NEW_KAGGLE_FISHERIES_DATASET_YOLO+"/train.oldandnew.txt","a") as fp:
    for x in all_new_samples:
        print(x,file=fp)    