<a href="https://colab.research.google.com/github/natalialopezg/Monografia-EACD/blob/master/Metadata_extraction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<font size="2">Monografía - Especialización en Analítica y Ciencia de Datos - Universidad de Antioquia - 2024</font>

<font size="6"> Notebook: Metadata extraction </font>

<font size="3"> Natalia López Grisales</font>


<br />

# Abstract
In this notebook, datasets [CVC-ClinicDB](https://polyp.grand-challenge.org/CVCClinicDB/), [Kvasir-SEG](https://datasets.simula.no/kvasir-seg/), [CVC-300 (EndoScene)](http://adas.cvc.uab.es/endoscene) and [CVC-ColonDB](http://mv.cvc.uab.es/projects/colon-qa/cvc-colondb) are loaded and metadata is extracted and saved. This metadata extraction is performed as a validation of the data for further analysis.



  



  


# Importing libraries

In [1]:
# Data analysis and manipulation
import pandas as pd

# Computer vision
import cv2

# Progress meter for loops
from tqdm.notebook import tqdm

# Miscellaneous operating system interfaces
import os

# Metadata generation

## Reading images

In [17]:
# Parameters
directory = '/content/drive/MyDrive/Classroom/Monografia/datasets'
folders = ["kvasir-seg",
           "cvc-clinicdb",
           "cvc-colondb",
           "cvc-300"]
output_folder = '/content/drive/MyDrive/Classroom/Monografia'

Read and extract metadata from images and masks:

In [18]:
# Read images metadata from repository
for folder in tqdm(folders, desc="Datasets read", position=0, colour='DarkTurquoise'):
  images_metadata = []
  masks_metadata = []

  # Reading images
  images_path = f"{directory}/{folder}/images"
  images_list = os.listdir(images_path)
  for image_filename in tqdm(images_list, \
                           desc=f"  Images read from {folder}", \
                           position=1, colour='SlateBlue'):
    image_path = f"{images_path}/{image_filename}"
    image = cv2.imread(image_path, 1)
    images_metadata.append({'dataset':folder,
                            'filename':image_filename,
                            'height':image.shape[0],
                            'width':image.shape[1],
                            'size':image.size})
  # Reading masks
  masks_path = f"{directory}/{folder}/masks"
  masks_list = os.listdir(masks_path)
  for mask_filename in tqdm(masks_list, \
                           desc=f"  Masks read from {folder}", \
                           position=1, colour='DarkSlateBlue'):
    mask_path = f"{masks_path}/{mask_filename}"
    mask = cv2.imread(image_path, 0)
    masks_metadata.append({'dataset':folder,
                            'filename':mask_filename,
                            'height':mask.shape[0],
                            'width':mask.shape[1],
                            'size':mask.size})

  pd.DataFrame(images_metadata).to_csv(f"{output_folder}/metadata/{folder}/images_metadata.txt", sep=';', index=False)
  pd.DataFrame(masks_metadata).to_csv(f"{output_folder}/metadata/{folder}/masks_metadata.txt", sep=';', index=False)

print("\nImages and masks metadata saved successfully!")

Datasets read:   0%|          | 0/4 [00:00<?, ?it/s]

  Images read from kvasir-seg:   0%|          | 0/1000 [00:00<?, ?it/s]

  Masks read from kvasir-seg:   0%|          | 0/1000 [00:00<?, ?it/s]

  Images read from cvc-clinicdb:   0%|          | 0/612 [00:00<?, ?it/s]

  Masks read from cvc-clinicdb:   0%|          | 0/612 [00:00<?, ?it/s]

  Images read from cvc-colondb:   0%|          | 0/380 [00:00<?, ?it/s]

  Masks read from cvc-colondb:   0%|          | 0/380 [00:00<?, ?it/s]

  Images read from cvc-300:   0%|          | 0/60 [00:00<?, ?it/s]

  Masks read from cvc-300:   0%|          | 0/60 [00:00<?, ?it/s]


Images and masks metadata saved successfully!


# Referencias

1. Bernal, J., Sánchez, F. J., Fernández-Esparrach, G., Gil, D., Rodríguez, C., & Vilariño, F. (2015). WM-DOVA maps for accurate polyp highlighting in colonoscopy: Validation vs. saliency maps from physicians. Computerized Medical Imaging and Graphics, 43, 99-111. [CVC-ClinicDB](https://polyp.grand-challenge.org/CVCClinicDB/)


2. Jha, D., Smedsrud, P. H., Riegler, M. A., Halvorsen, P., de Lange, T., Johansen, D., & Johansen, H. D. (2020). Kvasir-seg: A segmented polyp dataset. In International Conference on Multimedia Modeling (pp. 451-462). Springer. [Kvasir-SEG](https://datasets.simula.no/kvasir-seg/)
  

3. Sánchez, F. J., Bernal, J., Sánchez-Montes, C., de Miguel, C. R., & Fernández-Esparrach, G. (2017). Bright spot regions segmentation and classification for specular highlights detection in colonoscopy videos. Machine Vision and Applications, 28(8), 917-936. [CVC-300 (EndoScene)](http://adas.cvc.uab.es/endoscene)

  
 4. Bernal, J., Sánchez, J., & Vilarino, F. (2012). Towards automatic polyp detection with a polyp appearance model. Pattern Recognition, 45(9), 3166-3182. [CVC-ColonDB](http://mv.cvc.uab.es/projects/colon-qa/cvc-colondb)
