# Interface to label images

## What has been done before

To use this script, all images, which are to be labeled, where transfered into one folder ("image_folder"). They all have unique names.

A table listing all image names in one column and experimental data in the others was created. The column "category" was created and filled with 0 (= unrated).

This dataframe was exported as CSV to be used here. It is ";" separated and called "image_df_initial.csv".

## Goals: 

- write script to access folder and pull random sample from list of images where category = 0
- script: present image, ask for input, refresh
- add all new ratings to dataframe, merge with original data frame
- create exclusion list of already categorized images

## Imports

In [None]:
import os
import numpy as np
import pandas as pd

# to access folder, get random image
import random, os 
from os import listdir
from random import choice
from tqdm import tqdm

from IPython.display import display, Image # display image inline
from IPython.display import clear_output # clear user input, refresh image

## Image-categorizer

In [None]:
!pwd

### Displaying random images from folder

In [None]:
# directory to sample from

dir = "/path/image_folder/"

In [None]:
# this code displays a random image from the folder

# any type of image in this list will be shown
ext2conttype = {"jpg": "image/jpeg",
                "jpeg": "image/jpeg",
                "png": "image/png",
                "gif": "image/gif"}

def content_type(filename):
    return ext2conttype[filename[filename.rfind(".")+1:].lower()]

def isimage(filename):
    """true if the filename's extension is in the content-type lookup"""
    filename = filename.lower()
    return filename[filename.rfind(".")+1:] in ext2conttype

def random_file(dir):
    """returns the filename of a randomly chosen image in dir"""
    images = [f for f in listdir(dir) if isimage(f)]
    return choice(images)

if __name__ == "__main__":
    dir = dir
    r = random_file(dir)
    print(r)
    display(Image(data=dir, filename=(dir+r)))

# Functional categorizer

## Key

Any number of categories are possible, as long as they can be reliably distinguished.

        0: unrated
        1: category 1
        2: category 2
        3: category 3
        4: category 4
        5: category 5

## Get data

In [None]:
# load most recent table in later uses

image_df = pd.read_csv("image_df_classification.csv", sep=";", index_col=0)
image_df.head()

## Data integrity and exclusion/selection list

In [None]:
# list of images already rated

exclusion_df=image_df[image_df["category"] >= 1.0]

exclusion_list=exclusion_df["filename"].to_list()

len(exclusion_list)

In [None]:
image_df.isna().sum()

## Categorizing

In [None]:
# sample from folder

img = []
cat = []

for i in range(50):
    clear_output()
    dir = dir
    r = random_file(dir)
    
    if r in exclusion_list: # exclude these mages
        pass
    
    else:
        print(r)
        display(Image(data=dir, filename=(dir+r)))
        variable = int(input()) # so it can later more easily be manipulated
        cat.append(variable)
        img.append(r)

In [None]:
# create dataframe with img and cat list

rating_df = pd.DataFrame(list(zip(img, cat)), columns =["filename", "category_rated"]) 
rating_df.category_rated.astype(float)
rating_df.info()

In [None]:
#rating_df["category_rated"] = rating_df.category_rated.replace(to_replace=44, value=4) # fix typos
rating_df.head()

## Updating dataframe and exclusion list

To be able to exclude previously categorized images, the dataframe and the resulting exclusion list need to be updated continously.

In [None]:
# merge rating_df (contains names and categories) and image_df (contains names and information)

image_df.head()
image_df.sort_values(
    by="category", ascending=False)

left_join_image_df = pd.merge(image_df, rating_df, on="filename", how="left")
left_join_image_df = left_join_image_df.sort_values(
    by="category_rated", ascending=False)

#left_join_image_df.drop(columns=["Unnamed: 0"]).head(10)
left_join_image_df.head()

In [None]:
# overwrite category = 0 with true rating, where present

left_join_image_df["category"] = np.where(
    left_join_image_df["category"] < 1, left_join_image_df["category_rated"], left_join_image_df["category"])

left_join_image_df.head()

In [None]:
# collect finished new image_df

image_df = left_join_image_df.drop(columns=["category_rated"])

image_df.category.fillna(0, inplace=True)

image_df.head()

## Check NaNs and save to CSV

In [None]:
image_df.isna().sum()

In [None]:
image_df.to_csv("image_df_classification.csv", sep=";") # update numbering to keep versions

In [None]:
# exclude previously rated images (category of 1 or higher)

exclusion_df=image_df[image_df["category"] >= 1]

exclusion_list=exclusion_df["filename"].to_list()

len(exclusion_list)

In [None]:
image_df["category"].value_counts()

Back to [Categorizing](#Categorizing) to continue labeling