# Data loading

## Contents

* [Load data](#load)
    * [from list](#load_list)
    * [from Pandas.DataFrame](#load_df)
    * [from folder](#load_folder)
    * [from folder *interactively*](#load_folder_int)
* [Adding data](#add)
    * [from Data](#add_data)
* [Label](#label)
    * [from folder](#label_from_folder)
    * [from CSV file](#label_from_csv)
    * [from CSV file *interactively*](#inter__label_from_csv)
* [Getting items through indexing](#indexing)
* [To](#to)
    * [to Path list](#to_pathlist)
    * [to list](#to_list)
    * [to Array](#to_np)
    * [to Pandas.DataFrame](#to_df)
    * [to Pandas.Series](#to_series)

Loading images with cytokinin is very simple. Here many examples are shown. The best way to follow this demo is to try sequentially all the proposed code cells, since many of them depend on previous output.

First of all install the cytokinin. Obviously jump over if you already have done it.

In [None]:
!pwd

In [None]:
!pip uninstall cytokinin -y

In [None]:
!pip install ./../cytokinin

Workspace preparation

In [None]:
import os
import json
import numpy as np
import pandas as pd
from pathlib import Path
import logging
pil_logger = logging.getLogger('PIL')
pil_logger.setLevel(logging.INFO)

In [None]:
import cytokinin as ck

In [None]:
root = Path('./../') # quite the same as str

In [None]:
# Set an example dir for images files
MOCKS = root.joinpath('./cytokinin/cytokinin/tests/mocks/')
IMGS = MOCKS/'imgs' # this is another Path object
os.listdir(str(IMGS))

## <div id ='loaddata'>Load data</div>

### <div id='load_list'>Load from list</div>

In [None]:
flist = []
for root, dirs, files in os.walk(IMGS, topdown=False):
    for f in files:
        flist.append(os.path.join(root, f))
flist[:5]

In [None]:
from cytokinin.data import take_data
imgs = take_data('images')
imgs

In [None]:
imgs.store_filesnames_from_list(flist) # load
imgs.filesnames.head() #show

### <div id= 'load_df'>Load from DataFrame</div>

In [None]:
df = pd.DataFrame({'files': flist})
df.head()

In [None]:
imgs = take_data('images')
imgs.store_filesnames_from_df(df, 'files') #hold uniques
imgs.filesnames.head()

### <div id= 'load_folder'>Load from folder</div>

In [None]:
dogs_folder = IMGS.joinpath('dog')
dogs = take_data('images').store_filesnames_from_folder(dogs_folder)
dogs.filesnames.head()

### <div id= 'load_folder_int'>Load from folder interactively</div>

Here a very comfortable way to load images from system is shown. Sometimes is confusing and boring fishing data around the folders through code, then try this interactive way!   

Caveat: it can be a little unstable, especially when launched from jupyter notebooks on mac. In the worst case just restart the kernel.

Example 1: You select a good folder, containing images

In [None]:
dogs = take_data('images')
dogs.store_filesnames_from_folder(gui=True, include_subdirs=True)
dogs.filesnames.head()

Example 2: You select a wrong folder

In [None]:
dogs = take_data('images')
try:
    dogs.store_filesnames_from_folder(gui=True, include_subdirs=False)
except Exception as e:
    print(e)
    print(f'filesnames:\n{dogs.filesnames}')

## <div id='add'>Adding data</div>

### <div id='add_data'>add from Data</div>

You can easily stack data from differente Data objects in order to build up your dataset and be ready to feed your model.

In [None]:
dogs_folder = IMGS.joinpath('dog')
dogs = take_data('images').store_filesnames_from_folder(dogs_folder)
stones_folder = IMGS.joinpath('stone')
stones = take_data('images').store_filesnames_from_folder(stones_folder)

In [None]:
dogs_and_stones = dogs.copy()
print(dogs_and_stones)

In [None]:
dogs_and_stones.add_from_data(stones)
print(dogs_and_stones)

## <div id='label'>Label</div>

Supervised learning data set needs both samples and labels, here you can see how to load labels.

### <div id='label_from_folder'>from filesnames folder</div>

In [None]:
dogs_and_stones.label_from_folder()
# Let's see what it loaded
print(dogs_and_stones.labels.unique())
dogs_and_stones.labels.value_counts()

### <div id='label_from_csv'>from csv file</div>

In [None]:
# Load dogs images
dogs = take_data('images').store_filesnames_from_folder(IMGS/'dog')
# Load stones images
stones = take_data('images').store_filesnames_from_folder(IMGS/'stone')
# Merge Data set
dogs_and_stones2 = dogs.copy().add_from_data(stones)
print(f'Before:\n{dogs_and_stones2}')

# Label the resulting Data set
csv_url = MOCKS/'labels'/'dogsandstones_labes.csv'
dogs_and_stones2.label_from_csv(csv_url, col='Y')
print(f'After:\n{dogs_and_stones2}')

### <div id='inter_label_from_csv'>from CSV file interactively</div>

In [None]:
## Experimental! Available soon...
# dogs_and_stones2 = dogs.copy()
# dogs_and_stones2.add_from_data(stones)
# print(dogs_and_stones)
# dogs_and_stones.label_from_csv(csv_url, gui=True)
# print(dogs_and_stones)

## <div id='indexing'>Getting items through indexing</div>

You can access stored items like any other python iterables 

In [None]:
dogs_and_stones[0]

In [None]:
dogs_and_stones[:4]

Each element is a tuple made by the PIL Image Object and the label

In [None]:
dogs_and_stones[3][0]

Also you can choose a default PIL color mode you want the images to be opened each time they are gotten. 
Check [PIL Modes](https://github.com/python-pillow/Pillow/blob/5.1.x/docs/handbook/concepts.rst#id3) for more info about the color modes available.

In [None]:
dogs_and_stones.set_colormode('L') # L is greyscale

In [None]:
dogs_and_stones[3][0] # play with it

## Print

You can easily retrieve your Data info printing it

In [None]:
print(dogs)

## <div id='to'>To</div>

Here you can see how you can get a collection of the filesnames to your desired format. This way you can use it the way you want, after having collected all the names using the Data constructor.

### <div id= 'to_pathlist'>to Paths list</div>

In [None]:
top_3_as_pathlist = dogs.to('pathlist')[:3]
print(f'shape: {np.shape(top_3_as_pathlist)}')
top_3_as_pathlist

### <div id= 'to_list'>to list</div>

In [None]:
top_3_as_list = dogs.to('list')[:3]
print(f'shape: {np.shape(top_3_as_list)}')
top_3_as_list

### <div id='to_np'>to Array</div>

In [None]:
top_3_as_array = dogs.to('arrays')[:3] # array_mode=['rgb', 'gray', 'grey']
print(f'shape: {np.shape(top_3_as_array)}')
top_3_as_array

### <div id= 'to_df'>to Pandas.DataFrame</div>

In [None]:
top_3_as_df = dogs.to('dataframe')[:3]
print(f'shape: {np.shape(top_3_as_df)}')
top_3_as_df

### <div id='to_series'>to Pandas.Series</div>

In [None]:
top_3_as_series = dogs.to('series')[:3]
print(f'shape: {np.shape(top_3_as_series)}')
top_3_as_series