[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/keras-team/autokeras/blob/master/docs/templates/tutorial/image_regression.ipynb)

# Celebrity Ages Example

Regression tasks estimate a numeric variable, such as the price of a house or voter turnout.

This example is adapted from a [notebook](https://gist.github.com/mapmeld/98d1e9839f2d1f9c4ee197953661ed07) which estimates a person's age from their image, trained on the [IMDB-WIKI](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/) photographs of famous people.

First, prepare your image data in a numpy.ndarray or tensorflow.Dataset format. Each image must have the same shape, meaning each has the same width, height, and color channels as other images in the set.

### Connect your Google Drive for Data

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


### Install AutoKeras and TensorFlow

Download the master branch to your Google Drive for this tutorial. In general, you can use *pip install autokeras* .

In [0]:
!pip install  -v "/content/drive/My Drive/AutoKeras-dev/autokeras-master.zip"
!pip uninstall keras-tuner
!pip install git+git://github.com/keras-team/keras-tuner.git@d2d69cba21a0b482a85ce2a38893e2322e139c01

In [0]:
pip install tensorflow==2.2

###**Import IMDB Celeb images and metadata**

In [0]:
!mkdir ./drive/My\ Drive/mlin/celebs

In [9]:
! wget -O ./drive/My\ Drive/mlin/celebs/imdb_0.tar https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_0.tar

--2020-05-19 17:46:04--  https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_0.tar
Resolving data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)... 129.132.52.162
Connecting to data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)|129.132.52.162|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 28708782080 (27G) [application/x-tar]
Saving to: ‘./drive/My Drive/mlin/celebs/imdb_0.tar’


2020-05-19 18:30:31 (10.3 MB/s) - ‘./drive/My Drive/mlin/celebs/imdb_0.tar’ saved [28708782080/28708782080]



In [0]:
! cd ./drive/My\ Drive/mlin/celebs && tar -xf imdb_0.tar
! rm ./drive/My\ Drive/mlin/celebs/imdb_0.tar

Uncomment and run the below cell if you need to re-run the cells again and above don't need to install everything from the beginning.

In [0]:
# ! cd ./drive/My\ Drive/mlin/celebs.

In [11]:
! ls ./drive/My\ Drive/mlin/celebs/imdb/

00  01	02  03	04  05	06  07	08  09


In [12]:
! wget https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_meta.tar
! tar -xf imdb_meta.tar
! rm imdb_meta.tar

--2020-05-20 16:53:57--  https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/static/imdb_meta.tar
Resolving data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)... 129.132.52.162
Connecting to data.vision.ee.ethz.ch (data.vision.ee.ethz.ch)|129.132.52.162|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 22937600 (22M) [application/x-tar]
Saving to: ‘imdb_meta.tar’


2020-05-20 16:54:00 (8.23 MB/s) - ‘imdb_meta.tar’ saved [22937600/22937600]



###**Converting from MATLAB date to actual Date-of-Birth**

In [9]:
from datetime import datetime, timedelta 
def datenum_to_datetime(datenum):
    """
    Convert Matlab datenum into Python datetime.
    """
    days = datenum % 1
    hours = days % 1 * 24
    minutes = hours % 1 * 60
    seconds = minutes % 1 * 60
    try:
      return datetime.fromordinal(int(datenum)) \
           + timedelta(days=int(days)) \
           + timedelta(hours=int(hours)) \
           + timedelta(minutes=int(minutes)) \
           + timedelta(seconds=round(seconds)) \
           - timedelta(days=366)
    except:
      return datenum_to_datetime(700000)

print(datenum_to_datetime(734963))

2012-04-04 00:00:00


### **Opening MatLab file to Pandas DataFrame**

In [0]:
from scipy.io import loadmat
x = loadmat('imdb/imdb.mat')

In [14]:
import pandas as pd
import numpy as np 

mdata = x['imdb']  # variable in mat file
mdtype = mdata.dtype  # dtypes of structures are "unsized objects"
ndata = {n: mdata[n][0, 0] for n in mdtype.names}
columns = [n for n, v in ndata.items()]

rows = []
for col in range(0, 10):
  values = list(ndata.items())[col]
  for num, val in enumerate(values[1][0], start=0):
    if col == 0:
      rows.append([])
    if num > 0:
      if columns[col] == "dob":
        rows[num].append(datenum_to_datetime(int(val)))
      elif columns[col] == "photo_taken":
        rows[num].append(datetime(year=int(val), month=6, day=30))
      else:
        rows[num].append(val)

dt = map(lambda row: np.array(row), np.array(rows[1:]))

df = pd.DataFrame(data=dt, index=range(0, len(rows) - 1), columns=columns)
print(df.head())

         dob photo_taken  ...            celeb_names  celeb_id
0 1899-05-10  1970-06-30  ...  ['Weird Al' Yankovic]    6488.0
1 1899-05-10  1968-06-30  ...             [2 Chainz]    6488.0
2 1899-05-10  1968-06-30  ...              [50 Cent]    6488.0
3 1899-05-10  1968-06-30  ...           [A Martinez]    6488.0
4 1924-09-16  1991-06-30  ...           [A.D. Miles]   11516.0

[5 rows x 10 columns]


In [15]:
print(columns)
print(df["full_path"])

['dob', 'photo_taken', 'full_path', 'gender', 'name', 'face_location', 'face_score', 'second_face_score', 'celeb_names', 'celeb_id']
0         [01/nm0000001_rm3343756032_1899-5-10_1970.jpg]
1          [01/nm0000001_rm577153792_1899-5-10_1968.jpg]
2          [01/nm0000001_rm946909184_1899-5-10_1968.jpg]
3          [01/nm0000001_rm980463616_1899-5-10_1968.jpg]
4         [02/nm0000002_rm1075631616_1924-9-16_1991.jpg]
                               ...                      
460717    [08/nm3994408_rm761245696_1989-12-29_2011.jpg]
460718    [08/nm3994408_rm784182528_1989-12-29_2011.jpg]
460719    [08/nm3994408_rm926592512_1989-12-29_2011.jpg]
460720    [08/nm3994408_rm943369728_1989-12-29_2011.jpg]
460721    [08/nm3994408_rm976924160_1989-12-29_2011.jpg]
Name: full_path, Length: 460722, dtype: object


### **Calculating age at time photo was taken**

In [16]:
df["age"] = (df["photo_taken"] - df["dob"]).astype('int') / 31558102e9
print(df["age"])

0         71.136445
1         69.137846
2         69.137846
3         69.137846
4         66.783332
            ...    
460717    21.500000
460718    21.500000
460719    21.500000
460720    21.500000
460721    21.500000
Name: age, Length: 460722, dtype: float64


### **Creating dataset**


* We sample 200 of the images which were included in this first download.
* Images are resized to 128x128 to standardize shape and conserve memory
* RGB images are converted to grayscale to standardize shape
* Ages are converted to ints



In [0]:
import os
from PIL import Image

def df2numpy(train_set):
  images = []
  for img_path in train_set["full_path"]:
    img = Image.open("./drive/My Drive/mlin/celebs/imdb/" + img_path[0]).resize((128, 128)).convert('L')
    images.append(
      np.asarray(img,  dtype="int32")
    )

  image_inputs = np.array(images)

  ages = train_set["age"].astype('int').to_numpy()
  return image_inputs, ages

In [0]:

train_set = df[df["full_path"] < '02'].sample(200)
train_imgs, train_ages = df2numpy(train_set)

test_set = df[df["full_path"] < '02'].sample(100)
test_imgs, test_ages = df2numpy(test_set)

### **Training using AutoKeras**

In [0]:
import autokeras as ak

# Initialize the image regressor
reg = ak.ImageRegressor(max_trials=15) # AutoKeras tries 15 different models.

# Find the best model for the given training data
reg.fit(train_imgs, train_ages)

# Predict with the chosen model:
# predict_y = reg.predict(test_images)  # Uncomment if required

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000


Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000


Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000


Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000
Epoch 53/1000
Epoch 54/1000
Epoch 55/1000
Epoch 56/1000
Epoch 57/1000
Epoch 58/1000
Epoch 59/1000
Epoch 60/1000
Epoch 61/1000
Epoch 62/1000
Epoch 63/1000
Epoch 64/1000
Epoch 65/1000
Epoch 66/1000
Epoch 67/1000
Epoch 68/1000
Epoch 69/1000
Epoch 70/1000
Epoch 71/1000
Epoch 72/1000
E

Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000
Epoch 24/1000
Epoch 25/1000
Epoch 26/1000
Epoch 27/1000
Epoch 28/1000
Epoch 29/1000
Epoch 30/1000
Epoch 31/1000
Epoch 32/1000
Epoch 33/1000
Epoch 34/1000
Epoch 35/1000
Epoch 36/1000
Epoch 37/1000
Epoch 38/1000
Epoch 39/1000
Epoch 40/1000
Epoch 41/1000
Epoch 42/1000
Epoch 43/1000
Epoch 44/1000
Epoch 45/1000
Epoch 46/1000
Epoch 47/1000
Epoch 48/1000
Epoch 49/1000
Epoch 50/1000
Epoch 51/1000
Epoch 52/1000


Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000
Epoch 14/1000
Epoch 15/1000
Epoch 16/1000
Epoch 17/1000
Epoch 18/1000
Epoch 19/1000
Epoch 20/1000
Epoch 21/1000
Epoch 22/1000
Epoch 23/1000


Epoch 1/1000
Epoch 2/1000
Epoch 3/1000
Epoch 4/1000
Epoch 5/1000
Epoch 6/1000
Epoch 7/1000
Epoch 8/1000
Epoch 9/1000
Epoch 10/1000
Epoch 11/1000
Epoch 12/1000
Epoch 13/1000

In [0]:
# Evaluate the chosen model with testing data
print(reg.evaluate(test_images, test_ages))

### **Validation Data**

By default, AutoKeras use the last 20% of training data as validation data. As shown in the example below, you can use validation_split to specify the percentage.

In [0]:
reg.fit(train_imgs,
        train_ages,
        # Split the training data and use the last 15% as validation data.
        validation_split=0.15,epochs=3)

You can also use your own validation set instead of splitting it from the training data with validation_data.

In [0]:
split = 460000
x_val = train_imgs[split:]
y_val = train_ages[split:]
x_train = train_imgs[:split]
y_train = train_ages[:split]
reg.fit(x_train,
        y_train,
        # Use your own validation set.
        validation_data=(x_val, y_val),epochs=3)

### **Customized Search Space**

For advanced users, you may customize your search space by using AutoModel instead of ImageRegressor. You can configure the ImageBlock for some high-level configurations, e.g., block_type for the type of neural network to search, normalize for whether to do data normalization, augment for whether to do data augmentation. You can also choose not to specify these arguments, which would leave the different choices to be tuned automatically. See the following example for detail.

In [0]:
import autokeras as ak

input_node = ak.ImageInput()
output_node = ak.ImageBlock(
    # Only search ResNet architectures.
    block_type='resnet',
    # Normalize the dataset.
    normalize=True,
    # Do not do data augmentation.
    augment=False)(input_node)
output_node = ak.RegressionHead()(output_node)
reg = ak.AutoModel(inputs=input_node, outputs=output_node, max_trials=10)
reg.fit(x_train, y_train,epochs=3)

The usage of AutoModel is similar to the functional API of Keras. Basically, you are building a graph, whose edges are blocks and the nodes are intermediate outputs of blocks. To add an edge from input_node to output_node with output_node = ak.some_block(input_node).
You can even also use more fine grained blocks to customize the search space even further. See the following example.

In [0]:
import autokeras as ak

input_node = ak.ImageInput()
output_node = ak.Normalization()(input_node)
output_node = ak.ImageAugmentation(translation_factor=0.3)(output_node)
output_node = ak.ResNetBlock(version='v2')(output_node)
output_node = ak.RegressionHead()(output_node)
clf = ak.AutoModel(inputs=input_node, outputs=output_node, max_trials=10)
clf.fit(x_train, y_train,epochs=3)

### **Data Format**

# The AutoKeras ImageClassifier is quite flexible for the data format.

For the image, it accepts data formats both with and without the channel dimension. The images in the IMDB-Wiki dataset do not have a channel dimension. Each image is a matrix with shape (128, 128). AutoKeras also accepts images with a channel dimension at last, e.g., (32, 32, 3), (28, 28, 1).

For the classification labels, AutoKeras accepts both plain labels, i.e. strings or integers, and one-hot encoded labels, i.e. vectors of 0s and 1s.

So if you prepare your data in the following way, the ImageClassifier should still work.

In [31]:
# Reshape the images to have the channel dimension.
train_imgs = train_imgs.reshape(train_imgs.shape + (1,))
test_imgs = test_imgs.reshape(test_imgs.shape + (1,))

print(train_imgs.shape) # (200, 128, 128, 1)
print(test_imgs.shape) # (100, 128, 128, 1)
print(train_ages[:3])

(200, 128, 128, 1)
(100, 128, 128, 1)
[52 14 49]


We also support using tf.data.Dataset format for the training data. In this case, the images would have to be 3-dimentional. The labels have to be one-hot encoded for multi-class classification to be wrapped into tensorflow Dataset.

In [0]:
import autokeras as ak
import tensorflow as tf
from tensorflow.python.keras.utils.data_utils import Sequence
train_set = tf.data.Dataset.from_tensor_slices(((train_imgs, ), (train_ages, )))
test_set = tf.data.Dataset.from_tensor_slices(((test_imgs, ), (test_ages, )))

reg = ak.ImageRegressor(max_trials=15)
# Feed the tensorflow Dataset to the classifier.
reg.fit(train_set)
# Predict with the best model.
predicted_y = clf.predict(test_set)
# Evaluate the best model with testing data.
print(clf.evaluate(test_set))

## References

[Main Reference Notebook](https://gist.github.com/mapmeld/98d1e9839f2d1f9c4ee197953661ed07),
[Dataset](https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/),
[ImageRegressor](/image_regressor),
[ResNetBlock](/block/#resnetblock-class),
[ImageInput](/node/#imageinput-class),
[AutoModel](/auto_model/#automodel-class),
[ImageBlock](/block/#imageblock-class),
[Normalization](/preprocessor/#normalization-class),
[ImageAugmentation](/preprocessor/#image-augmentation-class),
[RegressionHead](/head/#regressionhead-class).
