# Purpose

The purpose of this notebook is to explore a simple regression tutorial with the [autokeras](https://autokeras.com/tutorial/structured_data_regression/) library.

# Data

The CA housing data will be used for this example, which is a simple regressiont task. It will be loaded from the sklearn data loader. I'll split off 20% into a test set which `autokeras` won't see during model development.

In [10]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import pandas as pd

housing = fetch_california_housing()

X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2
)

pd.DataFrame(X_train, columns=housing.feature_names).describe()


Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
count,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0
mean,3.866642,28.600291,5.429939,1.096732,1424.819828,2.999528,35.630778,-119.571148
std,1.888185,12.59648,2.564286,0.496689,1146.312518,4.46124,2.139404,2.006724
min,0.4999,1.0,0.846154,0.333333,3.0,0.692308,32.54,-124.35
25%,2.5673,18.0,4.438304,1.006231,788.0,2.436452,33.93,-121.8
50%,3.5417,29.0,5.233582,1.04878,1162.5,2.820659,34.25,-118.5
75%,4.739025,37.0,6.051005,1.098941,1724.0,3.286539,37.71,-118.0
max,15.0001,52.0,141.909091,34.066667,35682.0,502.461538,41.95,-114.31


# StructuredDataRegressor

Now I'll use the `StructuredDataRegressor` class to automatically build the model without any architectural choices from me.

## Define and Fit

Below I'll simply point `autokeras` to a directory to save log files, give it a maximum number of search trials and let it commense a search.  You don't need to specify number of epochs as it will adapt to the task.

In [11]:
import autokeras as ak

automl = ak.StructuredDataRegressor(
    overwrite=True, max_trials=20, directory="./ak-logs/intro-ca/"
)

automl.fit(X_train, y_train)


Trial 20 Complete [00h 00m 52s]
val_loss: 0.4374975860118866

Best val_loss So Far: 0.42246752977371216
Total elapsed time: 00h 19m 39s
INFO:tensorflow:Oracle triggered exit




To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Epoch 1/69
Epoch 2/69
Epoch 3/69
Epoch 4/69
Epoch 5/69
Epoch 6/69
Epoch 7/69
Epoch 8/69
Epoch 9/69
Epoch 10/69
Epoch 11/69
Epoch 12/69
Epoch 13/69
Epoch 14/69
Epoch 15/69
Epoch 16/69
Epoch 17/69
Epoch 18/69
Epoch 19/69
Epoch 20/69
Epoch 21/69
Epoch 22/69
Epoch 23/69
Epoch 24/69
Epoch 25/69
Epoch 26/69
Epoch 27/69
Epoch 28/69
Epoch 29/69
Epoch 30/69
Epoch 31/69
Epoch 32/69
Epoch 33/69
Epoch 34/69
Epoch 35/69
Epoch 36/69
Epoch 37/69
Epoch 38/69
Epoch 39/69
Epoch 40/69
Epoch 41/69
Epoch 42/69
Epoch 43/69
Epoch 44/69
Epoch 45/69
Epoch 46/69
Epoch 47/69
Epoch 48/69
Epoch 49/69
Epoch 50/69
Epoch 51/69
Epoch 52/69
Epoch 53/69
Epoch 54/69
Epoch 55/69
Epoch 56/69
Epoch 57/69
Epoch 58/69
Epoch 59/69


<tensorflow.python.keras.callbacks.History at 0x7fd7341f39e8>

## Best Model

Now I'll export the best model and save it to hard disk before exploring and evaluating.

In [12]:
mod = automl.export_model()
mod.save("ak-mods/intro-ca")


INFO:tensorflow:Assets written to: ak-mods/intro-ca/assets


In [15]:
mod.summary()


Model: "functional_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         [(None, 8)]               0         
_________________________________________________________________
multi_category_encoding (Mul (None, 8)                 0         
_________________________________________________________________
normalization (Normalization (None, 8)                 17        
_________________________________________________________________
dense (Dense)                (None, 256)               2304      
_________________________________________________________________
re_lu (ReLU)                 (None, 256)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                16448     
_________________________________________________________________
re_lu_1 (ReLU)               (None, 64)               

## Test Model

In [16]:
from sklearn.metrics import mean_squared_error

train_predictions = mod.predict(X_train)
print("Train MSE:", mean_squared_error(y_train, train_predictions))
test_predictions = mod.predict(X_test)
print("Test MSE:", mean_squared_error(y_test, test_predictions))


Train MSE: 0.3346768740720552
Test MSE: 0.396398391449601
