# Purpose

The purpose of this notebook is to explore a simple regression tutorial with the [autokeras](https://autokeras.com/tutorial/structured_data_regression/) library.

# Data

The CA housing data will be used for this example, which is a simple regressiont task. It will be loaded from the sklearn data loader. I'll split off 20% into a test set which `autokeras` won't see during model development.

In [2]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import pandas as pd

housing = fetch_california_housing()

X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2
)

pd.DataFrame(X_train, columns=housing.feature_names).describe()

Unnamed: 0,MedInc,HouseAge,AveRooms,AveBedrms,Population,AveOccup,Latitude,Longitude
count,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0,16512.0
mean,3.866568,28.61961,5.439112,1.098945,1425.32007,3.073659,35.635828,-119.569023
std,1.896199,12.586441,2.598831,0.502874,1109.155806,10.927191,2.145076,2.009436
min,0.4999,1.0,0.846154,0.333333,3.0,0.692308,32.54,-124.35
25%,2.5603,18.0,4.43814,1.006436,786.0,2.428851,33.93,-121.8
50%,3.5388,29.0,5.234189,1.048695,1167.0,2.817937,34.26,-118.49
75%,4.744,37.0,6.053664,1.099455,1731.0,3.286539,37.72,-118.0
max,15.0001,52.0,141.909091,34.066667,28566.0,1243.333333,41.95,-114.31


# StructuredDataRegressor

Now I'll use the `StructuredDataRegressor` class to automatically build the model without any architectural choices from me.

## 

In [3]:
import autokeras as ak

reg = ak.StructuredDataRegressor(
    overwrite=True, max_trials=20, directory="./ak-logs/intro-ca/"
)

reg.fit(X_train, y_train)


Trial 20 Complete [00h 00m 10s]
val_loss: 0.6868752241134644

Best val_loss So Far: 0.27613016963005066
Total elapsed time: 00h 38m 14s
INFO:tensorflow:Oracle triggered exit


To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.

Epoch 1/144
Epoch 2/144
Epoch 3/144
Epoch 4/144
Epoch 5/144
Epoch 6/144
Epoch 7/144
Epoch 8/144
Epoch 9/144
Epoch 10/144
Epoch 11/144
Epoch 12/144
Epoch 13/144
Epoch 14/144
Epoch 15/144
Epoch 16/144
Epoch 17/144
Epoch 18/144
Epoch 19/144
Epoch 20/144
Epoch 21/144
Epoch 22/144
Epoch 23/144
Epoch 24/144
Epoch 25/144
Epoch 26/144
Epoch 27/144
Epoch 28/144
Epoch 29/144
Epoch 30/144
Epoch 31/144
Epoch 32/144
Epoch 33/144
Epoch 34/144
Epoch 35/144
Epoch 36/144
Epoch 37/144
Epoch 38/144
Epoch 39/144
Epoch 40/144
Epoch 41/144
E

<tensorflow.python.keras.callbacks.History at 0x7f13d2f3b1d0>

## Test Model

In [4]:
from sklearn.metrics import mean_squared_error

train_predictions = reg.predict(X_train)
print("Train MSE:", mean_squared_error(y_train, train_predictions))
test_predictions = reg.predict(X_test)
print("Test MSE:", mean_squared_error(y_test, test_predictions))


Train MSE: 0.24980264982787825
Test MSE: 0.2672614839319725
