# Purpose

The purpose of this notebook is to explore a simple regression tutorial with the [autokeras](https://autokeras.com/tutorial/structured_data_regression/) library.

# Data

The CA housing data will be used for this example, which is a simple regressiont task. It will be loaded from the sklearn data loader. I'll split off 20% into a test set which `autokeras` won't see during model development.

In [None]:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
import pandas as pd

housing = fetch_california_housing()

X_train, X_test, y_train, y_test = train_test_split(
    housing.data, housing.target, test_size=0.2
)

pd.DataFrame(X_train, columns=housing.feature_names).describe()

# StructuredDataRegressor

Now I'll use the `StructuredDataRegressor` class to automatically build the model without any architectural choices from me.

## Define and Fit

Below I'll simply point `autokeras` to a directory to save log files, give it a maximum number of search trials and let it commense a search.  You don't need to specify number of epochs as it will adapt to the task.

In [None]:
import autokeras as ak

automl = ak.StructuredDataRegressor(
    overwrite=True, max_trials=20, directory="./ak-logs/intro-ca/"
)

automl.fit(X_train, y_train)


## Best Model

Now I'll export the best model and save it to hard disk before exploring and evaluating.

In [None]:
mod = automl.export_model()
mod.save('ak-mods/intro-ca')

## Test Model

In [None]:
from sklearn.metrics import mean_squared_error

# train_predictions = reg.predict(X_train)
# print("Train MSE:", mean_squared_error(y_train, train_predictions))
# test_predictions = reg.predict(X_test)
# print("Test MSE:", mean_squared_error(y_test, test_predictions))
