# Linear regression tutorial

WIP: adapt the BQML penguin weight tutorial to BigFrames https://cloud.google.com/bigquery-ml/docs/linear-regression-tutorial

This is an exploration for how a minimal combination of BQML and SKLearn style might work.

In [1]:
import bigframes

session = bigframes.connect()

Lets load the table containing our source data

In [2]:
df = session.read_gbq("bigquery-public-data.ml_datasets.penguins")
df.head()

We want to predict body_mass_g, but there are some missing measurements. Lets remove those

In [None]:
df = df[df['body_mass_g'].notNull()]
df.head()

Great! Now lets configure a linear regression model to predict body mass from the other columns

In [None]:
import bigframes.ml as ml

model = ml.LinearRegression()
model

The model is just an empty configuration at the moment, it won't create anything in BigQuery until we fit it to some training data

In [None]:
train_x = df['species', 'island', 'culmen_length_mm', 'culmen_depth_mm', 'flipper_length_mm', 'sex']
train_y = df['body_mass_g']
model.fit(train_x, train_y)
model

BigQuery automatically managed our training data split and model evaluation for us - lets see how well the model performed

In [None]:
model.evaluate()

Great! The model works well. Because we created it without a name, it was just a temporary model that will disappear after 24 hours. 

We decide that this approach is promising, so lets fit it again, but this time we'll specify a name so that the fitted model is saved.

In [None]:
model.set_params(name="projectid.datasetid.modelname")
model.fit(train_x, train_y)

model

We can now use this model anywhere in BigQuery with this name. We can also load it again in our BigFrames session and evaluate or inference it without needing to retrain it:

In [None]:
model = session.read_gbq("projectid.datasetid.modelname")

model

And of course we can retrain it:

In [None]:
model.fit(train_x, train_y)

We want to productionalize this model, so lets start publishing it to the vertex model registry ([prerequisites](https://cloud.google.com/bigquery-ml/docs/managing-models-vertex#prerequisites))

Note that while we can load models from BigQuery and change the parameters, things are only ever persisted when we run .fit()

In [None]:
model.set_params(
    registry="vertex_ai",
    vertex_ai_model_version_aliases=["experimental"])
model.fit(train_x, train_y)

Now when we fit the model, we can see it published here: https://pantheon.corp.google.com/vertex-ai/models