# Mean Squared Error Example

The following example uses the elastic_tensor_2015 dataset and a default config to create a MatPipe. This MatPipe is used to benchmark the target property K_VRH. We use the resulting information to determine the mean squared error.


## Setting up the Dataframe

Use the load dataset function to get access to the dataset in Automatminer. In this example, we will be loading the elastic_tensor_2015 dataset.

In [None]:
from matminer.datasets.dataset_retrieval import load_dataset

df = load_dataset("elastic_tensor_2015")

Use get_preset_config to use different pre-built configurations for a MatPipe. The options include production, default, fast, and debug. Specific details about each config can be seen in presets.py. In this example, we will be using the debug config for a short program runtime. Then, we will pass in the parameter as an argument of MatPipe to get a MatPipe object.

In [None]:
from automatminer.presets import get_preset_config
from automatminer.pipeline import MatPipe

debug_config = get_preset_config("debug")
pipe = MatPipe(**debug_config)

The preset automatminer uses pre-defined column names 'composition' and 'structure' to find the composition and structure columns. You can change these by editing your config.

In [None]:
df = df.rename(columns={"formula": "composition"})[["composition", "structure", "K_VRH"]]

## Benchmarking

In this example, we are performing an ML benchmark using MatPipe in order to see how well AutoML can predict a certain target property. The target property we will be benchmarking in this example is K_VRH. 

In [None]:
kfold = KFold(n_splits=5)predicted = pipe.benchmark(df, "K_VRH", kfold)

## Calculating MSE

The predicted variable is a dataframe that contains several columns, including actual property values and predicted property values. In this example, we will use the actual K_VRH data and the predicted K_VRH data in order to see how well the benchmarking went.

In [None]:
y_true = predicted["K_VRH"]
y_test = predicted["K_VRH predicted"]

Use the mean_squared_error function from sklearn to calculate the mean squared error of predicted vs. actual K_VRH data.

In [None]:
from sklearn.metrics.regression import mean_squared_error

mse = mean_squared_error(y_true, y_test)