# Part 3: Optimising mixtures

Our aim was to be able to predict mixtures that would lead to optimal compressive strengths to guide experminentation. We now have a model that can predict compressive strength based on mixture proportions, and the final step is to use an optimiser to find the mixture that produces the strongest concrete.

We will use the [Scipy library](https://www.scipy.org/), which contains an optimiser for this purpose. We firstly have to define a cost function which we want to minimise, and then we will run the optmiser from different starting points to find our candidate mixtures. Using different starting points is important, as we have no guarentee to find the global minimum.

In [None]:
import pandas as pd
import numpy as np
import pickle
from scipy.optimize import minimize

In [None]:
# Load data and both our models. We will focus on the GBR model, as it
# is faster to make predictions.
# You can investigate the RF model as well in your own time

test = pd.read_csv("processed/test_proportion.csv")

with open("/home/phil/Desktop/gradnet/trained/rf_model.pkl", "rb") as fh:
    rf_model = pickle.load( fh)
    
with open("/home/phil/Desktop/gradnet/trained/gbr_model.pkl", "rb") as fh:
    gbr = pickle.load( fh)

### We will look at 28 day old samples
 
We saw age was a feature, and is not part of a mixture. This is because concrete takes time to properly cure. As we had many observations for 28 day old samples, we will limit our investigation to these. 

In [None]:
mixture_ingredients = ['Cement','BlastFurnaceSlag','FlyAsh','Water',
                       'Superplasticizer','CoarseAggregate','FineAggregate','Age']

test_age_28 = test.loc[test.Age == 28, mixture_ingredients]

### Define a function to optimise

The optimiser needs a cost ('objective') function to work with. It will take this function and some starting values, and try to find a minimum.

The cost function takes in n-1 of our mixture proportions, and infers the last. We do this to ensure our mixture proportions sum exactly to 1. Once that process is complete, we append the age (28) to our feature vector. This is then fed into our model, and a prediction is generated. The optimiser will explore mixtures from a given starting point to try and find an optimal value, i.e. the mixture proportions which yield the greatest compressive strength.

As the optimiser is searching for a minimum, we return the negated compressive strength.

In [None]:
# assume numpy array
# x[0] = Cement
# x[1] = BlastFurnaceSlag
# x[2] = FlyAsh
# x[3] = Water
# x[4] = Superplasticizer
# x[5] = CoarseAggregate

def cost_function(x, model):
    
    # Check to make sure the mixture proportions are in the correct range
    # 'explode' the cost function if we violate
    if(x[0] < 0.0 or x[1] > 1.0): return(10**38)
    if(x[1] < 0.0 or x[1] > 1.0): return(10**38)
    if(x[2] < 0.0 or x[2] > 1.0): return(10**38)
    if(x[3] < 0.0 or x[3] > 1.0): return(10**38)
    if(x[4] < 0.0 or x[4] > 1.0): return(10**38)
    if(x[5] < 0.0 or x[5] > 1.0): return(10**38)
    
    # x[6] = FineAggregate, the proportion is 1 - the rest
    x = np.append(x, (1 - np.sum(x)))
    
    # if unrealistic amount of water
    if(x[3] < 0.05): return(10**38)

    # add age = 28
    x = np.append(x, 28.0)
    
    # return the prediction from the model
    # negate as we want the largest compressive strength, 
    # and the optimiser will minimise
    return -model.predict(x[None, :])[0]

### Mixture optimisation


Our strategy is to use  each observation in test set as a starting point (as we know they are realistic mixtures) and try to optimise.

Our cost function is not smooth, so we need to be careful which method we use. Nealder-Mead should perform well here.

We see not all starting values lead to a solution. We will run an optimisation for each starting value, and return the mixtures which we predict will give the best results.

In [None]:
res = np.empty((115, 9))
cols = ['Cement','BlastFurnaceSlag','FlyAsh','Water','Superplasticizer','CoarseAggregate']
index_vals = test_age_28.index.values

for i, ix in enumerate(index_vals):
    
    # current starting values
    curr_values = test_age_28.loc[ix, cols].values
    
    # perform the optimization
    opt = minimize(cost_function, curr_values, args=(gbr), method = "Nelder-Mead", options={'maxiter': 5000})
    
    if i % 25 == 0:
        print("===============")
        print("starting sample", i)
        print("compressive strength:",  -opt.fun)
        print("mixture:", opt.x)
        print()
    
    # store our mixtures and prediction of compressive strength
    res[i,:6] = opt.x
    res[i, 6] = (1.0 - opt.x.sum())
    res[i, 7] = 28.0
    res[i, 8] = -opt.fun
      

In [None]:
# lets pull out our best predictions
(
    pd.DataFrame(res, columns = mixture_ingredients + ['CompressiveStrength'])
        .sort_values(by = 'CompressiveStrength', ascending = False)
).head()

# Summary

We have achieved our objective: we have produced a selection of mixes we can propose for further experimentation.

To expand this work, we could include other consideration, for instance financial cost. The cost function could be enhanced with desirability function. For instance, if cement is very expensive, the fourth strongest mix prediction would be prohibitave.

