# 4: FIND GLOBAL MAXIMUM OF COMPOSITE SCORE

Using an optimization technique, we find the optimal values of features that maximize the composite score metric.

In [17]:
import pandas as pd
import optuna
import numpy as np

pd.set_option('display.max_columns', 500)

# Load data
train = pd.read_csv('processed_data/train.csv', index_col=0).dropna()
val = pd.read_csv('processed_data/val.csv', index_col=0).dropna()
test = pd.read_csv('processed_data/test.csv', index_col=0).dropna()

data = pd.concat([train, val, test])
data.head()

Unnamed: 0_level_0,primaryTitle,isAdult,startYear,averageRating,runtimeMinutes,numVotes_log,gross_income_log,composite_score,genres_TE,genres_COUNT,actor_TE,actor_COUNT,actress_TE,actress_COUNT,casting_director_TE,casting_director_COUNT,cinematographer_TE,cinematographer_COUNT,composer_TE,composer_COUNT,director_TE,director_COUNT,editor_TE,editor_COUNT,producer_TE,producer_COUNT,production_designer_TE,production_designer_COUNT,self_TE,self_COUNT,writer_TE,writer_COUNT
tconst,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1
tt0097192,Les deux Fragonard,0,1989.0,5.5,112.0,1.623249,0.0,0.227979,0.305057,1,0.30658,7,0.253762,3,0.227979,1,0.316176,1,0.299059,1,0.300059,1,0.323704,1,0.241344,1,0.274476,1,0.294165,1,0.327709,2
tt1588425,Poisoning Paradise: Ecocide New Zealand,0,2009.0,6.5,98.0,1.568202,0.0,0.270635,0.313868,2,0.308833,1,0.309132,1,0.28311,1,0.287034,2,0.273664,1,0.271371,1,0.287034,2,0.26869,1,0.282329,1,0.270635,10,0.283174,1
tt0065656,Dorian Gray,0,1970.0,5.8,93.0,3.107549,0.0,0.290249,0.295848,2,0.36196,4,0.303848,7,0.28311,1,0.341581,1,0.262655,2,0.339354,1,0.307324,2,0.35087,1,0.282329,1,0.294165,1,0.299086,4
tt0193617,"Vzveytes sokoly, orlami!",0,1981.0,6.4,77.0,1.079181,0.0,0.250068,0.305057,1,0.245348,6,0.250068,1,0.28311,1,0.256766,1,0.238734,1,0.255078,1,0.260133,1,0.26869,1,0.242623,1,0.294165,1,0.301126,1
tt8188734,Alta Banu,0,2018.0,6.8,130.0,1.255273,0.0,0.273661,0.305057,1,0.303137,6,0.29536,4,0.28311,1,0.269048,1,0.273664,1,0.278819,1,0.278819,1,0.30237,1,0.282329,1,0.294165,1,0.282628,3


## BASELINE

The maxima value obtained by this method should exceed the maximum composite score that can be obtained from this dataset. This hopes that there is a **better combination** of feature values that **results in an even higher composite score.**

In [14]:
# Load the pickled file
import pickle

MODEL_FILENAME = './models/2024-05-11 16-49-39_Booster.pkl'

with open(MODEL_FILENAME, 'rb') as file:
    best_model = pickle.load(file)

### Max predicted composite score

In [20]:
preds = best_model.predict(data[best_model.feature_name()])
print("Max predicted composite score: ", preds.max())

Max predicted composite score:  0.8552375029846687


## BAYESIAN OPTIMIZATION/`gp_minimize`

Source: https://towardsdatascience.com/the-beauty-of-bayesian-optimization-explained-in-simple-terms-81f3ee13b10f

The `gp_minimize` function from the Scikit-Optimize (`skopt`) library is tool for performing Bayesian optimization using Gaussian Processes (GP). Here’s an intuitive breakdown of its theoretical underpinnings:

1. **Surrogate Model:** It uses a Gaussian Process (GP) as a surrogate model to estimate the function's behavior. This model is good because it not only predicts the function’s values but also estimates how uncertain these predictions are.

2. **Decision Helper:** An acquisition function helps decide where to look next in the function's domain. It balances focusing on areas where the function is predicted to be lowest (exploitation) and exploring areas where the prediction is uncertain (exploration).

3. **Learning Cycle:** Starting with a few random points, it follows a cycle:
* Update the GP model with all known data points.
* Use the acquisition function to pick a new point to check.
* Evaluate the actual function at this new point and update the data.

Repeat this cycle until a stopping condition is met.

In [3]:
from skopt import gp_minimize

def objective_function(params):
    
    isAdult, runtimeMinutes, genres_TE, genres_COUNT, actor_TE, actor_COUNT,\
    actress_TE, actress_COUNT, casting_director_TE, casting_director_COUNT,\
    cinematographer_TE, cinematographer_COUNT, composer_TE, composer_COUNT,\
    director_TE, director_COUNT, editor_TE, editor_COUNT, producer_TE,\
    producer_COUNT, production_designer_TE, production_designer_COUNT,\
    self_TE, self_COUNT, writer_TE, writer_COUNT = params
    
    inputs = [isAdult, runtimeMinutes, genres_TE, genres_COUNT, actor_TE,
              actor_COUNT, actress_TE, actress_COUNT, casting_director_TE,
              casting_director_COUNT, cinematographer_TE,
              cinematographer_COUNT, composer_TE, composer_COUNT, director_TE,
              director_COUNT, editor_TE, editor_COUNT, producer_TE,
              producer_COUNT, production_designer_TE,
              production_designer_COUNT, self_TE, self_COUNT, writer_TE,
              writer_COUNT]
    
    return -1*best_model.predict(np.array([inputs]))[0]

In [4]:
runtimeMinutes_min,runtimeMinutes_max =  data['runtimeMinutes'].min(),\
                                         data['runtimeMinutes'].quantile(0.99)

genres_TE_min,genres_TE_max =  data['genres_TE'].min(),data['genres_TE'].max()

genres_COUNT_min,genres_COUNT_max =  data['genres_COUNT'].min(),\
                                     data['genres_COUNT'].max()

actor_TE_min,actor_TE_max =  data['actor_TE'].min(),data['actor_TE'].max()

actor_COUNT_min,actor_COUNT_max =  data['actor_COUNT'].min(),\
                                   data['actor_COUNT'].max()

actress_TE_min,actress_TE_max =  data['actress_TE'].min(),\
                                 data['actress_TE'].max()

actress_COUNT_min,actress_COUNT_max =  data['actress_COUNT'].min(),\
                                       data['actress_COUNT'].max()

casting_director_TE_min,casting_director_TE_max =\
                                       data['casting_director_TE'].min(),\
                                       data['casting_director_TE'].max()

casting_director_COUNT_min,casting_director_COUNT_max =\
                                      data['casting_director_COUNT'].min(),\
                                      data['casting_director_COUNT'].max()

cinematographer_TE_min,cinematographer_TE_max =\
                                     data['cinematographer_TE'].min(),\
                                     data['cinematographer_TE'].max()

cinematographer_COUNT_min,cinematographer_COUNT_max =\
                                     data['cinematographer_COUNT'].min(),\
                                     data['cinematographer_COUNT'].max()

composer_TE_min,composer_TE_max =  data['composer_TE'].min(),\
                                   data['composer_TE'].max()

composer_COUNT_min,composer_COUNT_max = data['composer_COUNT'].min(),\
                                        data['composer_COUNT'].max()

director_TE_min,director_TE_max =  data['director_TE'].min(),\
                                   data['director_TE'].max()

director_COUNT_min,director_COUNT_max = data['director_COUNT'].min(),\
                                        data['director_COUNT'].max()

editor_TE_min,editor_TE_max =  data['editor_TE'].min(),\
                               data['editor_TE'].max()

editor_COUNT_min,editor_COUNT_max = data['editor_COUNT'].min(),\
                                    data['editor_COUNT'].max()

producer_TE_min,producer_TE_max = data['producer_TE'].min(),\
                                  data['producer_TE'].max()

producer_COUNT_min,producer_COUNT_max = data['producer_COUNT'].min(),\
                                        data['producer_COUNT'].max()

production_designer_TE_min,production_designer_TE_max =\
                                        data['production_designer_TE'].min(),\
                                        data['production_designer_TE'].max()

production_designer_COUNT_min,production_designer_COUNT_max =\
                                     data['production_designer_COUNT'].min(),\
                                     data['production_designer_COUNT'].max()

self_TE_min,self_TE_max = data['self_TE'].min(),data['self_TE'].max()

self_COUNT_min,self_COUNT_max = data['self_COUNT'].min(),\
                                data['self_COUNT'].max()

writer_TE_min,writer_TE_max =  data['writer_TE'].min(),data['writer_TE'].max()

writer_COUNT_min,writer_COUNT_max =  data['writer_COUNT'].min(),\
                                     data['writer_COUNT'].max()

Note: the `gp_minimize` is designed for minimization problems. For maximization problems, a workaround is the minimize the negative of the objective function.

In [5]:
from skopt.space import Real,Integer,Categorical
from skopt import gp_minimize

search_space = [Categorical([0,1],name='isAdult'),
                Real(runtimeMinutes_min,runtimeMinutes_max,'log-uniform',
                     name='runtimeMinutes'),
                Real(genres_TE_min, genres_TE_max,'uniform',
                     name='genres_TE'),
                Integer(genres_COUNT_min, genres_COUNT_max,
                        name='genres_COUNT'),
                Real(actor_TE_min, actor_TE_max,'uniform',
                     name='actor_TE'),
                Integer(actor_COUNT_min, actor_COUNT_max,
                        name='actor_COUNT'),
                Real(actress_TE_min, actress_TE_max,'uniform',
                     name='actress_TE'),
                Integer(actress_COUNT_min, actress_COUNT_max,
                        name='actress_COUNT'),
                Real(casting_director_TE_min, casting_director_TE_max,
                     'uniform',name='casting_director_TE'),
                Integer(casting_director_COUNT_min,casting_director_COUNT_max,
                        name='casting_director_COUNT'),
                Real(cinematographer_TE_min,cinematographer_TE_max,'uniform',
                     name='cinematographer_TE'),
                Integer(cinematographer_COUNT_min,cinematographer_COUNT_max,
                        name='cinematographer_COUNT'),
                Real(composer_TE_min, composer_TE_max,'uniform',
                     name='composer_TE'),
                Integer(composer_COUNT_min, composer_COUNT_max,
                        name='composer_COUNT'),
                Real(director_TE_min, director_TE_max,'uniform',
                     name='director_TE'),
                Integer(director_COUNT_min, director_COUNT_max,
                        name='director_COUNT'),
                Real(editor_TE_min, editor_TE_max,'uniform',
                     name='editor_TE'),
                Integer(editor_COUNT_min, editor_COUNT_max,
                        name='editor_COUNT'),
                Real(producer_TE_min, producer_TE_max,'uniform',
                     name='producer_TE'),
                Integer(producer_COUNT_min, producer_COUNT_max,
                        name='producer_COUNT'),
                Real(production_designer_TE_min, production_designer_TE_max,
                     'uniform',name='production_designer_TE'),
                Integer(production_designer_COUNT_min,
                        production_designer_COUNT_max,
                        name='production_designer_COUNT'),
                Real(self_TE_min,self_TE_max,'uniform',
                     name='self_TE'),
                Integer(self_COUNT_min, self_COUNT_max,
                        name='self_COUNT'),
                Real(writer_TE_min, writer_TE_max,
                     'uniform',name='writer_TE'),
                Integer(writer_COUNT_min, writer_COUNT_max,
                        name='writer_COUNT')]

result = gp_minimize(objective_function,search_space,n_calls=500,
                     n_initial_points=100,random_state=0,verbose=1)

Iteration No: 1 started. Evaluating function at random point.
Iteration No: 1 ended. Evaluation done at random point.
Time taken: 0.0012
Function value obtained: -0.5350
Current minimum: -0.5350
Iteration No: 2 started. Evaluating function at random point.
Iteration No: 2 ended. Evaluation done at random point.
Time taken: 0.0009
Function value obtained: -0.2712
Current minimum: -0.5350
Iteration No: 3 started. Evaluating function at random point.
Iteration No: 3 ended. Evaluation done at random point.
Time taken: 0.0010
Function value obtained: -0.6130
Current minimum: -0.6130
Iteration No: 4 started. Evaluating function at random point.
Iteration No: 4 ended. Evaluation done at random point.
Time taken: 0.0008
Function value obtained: -0.3909
Current minimum: -0.6130
Iteration No: 5 started. Evaluating function at random point.
Iteration No: 5 ended. Evaluation done at random point.
Time taken: 0.0008
Function value obtained: -0.4144
Current minimum: -0.6130
Iteration No: 6 started. 

Iteration No: 100 ended. Evaluation done at random point.
Time taken: 0.9627
Function value obtained: -0.5681
Current minimum: -0.7859
Iteration No: 101 started. Searching for the next optimal point.
Iteration No: 101 ended. Search finished for the next optimal point.
Time taken: 0.9958
Function value obtained: -0.7168
Current minimum: -0.7859
Iteration No: 102 started. Searching for the next optimal point.
Iteration No: 102 ended. Search finished for the next optimal point.
Time taken: 1.1995
Function value obtained: -0.7649
Current minimum: -0.7859
Iteration No: 103 started. Searching for the next optimal point.
Iteration No: 103 ended. Search finished for the next optimal point.
Time taken: 1.1621
Function value obtained: -0.7143
Current minimum: -0.7859
Iteration No: 104 started. Searching for the next optimal point.
Iteration No: 104 ended. Search finished for the next optimal point.
Time taken: 1.1975
Function value obtained: -0.7479
Current minimum: -0.7859
Iteration No: 105 sta

Iteration No: 139 ended. Search finished for the next optimal point.
Time taken: 5.5058
Function value obtained: -0.8362
Current minimum: -0.8591
Iteration No: 140 started. Searching for the next optimal point.
Iteration No: 140 ended. Search finished for the next optimal point.
Time taken: 6.3450
Function value obtained: -0.8559
Current minimum: -0.8591
Iteration No: 141 started. Searching for the next optimal point.
Iteration No: 141 ended. Search finished for the next optimal point.
Time taken: 6.0062
Function value obtained: -0.8660
Current minimum: -0.8660
Iteration No: 142 started. Searching for the next optimal point.
Iteration No: 142 ended. Search finished for the next optimal point.
Time taken: 2.2705
Function value obtained: -0.8346
Current minimum: -0.8660
Iteration No: 143 started. Searching for the next optimal point.
Iteration No: 143 ended. Search finished for the next optimal point.
Time taken: 3.7893
Function value obtained: -0.8718
Current minimum: -0.8718
Iteration 

Iteration No: 178 ended. Search finished for the next optimal point.
Time taken: 9.5269
Function value obtained: -0.8448
Current minimum: -0.8720
Iteration No: 179 started. Searching for the next optimal point.
Iteration No: 179 ended. Search finished for the next optimal point.
Time taken: 9.8878
Function value obtained: -0.8725
Current minimum: -0.8725
Iteration No: 180 started. Searching for the next optimal point.
Iteration No: 180 ended. Search finished for the next optimal point.
Time taken: 11.1849
Function value obtained: -0.7614
Current minimum: -0.8725
Iteration No: 181 started. Searching for the next optimal point.
Iteration No: 181 ended. Search finished for the next optimal point.
Time taken: 8.0844
Function value obtained: -0.8572
Current minimum: -0.8725
Iteration No: 182 started. Searching for the next optimal point.
Iteration No: 182 ended. Search finished for the next optimal point.
Time taken: 7.1635
Function value obtained: -0.8423
Current minimum: -0.8725
Iteration

Iteration No: 217 ended. Search finished for the next optimal point.
Time taken: 11.3386
Function value obtained: -0.8684
Current minimum: -0.8784
Iteration No: 218 started. Searching for the next optimal point.
Iteration No: 218 ended. Search finished for the next optimal point.
Time taken: 15.1644
Function value obtained: -0.8680
Current minimum: -0.8784
Iteration No: 219 started. Searching for the next optimal point.
Iteration No: 219 ended. Search finished for the next optimal point.
Time taken: 12.7981
Function value obtained: -0.8245
Current minimum: -0.8784
Iteration No: 220 started. Searching for the next optimal point.
Iteration No: 220 ended. Search finished for the next optimal point.
Time taken: 11.6092
Function value obtained: -0.7170
Current minimum: -0.8784
Iteration No: 221 started. Searching for the next optimal point.
Iteration No: 221 ended. Search finished for the next optimal point.
Time taken: 15.5685
Function value obtained: -0.8384
Current minimum: -0.8784
Itera

Iteration No: 256 ended. Search finished for the next optimal point.
Time taken: 12.6507
Function value obtained: -0.8693
Current minimum: -0.8807
Iteration No: 257 started. Searching for the next optimal point.
Iteration No: 257 ended. Search finished for the next optimal point.
Time taken: 13.7582
Function value obtained: -0.8691
Current minimum: -0.8807
Iteration No: 258 started. Searching for the next optimal point.
Iteration No: 258 ended. Search finished for the next optimal point.
Time taken: 17.5496
Function value obtained: -0.8403
Current minimum: -0.8807
Iteration No: 259 started. Searching for the next optimal point.
Iteration No: 259 ended. Search finished for the next optimal point.
Time taken: 15.5664
Function value obtained: -0.8040
Current minimum: -0.8807
Iteration No: 260 started. Searching for the next optimal point.
Iteration No: 260 ended. Search finished for the next optimal point.
Time taken: 13.5118
Function value obtained: -0.8678
Current minimum: -0.8807
Itera

Iteration No: 295 ended. Search finished for the next optimal point.
Time taken: 23.0794
Function value obtained: -0.8076
Current minimum: -0.8807
Iteration No: 296 started. Searching for the next optimal point.
Iteration No: 296 ended. Search finished for the next optimal point.
Time taken: 22.7698
Function value obtained: -0.8758
Current minimum: -0.8807
Iteration No: 297 started. Searching for the next optimal point.
Iteration No: 297 ended. Search finished for the next optimal point.
Time taken: 24.1790
Function value obtained: -0.8775
Current minimum: -0.8807
Iteration No: 298 started. Searching for the next optimal point.
Iteration No: 298 ended. Search finished for the next optimal point.
Time taken: 19.7700
Function value obtained: -0.8663
Current minimum: -0.8807
Iteration No: 299 started. Searching for the next optimal point.
Iteration No: 299 ended. Search finished for the next optimal point.
Time taken: 15.8169
Function value obtained: -0.8762
Current minimum: -0.8807
Itera

Iteration No: 334 ended. Search finished for the next optimal point.
Time taken: 25.7942
Function value obtained: -0.8436
Current minimum: -0.8807
Iteration No: 335 started. Searching for the next optimal point.
Iteration No: 335 ended. Search finished for the next optimal point.
Time taken: 22.5131
Function value obtained: -0.8685
Current minimum: -0.8807
Iteration No: 336 started. Searching for the next optimal point.
Iteration No: 336 ended. Search finished for the next optimal point.
Time taken: 26.0085
Function value obtained: -0.8716
Current minimum: -0.8807
Iteration No: 337 started. Searching for the next optimal point.
Iteration No: 337 ended. Search finished for the next optimal point.
Time taken: 25.8948
Function value obtained: -0.7626
Current minimum: -0.8807
Iteration No: 338 started. Searching for the next optimal point.
Iteration No: 338 ended. Search finished for the next optimal point.
Time taken: 23.3116
Function value obtained: -0.8693
Current minimum: -0.8807
Itera

Iteration No: 373 ended. Search finished for the next optimal point.
Time taken: 32.4772
Function value obtained: -0.8673
Current minimum: -0.8809
Iteration No: 374 started. Searching for the next optimal point.
Iteration No: 374 ended. Search finished for the next optimal point.
Time taken: 31.0713
Function value obtained: -0.8774
Current minimum: -0.8809
Iteration No: 375 started. Searching for the next optimal point.
Iteration No: 375 ended. Search finished for the next optimal point.
Time taken: 34.5698
Function value obtained: -0.8635
Current minimum: -0.8809
Iteration No: 376 started. Searching for the next optimal point.
Iteration No: 376 ended. Search finished for the next optimal point.
Time taken: 25.0011
Function value obtained: -0.8735
Current minimum: -0.8809
Iteration No: 377 started. Searching for the next optimal point.
Iteration No: 377 ended. Search finished for the next optimal point.
Time taken: 24.4835
Function value obtained: -0.7946
Current minimum: -0.8809
Itera

Iteration No: 412 ended. Search finished for the next optimal point.
Time taken: 29.2088
Function value obtained: -0.8668
Current minimum: -0.8816
Iteration No: 413 started. Searching for the next optimal point.
Iteration No: 413 ended. Search finished for the next optimal point.
Time taken: 30.9429
Function value obtained: -0.8773
Current minimum: -0.8816
Iteration No: 414 started. Searching for the next optimal point.
Iteration No: 414 ended. Search finished for the next optimal point.
Time taken: 29.7006
Function value obtained: -0.8723
Current minimum: -0.8816
Iteration No: 415 started. Searching for the next optimal point.
Iteration No: 415 ended. Search finished for the next optimal point.
Time taken: 34.2822
Function value obtained: -0.8688
Current minimum: -0.8816
Iteration No: 416 started. Searching for the next optimal point.
Iteration No: 416 ended. Search finished for the next optimal point.
Time taken: 31.4413
Function value obtained: -0.8766
Current minimum: -0.8816
Itera

Iteration No: 451 ended. Search finished for the next optimal point.
Time taken: 32.1446
Function value obtained: -0.8784
Current minimum: -0.8816
Iteration No: 452 started. Searching for the next optimal point.
Iteration No: 452 ended. Search finished for the next optimal point.
Time taken: 31.8146
Function value obtained: -0.8723
Current minimum: -0.8816
Iteration No: 453 started. Searching for the next optimal point.
Iteration No: 453 ended. Search finished for the next optimal point.
Time taken: 37.1877
Function value obtained: -0.8730
Current minimum: -0.8816
Iteration No: 454 started. Searching for the next optimal point.
Iteration No: 454 ended. Search finished for the next optimal point.
Time taken: 33.7997
Function value obtained: -0.8765
Current minimum: -0.8816
Iteration No: 455 started. Searching for the next optimal point.
Iteration No: 455 ended. Search finished for the next optimal point.
Time taken: 33.6617
Function value obtained: -0.8770
Current minimum: -0.8816
Itera

Iteration No: 490 ended. Search finished for the next optimal point.
Time taken: 35.1264
Function value obtained: -0.8784
Current minimum: -0.8816
Iteration No: 491 started. Searching for the next optimal point.
Iteration No: 491 ended. Search finished for the next optimal point.
Time taken: 36.0732
Function value obtained: -0.8806
Current minimum: -0.8816
Iteration No: 492 started. Searching for the next optimal point.
Iteration No: 492 ended. Search finished for the next optimal point.
Time taken: 35.3319
Function value obtained: -0.8792
Current minimum: -0.8816
Iteration No: 493 started. Searching for the next optimal point.
Iteration No: 493 ended. Search finished for the next optimal point.
Time taken: 35.7405
Function value obtained: -0.8620
Current minimum: -0.8816
Iteration No: 494 started. Searching for the next optimal point.
Iteration No: 494 ended. Search finished for the next optimal point.
Time taken: 37.2083
Function value obtained: -0.8789
Current minimum: -0.8816
Itera

The maximum found using this approach is **0.8816**, which is better than the maximum prediction obtained from the dataset.

In [6]:
best_params = dict(zip([feat.name for feat in search_space], result.x))
best_params

{'isAdult': 0,
 'runtimeMinutes': 168.99999999999997,
 'genres_TE': 0.3844082717006556,
 'genres_COUNT': 3,
 'actor_TE': 0.6533680613825336,
 'actor_COUNT': 38,
 'actress_TE': 0.8189210693083288,
 'actress_COUNT': 1,
 'casting_director_TE': 0.8560667653884508,
 'casting_director_COUNT': 1,
 'cinematographer_TE': 0.0102518649082633,
 'cinematographer_COUNT': 1,
 'composer_TE': 0.8182351827308728,
 'composer_COUNT': 1,
 'director_TE': 0.8837858813422167,
 'director_COUNT': 1,
 'editor_TE': 0.8623019167399543,
 'editor_COUNT': 1,
 'producer_TE': 0.7347334584340207,
 'producer_COUNT': 1,
 'production_designer_TE': 0.9041301479025808,
 'production_designer_COUNT': 4,
 'self_TE': 0.7724457277542547,
 'self_COUNT': 30,
 'writer_TE': 0.736346111203962,
 'writer_COUNT': 3}

In [30]:
pd.DataFrame(pd.Series(best_params), columns=['optimal_value'])

Unnamed: 0,optimal_value
isAdult,0.0
runtimeMinutes,169.0
genres_TE,0.384408
genres_COUNT,3.0
actor_TE,0.653368
actor_COUNT,38.0
actress_TE,0.818921
actress_COUNT,1.0
casting_director_TE,0.856067
casting_director_COUNT,1.0


## SAVE BEST PARAMS

In [31]:
import pickle
from datetime import datetime

curr_time = datetime.today().strftime('%Y-%m-%d %H-%M-%S')

OPTIMALFEATURES_FILENAME = f'./models/{curr_time}_OPTIMALFEATURES.pkl'

pickle.dump(best_params, open(OPTIMALFEATURES_FILENAME, 'wb'))
print(OPTIMALFEATURES_FILENAME)

./models/2024-05-13 12-43-07_OPTIMALFEATURES.pkl
