### 0 - Imports :

In [1]:
import sys
import pandas as pd
import numpy as np
import os

In [2]:
current_path = os.getcwd()

project_directory = os.path.dirname( os.path.abspath( current_path ) )
history_data_filename = "IA BPC Teinte v2.xlsm"
requested_formulas_data_filename = "IA BPC Teinte Besoins.xlsm"
reference_formulas_data_filename = "composition des SOP avec type d'intro MP.xlsm"

In [3]:
sys.path.append(project_directory)
from model.modules import Factory , Formula , Batch
from model.outil import create_factory, upload_factory , determine_week_month_requested_formulas , check_datafiles , add_type_column_to_history_df , filter_history_df , transform_formulas_prediction ,transform_reference_df ,  determine_selected_formulas_ids

### 1 - Loading Data :

The user should enter the date of the provided data in the format "DD-MM-YYYY".

In [4]:
data_date = "19-08-2024"

Folders will then be created for temporary files, plot files, and result files specific to this date.

In [5]:
data_files_path = project_directory + "/data/data "+ data_date +"/"
temp_files_path = project_directory + "/temp files/temp files "+ data_date +"/"
plots_files_path = project_directory + "/plots files/plots files "+ data_date +"/"
results_files_path = project_directory + "/results files/results files "+ data_date +"/"

In [6]:
for path in [temp_files_path , plots_files_path , results_files_path] :
    if not os.path.exists(path):
        os.makedirs(path)
        print(f"Folder created at : {path}")
    else:
        print(f"Folder already exists at : {path}")

Folder already exists at : /Users/mohamadelosman/Library/CloudStorage/OneDrive-L'Oréal/Bureau/Smart BATCH Project/temp files/temp files 19-08-2024/
Folder already exists at : /Users/mohamadelosman/Library/CloudStorage/OneDrive-L'Oréal/Bureau/Smart BATCH Project/plots files/plots files 19-08-2024/
Folder already exists at : /Users/mohamadelosman/Library/CloudStorage/OneDrive-L'Oréal/Bureau/Smart BATCH Project/results files/results files 19-08-2024/


Reading the data files (formulas history file, requested batches file, and formula references file) specific to the specified date.

In [7]:
df = pd.read_excel(data_files_path + history_data_filename)
requested_formulas_df = pd.read_excel(data_files_path + requested_formulas_data_filename)
reference_df = pd.read_excel(data_files_path + reference_formulas_data_filename)

In [8]:
check_datafiles(df,requested_formulas_df,reference_df)

Total number of requested formulas : 332 

Number of requested formulas with both history and reference data : 217
Requested formula ids with both history and reference data : {'ZZ965978', 'ZZ9660001', 'ZZ96447502', 'ZZ96571307B', 'ZZ20100967', 'ZZ2009336', 'ZZ96564502', 'ZZ965973', 'ZZ993096', 'ZZ96435002', 'ZZ20102625', 'ZZ993778', 'ZZ96606706', 'ZZ2009963', 'ZZ2009185', 'ZZP96425508B', 'ZZ20101994', 'ZZ20097541', 'ZZ994009', 'ZZ87653707', 'ZZ9660051', 'ZZ86100124', 'ZZ993095', 'ZZ993769', 'ZZ965458', 'ZZ96564603', 'ZZ20091432', 'ZZ20091561', 'ZZ965971', 'ZZ86099524', 'ZZ993100', 'ZZ86121924', 'ZZ2009563', 'ZZ993772', 'ZZ96422721B', 'ZZ9932861', 'ZZ9657021', 'ZZ2009517', 'ZZ966015', 'ZZ965976', 'ZZ20087961', 'ZZ20101206', 'ZZ20091551', 'ZZ9660091', 'ZZ9932821', 'ZZ96571701', 'ZZ994073', 'ZZ993098', 'ZZ965963', 'ZZ9940111', 'ZZ994006', 'ZZP908348SA8', 'ZZ9659641', 'ZZ9921665', 'ZZ20097611', 'ZZ994016', 'ZZ965974', 'ZZ994087', 'ZZ2009875', 'ZZ966019', 'ZZ966055', 'ZZ993089', 'ZZ2009765

### 2 - Data Engineering :

Selecting the formula IDs that are requested in the requested batches file and are present in both the formulas history file and the formula references file.

In [9]:
df_aux = filter_history_df(df)
df_aux = add_type_column_to_history_df(df_aux , requested_formulas_df)
selected_formulas_ids = determine_selected_formulas_ids(df_aux , requested_formulas_df, reference_df )

In [10]:
print(len(selected_formulas_ids))

162


### 3 - Create / Upload factory object :

In [11]:
factory_file_name = "factory.pkl"

Creating the Factory object and saving it as a pickle file in the specified temporary folder.

In [12]:
factory = create_factory(df_aux,temp_files_path ,factory_file_name)


Processing formulas: 100%|██████████| 387/387 [00:35<00:00, 10.87it/s]


File replaced: factory.pkl


Loading the Factory object from the spcified pickle file.

In [13]:
#factory = upload_factory(temp_files_path , factory_file_name)

### 4 - Data Analysis :

Select formula IDs based on various criteria, such as high and low coefficient of variation (CV), whether they have been adjusted, and if they have non-unique raw material compositions.


In [14]:
formulas_ids_with_high_cv = factory.select_formulas_of_cv_higher_than_threshold(0.175)
formulas_ids_with_low_cv = factory.select_formulas_of_cv_lower_equal_than_threshold(0.2)
adjusted_fomulas_ids_list = factory.get_adjusted_formulas_ids()
no_unique_raw_material_composition_formulas_ids_list = factory.get_no_unique_raw_material_composition_formulas_ids()

Generate plots for the selected formulas and display them in the notebook.

In [15]:
formulas_ids_to_plot = list(selected_formulas_ids)[:5]
factory.plot_selected_formulas(formulas_ids_to_plot)

Generate plots for the selected formulas and export them to a pdf file.

In [16]:
plots_filename = "Formulas_plotted.pdf"
factory.plot_selected_formulas_to_pdf(formulas_ids_to_plot, plots_files_path + plots_filename)

Generate plots for the selected formulas with respect to the setted references and export them to a pdf file.

In [17]:
plots_filename = "Formulas_plotted_with_ref.pdf"
factory.plot_selected_formulas_with_ref_to_pdf(formulas_ids_to_plot, plots_files_path + plots_filename)

### 5 - Data Modelling :

#### 5.1 - Filtering Stable Formulas Stage :

Set stability criteria to identify stable formulas.

In [18]:
THRESHOLD_ADJUSTEMENTS_ACCEPTED = 0
NB_OF_RECENT_BATCHES_CONSIDERED = 3
THRESHOLD_CV = 0.002
MIN_ACCEPTED_BATCHES = 3

Identify stable formulas by applying specified stability criteria

In [19]:
stable_formulas = factory.search_of_stable_formulas(selected_formulas_ids = selected_formulas_ids , threshold_adjustements_accepted = THRESHOLD_ADJUSTEMENTS_ACCEPTED ,nb_of_recent_batches_considered = NB_OF_RECENT_BATCHES_CONSIDERED , threshold_cv = THRESHOLD_CV , min_accepted_batches = MIN_ACCEPTED_BATCHES)

In [20]:
stable_formulas_ids_list = []
for stable_formula in stable_formulas:
    stable_formulas_ids_list.append(list(stable_formula.keys())[0])


Remove the stable formulas from the list of selected formulas to prepare for the grid search stage.

In [21]:
selected_formulas_ids =  selected_formulas_ids - set(stable_formulas_ids_list) 

#### 5.2 - Filtering Stable Last Batch Formulas Stage :

Identify formulas with stable last batches by checking whether the last batch of each formula has been adjusted.

In [22]:
last_batch_not_adjusted_formula = []
for formula_id in factory.formulas_dict:
    formula_object = factory.formulas_dict[formula_id]
    if formula_object.batches_arr[-1].adjusted :
        last_batch_not_adjusted_formula.append(formula_id)

Remove the last batch stable formulas from the list of selected formulas to prepare for the grid search stage.


In [23]:
stable_last_batch_formulas_ids_list = set(last_batch_not_adjusted_formula) & set(selected_formulas_ids)
selected_formulas_ids =  selected_formulas_ids - stable_last_batch_formulas_ids_list 

#### 5.3 - Gridsearch Stage :

##### 5.3.1 - Setting gridsearch parameters :

The user should enter the `LIMIT_INITIALIZATION_DATE` in the format "YYYY-MM-DD" to specify the starting date for the historical data.

In [24]:
#LIMIT_INITIALIZATION_DATE = '2024-01-15'
LIMIT_INITIALIZATION_DATE = '2022-08-01'

Set the grid search parameters to generate recommendations for formulas references.

In [25]:
param_ranges = {
    'nb_batches_to_remove' : [1 , 2],
    'min_nb_batches' : [2 , 3 , 4 , 5 , 6 , 7],
    'max_nb_adjustments' : [5 , 6 , 7 , 8 , 9 , 10 , 11],
    'limit_initialization_date' : [pd.Period(LIMIT_INITIALIZATION_DATE)] ,
    'weighted' : [False , True] , 
    'initial_weight' : [1 , 1.2],
    'increase_rate': [ 0.2 , 0.25 , 0.3 , 0.35]
}

##### 5.3.2 - Running gridsearch process :

In [26]:
for percentage_filtered_out_formulas_threshold in range(0,101,5):
    passed_boolean , best_params, best_metrics , formulas_predictions , metrics_MP , formulas_ids_filtered_out , selected_batches_dict = factory.grid_search(selected_formulas_ids ,percentage_filtered_out_formulas_threshold,factory.function_metric ,param_ranges)
    if passed_boolean :
        break

Processing search: 30it [00:00, 295.52it/s]

Processing search: 1344it [00:02, 596.88it/s]
Processing search: 1344it [00:01, 699.34it/s]
Processing search: 1344it [00:02, 655.49it/s]
Processing search: 1344it [00:01, 688.42it/s]
Processing search: 1344it [00:01, 683.03it/s]
Processing search: 1344it [00:01, 696.20it/s]
Processing search: 1344it [00:01, 711.02it/s]
Processing search: 1344it [00:01, 683.02it/s]
Processing search: 1344it [00:01, 717.44it/s]
Processing search: 1344it [00:01, 692.62it/s]
Processing search: 1344it [00:01, 708.89it/s]
Processing search: 1344it [00:01, 678.58it/s]
Processing search: 1344it [00:02, 663.17it/s]
Processing search: 1344it [00:01, 696.34it/s]
Processing search: 1344it [00:01, 698.11it/s] 


In [27]:
print("Parameters used : \n")
print("Nb of batches removed : " , best_params[0] , "\t Min nb of batches : " , best_params[1] ,"\t Max nb of adjustments : " , best_params[2] ,"\t Limit initialization date : " , best_params[3] ,"\t Weighted : ",best_params[4],"\t Initial_weight : ",best_params[5] ,"\t Increase_rate : ",best_params[6], "\n")
print("Average metric : ",best_metrics[0],"\t Percentage of formulas filtered out : ",best_metrics[1],"\n")
print("List of formulas filtered out : " ,formulas_ids_filtered_out ,"\n")


Parameters used : 

Nb of batches removed :  1 	 Min nb of batches :  2 	 Max nb of adjustments :  6 	 Limit initialization date :  2022-08-01 	 Weighted :  True 	 Initial_weight :  1 	 Increase_rate :  0.35 

Average metric :  0.49821 	 Percentage of formulas filtered out :  67.30000000000001 

List of formulas filtered out :  ['ZZ87791404', 'ZZ993762', 'ZZ2010028', 'ZZ9941417', 'ZZ87647706', 'ZZ2010090', 'ZZ9941704', 'ZZ20100697', 'ZZ2010113', 'ZZ993777', 'ZZP2017092C35', 'ZZ993778', 'ZZ993775', 'ZZ2010450', 'ZZ20100796', 'ZZ201005220', 'ZZ2009963', 'ZZ966010', 'ZZ20096281', 'ZZ20100639', 'ZZ966047', 'ZZ994165', 'ZZ9658823', 'ZZ99414212', 'ZZ993785', 'ZZ87654110', 'ZZ9941733', 'ZZ2009608', 'ZZ2009563', 'ZZ2009517', 'ZZ20100547', 'ZZ96571406', 'ZZ966110', 'ZZP2010226', 'ZZ99416220', 'ZZ2009855', 'ZZ994005'] 



### 6 - Results exporting (formulas predictions compared to given formula reference):

Apply data transfromation process on the resulted formulas predictions and the given refernce formulas from the references file

In [28]:
formulas_predictions_df = transform_formulas_prediction(formulas_predictions)
reference_df_aux = transform_reference_df(reference_df , formulas_predictions)

Identify key indicators regarding the production requests for the selected formula IDs, and add these indicators to the results Excel file.

In [29]:
four_ten_days_requested_formulas_df_grouped , four_days_one_month_requested_formulas_df_grouped , earliest_date_after_four_days = determine_week_month_requested_formulas(requested_formulas_df , ref_date_str = data_date)

In [30]:
parameters_metrics_variables = {'C3': '0' , 
                                'C4': len(selected_formulas_ids) ,
                                'C7': best_params[0] ,
                                'C8': best_params[1] ,
                                'C9': best_params[2] ,
                                'C10': str(best_params[3]) ,
                                'C11': best_params[4] ,
                                'C12': best_params[5] ,
                                'C13': best_params[6] ,
                                'C16': str(round(best_metrics[1],3))+" %" ,
                                'C17': '' ,
                                'C20': best_metrics[0] ,
                                'C23': len(stable_formulas) ,
                                'C24': THRESHOLD_ADJUSTEMENTS_ACCEPTED ,
                                'C25': NB_OF_RECENT_BATCHES_CONSIDERED ,
                                'C26': MIN_ACCEPTED_BATCHES ,
                                'C27': THRESHOLD_CV 
                                }

In [31]:
parameters_metrics_variables

{'C3': '0',
 'C4': 55,
 'C7': 1,
 'C8': 2,
 'C9': 6,
 'C10': '2022-08-01',
 'C11': True,
 'C12': 1,
 'C13': 0.35,
 'C16': '67.3 %',
 'C17': '',
 'C20': 0.49821,
 'C23': 59,
 'C24': 0,
 'C25': 3,
 'C26': 3,
 'C27': 0.002}

Export the results to an Excel file with three sheets:

- **Formulas Predictions**: Contains detailed predictions of formula references and comparisons with the given formula references.
- **Formulas Indicators**: Provides detailed information about the formulas, including production requests, stability status (stable formula, last batch stable formula), grid search results (passed or filtered out), and a summary recommendation for each formula reference change.
- **Parameters and Metrics**: Details the parameters and key metrics used in the stable formulas filtering and grid search processes.


In [32]:
factory.export_predicted_and_ref_formulas_to_excel(results_files_path + "results_full_history_test.xlsx" , 2 ,formulas_predictions_df , reference_df_aux , four_ten_days_requested_formulas_df_grouped , four_days_one_month_requested_formulas_df_grouped ,earliest_date_after_four_days, formulas_ids_filtered_out, stable_formulas_ids_list , stable_last_batch_formulas_ids_list , parameters_metrics_variables)

Results excel file is created .
