### Assignment description

The algorithm should be suitable to solve the problem, and should be clearly described. It should be able
to accept the predicted **Remaining Useful Lifetime (RULs)** for all engines as input, and output a valid maintenance solution for it. 

Such a solution should provide: 
(i) a list of all machines, indicating for any maintained machine the type of the team doing the maintenance, start-date, and end-date of the maintenance, as well as the penalty costs incurred by that machine;

(ii) the total penalty costs. It is not important here what the concrete cost of such a solution is, it just should be feasible. Make sure your solution format is easy to read and understand

**Input**: RUL predictions  (100 entries/rows)

**Output**: a list of all machines, indicating for any maintained machine the type of the team doing the maintenance, start-date, and end-date of the maintenance, as well as the penalty costs incurred by that machine; (ii) the total penalty costs. (100 entries/rows)

## Read the predicted RUL (that is our input) 
I do not understand why but they have one column which is separated by semicolon ...
We need to transform it into a better, more readable solution 

In [27]:
import os
import pandas as pd

# Get the current working directory
current_directory = os.getcwd()

# Construct the relative path to prediction RUL file
rul_filename = "RUL_consultancy_predictions_A3-2.csv"
rul_path = os.path.join(current_directory, rul_filename)

# Read the CSV file
rul_df = pd.read_csv(rul_path)
print(rul_df.head(5))

# Split the 'RUL;id' column into two separate columns
rul_df[['RUL', 'id']] = rul_df['RUL;id'].str.split(';', expand=True)

# Convert values to integers
rul_df['RUL'] = rul_df['RUL'].astype(int)
rul_df['id'] = rul_df['id'].astype(int)

# Drop the original 'RUL;id' column
rul_df.drop(columns=['RUL;id'], inplace=True)

# Print the modified DataFrame
print(rul_df)


  RUL;id
0  135;1
1  125;2
2   63;3
3  100;4
4  103;5
    RUL   id
0   135    1
1   125    2
2    63    3
3   100    4
4   103    5
..  ...  ...
95  140   96
96  109   97
97   87   98
98  127   99
99   24  100

[100 rows x 2 columns]


## Lecture 6 from Hendrik Baier;  evolutionary algorithms 

## Input 
So the id from the RUL is the id of the engine, right? If we take the 

RUL;id
135;1
125;2
...

Engine number 1 has 135 days left until it needs to be maintained and engine 2 has 125 days ect.

### First step in Genetic Algorithm: Population

- Individual is one possible solution 
- the population is the set of possible solutions



### Fitness evaluation
In our case fitness evaluation is the penalty cost. The smaller the penalty cost, the better



### Check termination criteria

As presented on the slide it is very often unknown, often some threshold is introduced.  From Q&A session I found out that there is not really a termination criteria, but we need to run an algorithm for some time (as described in the assignment description) 


### Selection of parents

From Q&A I know that this is 100% dependent on us. In DEAP package you can specify parameters to deal with parents

### Crossover
It depends on us, please check the DEAP package docs
###  Mutation
It depends on us, please check the DEAP package docs
### New offspring 
It depends on us, please check the DEAP package docs

## Output
a list of all machines, indicating for any maintained machine the type of the team doing the maintenance, start-date, and end-date of the maintenance, as well as the penalty costs incurred by that machine; (ii) the total penalty costs. (100 entries/rows)

In [28]:
# Define constants
NUM_ENGINES = len(rul_df)
NUM_TEAMS_A = 2
NUM_TEAMS_B = 2
PLANNING_HORIZON = 30
MAX_DAILY_COST = 250

# Define maintenance times for teams A and B
maintenance_times_a = [4 if i < 20 else 3 if 20 <= i < 55 else 2 if 55 <= i < 80 else 8 for i in range(1, NUM_ENGINES + 1)]
maintenance_times_b = [time_a + 1 if i < 25 else time_a + 2 if 25 <= i < 70 else time_a + 1 for i, time_a in enumerate(maintenance_times_a, start=1)]

# Define engine costs
engine_costs = [4 if i < 21 else 3 if 21 <= i < 31 else 2 if 31 <= i < 46 else 5 if 46 <= i < 81 else 6 for i in range(1, NUM_ENGINES + 1)]

# Dictionary to store the maintenance schedule
maintenance_schedule = {}

# Function to assign maintenance to an engine by a team on a specific day
def assign_maintenance(engine_index, day, team):
    if engine_index not in maintenance_schedule:
        maintenance_schedule[engine_index] = {}
    if day not in maintenance_schedule[engine_index]:
        maintenance_schedule[engine_index][day] = team
    else:
        # Handle case where another team is already assigned for the day
        print("Error: Another team is already assigned for this day.")

# Function to get the maintenance team assigned to an engine on a specific day
def get_maintenance_team(engine_index, day):
    if engine_index in maintenance_schedule and day in maintenance_schedule[engine_index]:
        return maintenance_schedule[engine_index][day]
    else:
        return None
    
    
# Function to calculate penalty cost for unmaintained engine/machine
def calculate_penalty_cost(engine_index, safety_due_date):
    print(f"Safety due date {safety_due_date} and planning horizon {PLANNING_HORIZON}")
    if safety_due_date >= PLANNING_HORIZON:
        return 0  # No penalty if maintenance is performed within the planning horizon
    else:
        days_past_due = PLANNING_HORIZON - safety_due_date
        daily_cost = engine_costs[engine_index - 1] * (days_past_due ** 2)
        return min(daily_cost, MAX_DAILY_COST)  # Cap the daily cost at MAX_DAILY_COST


# Example usage
assign_maintenance(1, 1, "T1")
safety_due_date = 29  # Example RUL for engine 1 and Team 1
print(calculate_penalty_cost(1, safety_due_date))

Safety due date 29 and planning horizon 30
4


## First create a random solution as a starting point?

In [47]:

## I am thinking how to define the individual, individual=[id, penalty_cost, team, estimated_time]
def create_initial_randomized_solution():
    solution = []
    # Shuffle the engine IDs to randomize the order
    engine_ids = list(rul_df['id'])
    random.shuffle(engine_ids)
    # Initialize the counters for each team
    team_counters = {'A': 0, 'B': 0}
    next_available_day = None  # Initialize next_available_day outside the loop
    for engine_id in engine_ids:
        safety_due_date = rul_df.loc[rul_df['id'] == engine_id, 'RUL'].values[0]
        # Determine the team allocation
        team_allocated = 'A' if team_counters['A'] <= team_counters['B'] else 'B'
        team_counters[team_allocated] += 1
        # Check if the team limit is reached
        if team_counters[team_allocated] > NUM_TEAMS_A + NUM_TEAMS_B:
            # Find the next available day when one of the teams becomes available
            next_available_day = find_next_available_day(team_counters, NUM_TEAMS_A + NUM_TEAMS_B)
            # Calculate the team allocation for the next available day
            team_allocated = 'A' if team_counters['A'] <= team_counters['B'] else 'B'
            team_counters[team_allocated] += 1
        # Generate random start date within planning horizon
        start_of_the_repair = next_available_day if next_available_day else random.randint(1, PLANNING_HORIZON)
        # Calculate maintenance time based on the allocated team
        maintenance_time = maintenance_times_a[engine_id - 1] if team_allocated == 'A' else maintenance_times_b[engine_id - 1]
        # Calculate end date of repair
        end_of_the_repair = start_of_the_repair + maintenance_time - 1
        # Calculate penalty cost value
        penalty_cost_value = min(engine_costs[engine_id - 1] * ((PLANNING_HORIZON - safety_due_date) ** 2), MAX_DAILY_COST)
        solution.append((engine_id, safety_due_date, penalty_cost_value, team_allocated, start_of_the_repair, end_of_the_repair))
        # Update team counters if the job is finished
        if start_of_the_repair + maintenance_time > PLANNING_HORIZON:
            team_counters[team_allocated] -= 1
    return pd.DataFrame(solution, columns=['engine_id', 'safety_due_date', 'penalty_cost_value', 'team_allocated', 'start_of_the_repair', 'end_of_the_repair'])

def find_next_available_day(team_counters, max_teams):
    available_day = None
    for day in range(1, PLANNING_HORIZON + 1):
        if sum(team_counters.values()) < max_teams:
            available_day = day
            break
        else:
            team_counters = {k: v - 1 for k, v in team_counters.items() if v > 0}
    return available_day

def calculate_penalty_cost_individual(df, column_name):
    
    # Check if the column exists in the DataFrame
    if column_name not in df.columns:
        print(f"Error: '{column_name}' column not found in the DataFrame.")
        return None
    
    # Calculate the sum of values in the specified column
    sum_value = df[column_name].sum()
    return sum_value


solution = create_initial_randomized_solution()
print(solution)
total = calculate_penalty_cost_individual(solution ,'penalty_cost_value')
print(total)

    engine_id  safety_due_date  penalty_cost_value team_allocated  \
0         100               24                 216              A   
1          36               24                  72              B   
2          54              126                 250              A   
3          48              151                 250              B   
4          22              141                 250              A   
..        ...              ...                 ...            ...   
95         65              152                 250              B   
96         73              113                 250              B   
97         16              101                 250              B   
98         17               52                 250              B   
99         91               29                   6              B   

    start_of_the_repair  end_of_the_repair  
0                     9                 16  
1                    24                 28  
2                    15             

# Starting with DEAP package
### If it is not installed, plase run the command below to run the DEAP python package

In [8]:
# pip install deap

Collecting deap
  Downloading deap-1.4.1.tar.gz (1.1 MB)
     ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
     ---------------------------------------- 0.0/1.1 MB ? eta -:--:--
     - -------------------------------------- 0.0/1.1 MB 281.8 kB/s eta 0:00:04
     -- ------------------------------------- 0.1/1.1 MB 363.1 kB/s eta 0:00:03
     ----------- ---------------------------- 0.3/1.1 MB 1.4 MB/s eta 0:00:01
     ------------------------ --------------- 0.7/1.1 MB 2.6 MB/s eta 0:00:01
     ------------------------ --------------- 0.7/1.1 MB 2.6 MB/s eta 0:00:01
     ------------------------------- -------- 0.8/1.1 MB 2.4 MB/s eta 0:00:01
     ---------------------------------------  1.1/1.1 MB 2.7 MB/s eta 0:00:01
     ---------------------------------------- 1.1/1.1 MB 2.6 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'


In [45]:
import random
import pandas as pd
from deap import base, creator, tools, algorithms

# Define constants
NUM_ENGINES = len(rul_df)
NUM_TEAMS_A = 2
NUM_TEAMS_B = 2
PLANNING_HORIZON = 30
MAX_DAILY_COST = 250

def calculate_penalty_cost_individual(df, column_name):
    # Check if the column exists in the DataFrame
    if column_name not in df.columns:
        print(f"Error: '{column_name}' column not found in the DataFrame.")
        return None
    
    # Calculate the sum of values in the specified column
    sum_value = df[column_name].sum()
    return sum_value

# Define the DEAP problem
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", pd.DataFrame, fitness=creator.FitnessMin)

# Define the evaluation function
def evaluate(individual):
    total_penalty_cost = calculate_penalty_cost_individual(individual, 'penalty_cost_value')
    return total_penalty_cost,

# Register the genetic operators
toolbox = base.Toolbox()
toolbox.register("attr_engine_id", random.randint, 1, NUM_ENGINES)
toolbox.register("attr_team", random.randint, 1, NUM_TEAMS_A + NUM_TEAMS_B)

# Define individual creation function
def create_individual():
    return pd.DataFrame(columns=['engine_id', 'safety_due_date', 'penalty_cost_value', 'team_allocated', 'start_of_the_repair', 'end_of_the_repair'])

toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_team, n=PLANNING_HORIZON)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

toolbox.register("evaluate", evaluate)
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutUniformInt, low=1, up=NUM_TEAMS_A + NUM_TEAMS_B, indpb=0.05)
toolbox.register("select", tools.selTournament, tournsize=3)

# Define genetic algorithm parameters
population_size = 100
num_generations = 50
crossover_probability = 0.8
mutation_probability = 0.2

# Create an initial population
population = toolbox.population(n=population_size)

# Perform the genetic algorithm
for generation in range(num_generations):
    offspring = algorithms.varAnd(population, toolbox, cxpb=crossover_probability, mutpb=mutation_probability)
    fitnesses = toolbox.map(toolbox.evaluate, offspring)
    for ind, fit in zip(offspring, fitnesses):
        ind.fitness.values = fit
    population = offspring

# # Perform the genetic algorithm
# for generation in range(num_generations):
#     offspring = algorithms.varAnd(population, toolbox, cxpb=crossover_probability, mutpb=mutation_probability)
#     fitnesses = toolbox.map(toolbox.evaluate, offspring)
#     for ind, fit in zip(offspring, fitnesses):
#         ind.fitness.values = fit
#     population = toolbox.select(offspring + population, k=population_size)

# Retrieve the best individual and its fitness value
best_individual = tools.selBest(population, k=1)[0]
best_fitness = best_individual.fitness.values[0]


AttributeError: 'DataFrame' object has no attribute 'fitness'