# MODEL TRAINING

## Libraries, Constants, Functions

Importing libraries, constants, and functions from separate .py files, to enhance efficiency and promote code reuse across multiple notebooks.

In [None]:
from libraries import *
from constants import *
from functions import *

## Preparing data for calculations

Loading AED data, interventions data, and potential locations for AEDs for each city and storing them in dictionaries. For interventions, train data is loaded, given that the goal is to find what is the optimal coverage of AEDs for the specific city.

In [None]:
aeds = {}
cards = {}
possible_locations = {}

for city in cities:
    os.chdir(data_path + clean_path)
    aeds[city] = pd.read_csv(city + "_aeds.csv")
    cards[city] = pd.read_csv(city + "_cards_train.csv")

    os.chdir(data_path + possible_locations_path)
    possible_locations[city] = pd.read_csv(city + "_possible_locations.csv")

The max coverage algorithm necessitates a predefined column, here named "mandatory," for clarity. In essence, the algorithm requires explicit clarification for each AED regarding whether its inclusion in the final AED placement is mandatory or not.

As privately owned AEDs cannot be relocated, their mandatory value is set to 1, indicating that these locations must be included in the final AED placement. On the other hand, public AEDs can be relocated, so their mandatory value is 0, indicating optional inclusion.

For potential locations, the mandatory value is set to 0, as these locations are simply possibilities for AED placement.

In [None]:
predefined_lists = {}

for city in cities:
    aeds_df = aeds[city]
    possible_locations_df = possible_locations[city]
    
    aeds_df['public'] = aeds_df['public'].fillna(0)
    aeds_df['public'] = ~aeds_df['public'].astype(bool)
    aeds_df = aeds_df.rename(columns={'public': 'mandatory'})
    
    possible_locations_df['mandatory'] = 0
    possible_locations_df['mandatory'] = possible_locations_df['mandatory'].astype(bool)

    predefined_list_df = pd.concat([aeds_df['mandatory'], possible_locations_df['mandatory']], ignore_index = True)
    predefined_list_np = predefined_list_df.to_numpy().flatten()
    predefined_lists[city] = predefined_list_np

Retaining only latitude and longitude locations, as or calculating cost matrices only those are necessary. Data is combined in order to have complete list of all possible locations for AED, including previous and potential ones.

In [None]:
combined_locations = {}

for city in cities:
    aeds[city] = aeds[city][['latitude', 'longitude']]
    possible_locations[city][['latitude', 'longitude']] = possible_locations[city]['geometry'].apply(
        lambda x: pd.Series(get_coordinates_from_geometry(x))
    )
    possible_locations[city] = possible_locations[city][['latitude', 'longitude']]
    
    cards[city] = cards[city][['latitude', 'longitude']]
    combined_locations[city] = pd.concat([aeds[city], possible_locations[city]], ignore_index=True)

## Calculating cost matrices

Due to the limitation of free credits with the Google API, the number of calculations had to be significantly reduced. Instead of calculating the exact distance between each cardiac arrest and each AED, only the distances to its 10 closest AEDs are computed. Initially, the Euclidean distance between each pair is calculated, and then the 10 closest ones are flagged with a value of 1, indicating that the exact calculation needs to be performed using Google's Distance Matrix API.

For each city, the user is prompted to decide whether to proceed with the calculations to avoid unwanted costs.

In [None]:
cost_matrices = {}

for city in cities:
    print("Current city: " + city)
    
    cost_matrices[city] = get_cost_matrix(cards, combined_locations, city, CLOSEST_AEDS)

## Max Coverage algorithm

Employing the algorithm for each city.

Implementing it requires several arguments, including:
- Cost matrix, mandatory column, and the number of AEDs in that city, all of which were calculated in the previous part of this notebook;
- Weights, which are set to 1 for simplicity;
- Coverage radius, which is set to 150, meaning the algorithm aims to optimize coverage so that the majority of cards have an AED within 150m.

These values are exported as true/false for the combined dataset (current AEDs + all possible locations). The number of "true" values is equal to the number of current AEDs since the existing ones are only being rearranged, while not adding new AEDs.

In [None]:
os.chdir(data_path + optimal_indicators_path)

for city in cities:
    print("Optimizing " + city + "...\n")
    
    mclp = MCLP.from_cost_matrix(cost_matrix = cost_matrices[city].to_numpy(),
                                 predefined_facilities_arr = predefined_lists[city],
                                 weights = np.ones(cost_matrices[city].shape[0]),
                                 p_facilities = len(aeds[city]),
                                 service_radius = COVERAGE_RADIUS)

    mclp = mclp.solve(pulp.PULP_CBC_CMD(msg=False))

    facility_status = []
    for i, variable in enumerate(mclp.fac_vars):
        if variable.varValue == 1:
            status = True
        else:
            status = False
        facility_status.append([status])

    optimal_indicators = pd.DataFrame(facility_status, columns=['SelectionStatus'])
    optimal_indicators.to_csv(f'{city}_optimal_indicators.csv', index = False)