# Project Group - 33

Members: Amber van der Helm, Femke Tamsma, Merel Loman, Naomi Rottier, and Robin Karthaus

Student numbers: 5164303, 5122422, 4852982, 5496462, 5634563

# Research Objective

*Requires data modeling and quantitative research in Transport, Infrastructure & Logistics*

The bin packing is an Operational Research (OR) technique. It is similar to the knapsack problem, however the knapsack problem is a maximization problem where the purpose is to fill a fixed-size knapsack with the most valuable items. Whereas in the bin packing problem all items should be assigned and the purpose is to minimize the number of used bins. 

Throughout the supply chain various applications can be found where a bin packing problem is encountered. From assigning cargo to airplanes or containers to assigning packages to trucks. 

The objective for this research is a model that is able to optimize the allocation of packages to vehicles. This is first done with one type of vehicle with the same characteristics. From this model the model will be extended so it can optimize the same problem with different types of vehicles that can be used and eventually the emissions of the vehicles will be considered as well. Even more applications can be added to extend the model to become more realistic. An example is not allocating all packages from one day but to differentiate in parts of the day and that packages from certain time slots are used, instead of all packages available.  

The data set from the "Amazon Last Mile Routing Challenge Dataset" will be used. This data contains information about the dimension of the packages being delivered. The data will first be imported to Python and converted from a JSON file to a CSV file. Missing values will either be deleted or replaced by the average values, depending on the outcomes of the descriptive analysis. To simplify the model, the three dimensions (length, width, height), will be converted to a new variable (column) "volume" in cm3. Since the dimension of the vans is also in cm3, optimizing the bin packing algorithm will be possible. After these first data processing steps, the data will be further cleaned by checking for outliers and deleting these if necessary. Also a general view of the data will be formed by descriptive statistics and corresponding visualizations (box plots, normal distributions, histograms, scatter plots, etc.). 

When all the data is cleaned and a general view of the data is formed the algorithm will be written. This algorithm will be a bin packing algorithm which optimizes the allocation/loading of the vans. This algorithm will be written in different steps. First many assumptions will be made to keep the algorithm as simple as possible (for example, only one van at a time can be loaded). When the model runs correctly, the algorithm will be made more complex by adding more constraints and making less assumptions. Each "complexity step" in the model will be monitored separately and give insight into the capacity of the model of optimizing the loading more optimally. A visualization regarding the decrease in total needed vans per increase in model complexity will be created. 

The research question used in this assignment is: "Is it possible to optimize the allocation of packages, given by the Amazon dataset, to vans using a binpacking algorithm, where model complexity steps are taken into account and visualization of these complexity steps are done with advanced visualisation?"

# Contribution Statement

*Be specific. Some of the tasks can be coding (expect everyone to do this), background research, conceptualisation, visualisation, data analysis, data modelling*

**Author 1** (Amber): converting data to CSV

**Author 2** (Femke): visualization possibilities 

**Author 3** (Merel): converting data to CSV

**Author 4** (Naomi): checking other projects and RQs

**Author 5** (Robin): visualization possibilities

# Data Used

We have used 2 datafiles from Amazon to get the dimensions of the packages and delivery vans. The documentation can be found here: https://github.com/MIT-CAVE/rc-cli/blob/main/templates/data_structures.md

In [None]:
import pandas as pd
import json

json_file_path_package = 'package_data.json'

with open(json_file_path_package, 'r') as file:
    data = json.load(file)

json_file_path_route = 'route_data.json'

with open(json_file_path_route, 'r') as file:
    data = json.load(file)

# Data Pipeline

Series of steps to clean the raw data, after which we process and store it in a way that we can use it for our research question. First, the json data of the packages will be pipelined.

In [None]:
import pandas as pd
import json

json_file_path = 'package_data.json'

with open(json_file_path, 'r') as file:
    data = json.load(file)

# Initialize empty lists for each column
package_ids = []
scan_statuses = []
start_times = []
end_times = []
service_times = []
depths = []
heights = []
widths = []
stop_ids = []
route_ids = []

# Iterate through the JSON data to extract information
for route_id, route_data in data.items():
    for stop_id, stop_data in route_data.items():
        for package_id, package_data in stop_data.items():
            package_ids.append(package_id)
            scan_statuses.append(package_data.get("scan_status", ""))
            time_window = package_data.get("time_window", {})
            start_times.append(time_window.get("start_time_utc", ""))
            end_times.append(time_window.get("end_time_utc", ""))
            service_times.append(package_data.get("planned_service_time_seconds", ""))
            dimensions = package_data.get("dimensions", {})
            depths.append(dimensions.get("depth_cm", ""))
            heights.append(dimensions.get("height_cm", ""))
            widths.append(dimensions.get("width_cm", ""))
            stop_ids.append(stop_id)
            route_ids.append(route_id)

# Create a pandas DataFrame from the extracted data
df = pd.DataFrame({
    "PackageID": package_ids,
    "ScanStatus": scan_statuses,
    "StartTimeUTC": start_times,
    "EndTimeUTC": end_times,
    "PlannedServiceTimeSeconds": service_times,
    "DepthCM": depths,
    "HeightCM": heights,
    "WidthCM": widths,
    "StopID": stop_ids,
    "RouteID": route_ids
})

# Display the resulting DataFrame
df.head(5)


Now that the raw data of the packages is filtered, the second json file will be cleaned in order to get useful route data.  

In [None]:
import pandas as pd
import json

json_file_path_route = 'route_data.json'

with open(json_file_path_route, 'r') as file:
    data = json.load(file)
    
# Initialize empty lists for each column
route_ids = []
station_codes = []
dates = []
departure_times = []
executor_capacities = []
route_scores = []
stop_ids = []
lats = []
lngs = []
types = []
zone_ids = []

# Iterate through the JSON data to extract information
for route_id, route_data in data.items():
    station_code = route_data.get("station_code", "")
    date = route_data.get("date_YYYY_MM_DD", "")
    departure_time_utc = route_data.get("departure_time_utc", "")
    executor_capacity_cm3 = route_data.get("executor_capacity_cm3", "")
    route_score = route_data.get("route_score", "")
    
    stops = route_data.get("stops", {})
    for stop_id, stop_data in stops.items():
        lat = stop_data.get("lat", "")
        lng = stop_data.get("lng", "")
        stop_type = stop_data.get("type", "")
        zone_id = stop_data.get("zone_id", "")
        
        route_ids.append(route_id)
        station_codes.append(station_code)
        dates.append(date)
        departure_times.append(departure_time_utc)
        executor_capacities.append(executor_capacity_cm3)
        route_scores.append(route_score)
        stop_ids.append(stop_id)
        lats.append(lat)
        lngs.append(lng)
        types.append(stop_type)
        zone_ids.append(zone_id)

# Create a pandas DataFrame from the extracted data
df = pd.DataFrame({
    "RouteID": route_ids,
    "station_code": station_codes,
    "date_YYYY_MM_DD": dates,
    "departure_time_utc": departure_times,
    "executor_capacity_cm3": executor_capacities,
    "route_score": route_scores,
    "stop_id": stop_ids,
    "lat": lats,
    "lng": lngs,
    "type": types,
    "zone_id": zone_ids
})

# Display the resulting DataFrame
df.head(5)
