Our results conducted on small to moderate sized problem instances (n = 10, 20, 30, 50, 70, 80, 100) are as expected and consistent with, the performance characteristics for both **DP** and **GA** approaches documented in the literature. As anticipated, the **DP** algorithm exhibited very fast execution times in small inctances 10, 20, 30 with a runtime of under 0.1 seconds for up to 50 items. **DP** efficiency can be attributed to its pseudo-polynomial time complexity, which remains manageable when the number of items is relatively small. In comparison, the **GA** demonstrated runtime close to 1 second or up to 50 items times or 10 times longer than **DP**. This longer runtime of **GA** can be attributed to the inherent overhead associated with the **GA** processes, including the initialization of the population and the iterative evolutionary steps involving selection, crossover, and mutation. While **DP** directly computes the optimal solution, **GA** explores a population of potential solutions over multiple generations to show near-optimal result.
The execution time of DP grows exponentialy and in problem instances close to 100 we observed a running time close to 0.3 seconds. The runtime of GA almost not changed as the number of items increased took on the order of a few tenths of a second for 50 items, scaling roughly linearly (e.g., ~0.5-1 second by 150 items, depending on parameters). In contrast the runtime of DP chnage signifcantlly and grows 5 times. 

The performance comparison on moderate and large problem instances (n = 200, 300, 400, 500, 600) demonstrated a significant shift in the relative advantages of the two algorithms. Between problem instances with 150 and 180 items we observed a shift in the runtime performance of the two algorithms. The DP’s runtime started to exceed GA. With further growth of the number of items, DP’s curve would continue to rise super-linearly, while GA’s grows more gently proportional to the number of fitness evaluations. In large instances with $n$ up to 600 the runtime the runtime of the DP algorithm exhibited a rapid growth and the difference become up to 6 times. This means for very large problems, GA would have a clear advantage in runtime. n scenarios with extremely large capacity or item count (where DP might take hours or not fit in memory), a GA can still run and produce an answer within a feasible time. 
**Solution**
In our trials, GA usually achieved near-optimal results:
- For very small instances up to 20 items, **GA** found the optimal solution (100% of optimum) or something very close (within 1-2% of optimum). For instances up to 50 items the **GA** fell aound 5 % of the optimum and with the incresed sizes **GA** fell even more - up to 10 % in 100 items.
- For moderate and larger instances (100+ items), our GA (with fixed parameters) fell a short of the optimum. For example, GA might achieve ~95–98% of the optimal value on average. In one 150-item test, the best GA run got about 97% of the optimal value, missing a few low-value items that DP managed to pack. As the problem size grows, GA’s solution quality can dip slightly below 100%, but stays high (in this illustration, above ~95% for up to 150 items). 

The GA was able to find optimal or near-optimal solutions for very small small instances but large items the GA solutions in our experiments felt around 20 % of the optimum. By increasing the number of generations, we could improve the odds of reaching the optimum at the cost of more runtime. Conversely, fewer generations or a smaller population might degrade solution quality. This highlights the tunable nature of GA: you can trade more time for a better solution.

In contrast, DP always finds the best solution but cannot trade off quality for speed – it either solves optimally or is infeasible to run. Additionally, GA’s heuristic nature means it doesn't guarantee feasibility at all times; however, our design giving 0 fitness to overweight solutions effectively kept the population feasible after the initial generation or two. In some runs we observed that early generations have many overweight (invalid) individuals, but they quickly die out as they have zero fitness. 

This problem is NP-hard, with its decision version is NP-complete.


Population Size vs. Generations (Time budget trade‐off):
A huge population (≥ 1 000) uses up runtime quickly, but if you only run ~500–1 000 gens, you end up seeing no net quality gain compared to a smaller population run for more generations.
A medium population (≈ 500) paired with ~1 000–2 000 generations seems to be the brightest sweet spot for the best average quality under ~1–2 seconds runtime.

Mutation Rate (Exploration vs. Exploitation):
μ≈ 0.001–0.01 is low enough to let the GA preserve good partial solutions (i.e. it exploits).
μ≥ 0.05 quickly randomizes everything each generation, which prevents fine‐tuning and drags your quality down.
Because runtime differences are small (0.13 s vs. 0.10 s), you almost always pick the low‐mutation regime (∼0.001–0.01) unless you have a strong reason to keep re‐diversifying every generation.
Typical “Golden Set” (for your data):

Population ≈ 500,

Mutation ≈ 0.001 – 0.01,

Generations ≈ 1 000 – 2 000 yields a runtime around ~0.5–1.0 seconds with average solution quality ∼0.82–0.90. Pushing gens all the way to 3 000 can give you ∼0.865 if you can afford ~1.5 s of runtime.

Mutation Rate:

Very low rates (0.001–0.01) preserve good solutions → best quality (≈ 0.82–0.83 with only 500 gens).

Higher rates (0.05–0.10) inject too much noise → quality falls all the way to ~0.74–0.77, even if you give them more runtime.

Generations:

With very few gens (100–200), quality is ~0.80–0.82.

By ~500 gens, you hit ~0.83.

1 000–2 000 gens lets the GA refine further → ~0.82–0.87 (with noise around 0.80–0.83 if you don’t have enough mutation/exploration).

Beyond 2 000–3 000, you still climb, but at a slowing rate—each extra 500–1 000 gens adds only a few percent to your average.



You’re distributing your “1000 generations” over 3000 different individuals per generation (instead of 500), so an individual with a decent tour has fewer chances to propagate its genes over many iterations. As a result, beyond pop ≈ 500, you see diminishing returns in solution quality—indeed a slight dip at pop = 1000 in your plot—before it more or less “plateaus” somewhere around 0.88.

*Although bigger pops give more raw material, they also increase the search space size. If you keep the same number of generations, your algorithm effectively “spreads” its effort over more individuals and may not converge as tightly by the same generation count.*

*It can also be just sample‐noise: at very large population sizes, you sometimes need more generations or carefully tuned crossover/mutation schedules to actually polish the top‐end solutions. In your data, 1 000+ didn’t yield better average results unless you also adjust other parameters.*

In [7]:
def generate_knapsack_instance(n, weight_range=(1,100), value_range=(1,100), capacity_factor=0.5):
    weights = np.random.randint(weight_range[0], weight_range[1]+1, size=n)
    values  = np.random.randint(value_range[0], value_range[1]+1, size=n)
    capacity = int(weights.sum() * capacity_factor)
    return weights, values, capacity

def run_dp(weights, values, capacity):
    t0 = time.perf_counter()
    value, selected = solve_knapsack_dp(weights, values, capacity)
    return value, selected, time.perf_counter() - t0

def run_ga(weights, values, capacity, params):
    t0 = time.perf_counter()
    value, selected = solve_knapsack_ga(
        weights, values, capacity,
        population_size=params['population_size'],
        num_generations=params['num_generations'],
        mutation_rate=params['mutation_rate'],
        tournament_size=params['tournament_size'],
        elitism=params['elitism']
    )
    return value, selected, time.perf_counter() - t0

In [None]:
def solve_knapsack_ga_hy(weights, values, capacity,
                      population_size=200,
                      num_generations=500,
                      base_mutation_rate=0.05,
                      tournament_size=5,
                      elitism=True,
                      penalty_coef=1.0):
    n = len(weights)

    def fitness(ind):
        w = (weights * ind).sum()
        v = (values * ind).sum()
        if w <= capacity:
            return v
        else:
            return v - penalty_coef * (w - capacity)

    def greedy_init():
        ind = np.zeros(n, dtype=int)
        remaining = capacity
        for i in np.argsort(-values / weights):
            if weights[i] <= remaining:
                ind[i] = 1
                remaining -= weights[i]
        return ind

    def create_random():
        return np.random.randint(0, 2, size=n)

    def tournament(pop, fits):
        aspirants = random.sample(range(len(pop)), tournament_size)
        best = max(aspirants, key=lambda i: fits[i])
        return pop[best].copy()

    def crossover(p1, p2):
        mask = np.random.rand(n) < 0.5
        return np.where(mask, p1, p2), np.where(mask, p2, p1)

    def repair(ind):
        overweight = (weights * ind).sum() - capacity
        if overweight <= 0:
            return ind
        ratios = values / weights
        for i in np.argsort(ratios):
            if ind[i] == 1:
                ind[i] = 0
                overweight -= weights[i]
                if overweight <= 0:
                    break
        return ind

    def mutate(ind, curr_rate):
        for i in range(n):
            if random.random() < curr_rate:
                ind[i] ^= 1
        return ind

    # initialize population: one greedy, rest random
    pop = [greedy_init()] + [create_random() for _ in range(population_size - 1)]
    best_val = -1
    best_ind = None
    history = []

    for gen in range(num_generations):
        fits = [fitness(ind) for ind in pop]
        history.append((max(fits), np.mean(fits)))

        # update global best
        current_best = max(fits)
        if current_best > best_val:
            best_val = current_best
            best_ind = pop[int(np.argmax(fits))].copy()

        # prepare next generation
        new_pop = []
        if elitism:
            new_pop.append(best_ind.copy())

        # linearly decay mutation rate
        curr_mut_rate = base_mutation_rate * (1 - gen / num_generations)

        while len(new_pop) < population_size:
            p1 = tournament(pop, fits)
            p2 = tournament(pop, fits)
            c1, c2 = crossover(p1, p2)
            c1 = repair(mutate(c1, curr_mut_rate))
            c2 = repair(mutate(c2, curr_mut_rate))
            new_pop.extend([c1, c2])

        pop = new_pop[:population_size]

    return best_val, best_ind, history