-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse large instances with 15K or 30K nodes #117
Comments
I performed a a quick benchmark with the current distance computation + four alternatives: Scriptimport numpy as np
import timeit
from tabulate import tabulate
import numba
def current(coords: np.ndarray) -> np.ndarray:
diff = coords[:, np.newaxis, :] - coords
square_diff = diff**2
square_dist = np.sum(square_diff, axis=-1)
return np.sqrt(square_dist)
def faster_current(pts: np.ndarray) -> np.ndarray:
sq_sum = np.sum(pts**2, axis=1)
sq_diff = np.add.outer(sq_sum, sq_sum) - 2 * np.dot(pts, pts.T)
np.fill_diagonal(sq_diff, 0) # avoids minor numerical issues
return np.sqrt(sq_diff)
def numpy_linalg(X: np.ndarray) -> np.ndarray:
return np.linalg.norm(X[:, np.newaxis] - X, axis=2)
@numba.jit(nopython=True, fastmath=True)
def variant_numba(a: np.ndarray) -> np.ndarray:
n = a.shape[0]
out = np.zeros((n, n))
for i in range(n):
for j in range(i):
out[i][j] = euclidean(a[i], a[j])
return out + out.T
@numba.jit(nopython=True, fastmath=True)
def euclidean(u, v):
n = len(u)
dist = 0
for i in range(n):
dist += abs(u[i] - v[i]) ** 2
return dist ** (1 / 2)
def variant_scipy(pts: np.ndarray) -> np.ndarray:
from scipy.spatial import distance_matrix
return distance_matrix(pts, pts)
if __name__ == "__main__":
np.random.seed(1)
headers = ["# of coords", "Function", "Time (s)"]
NUMBER = 3
for num_coords in [5000, 10000, 15000]:
coords = np.random.randn(num_coords, 2)
funcs = [
current,
faster_current,
numpy_linalg,
variant_numba,
variant_scipy,
]
results = []
for func in funcs:
elapsed = timeit.timeit(
f"{func.__name__}(coords)", globals=globals(), number=NUMBER
)
avg = round(elapsed / NUMBER, 2)
results.append((num_coords, func.__name__, avg))
print(tabulate(results, headers=headers))
# current and numpy_linalg are too slow
for num_coords in [20000, 25000, 30000]:
coords = np.random.randn(num_coords, 2)
funcs = [
faster_current,
variant_numba,
variant_scipy,
]
results = []
for func in funcs:
elapsed = timeit.timeit(
f"{func.__name__}(coords)", globals=globals(), number=NUMBER
)
avg = round(elapsed / NUMBER, 2)
results.append((num_coords, func.__name__, avg))
print(tabulate(results, headers=headers)) ResultsResults:
For instances up to 25K, the I'll implement the Another option is to write some C++ code. See this for setting up Poetry with a custom build. |
The scipy variant seems to scale very well and is faster than what we have now. Looking at the implementation here, the relevant bits are just a few lines of code. We could implement our calculation similarly? |
On my machine (Windows with an x86/i7 CPU from last year), the results look something like this:
Here, |
Yeah agreed! |
The large instances of Arnold, Gendreau and Sörensen (2017) in CVRPLIB currently hang on the edge weight calculations. These instances are very large, ranging between 6-30K nodes. Let's see if we can support such large instances as well.
The text was updated successfully, but these errors were encountered: