Usage of squareform #2

wang701 · 2020-12-04T21:36:50Z

Hello, thanks for a nice and straightforward implementation of the ST-DBSCAN algorithm!

I ended up looking at your source and tried to understand it myself. What you essentially did is that you feed a distance matrix that sets the distances that do not meet the temporal eps to doubled the spatial eps, and then call sklearn's DBSCAN on the distance matrix.

For this block of code in st_dbscan.py:

time_dist = squareform(pdist(X[:, 0].reshape(n, 1), metric=self.metric))
euc_dist = squareform(pdist(X[:, 1:], metric=self.metric))

# filter the euc_dist matrix using the time_dist
dist = np.where(time_dist <= self.eps2, euc_dist, 2 * self.eps1)

db = DBSCAN(eps=self.eps1, min_samples=self.min_samples, metric='precomputed')
db.fit(dist)

You called squareform twice to form two square matrices for computed time and spatial distances. And then dist will be a third square matrix that has the same dimension as time_dist and euc_dist. This means you will have three matrices with relatively large size in terms of memory usage. This of course depends on the data size. For my data, they all have > 50000 rows (that results approx. 50000 by 50000 matrices, my data is float64, so each matrix is over 16GB of memory) so the algorithm breaks without processing the data in chunks.

What I would do is this:

time_dist = pdist(X[:, 0].reshape(n, 1), metric=self.metric)
euc_dist = pdist(X[:, 1:], metric=self.metric)

# filter the euc_dist matrix using the time_dist
dist = np.where(time_dist <= self.eps2, euc_dist, 2 * self.eps1)

db = DBSCAN(eps=self.eps1, min_samples=self.min_samples, metric='precomputed')
db.fit(squareform(dist))

In this case, only one square matrix is needed.

The text was updated successfully, but these errors were encountered:

eren-ck · 2020-12-07T13:19:08Z

Hello Wang,
thanks for the input. You're absolutely right. I have adapted the code and updated the library on PyPi.

Cheers,
Eren

eren-ck closed this as completed Dec 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usage of squareform #2

Usage of squareform #2

wang701 commented Dec 4, 2020

eren-ck commented Dec 7, 2020

Usage of squareform #2

Usage of squareform #2

Comments

wang701 commented Dec 4, 2020

eren-ck commented Dec 7, 2020