You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello, thanks for a nice and straightforward implementation of the ST-DBSCAN algorithm!
I ended up looking at your source and tried to understand it myself. What you essentially did is that you feed a distance matrix that sets the distances that do not meet the temporal eps to doubled the spatial eps, and then call sklearn's DBSCAN on the distance matrix.
time_dist=squareform(pdist(X[:, 0].reshape(n, 1), metric=self.metric))
euc_dist=squareform(pdist(X[:, 1:], metric=self.metric))
# filter the euc_dist matrix using the time_distdist=np.where(time_dist<=self.eps2, euc_dist, 2*self.eps1)
db=DBSCAN(eps=self.eps1, min_samples=self.min_samples, metric='precomputed')
db.fit(dist)
You called squareform twice to form two square matrices for computed time and spatial distances. And then dist will be a third square matrix that has the same dimension as time_dist and euc_dist. This means you will have three matrices with relatively large size in terms of memory usage. This of course depends on the data size. For my data, they all have > 50000 rows (that results approx. 50000 by 50000 matrices, my data is float64, so each matrix is over 16GB of memory) so the algorithm breaks without processing the data in chunks.
What I would do is this:
time_dist=pdist(X[:, 0].reshape(n, 1), metric=self.metric)
euc_dist=pdist(X[:, 1:], metric=self.metric)
# filter the euc_dist matrix using the time_distdist=np.where(time_dist<=self.eps2, euc_dist, 2*self.eps1)
db=DBSCAN(eps=self.eps1, min_samples=self.min_samples, metric='precomputed')
db.fit(squareform(dist))
In this case, only one square matrix is needed.
The text was updated successfully, but these errors were encountered:
Hello, thanks for a nice and straightforward implementation of the ST-DBSCAN algorithm!
I ended up looking at your source and tried to understand it myself. What you essentially did is that you feed a distance matrix that sets the distances that do not meet the temporal eps to doubled the spatial eps, and then call sklearn's DBSCAN on the distance matrix.
For this block of code in st_dbscan.py:
You called
squareform
twice to form two square matrices for computed time and spatial distances. And thendist
will be a third square matrix that has the same dimension astime_dist
andeuc_dist
. This means you will have three matrices with relatively large size in terms of memory usage. This of course depends on the data size. For my data, they all have > 50000 rows (that results approx. 50000 by 50000 matrices, my data is float64, so each matrix is over 16GB of memory) so the algorithm breaks without processing the data in chunks.What I would do is this:
In this case, only one square matrix is needed.
The text was updated successfully, but these errors were encountered: