Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP

Loading…

Allow datasources to be queried without supplying an extent or coordinate point #816

Open
artemp opened this Issue · 2 comments

1 participant

@artemp
Owner

Currently datasource plugins offer features and features_at_point as query methods for features.

We should add the ability to get all features without having to set (and know ahead of time) the maximum extent of the geometries - so basically without having to form up a mapnik::query with a bbox.

This would allow this hack to be removed: http://trac.mapnik.org/browser/trunk/bindings/python/mapnik/__init__.py#L239

This may prompt a rename of features to features_by_bbox. As a side note, features_at_point needs a tolerance, but that is covered by #503.

@artemp
Owner

[kkaefer]

Statistics about a column that would be useful:

  • Maximum, minimum, arithmetic/geometric mean
  • Variance/Standard deviation
  • Percentiles/quantiles/median
  • Jenks Natural Breaks
  • anything else?

What is the mean/stddev of a non-numeric column?

Indexing .dbf files

.dbf files contain the metadata associated with a shapefile. It's a very primitive format (some old dBase version) and doesn't contain indexes. ESRI created an index file format [http://support.esri.com/en/knowledgebase/techarticles/detail/17738 .atx], but it doesn't seem to be documented and changes between versions. However, it seems pretty cheap to just loop through all entries in a column and gather statistics along the way. The results would be cached, so we could do this when Mapnik loads the map.

Obtaining statistics

The datasource object would get these methods:

// These return features that actually exist.
virtual feature_ptr max(const std::string column, const query& q = NULL) const=0;
virtual feature_ptr min(const std::string column, const query& q = NULL) const=0;
virtual featureset_ptr percentile(const std::string column, int buckets, const query& q = NULL) const=0;
virtual featureset_ptr jenks_natural_breaks(const std::string column, int breaks, const query& q = NULL) const=0;

// These create pseudo-features that don't necessarily exist in the datasource.
virtual feature_ptr arithmetic_mean(const std::string column, const query& q = NULL) const=0;
virtual feature_ptr geometric_mean(const std::string column, const query& q = NULL) const=0;
virtual feature_ptr standard_deviation(const std::string column, const query& q = NULL) const=0;
virtual feature_ptr variance(const std::string column, const query& q = NULL) const=0;

Since these are potentially expensive to compute, those methods should cache their results.

@artemp
Owner

[kkaefer] Each Feature object gets a reference to the datasource it was loaded from, so that it could query statistics about the current feature in relation to all features in the datasource.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.