Skip to content

Conversation

pnorman
Copy link
Contributor

@pnorman pnorman commented May 7, 2016

Work in progress, not yet complete

ref https://lists.osgeo.org/pipermail/postgis-devel/2016-May/025808.html

Notes from implementing

  • SQL functions do not need a COST assigned, PostgreSQL will compute it from the statement
  • _postgis_deprecate has a cost of 100. This means deprecating a function like ST_length_spheroid makes it 20% slower, and a quick function like ST_force_2d becomes 20x slower.
  • There's inconsistent spacing and tabs in postgis.sql.in

Costs are based on a set of 5M polygons from a UK OSM extract, based
on comparing runtimes with those of bigint addition, subtraction,
modulo, division, squaring, and square root.

The tests were done with PostGIS 2.1.8, so 2.2.0 and later functions
are not yet costed.

Also adds comments where costs are guessed.

ref https://lists.osgeo.org/pipermail/postgis-devel/2016-May/025808.html
@robe2
Copy link
Member

robe2 commented May 16, 2016

I'm going thru these now. Some look suspicious.

e.g. ST_AsEWKT(geometry) (really costs 750 ?)

If it does that would suggest a bug or some really old code in our code.

-- ST_PointOnSurface (really costs 2500, I would expect that to be less than ST_Intersection)

@robe2
Copy link
Member

robe2 commented May 16, 2016

I think I got all of them except for some of the guesed comments things you put in for some of the existing.

I went thru 1 by one because I'm inept with dealing with conflicts with @pramsey latest commit. There were some that I thought should have costs on them that I didn't see, but I'll revisit those later.

I also changed the ST_DistanceSpheroid and ST_Distance as I'm pretty sure ST_DistanceSpheroid is 5-20 times costlier than ST_Distance. Though we may still want to up both of them.

Details here: https://trac.osgeo.org/postgis/ticket/3557

@robe2 robe2 closed this May 16, 2016
@pnorman
Copy link
Contributor Author

pnorman commented May 16, 2016

e.g. ST_AsEWKT(geometry) (really costs 750 ?)

If it does that would suggest a bug or some really old code in our code.

I just double-checked with a different dataset, st_asewkb takes 691ms, st_asewkt takes 26958ms, so it really is expensive. All of the st_as* functions which serialize to a text-based format are slow compared to binary.

I noticed that I missed ST_AsText somehow, but it has a similar cost

ST_PointOnSurface (really costs 2500, I would expect that to be less than ST_Intersection)

I haven't set the ST_Intersection cost, two-geometry functions are another round of testing. I expect it will be expensive

@pnorman
Copy link
Contributor Author

pnorman commented May 16, 2016

I also changed the ST_DistanceSpheroid and ST_Distance as I'm pretty sure ST_DistanceSpheroid is 5-20 times costlier than ST_Distance. Though we may still want to up both of them.

Those are two-geometry functions, I didn't touch their costs.

@strk
Copy link
Member

strk commented May 16, 2016

ST_PointOnSurface internally uses ST_Intersection, intersecting
the geometry with an envelope diagonal, so it makes sense for it
to cost more than Intersection. The difference is in the input
complexity, whereas ST_PointOnSurface(polygon) is guaranteed to
compute the intersection between an area (the polygon) and a single
segment, whereas ST_Intersection is not constrained on the input
data types.

On the other hand ST_PointOnSurface is guaranteed to always (except
for the EMPTY input) build the inputs topology, whereas
ST_Intersection might early exit if the envelopes of inputs are known
as being disjoint.

Should COST define min, max or avg cost, btw ?

@robe2
Copy link
Member

robe2 commented May 16, 2016

I don't think it really matters, though I think average would be best.
It's really only import for hierarchy of functions (controls order in which functions are processed).

e.g ST_FuncA and ST_FuncB

The planner will process ST_FuncB first and skip ST_FuncA if ST_FuncA is costlier.

For parallelism it's important in as much as how it affects if parallelization will kick in or not. The costs usage on that as pramsey noted is pretty flaky so having a general rule of thumb is best we can do and making sure we don't set costs higher than functions that are lower than others.

That said I think the relationship functions are most important to get right (at least hierarchy wise).

@Komzpa
Copy link
Member

Komzpa commented Jan 8, 2017

@pnorman how exactly was the cost measured?

The mail thread suggests it is in units of cpu_operator_cost - but cpu_operator_cost itself is 0.0025 * seq_page_cost in default config. Due to the latest course of reverting this change back to 1 - was this multiplier taken into account?

Did you try taking a more complicated operator than a bigint addition for the base?

@pnorman
Copy link
Contributor Author

pnorman commented Jan 8, 2017

The mail thread suggests it is in units of cpu_operator_cost - but cpu_operator_cost itself is 0.0025 * seq_page_cost in default config.

Yes. This does not matter, as cpu_operator_cost cancels out in the math.

Due to the latest course of reverting this change back to 1 - was this multiplier taken into account?

I have not been involved in any revert.

Did you try taking a more complicated operator than a bigint addition for the base?

See the mail thread for the operators/functions used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants