Skip to content
This repository has been archived by the owner on Oct 30, 2021. It is now read-only.

Distance floor #145

Merged
merged 5 commits into from
Apr 10, 2019
Merged

Distance floor #145

merged 5 commits into from
Apr 10, 2019

Conversation

Nmargolis
Copy link
Contributor

@Nmargolis Nmargolis commented Apr 2, 2019

Distance floor PR

Context

In calculating scoredist, the distratio had a hard-coded minimum value of 0.005. The rationale behind this was that if as distratio approaches zero, the scordist value gets exponentially higher. The cap at 0.05 was hard-coded based on a proximity radius of 200, assuming that 0.05 was the minimum reasonable value (more explanation below).

This value of 0.005 is based on the minimum reasonable value for a proximity radius closer to 200. The problem is that when the proximity radius passed in to coalesce is higher, say 400, there is meaningful differentiation in the values that are less than 0.005.

Alternatively, any distance less than 1 isn't meaningful, since something on the same tile as the proximity point will be 0, and something one tile away will have a distance of >=1.

This PR sets a minimum distance, instead of a minimum distRatio.

Summary of changes

  • Give distance a minimum value to avoid distRatios that approach zero, and exponentially high scoredists

Next steps

  • More comprehensive analysis of different zoom levels, proximity radius values, and scores
  • Potentially evaluate whether something like 0.9 would make more sense than 0.5

More in-depth explanation

zoomTileRadius converts miles into a tile unit value. For zoom 14, it would be 0.8:

    // 32 tiles is about 40 miles at z14, use this as our mile <=> tile conversion.
    return (32.0 / 40.0) / _pow(1.5, 14 - static_cast<int>(zoom)); // 0.8 / 1.5^0 = 0.8

proximityRadius for zoom 14, and radius 400 would be 320 (as opposed to 160 for radius 200):

    if (zoom >= 6 && zoom <= 14) {
        return radius * zoomRadius[14 - zoom]; // 400 * 0.8 = 320
    }

distRatio for zoom 14, distance 0, score 0, radius 400 would previously have been capped at 0.005:

Previously:

    double distRatio = distance / proximityRadius(zoom, radius); 
    // For a distance of 0 and radius of 400: 0 / 320 = 0
    // For a distance of 1 and radius of 400: 1/320 = 0.003125
    
    // Too close to 0 the values get intense. Cap it.
    if (distRatio < 0.005) {
        distRatio = 0.005;
    }
    // Both would return 0.005, resulting in no differentiation

If the feature was 1 tile away from the proximity point, with a radius of 200 (so proximityRadius of 160), the distRatio woud have been 1/160 = 0.00625. That’s why a minimum of 0.005 made sense. If the proximity radius is 400 though, the distRatio for 1/320 would be 0.003125, which is under the 0.005 threshold. This means that there was no distinction in scoredist between a distance of 0 and a distance of 1, so features on the same tile as the proximity point were not getting sorted higher than features 1 tile away.

Now:

    // Too close to 0 the scoredist values get intense. Give distance a floor.
    if (distance < 1) {
        // Something greater than 0 but less than 1, to avoid dividing by 0
        distance = 0.5;
    }
    double distRatio = distance / proximityRadius(zoom, radius); 
    
    // For a distance of 0, corrected to 0.05: 0.5 / 320 = 0.0015625
    // For a distance of 1: 1/320 = 0.003125
    // There is meaningful differentiation in these values

scoreDist:

    static const double E_POW[8] = {
        1,
        2.718281828459045,
        7.38905609893065,
        20.085536923187668,
        54.598150033144236,
        148.4131591025766,
        403.4287934927351,
        1096.6331584284585};
    
    // distratio calculations ...
    
    return ((6 * E_POW[score] / E_POW[7]) + 1) / distRatio;
    // For score 0, distRatio 0.0015625: ((6 * 1) / 1096.63) + 1) / 0.0015625 = ~643.50163
    // For score 0, distRatio 0.003125: ((6 * 1) / 1096.63) + 1) / 0.003125 = ~321.75081

Open question:

What are the limits of this for different zoom levels, scores and proximity radius values? Is it still necessary to have another cap to avoid astronomically high scoredists?

Copy link
Contributor

@miccolis miccolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Nmargolis I think we're very close, I'd like to see coalesce.test.js get tightened up just a little bit more, let me know if I can help.

src/cpp_util.cpp Show resolved Hide resolved
test/coalesce.test.js Outdated Show resolved Hide resolved
test/coalesce.test.js Show resolved Hide resolved
@Nmargolis Nmargolis merged commit 555c232 into master Apr 10, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants