Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Add vector tiling ADR #723

Merged
merged 1 commit into from Aug 12, 2019
Merged

Add vector tiling ADR #723

merged 1 commit into from Aug 12, 2019

Conversation

kellyi
Copy link
Contributor

@kellyi kellyi commented Aug 8, 2019

Overview

Adds an ADR documenting our decision to add an ST_AsMVT-backed vector tile endpoint to the existing Django web app.

Connects #704

Testing Instructions

  • read the ADR & verify that it documents our decision accurately and that it is an acceptable decision.

Checklist

  • fixup! commits have been squashed
  • CI passes after rebase
  • CHANGELOG.md updated with summary of features or fixes, following Keep a Changelog guidelines

@kellyi kellyi force-pushed the ki/add-vector-tiling-adr branch 2 times, most recently from b33f34d to 8ca76e3 Compare August 8, 2019 17:00
@kellyi kellyi changed the title WIP: Still a draft Add vector tiling ADR Aug 8, 2019
@kellyi kellyi marked this pull request as ready for review August 8, 2019 17:03

While Martin, in particular, seemed like a compelling solution, we had enough
questions about using it to discourage us from taking on the complexity of
using it here.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am very curious about Martin (et al), and I think others are too! We discussed that this would be a good R&D opportunity. Can those who are more involved with this project explain why this project in and of itself can't be that R&D opportunity? What are the limitations I don't know about, and can they be further discussed/captured?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jwalgran may amplify or correct this but I think one main reason not to have it here is that the client needs to be able to request subsets of facilities using query parameters like contributor id or facility name etc.

If we added Martin, making changes to these queries (like adding an ability to search on facility description or certifications) would require us to update both the Django queryset filter code and also the PL/pgSQL code used in Martin.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. If we didn't have to filter, Martin would be a more attractive choice. Writing our own endpoint also allows us to bail out of using ST_AsMVT if it proves to be a problem and do our tile rendering on an app server rather than in the database.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, makes sense. Consider adding the above to the ADR

##### Configuration

Martin appears fairly straightforward to configure and its documentation
encompassed most of what we'd want to do.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the power of good docs 🌦

Copy link
Contributor

@jwalgran jwalgran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an excellent summary of our team discussion. Thanks.

@kellyi kellyi force-pushed the ki/add-vector-tiling-adr branch 3 times, most recently from 858b4ea to 59cf56a Compare August 8, 2019 17:56
Copy link
Contributor

@rajadain rajadain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! Very detailed.

@@ -7,6 +7,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.

## [Unreleased]
### Added
- Add vector tile ADR [#723](https://github.com/open-apparel-registry/open-apparel-registry/pull/723)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider linking to the ADR itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion -- I'm going to leave this link as is since custom has been to link to the PR.

Copy link
Contributor

@hectcastro hectcastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good approach given the surrounding context, and I'm interested to see how it plays out.

I don't feel strongly that any of my comments need to be integrated prior to merge, but they were an attempt to add additional clarity since the repository is open source and may be consumed by people outside of Azavea.

we also want users to be able to filter these vector tiles by query parameters
like contributor, facility name, and country, along with the map bounding box.

To accomplish this we have decided to use vector tiles generated, ultimately,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This begins to leak details about the decision. I would consider moving this and some of the details below it into the Decision section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense -- I'll pare this down a bit here and move things into "Decision" or whatever is the relevant section.

### Reusing Existing /facilities API Endpoint

In theory we could remove the `MAX_PAGE_SIZE` limit on the `/facilities` API
endpoint. In practice this would cause performance problems as the size of the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are familiar with the pitfalls of large GeoJSON payloads and the browser's ability to render them, you read those details into this sentence as you skim it. To make it clearer, I think it would be helpful to consider citing a specific reason why performance problems are expected for the HTTP request attempting to serve the large GeoJSON payload, and the associated rendering of it on the client.

For example, the large GeoJSON HTTP payload is expected to be in the MBs (speculating), which consumes bandwidth and incurs a fairly significant serialization cost. Similarly, the client-side rendering will be expensive due to the sheer number of Leaflet markers that need to be rendered.


While we could potentially use a combination of [Windshaft][windshaft] and
[Leaflet.utfgrid][leaflet-utfgrid] to render facilities, there wasn't much
enthusiasm for setting up and maintaining a Windshaft tiler.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Enthusiasm doesn't seem like the best justification for not going with a particular solution. I would consider focusing on the aspects of the tool lead to a lack of enthusiasm (Terence probably has the best context for this):

  • The project is not well-maintained or documented
  • It requires an amount of configuration that is excessive for this use case
  • It requires adding another service vs. reusing the Django application
  • Something about UTF grid


### Creating Static Vector Tiles

We ruled out the idea of creating a static set of vector tiles because the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good justification.

or search option to the web application, we'd have to write a version of the
same query in PL/pgSQL for the tiler.

##### Security
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to this, I think integrating another service from a request authorization perspective is a significant con. For example, if the existing API supports API keys for authentication, a tile service would have to replicate that authorization logic, or be reverse proxied through Django to reuse it (opening the door to the same performance bottleneck issues cited as cons for ST_AsMVT).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to add this to the Security section:

Likewise, adding PostGIS-based security just for the tiler may also compel us to
have to figure out how to duplicate features like API key authentication or
facilities-data request logging -- which we've already written once in Django.


## Consequences

As a consequence of this decision, we will need to:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially too low detail for the ADR, but we may also need to consider:

  • Tweaking the number of Gunicorn workers per Gunicorn instance, in addition to the Gunicorn worker type (synchronous vs. Gevent). This would be in service of trying to increase the concurrent request capabilities of a single Gunicorn instance vs. spinning up more of them.
  • Looking at caching HTTP headers used by Whitenoise for static assets and evaluating which make sense to replicate for /tile.
  • Extending caching behaviors in CloudFront so that they respect any caching header changes made at the Django level.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- I'll add line items about Gunicorn configuration & caching to the list.

@kellyi kellyi force-pushed the ki/add-vector-tiling-adr branch 3 times, most recently from ad308a0 to e93ca56 Compare August 12, 2019 14:58
@kellyi
Copy link
Contributor Author

kellyi commented Aug 12, 2019

Thanks for the thoughtful reviews!

@kellyi kellyi merged commit e37a5f8 into develop Aug 12, 2019
@kellyi kellyi deleted the ki/add-vector-tiling-adr branch August 12, 2019 15:12
application's complexity. Keeping the tile endpoint in Django does not require
adding a new service.

##### Allows Scaling by Increasing the Number of App Instances

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this warrants a change to this ADR nor the decision you've arrived at, but it may be valuable to point out that this may not be a pro (it may not be a con either, but is certainly a consequence). I understood the implication of this statement to be that you lose the ability to scale the the tiling and the API endpoints separately, they must be done in tandem. You also lose the ability to provision the API infrastructure independently of the tiler. For example, if the tiling endpoints end up requiring higher memory allocations, you will be obligated to provision all API instances with the extra resources, and will take that hit if you need to scale on the API-only dimension.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants