Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement http pmtiles #991

Merged
merged 19 commits into from
Dec 22, 2023
Merged

Implement http pmtiles #991

merged 19 commits into from
Dec 22, 2023

Conversation

nyurik
Copy link
Member

@nyurik nyurik commented Nov 8, 2023

PMTiles is a web-optimized format, allowing the actual file to be read with HTTP range requests. Supporting this use case instantly allows Martin to function as a lambda executable accessing PMTiles, but without any significant investment into devops or hosting large file.

PMTiles config now also allows http and https protocol.

# Publish PMTiles files
pmtiles:
  paths:
    # specific pmtiles file will be published as mypmtiles source
    # (use last portion of the URL without extension)
    - http://example.org/path/to/mypmtiles.pmtiles
  sources:
    # named source matching source name to a single file
    pm-src1: https://example.org/path/to/some_pmtiles.pmtiles

fixes #884

@nyurik nyurik force-pushed the http-pmtiles branch 2 times, most recently from 9e7d829 to 56efa4c Compare November 9, 2023 00:46
@kyleslugg
Copy link
Contributor

Update / Info: I've edited test.sh to include the fileserver container (specified in docker-compose.yml) needed to test this particular piece, and have begun revising the CI workflow to similar effect.

Most tests are now passing, but I'm still working on the Windows segment, which requires installing and configuring Nginx on the runner. More to come!

@nyurik
Copy link
Member Author

nyurik commented Nov 30, 2023

@kyleslugg thanks! I just rebased this PR on top of main - please submit a PR to my branch nyurik / http-pmtiles - and i will merge it into this PR before merging it into main

@nyurik
Copy link
Member Author

nyurik commented Dec 10, 2023

@kyleslugg hi, any luck with the CI for this thing? I can help if needed. Thx!

@kyleslugg
Copy link
Contributor

kyleslugg commented Dec 10, 2023 via email

@nyurik
Copy link
Member Author

nyurik commented Dec 12, 2023

@kyleslugg no prob, i'll try to get to it soon. Just FYI - I think github runners seem to have nginx pre-installed (see repo search) - so it might be possible to just activate the preinstalled versions as is?

@kyleslugg
Copy link
Contributor

Good catch! Reworking now to use the existing installation. Updates to come!

@kyleslugg
Copy link
Contributor

About ready to go (turned out to be an issue with the specification of the nginx service). Tidying up commits a bit; will create a PR to your feature branch momentarily.

@nyurik
Copy link
Member Author

nyurik commented Dec 18, 2023

awesome, thank you!!! Next thing after this -- figure out how to use S3 and similar libs instead of raw HTTP - as I suspect S3 bucket access might be tricky-ish -- the signed URLs only work for a few hours I think, so we should either need to add some magical S3 URL authentication headers, or use some ready-made-crates. Regardless, we may need some authentication params passable from the config file

@kyleslugg
Copy link
Contributor

Sounds great! Let me do some reading/thinking -- I'll need to polish my AWS knowledge a bit before I'm of much use here.

Just thinking through this design-wise, does it make sense to you to implement a separate source definition (in the form of, e.g., a PmtS3Source struct) for this purpose? If that sounds right, I'll start tinkering in that direction.

@nyurik
Copy link
Member Author

nyurik commented Dec 18, 2023

That's a great question, and I don't have a good answer to that. Technically, pmtiles is the "main" type, and connecting to a specific http source with authentication is a detail of that. Eventually we may have all sorts of http pmtiles sources, some of which use authentication, and some requiring magical credentials... If we can use some non-AWS specific authentication pattern that works on multiple cloud providers, that would be ideal... So first step i guess would be to find out what's actually needed in terms of authentication.

@kyleslugg
Copy link
Contributor

Yes, that makes sense to me.

Based on some initial research, it seems that common practice for enabling semi-public access to an s3 bucket involves the type of filtering and access restriction via Lambda that you described during our convo a few weeks ago. This raises the question, for me, of what we are expecting of users during the source setup process and what we are planning to have Martin abstract away. (And, relatedly, how much s3 fluency are we expecting of users?)

At the most basic, for example, it seems like someone could drop a pmtiles file in a bucket, enable full public access, configure an index file on the s3 side, and then use the resulting URL as a source URL like any other, with no adjustment on our end. The same is true of a situation in which someone exposes a properly configured Lambda endpoint, with security again managed on the AWS end.

Alternatively, if the user can supply a sufficiently privileged set of access credentials, plus a bucket UUID and filepath, the AWS SDK for Rust could be used to generate a fresh pre-signed URL on each run, and rotate it as needed. This approach, though, is highly specific to the AWS context, and so would not be generalizable across cloud providers.

Let me know what thoughts this brings up for you!

@nyurik
Copy link
Member Author

nyurik commented Dec 19, 2023

thanks for all the great links! My thinking of the first (MVP) deliverables:

  • code: non-authenticated HTTP(S) access only (https is better for keepalive and http/2 efficiency)
  • code: LRU caching of the directory structures
  • docs on how to set up public unlisted readonly AWS bucket (expecting to have docs for other cloud providers too)
  • docs on how to run Martin in AWS or other Lambda services, accessing the above bucket

Let's table the authentication aspect until after we get the above, because I suspect most people will be fine with having a non-discoverable but public bucket with the pmtiles file -- this way they can easily test directly against it from their local machine, but at the same time not worry that someone else will see where that file is and download it directly.

@Libbum
Copy link

Libbum commented Dec 21, 2023

FWIW, I'm quite interested in using this feature with a private, encrypted s3 bucket - not that I'm asking you to change the reasonable roadmap you have above, but adding a +1 to the mix of the more complicated integration.

For the moment I've got a server loading other files from s3, using pre-shared keys that are generated via the rust AWS SDK. It's not super nice, because like you've already said - you have to generate those PSKs, keep a cache of them and rotate the links when they expire.

Uncertain I can commit any time to assist with the further development of this PR, but can definitely beta test the solution as a 3rd party.

@kyleslugg
Copy link
Contributor

That makes sense to me! Re. caching: based on the caching work you've done so far, what, in your mind, is left to be implemented before that feature is MVP-ready?

As for the documentation: I'm going to be visiting family next week, and so can't commit too much time, but I can plan on putting together a setup guide for an S3 bucket by the first week of the new year. I'll also plan on testing out deployment via Lambda, with the intent of drafting those docs as well in early January.

@Libbum
Copy link

Libbum commented Dec 21, 2023

For an MVP of the cache: currently I'm using an in-memory store with Axum sessions, but that isn't perfect. I've essentially had to decouple the PSK expiry and the session (which uses Expiry::OnInactivity) since I couldn't quite figure out how to get that middleware to expire something X seconds from now.

My next step would be to replace this in memory solution with Redis - I suspect most people would be interested in that setup anyhow. I can post something on a similar time frame most likely - enjoy the new year!

nyurik and others added 13 commits December 21, 2023 18:39
* Implement http pmtiles

* Add Moka cache support

* Added action-mate step in CI workflow to troubleshoot

* Rewrote STATICS_URL in test.sh to use 0.0.0.0 instead of localhost

* Altered default test db string

* Added fileserver to services in CI actions / docker-build-test

* Reverted MARTIN_DATABASE_URL in test.sh

* Edited volume spec in fileserver service in CI workflow:

* Tried setting safe directory to false

* Reverting fileserver volume spec

* Separated fileserver startup in CI

* Propogated fileserver steps in CI workflow

* Added step to install docker on MacOS

* Adjusting docker install on MacOS runner

* Trying filerserver as service during testing

* Experimenting with manually installing Nginx on Windows runner

* Modifying Nginx install and start to specify use of PowerShell:

* Troubleshooting Windows tests

* Added line to rename versioned nginx directory to versionless:

* More messing around with Windows directory structure

* Last time with PowerShell? Let's hope so.

* Setting proper port number

* Troubleshooting Nginx startup

* Reverted use of github workspace var in nginx config

* Further tweaks

* Commenting out unnecessary tasks

* Commenting out done

* Using preinstalled Windows Nginx

* Fixed Cargo.lock post-merge

* Commented out non-Windows builds to accelerate testing

* Elevating PowerShell permissions to start Nginx

* Running Nginx as process rather than service

* Reintroducing multiplatform builds

* Set curl in test.sh to run as verbose and fail with body for troubleshooting purposes

* More Nginx troubleshooting

* More Nginx...

* Trying service...

* More troubleshooting...

* Further monitoring...

* Retrying service

* Trying with default config

* Logging all services and start types

* Error with powershell? Can't run Get-Service through the Ms ...

* Added silent error handling to Get-Service

* Updating Get-CimInstance

* Knocked out line 314 to try and run steps concurrently

* Amended Get-CimInstance call to look at services

* Knocked out build action{

* Enabling nginx service

* Re-enabling build -- problem was that nginx service was disabled by default. Once manual start was enabled, port was shown to be listening.

* Restoring knocked-out workflow steps

* Using preinstalled Windows Nginx

Fixed Cargo.lock post-merge

Commented out non-Windows builds to accelerate testing

Elevating PowerShell permissions to start Nginx

Running Nginx as process rather than service

Reintroducing multiplatform builds

Set curl in test.sh to run as verbose and fail with body for troubleshooting purposes

More Nginx troubleshooting

More Nginx...

Trying service...

More troubleshooting...

Further monitoring...

Retrying service

Trying with default config

Logging all services and start types

Error with powershell? Can't run Get-Service through the Ms ...

Added silent error handling to Get-Service

Updating Get-CimInstance

Knocked out line 314 to try and run steps concurrently

Amended Get-CimInstance call to look at services

Knocked out build action{

* Enabling nginx service

* Re-enabling build -- problem was that nginx service was disabled by default. Once manual start was enabled, port was shown to be listening.

* Restoring knocked-out workflow steps

* Removed --verbose option from test.sh curl specification

* revert spacing

* cleanup settings file

---------

Co-authored-by: Yuri Astrakhan <YuriAstrakhan@gmail.com>
@nyurik nyurik marked this pull request as ready for review December 22, 2023 04:55
@nyurik nyurik enabled auto-merge (squash) December 22, 2023 05:41
@nyurik
Copy link
Member Author

nyurik commented Dec 22, 2023

@kyleslugg if you have a sec, I did number of cleanup rounds, all seems to be ok, except the MacOS seem to be having some sporadic hickups. If you have a sec, could you take a look, see if its something I accidentally broke, or if its some weirdness with macs?

@nyurik nyurik merged commit 1a8e7c8 into maplibre:main Dec 22, 2023
18 checks passed
@nyurik nyurik deleted the http-pmtiles branch December 22, 2023 06:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support PMTiles reading over HTTP
4 participants