Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider joining efforts with pmtiles2 #16

Open
nyurik opened this issue May 3, 2023 · 10 comments
Open

Consider joining efforts with pmtiles2 #16

nyurik opened this issue May 3, 2023 · 10 comments

Comments

@nyurik
Copy link
Collaborator

nyurik commented May 3, 2023

I just spotted arma-place/pmtiles-rs efforts, and I hope it would make sense for us all to work on this together. CCing @DerZade and @TheWillard

@DerZade
Copy link

DerZade commented May 3, 2023

Hey 🙃

It would probably make sense to join efforts (at least to to some degree).

Before I started with my implementation I took a look at this implementation, but decided against using it.
This decision had three main reasons at the time:

  1. I needed a very low-level implementation, that makes as few assumptions as possible about the environment it runs in. This why I decided to base my work in the Read and Write-Traits.
  2. According to your readme your project wasn't ready for production use (although someone could argue, that I could have contributed instead of starting from scratch, if that was the only issue)
  3. I wanted a small challenge to improve my Rust skills, since my only experience with Rust was either on very small (and unreleased) or very unfinished project so far 😂

I've recently worked on async support via the AsyncRead and AsyncWrite-traits and actually planned to hit you up and suggest joining efforts to some degree, once I've release and thoroughly tested that. So I'm happy to talk about joining efforts. We just have to agree on how exactly :)

@nyurik
Copy link
Collaborator Author

nyurik commented May 3, 2023

@DerZade, thx for the quick reply! :)

So first off, I don't think either @Seelenbinder nor myself are hard-set on the specific implementation path or architecture, we simply need a bunch of functionality (I need it for Martin tile server, and Luke needs it for his company). So as long as it has async, i'm all good. Moreover, I think you have progressed a lot further than we have in some areas, and pmtiles crate hasn't been touched for a while (it is in production in a number of places). There is one reported bug AFAIK, but I don't have a reproducable case yet.

Secondly, IMHO - for learning it might be good to hack on one code with more people because we can help each other learn Rust better - it wasn't too long ago when I started this journey too (still on it actually, and far from the end).

Lastly, but probably most importantly, I am more concerned with users being confused over which crate to pick, and all of us duplicating our efforts instead of collaborating - i.e. it splinters the already tiny community, which is a bit sad and wasteful. I am even willing to toss some of pmtiles(1) code out and merge in yours (Luke, don't kill me!) as long as we all can progress faster on a common project.

So all that said, I think you actually have a better understanding of where thing are code-wise because you have spent more time on it more recently, and could offer a way to merge our code. I think we should keep using pmtiles crate (name), and once it has all the features of pmtiles2 crate, sunset pmtiles2 to avoid confusion (and to keep the higher usage statistics on crates.io)

What do you think?

@lseelenbinder
Copy link
Member

Hey @nyurik, @DerZade,

Thanks for opening the issue. I'm not opposed, in principle.

However, before we go too far down the road, I think there is a significant difference in approaches which we'd need to resolve, namely Read & Write (sync or async) only give you half of what you need to have a good, production-read crate IMO. Without a HTTP or a mmap/local backend, you have to implement that yourself, which is non-trivial.

pmtiles2 would need those before we could port over. I'd also want to take a look at implementation details, because performance is a fundamental requirement on our side (especially with HTTP), whereas I don't know on the other side what your goals are.

I don't have the bandwidth to spearhead such an effort right now, and, while, we aren't quite using this crate in production yet, we will shortly. (Which means I'll get back to hacking on it if we discover issues.)

It is very nice you have writing! 😄

@DerZade
Copy link

DerZade commented May 5, 2023

I personally do not need a HTTP backend at all. I just want to read and write some PMTiles 😇 So what do you guys think about having two crates in the same repository (idc if here or arma-place/pmtiles-rs)? One crate for low level reading / writing and another which depends on the first one and includes the HTTP backends. This would still result in one common implementation of the format.
The base crate should then probably be named pmtiles to avoid confusion for future users. For current users this may cause confusion tho, because they may need to switch to another crate. Apart from that it would be the imho cleanest solution.

@nyurik
Copy link
Collaborator Author

nyurik commented May 5, 2023

Why not use the "features" feature for that?

@DerZade
Copy link

DerZade commented May 8, 2023

Why not use the "features" feature for that?

That would also be a possibility 🙈.
Anyway, the more pressing question, however, would be what would have to be done to merge the two crates while keeping all functionality.

Did I understand correctly that your implementation of the format is not yet complete and that you want to build mostly on my implementation? If that is the case, the first (and probably biggest) thing that needs to happen, is adapting your HTTP backends so that they use my implementation of the format. Correct?

I'll release a new version of pmtiles2, which includes async support, in the next couple of days (hopefully tonight).

@lseelenbinder
Copy link
Member

lseelenbinder commented May 8, 2023

Why not use the "features" feature for that?

That's definitely the "correct" Rust option. 😄

Did I understand correctly that your implementation of the format is not yet complete and that you want to build mostly on my implementation?

Our read-side implementation is more or less complete, and is pretty well sanity tested. There it no write-side implementation.

I think the big bad warning about production use should go away, since it seems to pass basic sniff tests for users. 😄 The big missing piece there is solid documentation.

If that is the case, the first (and probably biggest) thing that needs to happen, is adapting your HTTP backends so that they use my implementation of the format. Correct?

So that's actually my biggest concern. My initial intuition is that Read is not very similar to efficient HTTP requests (especially if you want to conflate byte-range requests, etc.), so I'd need to understand how we can maintain efficiency while wrapping your implementations. We may simply need to limit what methods of the Read trait we use (I currently implement read and read_exact; Seek is unnecessary for HTTP or mmap'ed files).

I also pretty explicitly chose to use Bytes vs Vec<u8> due to more efficient application code above that, but we could wrap the Vec returned by your code in Bytes fairly easily (I think it's zero-copy).

@lseelenbinder
Copy link
Member

(And to be clear, I'm not trying to stall the efforts, just trying wrap my head around how we could unify the two implementations without losing the advantages of both.)

@DerZade
Copy link

DerZade commented May 9, 2023

So that's actually my biggest concern. My initial intuition is that Read is not very similar to efficient HTTP requests (especially if you want to conflate byte-range requests, etc.), so I'd need to understand how we can maintain efficiency while wrapping your implementations. We may simply need to limit what methods of the Read trait we use (I currently implement read and read_exact; Seek is unnecessary for HTTP or mmap'ed files).

I choose Read because it does not make a lot of assumptions about where you get the data from. Obviously it works well with the file system, but can easily adjusted to almost any medium. The other big thing is that it represents something that is readable, but may not be fully loaded to memory yet, which would be perfect for HTTP range requests How Read is implemented is entirely up to the implementation. A very common pattern is also to wrap a standard reader in a BufReader or something similar, which I would definitely do, for our use-case, otherwise every read call would result in a HTTP request.

I haven't had a lot of time yet, to look at your code in detail, but I'm quite convinced that implementing Read to work with HTTP efficiently is doable with not too much overhead and even things like conflating byte-range requests will be possible. In order to evaluate this completely, however, I need to take a closer look at the code. I just have very little time at the moment because I have a pretty big trade fair coming up at work in a couple of weeks. Once that is through, however, I should have considerably more time.

For a bit of context:
The original plan for my crate was to read bytes from a Cloudflare R2 Bucket in a Cloudflare Worker, which has very similar workflow as HTTP requests would have, and I'm not too concerned about that 🙈 There I also have very limited RAM and CPU, so performance is also a high priority for me.

@lseelenbinder
Copy link
Member

Sounds good @DerZade!

It seems like our goals align, so once we both have more time, hopefully we can hash out what the merged result looks like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants