Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

to what extent could a Rust DB working group benefit our shared goals? #1

Open
spacejam opened this issue Mar 22, 2019 · 15 comments

Comments

@spacejam
Copy link

commented Mar 22, 2019

Hey folks, @spacekookie pitched the idea yesterday of having something like a Rust database WG. The thing about most of the Rust database people I know is that they seem quite busy with their projects, probably more than many other types of projects, as databases are such a labor intensive endeavor. We scout out a possible architecture and then often iterate for years to get something we are comfortable with, if we get there at all.

My question is: to what extent can we share foundational libraries, testing techniques, production expertise etc... so that we can reach our performance, reliability, and user ergonomic goals faster than we would be able to achieve on our own? We seem to love writing everything ourselves, often with justifications of "that other stuff isn't fast enough" ;) And perhaps some of us have started working on databases specifically to avoid having to work with many other people, as these things tend to be low-coordination by necessity unless there is some serious capital being invested in reducing bus factor by sharing knowledge that is often high effort to share at all.

But maybe it makes sense to turn some of the components in our various systems into more accessible shared libraries for each other. Personally, I would love to have some knowledge sharing sessions where we learn about each other's approaches to testing, debugging, and optimization.

As a set of examples, I'd like to talk with more Rust DB folks about techniques for finding bugs in concurrent Rust, fuzzing approaches that actually yield bugs, possibly combining our various bespoke histogram collection libraries, and generally how we can turn our many shell and perl scripts that we use for causing bugs and performance issues to pop out into more ergonomic tooling that is accessible to more Rust users.

Databases in Rust are definitely a thing now. We often pour so much of our lives into them. Let's share the table stakes stuff so we can spend more time being creative.

Please add a reaction or text response if any of this appeals to you! Thanks for reading :)

Tagging a few folks from academia and industry, involved in building databases or adjacent tech in Rust:
@utaal @frankmcsherry
@fulmicoton @jonhoo @KodrAus @Kerollmops @cswinter @tglman @krl @hmwill @alex-shapiro @mbalex99 @davidrusu
@siddontang @ngaut @BusyJay @Hoverbear @c4pt0r

Please tag other folks who may be interested in at least being a fly on the wall :)

@jonhoo

This comment has been minimized.

Copy link

commented Mar 22, 2019

I'll leave a link to Apache Arrow and loom here, which may both end up being relevant to this.

@jonhoo

This comment has been minimized.

Copy link

commented Mar 22, 2019

See also this highly relevant Reddit thread. I left a comment outlining some of my thoughts around async Rust DB libraries.

@Kerollmops

This comment has been minimized.

Copy link

commented Mar 22, 2019

I could "probably" help on improving data accesses and these kind of things. Like data-oriented programming...

@Hoverbear

This comment has been minimized.

Copy link

commented Mar 22, 2019

There are definitely shared concerns for database projects.

First to mind for me: Shared ecosystem.

Many of our projects will share dependencies like RPC, monitoring, storage engines, consensus layers, clients, parsing & lexing, query planning, co-processors, MVCC models, diagnostics, logging, networking, even things like common interface patterns.

While there exist working groups for many of these topics, there are others that exist in the void. In those that do have relevant working groups the demands of databases are not always understood, or the capacity to address them is too small.

Having some way to find, shepherd, mentor for, and/or highlight issues of concerns to database projects could benefit us all.

@fulmicoton

This comment has been minimized.

Copy link

commented Mar 23, 2019

I'm mostly interested to hear about testing tips. loom and fail-rs is something that is in my radar for instance. I would also love to see a convergence on best practices for error handling and logging and apply them to tantivy.

@pimeys

This comment has been minimized.

Copy link

commented Mar 27, 2019

We're currently porting https://prisma.io into Rust and from our work hopefully spawns reusable components as separate crates. So I'm all in to the WG.

We really need a DSL to abstract the SQL syntax between the databases, so for that we have prisma-query. Not optimized and constantly changing, takes a bit too much of ownership for internal reasons, but hopefully could be useful for others too. The plan is to have features first, to be correct and then be fast.

@weiznich

This comment has been minimized.

Copy link

commented Mar 27, 2019

@pimeys Great to hear that you are trying to port prisma to rust.

We really need a DSL to abstract the SQL syntax between the databases, so for that we have prisma-query. Not optimized and constantly changing, takes a bit too much of ownership for internal reasons, but hopefully could be useful for others too. The plan is to have features first, to be correct and then be fast.

Did you try diesel and diesel-dynamic-schema? Did you find any issues that prevents using it?

By the way to you probably want to have a look at wundergraph that does already provide a simple way to build a graphql schema from a given database. It builds on top of diesel. It is in a working state but missing some small improvements and the documentation needs to be written/updated.

@pimeys

This comment has been minimized.

Copy link

commented Mar 28, 2019

@weiznich We didn't try diesel-dynamic-schema and I guess the reason was we just didn't find it back then when we evaluated our options. Yes, it would make sense to use an off-the-shelf solution and yes, I'll put a post-it today to our kanban board to evaluate it. I don't really want to build our own DSL so this will have priority.

The other part of the team is currently parsing the incoming graphql and they will need a schema parser too. We found the wundergraph yesterday, so we'll be evaluating it when the time comes.

@weiznich

This comment has been minimized.

Copy link

commented Mar 28, 2019

@pimeys

The other part of the team is currently parsing the incoming graphql and they will need a schema parser too.

You may want to look at juniper for that 😉

@pimeys

This comment has been minimized.

Copy link

commented Mar 28, 2019

It was evaluated, but turned out to be too complex for us, so we chose a graphql parser, handling the query execution by ourselves.

@SamuelMarks

This comment has been minimized.

Copy link

commented Mar 29, 2019

Hi all, my name's Samuel, I'm loving Rust and bringing together a bunch of engineers on open-source development (LLVM, register-based VM, custom programming languages, custom consensus algorithms, &etc.).


Here is my 2¢!

In addition to @Hoverbear's great comment above #1 (comment), some obvious things to share [off the top of my head] are:

  • Black-box testing (my team is looking at Jepsen)
  • Benchmarking
  • Verification (my team is moving towards TLA+ for our consensus, but it may be possible to link verification with actual code, and verify the implementation's orthogonality to TLA+ [or a similar, proven language])
  • Command-line controllers (daemon libraries, common configuration libraries & global config which could be common to all databases, logrollers, &etc)

Also documentation for all the above, in an easy to grok manner, e.g.:

  • How to write a database in Rust
  • HA in Rust
  • Performance comparison of [list of protocols], and how to compare yours

Additionally, we could look at an AreWeDByet, similar to arewebyet & areweasyncyet, but for database development in Rust.

Wouldn't hurt to have monthly working group meetings over videoconference, though we need to be careful it doesn't turn into a research/reading group.

Finally, some random resources on database engine creation, HA, clustering, tradeoffs between levels of consistency &etc. wouldn't go amiss. Including textbooks, lecture series and related. Found some of this directly through @spacejam.
Maybe link this AreWeDByet to a wiki—or just a github repo—then we can all start contributing? - And maybe an IRC channel on moznet?

Thanks for your consideration 😃

@KodrAus

This comment has been minimized.

Copy link

commented Mar 30, 2019

It looks like we've got two ideas of who a database working group is targeted at; database integration (like making the story of connecting to databases from various frameworks nicer), and database implementation (which is where I think this issue is focused).

I'm keen to support a Rust database implementation community that can share experience and resources.

knowledge sharing sessions where we learn about each other's approaches to testing, debugging, and optimization.

I think this is a great idea, and personally feel like it's at least a prerequisite to identifying and sharing common implementations (in storage engines particularly I think there's an eventual need to own as much of your stack as possible). Resources like Ayende's recent dive into sled could help surface some design decisions in a storage codebase from an outside perspective. Having a similar dive into tantivy would be great too.

As an example, I'm sure there are probably some common approaches that we all find we need in some form. Some things that immediately come to mind from our own storage engine:

trait MemRead {
    // if `MemRead::bytes` returns `Some` they *must*
    // be exactly the same bytes yielded by `MemRead::into_reader().read_to_end()`
    type Reader: Read + Into<Self>;

    fn bytes(&self) -> Option<&[u8]>;
    fn into_reader(self) -> Self::Reader;
}

Which is basically std::read::Read, but lets you optimize for the case where you're holding a contiguous slice. An early decision we made was never to assume we'll always have a contiguous slice to work with, which made dealing with compression or data stored across multiple pages much more natural later.

Designing with Windows in mind right from the start can save some pain down the track.

Our storage engine isn't lock-free like sled; it uses a strategy that prevents writes and maintenance from ever blocking reads so we've spent a lot of time localizing, building infrastructure to verify, and documenting locks and how they need to be held to perform certain operations.

We interact with our engine exclusively through C#, so FFI is really important to us. I've started pulling out the guts of that FFI work into a public example repo which I'm working on turning into something more compelling. A nice C ABI would probably also be useful for tantivy and other Rust databases down the track too.

So maybe a good starting point could be to organize some knowledge sharing sessions like @spacejam suggested of our current codebases with the output being posts or conference talk content and an idea of how we could support our community better?

@spacekookie

This comment has been minimized.

Copy link
Member

commented Mar 31, 2019

@KodrAus Yea, I brought this up in https://internals.rust-lang.org/t/kickstarting-a-database-wg/9696 as well.

I like the idea raised by bitshiftmask in that thread, having one large Database working group, with sub-teams for various aspects.

Anyway, I also just wanted to post it here that we have a zulip stream now: https://rust-lang.zulipchat.com/#narrow/stream/193127-wg-database

And I would love to send a doodle around at some point to find some time that we can maybe discuss what we want to do (in high-level terms i.e. "I want to find X people to talk about Y") in sync some time

@spacekookie

This comment has been minimized.

Copy link
Member

commented Apr 11, 2019

Just to make sure to ping people here too, I'd love to get started with a synched "kickoff meeting" soon, somewhere where we can discuss who wants/ can work on what and maybe form some sub-teams.

Generally I see a lot of energy in this space but I feel like deliberate collaboration is important.

The zulip topic is here: https://rust-lang.zulipchat.com/#narrow/stream/193127-wg-database/topic/Coordination.20meetings

@tvogt

This comment has been minimized.

Copy link

commented Apr 15, 2019

As an answer to the original question about a DB working group:

Database abstraction is the game in every popular framework in the web world. People who install Symfony, for example, don't want to care much if they run Postgres or MySQL or MariaDB or whatever else. They think the DB abstraction layer or the ORM etc. should handle that for them.

So yes, a meta-group would be useful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
You can’t perform that action at this time.