Skip to content
Switch branches/tags
Go to file
Cannot retrieve contributors at this time
Ok, so I decided to write a database.
Because it's a really hard problem that lots of engineers
have tried to solve in the past and I think I can do it
Better how?
I have three main design principles (so far) that I think are
going to make this database better than any that have come
1. Instead of thinking about "transactions" think instead about
requests and responses, I believe this is a better separation of
concerns that will lend itself to better solutions to the concurrency
problems that one runs into when designing a database and I am hopeful
that it will lead to encountering-- perhaps not fewer, but better
2. I don't want to write a traditional database, I have no interest in
doing that, and I don't think that looking at what other people have
done and trying to improve a few things is how you do something really
extraordinary. I think that you decide what you think is important, then
steal shamelessly from people you think have done a good job
at particular parts of what you're trying to do.
What I think is important is doing things the way that computers like to
do them, in other words exhibiting "Mechanical Sympathy." I am very much
inspired by the approach taken by Alexia Massalin in her famous Synthesis
Kernel. Instead of deciding how you're going to do things and then kinda
trying to make it efficient, take a bunch of efficient things that seem
like they'll be useful and then make the thing out of those.
3. I'm a teensy bit inspired by the episode of Atlanta in which Earn
blacks out and forgets where he put something but in is blacked out
state had vouchsafed the valuable item to a friend. The friend, upon
returning it, tells him "[You] outsmarted yourself again." Since each
node runs the risk of going down(blacking out) at any time, and the
item of value can be copied, it seems reasonable to vouchsafe each request
and response with several friends before assuming that it is truly safe.
ok, so like if it's not going to be a traditional database what is it
going to be?
Good question, and I'm kinda still working on that. The first thing I'm
building as a sort of POC is basically a distributed value that can be updated
across multiple nodes. The idea is that instead of making application developers
worry about how to shard things across nodes you should instead make them
describe the things in a way that allows the database to decide how to shard them
based on the requests it's getting.
So the POC seems like a neat example of that, you have a quantity
distributed across nodes. Each node has some amount that it is in charge of,
and when client make requests to check out some of that quantity that node can
make the decision to give some away. If it needs more it can talk to one or more
of the other nodes and request some of theirs. In this way you can have a producer
client talk to one of the nodes and give it some quantity, then have consumer clients
talking to completely different nodes consuming their quantity, and have no worries
about race conditions or all of the clients ganging up on one node and clogging it up
with requests that could be more efficiently distributed across multiple nodes if things
were sharded better. This POC may seem like a toy problem that is uniquely well suited to
this approach and has no real world value. I would respond that firstly it does have
real world value (consider a distributed web crawler that is only allowed to make so
many connections to any given server at once), and secondly I believe that this approach
can be extended to other problems. My thinking is that you should be able to define
an object with multiple attributes in such a way that you can distribute the
responsibility for keeping track of those attributes in a somewhat similar
way to how this POC distributes responsibility for keeping track of smaller subquantities
of what may be considered to be one global quantity.
ok, so I think that one of the big advantages to the design of
recording requests and responses instead of transactions is going to
be that we only need to maintain the current value of things in memory
and we can just append to the log file, instead of having to index into
some big random access file every time you want to do a write.
JoshDB stuff to think about later
it seems like a database is something you would be able to verify with a merkle tree
and a merkle tree is something you could use to determine diffs, and diffs are what
you need to determine what to send to a node as it's starting up. This also means
that if everything is encrypted and signed we have a secure means of verifying database
possible implications:
multiple nodes you can read from that register for updates from
the nodes you can talk to to update values that just have to maintain
an in memory value