Simplifies querying of structured data using relational algebra
Switch branches/tags
Nothing to show
Pull request Compare This branch is 3 commits ahead, 1869 commits behind dkubb:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.



Simplifies querying of structured data using relational algebra


The purpose of this project is to expand my knowledge of relational algebra by attempting to implement a simple query system using the primitive operations defined in relational algebra.

Most of the design is heavily inspired from koios and arel. The reason I decided to write my own and not just build on top of those systems was not so much because I don't like the code/API in those projects, it's more because I wanted to gain a depth of understanding that can only be earned by trying to solve the problem myself.

I should note though that I don't plan to just re-implement those systems with a different API, or different internals, I plan to make something that surpasses them in several areas. For example, I want to be able to join information from multiple datastores and represent it as a single relation. I want to be able to insert, update or delete from that relation and have those changes propagated back to the right datastore. This is not an easy feat, since it is basically the writable view problem that RDBMS' struggle with. While I do think there are some cases where relations will become read-only, I think it will be possible to propagate writes properly in this manner. I certainly think for the current use cases in DataMapper this should work quite well.

The ability to join data from multiple datastores and have it presented in a consistent manner will solve one of the longest standing problems in DataMapper, namely how do we do cross repository joins. It should be possible to construct one query for one datastore, and then another query for another datastore and then join them. Since they are using different engines veritas will know to perform each query natively and then join the results in-memory seamlessly. It should also be possible to reorganize the queries so that as much work as possible is done natively as opposed to in-memory, which is considered the last resort.

Not only does this work nicely with associations, but it will allow DataMapper to perform mapping in a more powerful way. You'll be able to construct a join from multiple datastores, and set that as the base for your model. Each DM Resource would work as normal, but again writes could be propagated back to the appropriate datastore. You'd be able to split your data up between different datastores, but assemble it into one coherent view.

Phase 1: In-Memory Operations

The first phase of this project will be to implement all the operations listed below using in-memory data structures. I'm focusing on the API, and making sure the specs ensure the desired results are obtained from each operation.

This is approximately *90% Complete*

Phase 2: RDBMS Engines

The second phase of this project will be to add an RDBMS based engine, and move the in-memory matching to it's own engine. I'll also be working on a system where if the primary engine cannot carry out some operation, that it first look at alternate forms (eg. using a join instead of an intersection), and then fall-back to in-memory matching. I also want to look at re-arranging queries so that all the operations that can be performed natively are “pushed down” the hierarchy and then the in-memory matching is performed last.

Phase 3: DataMapper Integration

The third phase of this project will be to add a few NoSQL engines (like MongoDB and CouchDB) and then look at writing a DataMapper adapter that translates Query objects into Veritas relations. I want to make sure all the DM specs pass with this adapter and each engine, and if everything goes well I will look at updating DM to work directly on top of Veritas.


relation =[ [ :id, Integer ] ], [ [ 1 ] ])

# Relational Operators
# --------------------

# projection
new_relation = relation.project([ :a, :b, :c ])

# removal
new_relation = relation.remove([ :a, :b, :c ])

# rename
new_relation = relation.rename(:a => :b, :c => :d)

# natural join
new_relation = relation.join(other)        # OR relation + other

# product
new_relation = relation.product(other)     # OR relation * other

# intersection
new_relation = relation.intersect(other)   # OR relation & other

# union
new_relation = relation.union(other)       # OR relation | other

# difference
new_relation = relation.difference(other)  # OR relation - other

# restriction
new_relation = relation.restrict('other'))
new_relation = relation.restrict { |r| r[:a].eq('other').and(r[:b].gte(42)) }

# theta-join
new_relation = relation.join(other, relation[:a].gte(other[:b]))
new_relation = relation.join(other) { |r| r[:a].gte(r[:b]) }

# NOTE: theta-join is effectively restricting a product of the relations

# Non-Relational Operators
# ------------------------

# returns a set that represents the relation header
header = relation.header

# simplifies and normalizes the relation operations
optimized_relation = relation.optimize

# a relation is Enumerable
relation = relation.each { |tuple| ... }

# order by attribute and direction
new_relation = relation.order([ relation[:a].desc, relation[:b] ])
new_relation = relation.order { |r| [ r[:a].desc, r[:b] ] }

# limiting (only allowed if ordered)
new_relation = relation.limit(10)

# offset (only allowed if ordered)
new_relation = relation.offset(0)

# get the first n tuples (only allowed if ordered)
new_relation = relation.first     # default is 1
new_relation = relation.first(5)

# get the last n tuples (only allowed if ordered)
new_relation = relation.last      # default is 1
new_relation = relation.last(5)

# reverse the relation (only allowed if ordered)
new_relation = relation.reverse

# CRUD Operators
# --------------

# can only modify relations that project all candidate keys for the base relations
new_relation = relation.create(:a => 'test', :b => 'other', :c => 'yet another')
new_relation = relation.update(:a => 'test', :b => 'other', :c => 'yet another')
new_relation = relation.delete

Note on Patches/Pull Requests

  • Fork the project.

  • Make your feature addition or bug fix.

  • Add tests for it. This is important so I don't break it in a future version unintentionally.

  • Commit, do not mess with rakefile, version, or history. (if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)

  • Send me a pull request. Bonus points for topic branches.


Copyright © 2010 Dan Kubb. See LICENSE for details.