Goodbye serialize, hello hstore.
You need dynamic columns in your tables. What do you do?
- Create lots of tables to handle it.
Nice, now you’ll need more models and lots of additional sqls. Insertion and selection will be slow as hell.
- Use a noSQL database just for this issue.
- Create a serialized column.
Nice, insertion will be fine, and reading data from a record too. But, what if you have a condition in your select that includes serialized data? Yeah, regular expressions.
|action||serialize||hstore||serialize sql||hstore sql|
|count all with condition||10114,575||1830,444||SELECT count(*) FROM foos WHERE data ~ ‘foo: bar’;||SELECT count(*) FROM bars WHERE data @> ‘foo=>bar’;|
|count all with negative condition||18722,149||1677,948||SELECT count(*) FROM foos WHERE data !~ ‘another key: 9999990’;||SELECT count(*) FROM bars WHERE not data @> ‘“another key”=>9999990’;|
|find one with condition||17740,307||130,227||SELECT count(*) FROM foos WHERE data ~ ‘another key: 9999990’;||SELECT * FROM bars WHERE data @> ‘“another key”=>9999990’|
Benchmarks made in my local machine, with a million records. Time is in milliseconds.
Postgresql 8.4+ (also tested with 9.0) with contrib and Rails 3. (It might work on 2.3.x with minor patches…)
On Ubuntu, this is easy:
sudo apt-get install postgresql-contrib-9.0
…you are screwed. Use a VM. you should use the binary package kindly provided by EnterpriseDB
Hstore is a postgres contrib type. Check it out first:
Then, just add this to your Gemfile:
And run your bundler:
Make sure that you have the desired database, if not create it as the desired user:
Add the parameters to your database.yml (these are system dependant), e.g.:
development: adapter: postgresql host: 127.0.0.1 database: hstorage_dev encoding: unicode username: postgres password: pool: 5
Now you need to create a migration that adds hstore support for your postgresql database:
rails g hstore:setup
Finally you can create your own tables using hstore type. It’s easy:
rails g model Person name:string data:hstore
Well, not yet. Don’t forget to add indexes. Like this:
CREATE INDEX people_gist_data ON people USING GIST(data);
CREATE INDEX people_gin_data ON people USING GIN(data);
To my experience GIN is faster for searching records.
Once you have it installed, you just need to learn a little bit of new sqls for selecting stuff (creating and updating is transparent).
Find records that contains a key named ‘foo’:
Person.where("data ? 'foo'")
Find records where ‘foo’ is equal to ‘bar’:
Person.where("data -> 'foo' = 'bar'")
This same sql is at least twice as fast (using indexes) if you do it that way:
Person.where("data @> 'foo=>bar'")
Find records where ‘foo’ is not equal to ‘bar’:
Person.where("data -> 'foo' <> 'bar'")
Person.where("not data @> 'foo=>bar'")
Find records where ‘foo’ is like ‘bar’:
Person.where("data -> 'foo' LIKE '%bar%'")
or something like …
Person.where("data -> 'foo' ILIKE '%bar%'")
If you need to delete a key in a record, you can do it that way:
This way you’ll also save the record:
The destroy_key method returns ‘self’, so you can chain it:
person.destroy_key(:data, :foo).destroy_key(:data, :bar).save
But there it a shortcut for that:
person.destroy_keys(:data, :foo, :bar)
person.destroy_keys!(:data, :foo, :bar)
And finally, if you need to delete keys in many rows, you can:
and with many keys:
Person.delete_keys(:data, :foo, :bar)
You can use issues in github for that. Or else you can reach me at twitter: @joaomilho
Note on Patches/Pull Requests
- Fork the project.
- Make your feature addition or bug fix.
- Add tests for it. This is important so I don’t break it in a
future version unintentionally.
- Commit, do not mess with rakefile, version, or history.
(if you want to have your own version, that is fine but bump version in a commit by itself I can ignore when I pull)
- Send me a pull request. Bonus points for topic branches.
Copyright © 2010 Juan Maiz. See LICENSE for details.