Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP
branch: master
Fetching contributors…

Cannot retrieve contributors at this time

file 261 lines (173 sloc) 8.54 kb
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261
DumpTruck
==============
DumpTruck is a document-like interface to a SQLite database.

Quick start
----------
Install, save data and retrieve it using default settings.

### Install

    pip2 install dumptruck || pip install dumptruck

### Initialize

Open the database connection by initializing the a DumpTruck object

    from dumptruck import DumpTruck

    dt = DumpTruck()

### Save
The simplest `insert` call looks like this.

    dt.insert({"firstname":"Thomas","lastname":"Levine"})

This saves a new row with "Thomas" in the "firstname" column and
"Levine" in the "lastname" column. It uses the table "dumptruck"
inside the database "dumptruck.db". It creates or alters the table
if it needs to.

If you insert one row, `DumpTruck.insert` returns the rowid of the row.

    dt.insert({"foo", "bar"}, "new-table") == 1

If you insert many rows, `DumpTruck.insert` returns a list of the rowids of the
new rows.

    dt.insert([{"foo", "one"}, {"foo", "two"}], "new-table") == [2, 3]

If there are UNIQUE constraints on the table (perhaps from `create_index`) then
`insert` will fail if these constraints are violated. You can use `upsert` (with
the same syntax) to replace the existing row instead.

### Retrieve
Once the database contains data, you can retrieve them.

    data = dt.dump()

The data come out as a list of ordered dictionaries,
with one dictionary per row.

Slow start
-------
### Initialize

You can specify a few of keyword arguments when you initialize the DumpTruck object.
For example, if you want the database file to be `bucket-wheel-excavators.db`,
you can use this.

    dt = DumpTruck(dbname="bucket-wheel-excavators.db")

It actually takes up to four keyword arguments.

    DumpTruck(dbname='dumptruck.db', auto_commit = True, vars_table = "_dumptruckvars", adapt_and_convert = True)

* `dbname` is the database file to save to; the default is dumptruck.db.
* `vars_table` is the name of the table to use for `DumpTruck.get_var`
    and `DumpTruck.save_var`; default is `_dumptruckvars`. Set it to `None`
    to disable the get_var and save_var methods.
* `auto_commit` is whether changes to the database should be automatically committed;
    if it is set to `False`, changes must be committed with the `commit` method
    or with the `commit` keywoard argument.
* `adapt_and_convert` is whether types should be converted automatically; with
    this on dates get inserted as dates, lists as lists, &c.

### Saving
As discussed earlier, the simplest `insert` call looks like this.

    dt.insert({"firstname": "Thomas", "lastname": "Levine"})

#### Different tables
By default, that saves to the table `dumptruck`. You can specify a different table;
this saves to the table `diesel-engineers`.

    dt.insert({"firstname": "Thomas", "lastname": "Levine"}, "diesel-engineers")

#### Multiple rows
You can also pass a list of dictionaries.

    data=[
        {"firstname": "Thomas", "lastname": "Levine"},
        {"firstname": "Julian", "lastname": "Assange"}
    ]
    dt.insert(data)

#### Complex objects
You can even pass nested structures; dictionaries,
sets and lists will automatically be dumped to JSON.

    data=[
        {"title":"The Elements of Typographic Style","authors":["Robert Bringhurst"]},
        {"title":"How to Read a Book","authors":["Mortimer Adler","Charles Van Doren"]}
    ]
    dt.insert(data)

Your data will be stored as JSON. When you query it, it will
come back as the original Python objects.

And if you have some crazy object that can't be JSONified,
you can use the dead-simple pickle interface.

    # This fails
    data = {"weirdthing": {range(100): None}
    dt.insert(data)

    # This works
    from dumptruck import Pickle
    data = Pickle({"weirdthing": {range(100): None})
    dt.insert(data)

It automatically pickles and unpickles your complex object for you.

#### Names
Column names and table names automatically get quoted if you pass them without quotes,
so you can use bizarre table and column names, like `no^[hs!'e]?'sf_"&'`

#### Null values
`None` dictionary values are always equivalent to non-existence of the key.
That is, these insert commands are equivalent.

    dt = DumpTruck()
    dt.insert({ u'foo': 8, u'bar': None})
    dt.insert({ u'foo': 8})

Passing an empty dictionary creates a new row with all NULL values.

    # These all create a row with all NULL values.
    dt.insert({})
    dt.insert([{}])
    dt.insert({u'potato': None})

More precisely, they set the values to the default values via this SQL.

    INSERT INTO foo DEFAULT VALUES

Passing an empty list to `insert` inserts zero rows (rather than one);
this command does nothing.

    dt.insert([])

You can pass zero rows or empty rows to `DumpTruck.insert`, but you'll get an
error if you try passing them to `DumpTruck.create_table`.

### Retrieving

You can use normal SQL to retrieve data from the database.

    data = dt.execute('SELECT * FROM `diesel-engineers`')

The data come back as a list of dictionaries, one dictionary
per row. They are coerced to different python types depending
on their database types.

### Individual values
It's often useful to be able to quickly and easily save one metadata value.
For example, you can record which page the last run of a script managed to get up to.

    dt.save_var('last_page', 27)
    27 == dt.get_var('last_page')

It's stored in a table that you can specify when initializing DumpTruck.
If you don't specify one, it's stored in `_dumptruckvars`.

If you want to save anything other than an int, float or string type,
use json or pickle.

### Helpers
DumpTruck provides specialized wrapper around some common commands.

`DumpTruck.tables` returns a set of all of the tables in the database.

    dt.tables()

`DumpTruck.drop` drops a table.

    dt.drop("diesel-engineers")

`DumpTruck.dump` returns the entire particular table as a list of dictionaries.

    dt.dump("coal")

It's equivalent to running this:

    dt.execute('SELECT * from `coal`;')

### Creating empty tables
When working with relational databases, one typically defines a schema
before populating the database. You can use the `DumpTruck.insert` method
like this by calling it with `create_only = True`.

For example, if the table `tools` does not exist, the following call will create the table
`tools` with the columns `toolName` and `weight`, with the types `TEXT` and `INTEGER`,
respectively, but will not insert the dictionary values ("jackhammer" and 58) into the table.

    dt.create_table({"toolName":"jackhammer", "weight": 58}, "tools")

If you are concerned about the order of the tables, pass an OrderedDict.

    dt.create_table(OrderedDict([("toolName", "jackhammer"), ("weight", 58)]), "tools")

The columns will be created in the specified order.

### Indices

#### Creating
DumpTruck contains a special method for creating indices. To create an index,
first create an empty table. (See "Creating empty tables" above.)
Then, use the `DumpTruck.create_index` method.

    dt.create_index(['toolName'], 'tools')

This will create a non-unique index on the column `tool`. To create a unique
index, use the keyword argument `unique = True`.

    dt.create_index(['toolName'], 'tools', unique = True)

You can also specify multi-column indices.

    dt.create_index(['toolName', 'weight'], 'tools')

DumpTruck names these indices according to the names of the relevant table and columns.
The index created in the previous example might be named `dt__tools_toolName_weight`.

#### Other index manipulation
DumpTruck does not implement special methods for viewing or removing indices, but here
are the relevant SQLite SQL commands.

The following command lists indices on the `tools` table.

    dt.execute('PRAGMA index_list(tools)')

The following command gives more information about the index named `dt__tools_toolName_weight`.

    dt.execute('PRAGMA index_info(dt__tools_toolName_weight)')

And this one deletes the index.

    dt.execute('DROP INDEX dt__tools_toolName_weight')

For more information on indices and, particularly, the `PRAGMA` commands, check
the [SQLite documentation]().

### Delaying commits
By default, the `insert`, `get_var`, `drop` and `execute` methods automatically commit changes.
You can stop one of them from committing by passing `commit=False` to the method.
Commit manually with the `commit` method. For example:

    dt = DumpTruck()
    dt.insert({"name":"Bagger 293","manufacturer":"TAKRAF","height":95}, commit=False)
    dt.save_var('page_number', 42, commit=False)
    dt.commit()
Something went wrong with that request. Please try again.