Skip to content

Commit

Permalink
Documented insert_all() and upsert() and upsert_all()
Browse files Browse the repository at this point in the history
Also re-titled main docs page to 'Python API'
  • Loading branch information
Simon Willison committed Aug 13, 2018
1 parent 9eacd30 commit 34a5c0e
Show file tree
Hide file tree
Showing 2 changed files with 54 additions and 102 deletions.
104 changes: 5 additions & 99 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,112 +4,18 @@

*Python utility functions for manipulating SQLite databases*

This library aims to make creating a SQLite database from a collection of data as easy as possible.
This library helps create SQLite databases from an existing collection of data.

It is not intended to be a full ORM: the focus is utility helpers to make creating the initial database and populating it with data as productive as possible.

It is designed as a useful complement to `Datasette <https://github.com/simonw/datasette>`_.

Contents
--------

.. toctree::
:maxdepth: 2

table

While the full documentation is being constructed, enjoy an example:

.. code-block:: python
from sqlite_utils import Database
import sqlite3
import requests
import hashlib
import json
raw_ads = requests.get(
"https://raw.githubusercontent.com/edsu/irads/master/ads.json"
).json()
print(raw_ads[0])
# {'clicks': 32,
# 'created': '2016-11-14T04:10:27-08:00',
# 'ended': None,
# 'file': 'data/2016-11/P(1)0001720.pdf',
# 'id': 3186,
# 'impressions': 396,
# 'spend': {'amount': '1050.77', 'currency': 'RUB'},
# 'targeting': {'age': ['18 - 65+'],
# 'excluded_connections': ['Exclude people who like Black guns matter'],
# 'language': ['English (UK)', 'English (US)'],
# 'location': ['United States'],
# 'people_who_match': {'interests': ['Martin Luther King',
# 'Jr.',
# '2nd Amendment',
# 'National Rifle Association',
# 'African-American culture',
# 'African-American Civil Rights Movement (1954—68)',
# 'Gun Owners of America',
# 'African—American history',
# 'Second Amendment to the United States Constitution',
# 'Concealed carry in the United States',
# 'Firearm',
# 'Malcolm X']},
# 'placements': ['News Feed on desktop computers',
# 'News Feed on mobile devices',
# 'Right column on desktop computers']},
# 'text': 'Black American racial experience is real. We support the 2nd ammendment\nfor our safety.\n\n',
# 'url': 'https://www.facebook.com/ProtectBIackGunOwners/'}
def flatten_targeting(targeting, prefix=''):
# Convert targeting nested dictionary into list of strings
# e.g. people_who_match:interests:Martin Luther King
if isinstance(targeting, list) and all(isinstance(s, str) for s in targeting):
return ["{}:{}".format(prefix, item) for item in targeting]
elif isinstance(targeting, str):
return ["{}:{}".format(prefix, targeting)]
elif isinstance(targeting, dict):
items = []
for key, value in targeting.items():
new_prefix = "{}:{}".format(prefix, key) if prefix else key
items.extend(flatten_targeting(value, new_prefix))
return items
def hash_id(s):
return hashlib.md5(s.encode("utf8")).hexdigest()[:5]
database = Database(sqlite3.connect("/tmp/ads3.db"))
ads = database["ads"]
targets = database["targets"]
ad_targets = database["ad_targets"]
python-api

for ad in raw_ads:
ad_id = int(ad["file"].split(')')[-1].split(".")[0])
record = {
"id": ad_id,
"file": ad["file"],
"clicks": ad["clicks"],
"impressions": ad["impressions"],
"text": ad["text"],
"url": (ad["url"] or "").replace("httpszll", "https://"),
"spend_amount": ad["spend"]["amount"],
"spend_currency": ad["spend"]["currency"] or "USD",
"created": ad["created"],
"ended": ad["ended"],
}
ads.upsert(record, pk="id")
for target in flatten_targeting(ad["targeting"]):
target_id = hash_id(target)
targets.upsert({
"id": target_id,
"name": target,
"category": target.split(":")[0],
"prefix": target.rsplit(":", 1)[0]},
pk="id"
)
ad_targets.insert({
"target_id": target_id,
"ad_id": ad_id,
}, foreign_keys=(
("ad_id", "INTEGER", "ads", "id"),
("target_id", "TEXT", "targets", "id"),
))
Take a look at `this script <https://github.com/simonw/russian-ira-facebook-ads-datasette/blob/master/fetch_and_build_russian_ads.py>`_ for an example of this library in action.
52 changes: 49 additions & 3 deletions docs/table.rst → docs/python-api.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
======================
Databases and Tables
======================
============
Python API
============

Database objects are constructed by passing in a SQLite3 database connection:

Expand Down Expand Up @@ -90,6 +90,52 @@ If you want to explicitly set the order of the columns you can do so using the `
You don't need to pass all of the columns to the ``column_order`` parameter. If you only pass a subset of the columns the remaining columns will be ordered based on the key order of the dictionary.

Bulk inserts
============

If you have more than one record to insert, the ``insert_all()`` method is a much more efficient way of inserting them. Just like ``insert()`` it will automatically detect the columns that should be created, but it will inspect the first 100 items to help decide what those column types should be.

Use it like this:

.. code-block:: python
dogs.insert_all([{
"id": 1,
"name": "Cleo",
"twitter": "cleopaws",
"age": 3,
"is_good_dog": True,
}, {
"id": 2,
"name": "Marnie",
"twitter": "MarnieTheDog",
"age": 16,
"is_good_dog": True,
}], pk="id", column_order=("id", "twitter", "name"))
Upserting data
==============

Upserting allows you to insert records if they do not exist and update them if they DO exist, based on matching against their primary key.

For example, given the dogs database you could upsert the record for Cleo like so:

.. code-block:: python
dogs.upsert([{
"id": 1,
"name": "Cleo",
"twitter": "cleopaws",
"age": 4,
"is_good_dog": True,
}, pk="id", column_order=("id", "twitter", "name"))
If a record exists with id=1, it will be updated to match those fields. If it does not exist it will be created.
Note that the ``pk`` and ``column_order`` parameters here are optional if you are certain that the table has already been created. You should pass them if the table may not exist at the time the first upsert is performed.
An ``upsert_all()`` method is also available, which behaves like ``insert_all()`` but performs upserts instead.
Creating views
==============
Expand Down

0 comments on commit 34a5c0e

Please sign in to comment.