# django-postgre-copy speed tests

By Ben Welsh

This notebook tests the effect of dropping database constraints and indexes prior to loading a large data file.

The official PostgreSQL documentation suggests it lead to significant gains.

We will test this claim by dropping constraints and indexes prior to loading data from the California Civic Data Coalition via the django-postgres-copy wrapper on the database's COPY command.

### Connect California Civic Data Coalition Django project

Import Python tools

In [1]:
import os
import sys

In [2]:
import warnings
warnings.simplefilter("ignore")

Add the Django settings module to the environment.

In [3]:
sys.path.insert(0, '/home/palewire/.virtualenvs/django-calaccess-raw-data/src/')
sys.path.insert(0, '/home/palewire/.virtualenvs/django-calaccess-raw-data/lib/python2.7/')
sys.path.insert(0, '/home/palewire/.virtualenvs/django-calaccess-raw-data/lib/python2.7/site-packages/')
sys.path.insert(0, '/home/palewire/Code/django-calaccess-raw-data/')
sys.path.insert(0, '/home/palewire/Code/django-calaccess-raw-data/example/')
sys.path.insert(0, '/home/palewire/Code/django-postgres-copy/')
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "settings")

'settings'

Verify we have the correct version of django-postgres-copy

In [4]:
import postgres_copy

In [5]:
postgres_copy.__version__

'2.1.0'

Boot the Django project

In [6]:
%%capture
import django
django.setup()

### Prep for speed tests

Import database models

In [7]:
from calaccess_raw import models 

Prep a couple functions we'll use during loading.

In [8]:
def model_mapping(model):
     return dict(
        (f.name, f.db_column) for f in model._meta.fields
        if f.db_column
    )

In [9]:
def truncate(model):
    from django.db import connection
    cursor = connection.cursor()
    cursor.execute('TRUNCATE TABLE "{}";'.format(model._meta.db_table))

In [10]:
def speed_test(model, csv, drop=True):
    mapping = model_mapping(model)
    model.objects.from_csv(csv, mapping, drop_constraints=drop, drop_indexes=drop)

### Test a small file

Show how many rows are in the file we'll be loading

In [11]:
!wc -l ./cvr_registration_cd.csv

52025 ./cvr_registration_cd.csv


Test how long it takes to load data when the constraints stay

In [12]:
%%timeit -r 3 truncate(models.CvrRegistrationCd)
speed_test(models.CvrRegistrationCd, "./cvr_registration_cd.csv", drop=False)

1 loop, best of 3: 3.13 s per loop


Test how long it takes to load data when constraints are dropped

In [13]:
%%timeit -r 3 truncate(models.CvrRegistrationCd)
speed_test(models.CvrRegistrationCd, "./cvr_registration_cd.csv", drop=True)

1 loop, best of 3: 2.61 s per loop


### Test a large file

In [14]:
!wc -l ./rcpt_cd.csv

10342161 ./rcpt_cd.csv


In [15]:
%%timeit -r 3 truncate(models.RcptCd)
speed_test(models.RcptCd, './rcpt_cd.csv', drop=False)

1 loop, best of 3: 7min 32s per loop


In [17]:
%%timeit -r 3 truncate(models.RcptCd)
speed_test(models.RcptCd, './rcpt_cd.csv', drop=True)

1 loop, best of 3: 5min 34s per loop
