Command to generate a test database, part 1 #1111

rowanseymour · 2017-02-20T07:41:02Z

So far only generates non-message data: locations, orgs, users, groups, fields, labels, test contacts, contacts, contact group memberships, contact field values. Figure I'll add messages when I get onto a improving message searching performance.

The default settings generate a database with 100 orgs and 1,000,000 contacts. This takes about an hour and the resultant database is ~3.1GB.

PR includes a ~7MB dump of the AdminBoundary table in Postgres's compressed/custom format, after loading the test-data/nigeria.zip geojson file. Loading the geojson file takes half an hour but the dump loads in seconds.

rowanseymour · 2017-02-20T07:46:07Z

temba/utils/management/commands/make_test_db.py

+                    c_id = base_contact_id + c
+
+                    # ensure every org gets at least one contact
+                    org = orgs[c] if c < len(orgs) else self.random_org(orgs)


This is important to ensure contacts for different orgs aren't stored sequentially which wouldn't resemble a real world database, and would effect query performance when an org's contacts are all bunched at the beginning or end of a table's data.

…s get same uuids for same seed

nicpottier · 2017-02-20T14:15:27Z

This is 1,000,000 contacts across 100 orgs or each has 1,000,000 contacts? Seems like the former is probably overkill. Seems our scaling issues tend to focus around single orgs being large, so wonder if we could cut down on the build speed by just building 10 orgs instead of 100.

nicpottier · 2017-02-20T14:16:59Z

temba/utils/management/commands/make_test_db.py

+        # We want a variety of large and small orgs so when allocating content like contacts and messages, we apply a
+        # bias toward the beginning orgs. if there are N orgs, then the amount of content the first org will be
+        # allocated is (1/N) ^ (1/bias). This sets the bias so that the first org will get ~50% of the content:
+        self.org_bias = math.log(1.0 / num_orgs, 0.5)


Guess I could have just read the code. :)

rowanseymour · 2017-02-20T14:18:09Z

It's 1,000,000 contacts split across 100 orgs. Org generation doesn't take that much time so relatively little difference between 10 and 100 orgs. Regardless of how many orgs there are, the first org always gets ~50% of the total contacts.

One thing I haven't tried yet is ditching all indexes and recreating them after, but am trying to not over-engineer things at this point.

rowanseymour · 2017-02-20T14:20:53Z

A sample of a first org in a database with total 1,000,000 contacts:

nicpottier

I'd say a few hours is a good target for build time for these, so agree let's keep it simple as long as we can keep it under that. Looks good!

ericnewcomer

👍

rowanseymour added 12 commits February 16, 2017 15:55

Command to create test database (WIP)

e810333

Code to generate contacts, urns, values (WIP)

ecaef96

Load locations from dump and crete random location values for contacts

012fcf8

Fix unit test

95b30e2

Caculate bias so that first org always gets 50% of content

7e4eeb8

Create test contacts and add timing of whole create process

8954e9c

Disable group count trigger whilst creating contacts

deb07e2

Make batch size 5000 and make largest orgs have largest topups

6ea7a84

Create contat group counts for system groups too

9f57b8e

Disable table triggers on all tables used during contact creation

04da805

Fix unit test by not making assumptions about ids starting from 1

c548afb

Merge branch 'master' into make_test_db

316ec0b

rowanseymour self-assigned this Feb 20, 2017

rowanseymour added the review label Feb 20, 2017

rowanseymour commented Feb 20, 2017

View reviewed changes

Fix contacts not being put in system groups and patch uuid4 so object…

0c423a4

…s get same uuids for same seed

rowanseymour requested review from nicpottier, ericnewcomer and norkans7 February 20, 2017 13:03

nicpottier reviewed Feb 20, 2017

View reviewed changes

nicpottier approved these changes Feb 20, 2017

View reviewed changes

ericnewcomer approved these changes Feb 20, 2017

View reviewed changes

rowanseymour merged commit e5c2ba3 into master Feb 20, 2017

rowanseymour deleted the make_test_db branch February 20, 2017 15:10

rowanseymour removed the review label Feb 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Command to generate a test database, part 1 #1111

Command to generate a test database, part 1 #1111

rowanseymour commented Feb 20, 2017 •

edited

rowanseymour Feb 20, 2017

nicpottier commented Feb 20, 2017

nicpottier Feb 20, 2017

rowanseymour commented Feb 20, 2017

rowanseymour commented Feb 20, 2017

nicpottier left a comment

ericnewcomer left a comment

Command to generate a test database, part 1 #1111

Command to generate a test database, part 1 #1111

Conversation

rowanseymour commented Feb 20, 2017 • edited

rowanseymour Feb 20, 2017

Choose a reason for hiding this comment

nicpottier commented Feb 20, 2017

nicpottier Feb 20, 2017

Choose a reason for hiding this comment

rowanseymour commented Feb 20, 2017

rowanseymour commented Feb 20, 2017

nicpottier left a comment

Choose a reason for hiding this comment

ericnewcomer left a comment

Choose a reason for hiding this comment

rowanseymour commented Feb 20, 2017 •

edited