Skip to content

Cost of each mutation grows as more mutations are in a transaction #3046

@mooncake4132

Description

@mooncake4132

I originally asked this on slack, but it might be more useful to track it as an issue.

Every few days our application will need to insert up to 3 million (this number may grow) predicates into the database. To assess dgraph's performance, I wrote this little python script below to benchmark the time it takes to insert 1000, 10000, 30000, 50000, and 100000 predicates. Results are as follows:

Updated schema in 1.824007272720337 seconds.
Mutating 1000 N-Quads took 0.0899970531463623 seconds.
Mutating 10000 N-Quads took 1.6726512908935547 seconds.
Mutating 30000 N-Quads took 11.846931219100952 seconds.
Mutating 50000 N-Quads took 27.030992031097412 seconds.
Mutating 100000 N-Quads took 111.02126455307007 seconds.

The growth of the time is a bit worrying. Why does inserting 100 thousand predicates take 70x the time to insert 10 thousand predicates?

Here's the script:

#!/usr/bin/env python3
import time

import pydgraph


client_stub = pydgraph.DgraphClientStub('localhost:9080')
client = pydgraph.DgraphClient(client_stub)
client.alter(pydgraph.Operation(drop_all=True))

schema = """
test: string @index(fulltext) @lang .
"""
start_time = time.time()
client.alter(pydgraph.Operation(schema=schema))
print('Updated schema in {} seconds.'.format(time.time() - start_time))

for n in (1_000, 10_000, 30_000, 50_000, 100_000):
    rdf = '\n'.join('<_:node_{}> <test> "test" .'.format(i) for i in range(n))
    transaction = client.txn()
    start_time = time.time()
    transaction.mutate(set_nquads=rdf, commit_now=True)
    print('Mutating {} N-Quads took {} seconds.'.format(n, time.time() - start_time))

Initially, I thought it's because of the fulltext index. So I also tried without without @index(fulltext). Here are the results:

Updated schema in 0.004003763198852539 seconds.
Mutating 1000 N-Quads took 0.07899928092956543 seconds.
Mutating 10000 N-Quads took 1.236546277999878 seconds.
Mutating 30000 N-Quads took 7.040283203125 seconds.
Mutating 50000 N-Quads took 16.69643545150757 seconds.
Mutating 100000 N-Quads took 59.379029989242554 seconds.

It's slightly better, but the time growth is still worrying.

Any guidance is appreciated.

Configurations:

  • Running in docker on Windows.
  • One zero and one alpha.
    Dgraph version : v1.0.11
    Commit SHA-1 : b2a09c5
    Commit timestamp : 2018-12-17 09:50:56 -0800
    Branch : HEAD
    Go version : go1.11.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/performancePerformance related issues.kind/enhancementSomething could be better.priority/P1Serious issue that requires eventual attention (can wait a bit)status/acceptedWe accept to investigate/work on it.status/needs-attentionThis issue needs more eyes on it, more investigation might be required before accepting/rejecting it

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions