Skip to content

Filtering is slow on large amount of data #2713

@makitka2007

Description

@makitka2007

dgraph 1.0.9

due to required index lookup for every @filter condition, filtering is slow on large amounts of data.

test data: 10m entities created with "sex" predicate having random value "m" or "f".
since index is required for filtering, it's created both for entity_key and sex predicates.

entity_key: string @index(exact) .
sex: string @index(exact) .

10m entities loaded:

_:node$x <entity_key> "entity$x" .
_:node$x <sex> "$sex" .

query to get 1 entity takes 1ms:

{
	get_entity(func: eq(entity_key, "entity600000")) {
		uid
	}
}

adding filter by "sex" predicate slows it down to 7 seconds:

{
	get_entity(func: eq(entity_key, "entity900000")) @filter(eq(sex, "f")) {
		uid
	}
}

because filter loads all ~5m entities having sex="f" into memory.

need to improve filters not to use index when index doesn't exist or by some special directive.
if I use filtering on edge facet it works fast as expected (1 ms):

{
	get_entity(func: eq(entity_key, "entity800000")) @cascade {
		uid
		attrs @facets(eq(sex, "f"))
	}
}

so, predicate filters should use the same logic as edge facets filters (if index is not created or there is a special directive not to use index on this predicate).

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/performancePerformance related issues.area/querylang/filterRelated to the filter directive.kind/enhancementSomething could be better.popularpriority/P1Serious issue that requires eventual attention (can wait a bit)status/acceptedWe accept to investigate/work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions