Skip to content

Sum over facets is incorrect #4160

@campoy

Description

@campoy

What version of Dgraph are you using?

master

Have you tried reproducing the issue with the latest release?

yes

What is the hardware spec (RAM, OS)?

n/a

Steps to reproduce the issue (command/config used to run Dgraph).

Given the dataset generated by this mutation:

{
  set {
    _:a <name> "Anne" .
    _:b <name> "Brian" .
    
    _:jp <name> "Jurassic Park" .
    _:ij <name> "Indiana Jones" .
    
    _:a <rated> _:jp (rating=5) .
    _:a <rated> _:ij (rating=2) .
    _:b <rated> _:ij (rating=2) .
  }
}

If you run the following request:

{
  q(func: has(rated)) {
    name
    rated @facets(r as rating)
    partial_sum: sum(val(r))
  }
      
  sum() {
    total_sum: sum(val(r))
  }
}

Expected behaviour and actual result.

I'd expect partial_sum to be 7 for Anne and 2 for Brian, then total_sum would be 9.

Instead, the result is as follows:

{
  "data": {
    "q": [
      {
        "name": "Anne",
        "rated": [
          {
            "rated|rating": 5
          },
          {
            "rated|rating": 2
          }
        ],
        "partial_sum": 9
      },
      {
        "name": "Brian",
        "rated": [
          {
            "rated|rating": 2
          }
        ],
        "partial_sum": 4
      }
    ],
    "sum": [
      {
        "total_sum": 9
      }
    ]
  }
}

I have a theory about why we're getting these weird numbers.

Variables attach values to uid, but in this case that's not the right behavior, as the value of the variable should not be attached to the UID of the person nor the movie, but rather the combination of both linked by the predicate.

You can see the weird artifact by querying by this value on all of the nodes.

{
  var(func: has(rated)) {
    rated @facets(r as rating)
  }
      
  sum(func: has(name)) {
    name
    val(r)
  }
}

returns

{
  "data": {
    "sum": [
      {
        "name": "Jurassic Park",
        "val(r)": 5
      },
      {
        "name": "Indiana Jones",
        "val(r)": 4
      },
      {
        "name": "Anne"
      },
      {
        "name": "Brian"
      }
    ]
  }
}

This proves that the variable r has been attached to the movie UIDs by adding all of the values in the facets pointing to them.

Once we understand this, it makes sense that the sum of the ratings for Anne is 9 instead of 7, as it's the sum of the ratings for the two movies. Same goes for the ratings for Brian being 4 instead of 2.

Fixing this might be complicated, as it might imply making variables work as a map from <uid, uid> to value rather than to value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/facetsIssues related to face handling, querying, etc.kind/bugSomething is broken.priority/P1Serious issue that requires eventual attention (can wait a bit)status/acceptedWe accept to investigate/work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions