Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stale type in FieldReference after RelCopyOnWrite modifications #185

Open
vbarua opened this issue Sep 21, 2023 · 0 comments
Open

stale type in FieldReference after RelCopyOnWrite modifications #185

vbarua opened this issue Sep 21, 2023 · 0 comments

Comments

@vbarua
Copy link
Member

vbarua commented Sep 21, 2023

The FieldReference POJO includes a field to store its type, which is set based on the input rel when it is created.

When the plan tree is modified by the RelCopyOnWrite visitor, existing field refs are copied over as they are. This includes the stored type. If the visitation changes the output type of a relation, FieldReferences to changed columns are not updated.

Code Example

The test below illustrates this issue:

@Test
  void staleFieldRefStateAfterCopyOnWriteVisitation() {
    var b = new SubstraitBuilder(extensions);
    TypeCreator R = TypeCreator.of(false);
    TypeCreator N = TypeCreator.of(true);

    var plan =
        b.project(
            input -> List.of(b.fieldReference(input, 0)),
            // original named scan has type R.I64
            b.namedScan(List.of("example"), List.of("n"), List.of(R.I64)));

    var newPlanOption =
        plan.accept(
            new RelCopyOnWriteVisitor() {
              @Override
              public Optional<Rel> visit(NamedScan namedScan) throws RuntimeException {
                // replace the original named scan with one of type N.I64
                return Optional.of(b.namedScan(List.of("example"), List.of("n"), List.of(N.I64)));
              }
            });

    // decompose the new plan
    assertTrue(newPlanOption.isPresent());
    var newPlan = newPlanOption.get();
    assertInstanceOf(Project.class, newPlan);
    var newProject = (Project) newPlan;
    assertInstanceOf(NamedScan.class, newProject.getInput());
    var newNamedScan = (NamedScan) newProject.getInput();

    // the new named scan emits single column of type N.I64, as expected
    assertEquals(NamedStruct.of(List.of("n"), R.struct(N.I64)), newNamedScan.getInitialSchema());

    // retrieve the field reference
    var exprs = newProject.getExpressions();
    assertTrue(exprs.size() == 1);
    var expr = exprs.get(0);
    assertInstanceOf(FieldReference.class, expr);
    var fieldRef = (FieldReference) expr;

    // The field reference type should correspond to the type of the new named scan.
    // Instead, the type is R.I64 which is from the ORIGINAL named scan
    assertEquals(fieldRef.type(), N.I64); // <- FAILS
  }

Fixing

In the short-term, it's enough to update the RelCopyOnWrite visitor to update FieldReferences as necessary.

However, we should consider whether we should be caching these types at all. It might make sense to force type derivation of expressions to take place relative to the input relation of the expression. That is to say, we can only determine the type of a FieldReference when we now the input relation to the FieldReference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant