Skip to content

Outer join #75

@julianhyde

Description

@julianhyde

In #72 we added join, with new join and on keywords, and in this case we add outer join, with left, right and full keywords.

Representing extra rows as option

The hardest question is how we represent the extra rows that are generated by outer joins. In SQL, these extra rows are represented by rows whose column values are all NULL. Morel does not have NULL values, so we use optional. Thus the type of

from d in depts
  left join e in emps on d.empno = e.deptno

is {d:{deptno:int, dname:string} option, e: {empno:int, ename:string, deptno:int}} list. Note that d is optional and e is not.

In the on clause neither d nor e are optional.

Writing predicates on option values is a bit verbose:

from d in depts
  left join e in emps
  where case e of SOME e2 => d.deptno = e2.deptno | _ => false

Field accessors

In a future change, we might add some syntactic sugar to make it easier to deal with optional fields. If d has type {deptno:int, dname:string} option, then we can write d.deptno or #deptno d to get a value of type int option.

Thus you could write

from d in depts
  left join e in emps
  where case e.deptno of SOME x => d.deptno = x | _ => false

Note that option is a monad, and so is list. We could generalize so that accessors can push into all monads, not just option. For example, depts.deptno would return an int list, and if z has type {x:{y:int} option} option then z.y would have type int option option.

Flattening option

If the type of from d in depts left join e in emps is {d: Dept, e: Emp option} then it would be logical that the type of from d in depts left join e in emps left join a in assignments is {d: Dept, e: Emp option, a: Assignment option option}. But the double option in Assignment option option is inconvenient, and doesn't give us much useful information. So we flatten: the type is {d: Dept, e: Emp option, a: Assignment option}.

Bushy joins

Other than these new keywords, the grammar is unchanged. The join operation is still left-associative.

If you want to create a join expression that is not left-deep, you will have to write intermediate from expressions. it's not too great a hardship. The following is a bushy join ((a . b ) . (c. d)):

from (a, b) in (from a in as left join b in bs)
  left join (c, d) in (from c in cs left join d in ds)
  where a.x = c.y

Note how we use tuple patterns (a, b) and (c, d) to so that we have four variables a, b, c, d rather than just two. The one compromise of the nested from expressions is that we can't flatten the optional wrappers: the type of d is { ... } option option.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions