Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

23b Object: Attributes #59

Merged
merged 57 commits into from
May 20, 2022
Merged

23b Object: Attributes #59

merged 57 commits into from
May 20, 2022

Conversation

ngjunsiang
Copy link
Contributor

Before starting this chapter, a very minor delayed update in our parser for parsing precedence: [dccbb97]


After we register our custom type and enable variables to be declared with this type, how do we ... you know, retrieve its attributes or update them?

Most languages use a dot(.) notation for accessing object attributes, and 9608 pseudocode is no different. Student.Name refers to the Name attribute of Student. Let's start by getting our scanner to recognise them.

Scanning attributes

Oh, looks like our scanner already recognises the . symbol and can tokenise them just fine. Moving on to the parser.

@ngjunsiang
Copy link
Contributor Author

Getting an attribute

Okay, we can do this in a number of ways. Our object is very similar to a Frame, except without outer-chaining and deletion of names. And we already have an Expr for retrieving named values from a Frame: that's our Get Expr. We could just reuse Get, maybe ...?

Or we could make a new Expr, let's call it Attr. We would need a new resolver and evaluator for it.

Hmm, without trying to plan further than I can imagine, I think I'll go with reusing a Get Expr. Resolving is gonna be a bit tricky, since we can't just insert the frame; we'll have to retrieve the object and insert that instead, and the resolver can't do that (since the resolver doesn't carry out any instantiation of objects). It will have to be the interpreter's job.

Let's clean up some things before we start parsing attribute gets:

  • updating documentation in value(): [e0e08c2]
  • initialise data structures within if: [a78a521]

@ngjunsiang ngjunsiang marked this pull request as draft May 18, 2022 09:34
@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented May 18, 2022

Parsing attribute gets

Let's start with thinking through how to proceed:

# Attribute get
if match(tokens, '.'):
# Create Get Expr
# Get Exprs usually insert frame=NULL
# Here our frame will be an Object, which has to be retrieved
# from the frame
# So we'll insert a Get Expr for the Object in place of frame.
expr = makeExpr(
frame=expr,
name="???",
)

nameExpr() already starts with a Get Expr that retrieves the typedvalue with the name (presumably a custom type object, in the case of attribute gets). If we encounter a ., we make another Get Expr:

# Attribute get
if match(tokens, '.'):
# Insert Get Expr for Object
# in place of frame
name = identifier(tokens)
expr = makeExpr(
frame=expr,
name=name.name,
token=name.token(),
)

This represents an expression for Getting the attribute from the object.

@ngjunsiang
Copy link
Contributor Author

This code doesn't handle nested attributes yet (e.g. Student.Name.First, but that's a tiny fix for us: [7b7da65]

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented May 18, 2022

Now comes some major refactoring: up to this point, we've been assuming that Assign Exprs only ever need to store a name and an expr. We assume that this name is declared in the frame. But if we are assigning values to an attribute, that's a name declared in an object and not in the frame.

We'll have to refactor Assign to accept left-hand Gets.

Let's prep the stage: I didn't do a complete job of refactoring the parser to use the makeExpr() Expr factory function, so I caught a few more: [484e47e], [f2766db]

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented May 19, 2022

Assigning to attributes

We'll need to replace name in Assign Exprs with a more general term; let's call it an assignee. We'll introduce it without removing name yet, the goal being to keep the code working:

class Assign(Expr):
__slots__ = ('name', 'assignee', 'expr')
def __init__(self, name, assignee, expr, token=None):
super().__init__(token=token)
self.name = name
self.assignee = assignee
self.expr = expr

And update makeExpr() as well to accept an assignee argument:

def makeExpr(
*,
type=None, value=None,
frame=None, name=None, assignee=None, expr=None,
left=None, oper=None, right=None,
callable=None, args=None,
token=None,
):
if name is not None:
if frame is not None:
return Get(frame, name, token=token)
elif expr is not None:
if assignee is None:
assignee = name
return Assign(name, assignee, expr, token=token)

Not every makeExpr() call is going to pass an assignee, so we take it from name if there isn't one. Eventually we will remove this scaffold code, and then remove name.

@ngjunsiang
Copy link
Contributor Author

Now we turn our attention to assignment(), the first place that creates Assign Exprs:

def assignment(tokens):
name = identifier(tokens)
expectElseError(tokens, '<-', "after name")
expr = expression(tokens)
return makeExpr(name=name.name, expr=expr, token=name)

Let's make it create and pass an assignee:

def assignment(tokens):
name = identifier(tokens)
expectElseError(tokens, '<-', "after name")
expr = expression(tokens)
return makeExpr(
name=name.name,
assignee=makeExpr(
frame=NULL,
name=name.name,
token=name.token(),
),
expr=expr,
token=name,
)

@ngjunsiang
Copy link
Contributor Author

We haven't added any new functionality yet; assignment() still only parses assignment-to-name, and can't handle assignment-to-attribute. Looks like we'll have to throw in a whole chunk of code for allowing Name assignees, as well as GetAttribute assignees ... but wait! We already have nameExpr() which can parse those:

def nameExpr(tokens):
name = identifier(tokens)
expr = makeExpr(
frame=NULL,
name=name.name,
token=name.token(),
)
# Function call
if match(tokens, '('):
args = []
while not check(tokens).word in (')',):
match(tokens, ',') # ,
arg = expression(tokens)
args += [arg]
expectElseError(tokens, ')', "after '('")
expr = makeExpr(
callable=expr,
args=args,
token=name.token(),
)
# Attribute get
while match(tokens, '.'):
# Insert Get Expr for Object
# in place of frame
name = identifier(tokens)
expr = makeExpr(
frame=expr,
name=name.name,
token=name.token(),
)
return expr

Unfortunately, nameExpr() also parses Call Exprs (e.g. Func(...)). I think it's getting too big and taking on too many responsibilities; time to break up Big nameExpr()! Fortunately, we organised our code so the resolver and interpreter don't depend on parser, and the only place where nameExpr() is used is in value().

Let's reorganise it this way:

  • name() parses a Name Expr only
  • callExpr() parses a Call Expr
  • attrExpr() parses an Attr Expr (which is a Get Expr where the frame is an Object instead of a Frame)

In value():

# A name or call or attribute
elif token.type == 'name':
expr = name(tokens)
while not atEnd(tokens) and check(tokens).word in ('(', '.'):
# Function call
if match(tokens, '('):
expr = callExpr(tokens, expr)
# Attribute get
while match(tokens, '.'):
expr = attrExpr(tokens, expr)
return expr

We've just extended the functionality a bit: this will recognise chains of () and ., e.g. Name(...).Attribute and also Name.Attribute(). Not all of these will point to valid callables or attributes, but that's a problem for the interpreter.

@ngjunsiang
Copy link
Contributor Author

Finally, our name(), callExpr() and attrExpr():

def name(tokens):
name = identifier(tokens)
return makeExpr(
frame=NULL,
name=name.name,
token=name.token(),
)
def callExpr(tokens, expr):
args = []
while not check(tokens).word in (')',):
match(tokens, ',') # ,
arg = expression(tokens)
args += [arg]
expectElseError(tokens, ')', "after '('")
return makeExpr(
callable=expr,
args=args,
token=name.token(),
)
def attrExpr(tokens, expr):
name = identifier(tokens)
return makeExpr(
frame=expr,
name=name.name,
token=name.token(),
)

@ngjunsiang
Copy link
Contributor Author

And back to assignment(): Instead of just parsing name assignments, let's handle attribute assignments as well:

def assignment(tokens):
assignee = name(tokens) # Get Expr
while not atEnd(tokens) and match(tokens, '.'):
# Attribute get
assignee = attrExpr(tokens, assignee)
expectElseError(tokens, '<-', "after name")
expr = expression(tokens)
return makeExpr(
name=assignee.name,
assignee=assignee,
expr=expr,
token=name,
)

No more ParseError; the next error occurs in the resolver, and we'll get to that shortly.

Note that assignment() won't be able to handle expressions like Name(...).Attribute; I don't foresee a need for this yet, but if we run across pseudocode like that we can always come back to this and extend it.

@ngjunsiang
Copy link
Contributor Author

We can't remove name from Assign yet, not while the resolver and interpreter depend on it. So let's get to resolving next.

Resolving attribute assignments

Two things we need to handle:

  1. Get Exprs potentially having a Get Expr as a frame
  2. Assign Exprs having an additional assignee attribute

First, nested Gets:

def resolveGet(frame, expr):
"""Insert frame into Get expr"""
assert isinstance(expr, Get), "Not a Get Expr"
# frame can be a Frame, or a Get Expr (for an Object)
# If frame is a Get Expr, resolve it recursively
if isinstance(expr.frame, Get):
# Pass original frame for recursive resolving
expr.frame.accept(frame, resolveGet)
if expr.frame is NULL:
while not frame.has(expr.name):
frame = frame.lookup(expr.name)
if not frame:
raise LogicError("Undeclared", expr.token())
expr.frame = frame
return expr.frame.getType(expr.name)

We do a check for a Get Expr, and have the nested frame accept() resolveGet() recursively. When we finally hit the innermost Get Expr (which would be the first one parsed by name()), we carry out a frame insertion by checking for frame is NULL.

And we lean on the frame's typesystem to help us determine the value's type.

And now the assignee:

def resolveAssign(frame, expr):
# assignee frame might be a Frame or Get(Object)
assnType = expr.assignee.accept(frame, resolveGet)
exprType = expr.expr.accept(frame, resolve)
expectTypeElseError(
exprType, assnType, token=expr.token()
)

The assignee is going to be a Get Expr, so let's resolve it; thankfully resolveGet() now handles nested Gets (although we haven't tested this yet). Then we can do a type-check for the assignee's type and the expr's type.

@ngjunsiang
Copy link
Contributor Author

I'm not quite happy with how I handled verifyFunction and verifyProcedure in the resolver; it looks unwieldly and too detailed, and it is difficult to see the common parts between them.

In particular I don't like the return type checking being in verifyFunction; it looks like it could easily be handled in verifyStmts even if procedures won't use it. Let's upgrade verifyStmts to do that: [b9eb1d6]

and then use it in verifyFunction: [db6ec08]

together with this change, we also need another way to check for the existence of a Return statement. I do this by just checking for a statement with a 'return' rule.

@ngjunsiang
Copy link
Contributor Author

Found a bug which didn't come up in testing: verifying BYREF parameters was not being done properly. I was setting the value in the local frame's typedvalue slot instead of assigning the slot from the global frame, so I fixed that: [3ee3636]

Frame does not have a set() method to make this work, so I made one: [968a055]

@ngjunsiang
Copy link
Contributor Author

It's still not quite how I would like it. Within the resolver, it doesn't make sense to use the Expr.accept() method when we can reference the resolver we want with e.g. resolveDeclare(frame, expr). So I reformatted that for easier reading (and marginally faster execution): [d9651ae]

Oops, accidentally took out some code I needed. Undoing it: [13e91aa]

Declaring parameters shares some code with declaring variables, but can't use resolveDeclare(). But I think it should; we are doing similar things to a frame, after all, though BYREF declarations need to be handled differently. No reason why that can't be done in resolveDeclare(); we'll need access to the outer frame, which we now have.

So let's upgrade resolveDeclare(), and then use it to simplify verifyFunction and verifyProcedure: [63be4f3]

@ngjunsiang
Copy link
Contributor Author

Parameter checking and replacement is the same across both verify* functions. Let's make a helper for that: [23c4667]

@ngjunsiang
Copy link
Contributor Author

We are going to be adding a new object type in the next chapter. That means more match() and check() use; I don't like how I have to use check(tokens).word if I don't want the token to be consume()d; I want to use match() for all matching operations! I upgraded match() to make consuming optional: [8ec5c0e]

Then refactor the parser to use it in place of check(): [54a1746]

@ngjunsiang
Copy link
Contributor Author

ngjunsiang commented May 20, 2022

I also use check(tokens).type == ... in a few places, which does the same thing as match() except using the type attribute instead of word. Let's make a matchType() helper: [3ee7cae]

Since this doesn't need to consume tokens, I set advance=False by default.

Then a small fix for chaining () calls and . attribute gets: [0fa5ec8]

@ngjunsiang
Copy link
Contributor Author

Did a more thorough pass for loops in the parser to catch EOFs: [4f1a240]

Next, some really long lines stand out in the parser:

  • while not atEnd(tokens) and match(...) occurs 13 times
  • while not atEnd(tokens) and match(...) occurs 7 times
  • match(...) occurs 51 times.

This needs some reorganising ...

Let's do this:

  • match*() helpers will consume and return the token: [3df2562]
    • matchElseError()
    • matchWord() (previously match())
  • expect*() helpers will not consume the token, and return a boolean: [e9ad41a]
    • expectWord()
    • expectType()

@ngjunsiang
Copy link
Contributor Author

About atEnd(tokens) ... we want to check for this in a loop because match*() and expect*() will not consume an EOF token (unless we actively match it), so we might lock ourselves into a loop.

This feels like it might be a constant source of bugs; it relies on human consistency in using this check whenever there's a loop. I just had to add a whole bunch of them earlier. Since absolute blistering performance is not the goal here, I'd rather have safety. I'll throw in an atEnd() check to throw an error if we encounter an unexpected EOF:

def atEndThenError(tokens):
if atEnd(tokens):
raise ParseError("Unexpected EOF", check(tokens))

And use it in the match*() and expect*() helpers (except the ElseError ones): [3df0ae1]

Refactoring matchElseError() to use matchWord(): [a0198aa]

@ngjunsiang
Copy link
Contributor Author

Finally we can take out the atEnd() check in our loops: [26158d3]

Not only do we get more readable loops, we also get more safety knowing that our match and expect helpers help us to check and raise errors. This will need testing too, to ensure it's not raising Unexpected EOF unexpectedly.

@ngjunsiang
Copy link
Contributor Author

Almsot got myself into infinite recursion there: [9260d50]

Can't have expectType() and atEnd() calling each other!

@ngjunsiang
Copy link
Contributor Author

It's very clean to have match and expect helpers return a boolean, but it's quite limiting. In many places we expect() tokens, and then consume them after that. We'll be able to compress some code if our match and expect helpers return the token for a match, and return None otherwise. These are still truthy and falsy respectively; they are treated as True and False respectively when evaluated as booleans. [5538bed]

While compressing code, I accidentally also added a whole bunch of other formatting changes. Including adding a msg keyword argument to matchElseError(): [d7a7f15]

matchElseError() doesn't follow the new naming convention; it matches on words, so I renamed it to matchWordElseError(). Wordier, but more accurate: [9fc5f1b]

@ngjunsiang
Copy link
Contributor Author

And then bugfixes.

  • wrong return value: [c9a0d04]
  • We no longer need to skip line breaks since that is done before the statement: [baa91b3] (idk, this might need fixing again in future. We'll see.)
  • improved return statement detection: [d63aecd]

@ngjunsiang
Copy link
Contributor Author

At the end of the day, we have cleaner code, and still mostly working. I think we can move on to arrays.

@ngjunsiang ngjunsiang merged commit 72fe4f2 into main May 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant