Skip to content

Loading…

Bug with Structures in Lists. #15

Closed
tvarney opened this Issue · 8 comments

2 participants

@tvarney
Collaborator

There is a bug with the current way structs are handled; consider the following scenario:

(print (struct1, struct2, struct3))

where struct1, struct2, and struct3 are all symbols pointing to structures. The intended behavior is to
have the print function print out a cons with the string representation of each of the structs. However, since each of the structs is also technically a function, Crono will interpret the cons as a function application and throw a runtime exception since the second and third arguments are "wrong". Quoting the cons won't work, since Crono won't attempt to resolve the symbols and you will get the literal string "(struct1, struct2, struct3)".
This doesn't just affect the print function; any time you wish to pass a cons of structures to a function it will break. Currently the only method is to use the cons function:

(define structlist (cons struct1 (cons struct2 (cons struct3 Nil))))
(print structlist)

While this works, it breaks the intuitive flow of allowing list definitions since

(print (1 2 3))

will work in the current interpreter.

Any ideas on what we should do? It would be possible to add another exception to the Interpreter that gets checked when the arguments are wrong to a function. It would also be completely possible to have a 'resolve' function which translates symbols to their definitions and nothing else; eg:

(print (resolve (struct1 struct2 struct3)))
@mwatts15
Owner
@tvarney
Collaborator

See issue #12, when Carlo and I stopped the interpreter from interpreting the "String as a Cons" as a function application, I added in the ability to start a cons with any primitive such that the interpreter didn't attempt to evaluate it. The biggest gain from doing it this way was that it was easy to create a list that had computed elements:

(print ((+ 1 2) (+ 3 4) (+ 5 6))) % prints "(3 7 11)"

The reason for doing this was mostly convenience; instead of needing to cons each item onto a list you could just define it inline. However, if we change Strings to be arrays instead of Cons, this could be replaced with a function like list in scheme which is variadic and returns the cons of it's elements.

(print (list (+ 1 2) (+ 3 4) (+ 5 6)))) % prints "(3 7 11)"

The idea is to avoid needing large lists of cons statements like:

(print (cons (+ 1 2) (cons (+ 3 4) (cons (+ 5 6) Nil))))

I'm not sure how I feel about 'quasi-quoting' with quoting and unquoting. Just switching back to requiring that the first element in all structural cons be a function, switching Strings from Cons to Arrays, and adding a list function would solve this issue.

@mwatts15
Owner
@tvarney
Collaborator

I agree that we should revert the behavior; I was just noting that this 'bug' is solved by swapping strings to arrays instead of cons, and the connivence factor could be taken care of by adding a list function.

The 'problem' I have with quasiquoting is that it increases the work the interpreter does when it reaches a cons; instead of the interpreter just returning the quoted value, it would have to traverse the entire tree to search for unquotes. This would mean changing the Interpreter from a simple Visitor-pattern to a combined Visitor-pattern/state-machine, as each visit method would need to change it's behavior based on if it was quoted or not. There would also have to be a new unquote node, since there would be no 'good' way to do it with functions (it could be done, just not 'well' in my mind). A new node would not be too hard, but would mean another visit method on interpreters that swap the state machines state.

We could add it, and it would be another 'cool' feature to show off to Canata, but I don't want to jump on that to solve what is easier solved with smaller changes, namely reverting to the old behavior and adding a convenience function.

@mwatts15
Owner

The 'problem' I have with quasiquoting is that it increases the work the interpreter does when it reaches a cons; instead of the interpreter just returning the quoted value, it would have to traverse the entire tree to search for unquotes. This would mean changing the Interpreter from a simple Visitor-pattern to a combined Visitor-pattern/state-machine, as each visit method would need to change it's behavior based on if it was quoted or not.

I don't see that it's that serious. I've changed my mind about auto-quasiquoting since there may be utility to allowing for a stored "format string" pattern where you would quote a to-be-interpolated cons for later evaluation as a quasiquoted cons. Anyway, the quasiquoted cons just needs to be treated like a closure. The quasi-quote encloses the enivironment at the time of creation and we don't grab the unquoted values until we actually need them. No extra nodes needed.

We could add it, and it would be another 'cool' feature to show off to Canata, but I don't want to jump on that to solve what is easier solved with smaller changes, namely reverting to the old behavior and adding a convenience function.

I don't want to discount this feature for nothing, but certainly it's not an important feature. We need more to improve the construction of our abstract syntax. The string-as-list-of-char thing has some nice features like allowing us to co-opt the list processing functions for string processing without any extra code. Yes, it would require quoting them, but what of that? They are data and so would need to be quoted for that use. As for pretty-printing as "string" we would need stronger typing requirements on lists (a la Haskell) so the print part of our REPL loop knows to treat (listof Char) specially.

@tvarney
Collaborator

The string-as-list-of-char thing has some nice features like allowing us to co-opt the list processing functions for string processing without any extra code.

They will still co-opt code for manipulating them, just from a different source (arrays).

Yes, it would require quoting them, but what of that?

I don't feel that requiring a quote on data no matter where it is in the program is a trivial problem. Consider:

(print 1)

The integer 1 is data, and it doesn't need to be quoted. However, with strings as lists

(print "Hello World")

is a syntax error since 'H' isn't a function. Data doesn't need to be quoted to be used in crono, unless strings are lists, in which case there exists that one case where it would. The point is that strings-as-lists adds exceptions to the language; "You don't need to quote data, unless it's a string".

As for pretty-printing as "string" we would need stronger typing requirements on lists

Stronger typing on lists could be done, but we would probably want to switch back to the old way Cons were represented; backed up by a java linked list instead of a raw-node. It also isn't needed except as an optional feature with strings-as-arrays since Strings are a specialization of arrays.

Finally, strings as lists carry significant performance problems. Consider the size of a string in memory. If a string is represented by a list, you get the problem of each individual character requiring a node on that list, which contains the character object and a pointer to the next object in the list (or nil). The total size of a string-as-a-list then is (2 * p + c) * n bytes, where p is the size of a reference (8 bytes on 64-bit systems), and c is the size of a character (2 bytes). With strings as an array, you get the initial object overhead (8 bytes as above), plus the array of characters for a size of p + c * n bytes (technically it would be n + 1 for the null character).
Consider the case where someone wants to read in a word list and return a random word. Such a file can be upwards of 100,000 characters. So, a string-as-a-list representation of it would come out to (28 + 2) * 100,000 = 1,800,000 bytes. A string-as-an-array representation of it is only 8+2(100,000 + 1) = 200,010 bytes long. That's a big difference, and it affects not only memory; the larger an object is, the worse it's cache locality. Meaning, scanning such a string would cause far more cache misses (and therefore have worse performance) if it was represented as a list instead of an array.

For reference, here is a page about the performance of strings in Haskell. The python version is 4 times faster than the [char] haskell version, and switching to Data.ByteString causes a similar speed-up (3.1x faster).

@tvarney tvarney was assigned
@mwatts15
Owner

The string-as-list-of-char thing has some nice features like allowing us to co-opt the list processing functions for string processing without any extra code.

They will still co-opt code for manipulating them, just from a different source (arrays).

By "extra code", I mean for us. If I understand you aright, we would have to map the list operations to an array on the back end, again especially for strings. That's just a shed of a different color. If we have that strings are lists, then we don't have to write anything other than some common list functions for prelude.crisp (what do you think of the file ext, btw? Cute eh?), which we would want anyway.

Yes, it would require quoting them, but what of that?

I don't feel that requiring a quote on data no matter where it is in the program is a trivial problem.

You misunderstand me-- the quotation happens in the translation of the literal, so this:
"string"
has the value:
(quote (#\s #\t #\r #\i #\n #\g))
using the scheme character syntax. Which amounts to wrapping the string cons in a quote node, I think. We could also go the route of racket on this and have functions for turning strings into lists and lists into strings. I would still prefer to have strings implemented simply as lists, for aesthetic reasons, but it's well either way for me IF there we provide list->string / string->list functions.

The point is that strings-as-lists adds exceptions to the language; "You don't need to quote data, unless it's a string".

When you use a list as data, it is quoted, in a manner of speaking, to prevent it being evaluated as a function application. Again, the exception, if it can be called one, is on our end; users never need worry about it.

Finally, strings as lists carry significant performance problems. Consider the size of a string in memory.
[...]

Certainly, that is a concern and a motivation for supporting both list-based strings and array-based. Which we choose as default depends largely on who would be using our language and what for. At this point I guess the answer are "us" and "for a grade." I would also add, "and under a deadline": this is a minor issue, and we still need to satisfy our requirements for tracing interpretation steps.

Also: It's now only tangentially related to this ticket! We can close this when we have a (list ...) primitive.

@tvarney
Collaborator

By "extra code", I mean for us. If I understand you aright, we would have to map the list operations to an array on the back end, again especially for strings. That's just a shed of a different color. If we have that strings are lists, then we don't have to write anything other than some common list functions for prelude.crisp (what do you think of the file ext, btw? Cute eh?), which we would want anyway.
Arrays will have their own special functions on them (get, set, append, remove, insert) that support indexed positions into them. Further, append and insert will allow for appending/inserting arrays onto/into other arrays, such as:

(insert "H World" 1 "ello") % Result: "Hello World";

Arrays will not have any interpreter backend mapping to list functions like cons/car/cdr.

You misunderstand me-- the quotation happens in the translation of the literal
Point taken, it could be done that way.

it's well either way for me IF there we provide list->string / string->list functions.
That was actually planned as the (aslist *array*) and (asarray *cons*) functions.

At this point I guess the answer are "us" and "for a grade." I would also add, "and under a deadline"
Agreed, which is another reason I don't want to change it. It would require me changing a large amount of code when I'm already partially done with the current strings-as-arrays implementation. All I need to do at this point is add the functions and it should be good to go with some testing.

Also: It's now only tangentially related to this ticket! We can close this when we have a (list ...) primitive.
Yes, agreed. I'll write the list function and change the behavior of the interpreter after I push the changes I made to finish structs.

@tvarney tvarney closed this
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.