Skip to content

Hedy Language Design

Felienne Hermans edited this page Jun 13, 2024 · 2 revisions

Design Goals

Learning to program is hard for newbies. The goal of the Hedy language is to start with a language that is as simple as possible, and to add more and more syntax as the levels progress. For this we have a number of clear design goals.

  1. Concepts are offered at least three times in different forms:

    Research from writing education has shown that it is best to offer concept in different forms over a long period of time. Furthermore it has been shown that a word needs to be read 7 times before it is stored in long-term memory .

  2. The initial offering of a concept is the simplest form possible:

    Previous research has shown that syntax can be confusing for novices. Early levels thus are as syntax-free as possible to lower cognitive load.

  3. Only one aspect of a concept changes at a time:

    In his paper on the spiral approach Shneiderman argued for small steps in teaching programming, which we follow for Hedy too. This allows us to focus the full attention of the learner on the new syntactic element.

  4. Adding syntactic elements like brackets and colons is deferred to the latest moment possible:

    Previous research in the computer science education domain has shown that operators such as == and : can be especially hard for novices. In a study with high-schoolers we found that that might be due to their pronunciation. Research from natural language acquisition also indicates that parentheses and the colon are among the latest element of punctuation that learners typically learn. Given the choice between colons and parenthesis and other elements like indentation, the latter are introduced first.

  5. Learning new forms is interleaved between concepts as much as possible:

    We know that spaced repetition is a good way of memorizing, and that it takes time to learn punctuation, so we give students as much opportunity as possible to work with concepts before syntax changes.

  6. At every level it is possible to create simple but meaningful programs:

    It is important for all learners to engage in meaningful activities. Our experience in teaching high-school students (and even university CS students) is that learning syntax is not always seen as a useful activity. Students experience a large discrepancy between the computer being smart, for example by being able to multiply 1,910 and 5,671 within seconds, while simultaneously not being able to add a missing colon independently. We anticipate that when the initial syntax is simple, allowing novices to create a fun and meaningful program, they will later have more motivation to learn the details of the syntax.

Levels

In its current form, Hedy consists of 18 different levels. The levels loosely follow the lesson series Python in de klas ("Python in the classroom") in such a way that these existing lessons can be executed with Hedy instead of with Python.

Level 1: Printing and input

At the first level, students can firstly print text. For this, no syntactic elements are needed other than the keyword print followed by arbitrary text. Furthermore students can ask for input of the user using the keyword ask. Here we decided to use the keyword ask rather than input because it is more aligned with what the role of the keyword is in the code than with what it does. Input of a user can be repeated with echo, so very simple programs can be created in which a user is asked for a name or a favorite animal, fulfilling Design Goal 6.

Level 2: (Singular) variables with is

At the second level, variables are added to the syntax. Defining a variable is done with the word is rather than the equals symbol fulfilling Design Goal 3 and Design Goal 4.

Level 3: Lists with is

In level 3 we add the option to create lists and retrieve elements, including random elements from lists with at. Adding lists and especially adding the option to select a random item from a list allows for the creation of more interesting programs such as a guessing game or a story with random elements, which is an assignment from Python in de klas ("Python in the classroom"), or a customized dice. Level 3 also allows learners to add and remove lists elements with a textual syntax: add animal to animals.

Level 4: Quotation marks and types

In Level 3 the first syntactic element is introduced: the use of quotation marks to distinguish between strings and text. In teaching novices we have seen that this distinction can be confusing for a long time, so offering it early might help to draw attention to the fact that computers need information about the types of variables. This level is thus an interesting combination of explaining syntax and explaining programming concepts, which underlines their interdependency. The variable syntax using is remains unchanged, meaning that learners can now use both number is 12 and name is Hedy.

Level 5: Selection with if (pressed) and else flat

In Level 5, selection with the if statement is introduced, but the syntax is 'flat', i.e. placed on one line, resembling a regular syntax more:
if name is print
Another feature is the if pressed statement, making it possible to link commands to any a-z and 1-0 key on the keyboard:
if x is pressed print Else statements are also included, and are also placed on one line, using the keyword else:
if name is print else print . this also works together with pressed.

Level 6: Calculations

In Level 6, students learn to calculate with variables. Therefore addition, multiplication, subtraction and division are introduced. While this might seem like a simple step, our experience taught us that the use of * for multiplication, rather than x is a steep learning curve and should be treated as a separate learning goal.

Level 7: Repetition with repeat x times

In working with non-English native Python novices. Research has found the keyword for to be a confusing word for repetition, especially because it sounds like the word 'four'. For our first simplest form, according to Design Goal 2, we opt to use Quorum syntax repeat x times. In this initial form, like the if the syntax is placed on one line:
repeat 5 times print Repeat can also be used in combination with pressed, so that the program will a keypress multiple times before terminating.

Level 8: Code blocks with one level of nesting

After Level 7, there is a clear need to 'move on', since the body of a loop (and also that of an if) can only consist of one line, which limits the possibilities of programs that users can create. We assume this limitation will be a motivating factor for learners, rather than 'having to learn' the block structure of Python, they are motivated by the prospect of building larger and more interesting programs (Design Goal 6). The syntax of the loop remains otherwise unchanged as per Design Goal 3, so the new form is:\

repeat 5 times
  print 'Hello'
  print 'I am repeated 5 times'

Level 9: Code blocks with multiple levels of nesting

To allow for enough interleaving of concepts (Design Goal 5), we defer the introduction of syntax concepts for now, and focus on more conceptual additions: the nesting of blocks. We know indentation is a hard concept for students to learn, so this warrants its own level (Design Goal 3).

Level 10: For syntax looping over list

In level 10, learners the for syntax to loop over the values in a list with for animal in animals. This allows the customization of stories, drawing and songs.

Level 11: For syntax with in range

Once blocks are sufficiently automatized, learners will see a more Python-like form of the for loop, namely: for i in range 0 to 5. This allows for access to the loop variable i and this allows for more interesting programs, such as counting to 10. As per Design Goal 3, the change is made small, and to do so (following Design Goal 4), brackets and colons are deferred to a later level, but indentation which was introduced in Level 8 remains.

Level 12: Datatypes

Learners are now allowed to use floats and need to place quotation marks around strings to distinguish them from numbers.

Level 13: And and or

In level 13, Learners learn about and and or in if statements.

Level 14: Smaller and bigger

In level 14, Learners learn about <(=) and >(=) in preparation for while loops.

Level 15: While loops

In level 15, learners are introduced to the while loop. With the previous knowledge of loops and <= and >=, learners can make basic while loops.

Level 16: Adding rectangular brackets

In this level, learners encounter brackets for the first time, because it adds rectangular brackets for list access, which up to now was done with the keyword at, following Design Goal 2. THis level also explain accessing lists with a numeric index, starting at 1. The code to access a specific value has already been available technically since level 2, but there was no explanation yet how to access a specific value and it is not used in examples (and should maybe be removed?)

Level 17: Learning elif and the colon

To make the step to full Python, learners will need to use the colon to denote the beginning of a block, in both loops and conditionals. Because blocks are already known and practiced over several levels, we can teach learners to use a colon before every indentation.

This level also introduces elif to allow for more exciting programs, since just adding a colon does not really create engagement.

Level 18: Adding round brackets

Level 18 adds round brackets in print and range and changes ask to input. As per Design Goal 4, these are added as late as possible.

Additional features

Comments

All levels allow for the use of comments, and it is up the the teachers to explain their different uses.

Technical Design

Grammars

Every level of Hedy is essentially a new language which requires its own grammar. Due to the gradual nature of Hedy, however, the grammar of each level is only slightly different from the grammar of the previous one. To avoid massive duplication, grammar code in Hedy is organized in the following manner:

  • A level1.lark file serving as a base grammar file.
  • A level[1-9+]-Additions.lark file for every level. Each file describes only the grammar changes compared to the previous level. Addition files can add new grammar rules or override existing ones.

To get the grammar of a concrete level, Hedy takes the grammar of level 1 and merges consecutively all the changes specified in the Addition files until the required level is reached. The final merged grammars for all levels are generated in the /grammars-Total folder.

Type System

Hedy has a rudimentary type system created to provide better error messages to end users. The type system performs type inferring, type validation and lookup table enrichment before transpilation happens. Note that if in the future the transpiler still does not require any of the lookup table enrichments done by the type system, type validation and transpiling can run in parallel.

The type system requires as input a lookup table containing the names of all variable definitions, which it later enriches with their inferred types. The supported types are string, integer, float, list, boolean, input, any and none. The type any is used when types cannot be inferred and is ignored in all type validations. The type input is a composite data type used to denote user input (retrieved through the ask and input commands), which means input can be multiple types depending on the value the user enters. At the moment, the user input could be interpreted as string, integer or float. The lookup table is also used by the transpiler to differentiate literals from expressions, e.g. the literal 'text' vs a variable called 'text'. Because of that the lookup table does not contain only variable definitions, but also all expressions that need to be escaped, e.g. variable access such as animals[0].

The lookup table is created and enriched in two separate steps. The first traversal of the abstract syntax tree puts in the lookup table the entries required by the transpiler along with a reference to the sub-tree needed to infer their type. For example, the line a is 1 will add the following entry {name: 'a', tree: {data='integer', children:['1']}} The second traversal of the abstract syntax tree is performed to infer the types of expressions, store the inferred types of variables in the lookup table, and perform type validation. If during the second step the type system encounters a variable with type that has not been inferred yet, it will use the tree stored in the lookup entry to infer its type. Note that there are valid scenarios in which the lookup entries will be accessed before their type is inferred. This is the case with for loops:

for i in 1 to 10
    print i

In the above case, print i is visited before the definition of i in the for loop. To mitigate the issue, the lookup entry tree is used to infer the type of i. There is a guard against cyclic definitions, e.g. b is b + 1.

Error messages

We try to craft our error messages along these lines.

  • (i) Structure. We aim to follow a consistent structure for all of our error messages. First sentence will inform our users of the issue, while the second sentence will offer a suggestion on how to resolve it. This approach helps users quickly understand where to look when something goes wrong and allows them to become accustomed to reading the error messages more efficiently due to their uniform structure.
  • (ii) Consistency. We aim to keep our error messages consistent by using the uniform wording and maintaining a similar sentence structure and length. Errors should follow the same format ensuring that the sentences start and end similarly. This consistency also aids in the readability and the structure of the error messages.
  • (iii) Keeping Them Short. We aim to keep our error messages concise and short while providing sufficient information about the issue and its cause. Given that people often do not read the entire error message, critical information will be presented at the beginning.
  • (iv) Specific Errors. We aim to create error messages that are specific enough to help users understand programming concepts. This is particularly important for new programmers, who require more detailed guidance compared to those with prior knowledge. To assist new users effectively, we strive to provide distinct error messages for similar issues, as a generic message such as ‘syntax error’ is not sufficiently informative.
  • (v) Language.
    • The first thing regarding the language use is to maintain a positive and encouraging tone such as avoiding words like ‘illegal’ and ‘invalid’, we basically want to blame the computer, not the user.
    • The second consideration for our language use is to maintain simplicity, minimizing the use of programming terms. Terms should be from the programmer's perspective, not the compilers, so we avoid words like 'initialization' and use simpler terms such as 'set' or 'declare'.
    • The third consideration for our language use is to employ anthropomorphic messages for specific types of errors, implying that computers can think. For example, we might say, 'I can’t find the ... for this function.' We use this type of language in the first sentence of error messages to explain the problem. This approach is beneficial, particularly for younger users, as children tend to use correlation to learn new topics more effectively.
  • (vi) Attention. We aim to catch our user’s attention by using highlighting and using different type of fonts for helpful indications such as a missing command, or a typo in one of the commands. Additionally, we sometimes specify the line containing the error, as the system highlights the line where the error occurred, making it easily visible.
  • (vii) Documentation and Examples. This process focuses on improving the error messages by incorporating feedback and suggestions from users and documenting them. Additionally, examples are provided at the top of the console named as 'Exercise' to guide students to prevent them from making errors.