Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent uninitialized variables #452

Open
perlun opened this issue Apr 5, 2024 · 3 comments
Open

Prevent uninitialized variables #452

perlun opened this issue Apr 5, 2024 · 3 comments
Labels
bug Something isn't working as expected language Language features (or bugs)
Milestone

Comments

@perlun
Copy link
Collaborator

perlun commented Apr 5, 2024

The following Perlang program:

var s: string;
print s;

...currently both compiles (when using experimental compilation, #406) and runs.

In interpreted mode

The semantics here are better, less unexpected. We probably inherited them from Lox in this case.

~/git/perlang on  fix/AsciiString-to-String-assignment [!+] 
❯ perlang docs/examples/quickstart/hello_world.per 
null

In compiled mode

Quite frankly, this is pretty horrible. 😬

~/git/perlang on  fix/AsciiString-to-String-assignment [!+] 
❯ perlang docs/examples/quickstart/hello_world.per 
1�H��ǃ�

Looking at the intermediate C++ code, this isn't very surprising in fact. It's a completely uninitialized (stack) pointer, what do we expect from it? In fact, we're quite lucky to even get garbage printed in this case, it could equally well have caused a SIGSEGV/core dump. 🙂

~/git/perlang on  fix/AsciiString-to-String-assignment [!+] 
❯ cat docs/examples/quickstart/hello_world.cc 
// Automatically generated code by Perlang 0.4.0-dev.45 at 2024-04-05T19:23:31.5549882Z
// Do not modify. Changes to this file might be overwritten the next time the Perlang compiler is executed.

#include <math.h> // fmod()
#include <stdint.h>

#include "bigint.hpp" // BigInt
#include "stdlib.hpp"

//
// Method definitions
//
int main();

//
// Method declarations
//
int main() {
    const char * s;
    perlang::print(s);
}

Going forward

We should probably do one of the following:

  1. Use the V approach: forbid declaring uninitialized variables altogether. This feels a bit limiting though; there are perfectly valid cases where e.g. a variable is initialized in if/else blocks, for example.
  2. Use the Java/C# approach: allow uninitialized variables, but prevent them from being used before they are initialized. This is arguably more complex, but should at the same time provide more value to the user. Since we are (or will be) pretty close friends with the Java and C# languages anyway, we might as go with this approach.
@perlun perlun added bug Something isn't working as expected language Language features (or bugs) labels Apr 5, 2024
@perlun perlun added this to the 0.6.0 milestone Apr 5, 2024
@perlun
Copy link
Collaborator Author

perlun commented Apr 5, 2024

@munificent - which option would you pick? How have you handled this in the language(s) you've designed? Would be interesting to hear some "outside thoughts" on this one.

(and yeah.. the real reason for pinging you is that I do want to brag a bit about the fact that I'm on my way to writing a full Perlang-to-C++-transpiler.. 😊 🙈 Pretty cool for a Lox-derivative, huh?)

@munificent
Copy link

  • Use the V approach: forbid declaring uninitialized variables altogether. This feels a bit limiting though; there are perfectly valid cases where e.g. a variable is initialized in if/else blocks, for example.

This approach is simplest. But to make it really usable where it doesn't get in the users way, it helps to have:

  1. Pattern matching or destructuring assignment or some other way to initialize multiple variables at once like:
var (a, b) = someComplexCode();
  1. Everything is an expression or at least some way to have a block of code as an expression, like:
var (a, b) = {
  some...
  complex...
  computation...
  that finally yields a and b...
}
  • Use the Java/C# approach: allow uninitialized variables, but prevent them from being used before they are initialized. This is arguably more complex, but should at the same time provide more value to the user. Since we are (or will be) pretty close friends with the Java and C# languages anyway, we might as go with this approach.

This is the most usable path for a statement-oriented language that doesn't have the above feature. The name for it is "definite assignment analysis". It's a flavor of control-flow analysis. Doing it isn't rocket science, but it does increase the language complexity by a notch.

@perlun
Copy link
Collaborator Author

perlun commented Apr 27, 2024

Thanks a lot for your reply and input @munificent! 🙇 Much appreciated.

  1. Pattern matching or destructuring assignment or some other way to initialize multiple variables at once like:
var (a, b) = someComplexCode();

This is indeed quite useful. 👍 In the C# codebase I've started working on recently since switching to a new job some month ago, the tuple-based approach seems to be used a bit here and there. One thing that's particularly nice about it is that if you refactor a method to return "more data" than just a single type, the effort to just return (foo, bar) instead of return foo is really simple instead of having to move something to a ref or an out parameter. Coupled with destructuring like above, it a quite nice tool to have handy in the toolbox of the language.

Thinking loudly, if I wanted to implement something like this right now I could just add some simple C++ tuple wrapper class which the Perlang (foo, bar) code would get transformed to. And apparently, C++ already has a nice std::tuple class I could use for this, so at least the "return tuple" part would probably be quite trivial. The destructuring feels like it would be not-to-hard either, but as usual the Devil is in the details.

  1. Everything is an expression or at least some way to have a block of code as an expression, like: [...]

This is actually an option too, and I've actually done a significant amount of Ruby coding in the past, where this pattern is prevalent. I dunno... there's something about the "implicit" part of that I kind of dislike I think, or the fact that code like this:

var foo = if bar {
  "baz"
} 
else {
  "zot"
}

...it just doesn't look so pretty, I think... maybe. 🤔

This is the most usable path for a statement-oriented language that doesn't have the above feature. The name for it is "definite assignment analysis". It's a flavor of control-flow analysis. Doing it isn't rocket science, but it does increase the language complexity by a notch.

Thanks. 👍 I think I'll leave this on the backburner for a while. As the language (and hopefully compiler) settles, I'll hopefully come to some form of conclusion on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected language Language features (or bugs)
Projects
None yet
Development

No branches or pull requests

2 participants