New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EOL character detection #514
Comments
I have tried messing around with diff --git a/py/lexer.c b/py/lexer.c
index 0360537..0706578 100644
--- a/py/lexer.c
+++ b/py/lexer.c
@@ -79,7 +79,16 @@ STATIC bool is_end(mp_lexer_t *lex) {
}
STATIC bool is_physical_newline(mp_lexer_t *lex) {
- return lex->chr0 == '\n' || lex->chr0 == '\r';
+ if (lex->chr0 == '\n' && lex->chr0 != '\r')
+ return true;
+ else if (lex->chr0 == '\r' && lex->chr0 != '\n')
+ return true;
+ else if (lex->chr0 == '\n' && lex->chr0 == '\r')
+ return true;
+ else if (lex->chr0 == '\r' && lex->chr0 == '\n')
+ return true;
+ else
+ return false;
}
STATIC bool is_char(mp_lexer_t *lex, char c) {
@@ -149,18 +158,28 @@ STATIC void next_char(mp_lexer_t *lex) {
int advance = 1;
- if (lex->chr0 == '\n') {
+ if (lex->chr0 == '\n' && lex->chr1 != '\r') {
// LF is a new line
+ printf("LF is a new line\n");
++lex->line;
lex->column = 1;
- } else if (lex->chr0 == '\r') {
+ } else if (lex->chr0 == '\r' && lex->chr1 != '\n') {
// CR is a new line
+ printf("CR is a new line\n");
++lex->line;
lex->column = 1;
- if (lex->chr1 == '\n') {
- // CR LF is a single new line
- advance = 2;
- }
+ } else if (lex->chr0 == '\n' && lex->chr1 == '\r') {
+ // CR LF is a single new line
+ printf("CR LF is a single new line\n");
+ ++lex->line;
+ lex->column = 1;
+ advance = 2;
+ } else if (lex->chr0 == '\r' && lex->chr1 == '\n') {
+ // LF CR is a single new line
+ printf("LF CR is a single new line\n");
+ ++lex->line;
+ lex->column = 1;
+ advance = 2;
} else if (lex->chr0 == '\t') {
// a tab
lex->column = (((lex->column - 1 + TAB_SIZE) / TAB_SIZE) * TAB_SIZE) + 1;
@@ -310,6 +329,7 @@ STATIC void mp_lexer_next_token_into(mp_lexer_t *lex, mp_token_t *tok, bool firs
bool had_physical_newline = false;
while (!is_end(lex)) {
if (is_physical_newline(lex)) {
+ printf("Found an physical new line!\n");
had_physical_newline = true;
next_char(lex);
} else if (is_whitespace(lex)) { But go nowhere so far. Here is my test script: printf '\n' | ./micropython
printf '\r' | ./micropython
printf '\n\r\n\r' | ./micropython
printf '\r\n\r\n' | ./micropython
printf 'def setup():\n\tprint("123")\n\nsetup()\n' | ./micropython
printf 'def setup():\r\tprint("123")\r\rsetup()\r' | ./micropython
printf 'def setup():\n\r\tprint("123")\n\r\n\rsetup()\n\r' | ./micropython
printf 'def setup():\r\n\tprint("123")\r\n\r\nsetup()\r\n' | ./micropython I'm mostly unsure why there are two places where we attempt to detect EOL character... |
But what's exactly the subject of the report here? You seem to presume that uPy unix binary (or maybe all ports at all?) should behave in some particular way, without providing grounds and motivation for that. For me this report looks as good as reporting that "p1r1i1n1t1(111)1" doesn't work and trying to "fix" that. |
If you put your example scripts into files then both kinds of newlines work just fine with uPy. I think the problem you have is piping the output from printf to micropython. |
@pfalcon, I think that once |
@dpgeorge, that's interesting... In my opinion there should be no difference, whether it's from a file or piped. Although, I am usimg |
@dpgeorge now I see why - it's piping a script into REPL vs reading script from file, the functions being called are diffrent. Although, I still would call this a bug, perhaps it's very low priority... @pfalcon to your point, I have changed my test script to pipe into
|
Yes, to get this working you would need to detect (in unix/main.c) if stdin |
But do you have reference confirming this is supported usage? I personally never saw somebody doing that. There's "-c" option to CPython to pass Python code to execute on commandline, and that's what I always see being used. If you can't provide evidence that such usage should behave in particular way, then I'd argue following account is valid: Calling python executable w/o "-c" or filename starts it in interactive mode, where some input is interpreted specially. For example, you may expect that "\t" would lead to completion variants to be output. That doesn't happen with GNU readline, but there's no warranty it won't happen with other readline implementations. In other words, behavior you're trying to use is undefined, and then it's undefined for line-endings too. Feel free to prove me wrong or at least explain why you think it's worth to define it in particular way. |
@pfalcon it's a valid point... I suppose my thinking was biased towards how this works with Ruby, there is One thing that's might prove my point is that when you input a script to CPython through a pipe or redirection, it doesn't print the version info nor the I am not sure about your example with Below are the key parts of
I am not saying that UNIX port of uPy should be CLI-compatible, but you asked to prove the point of why should treat standard input why it's not a TTY the same way as we do a file on filesystem. If there is some difficulty with doing that, it's a different question then. My original point was actually more around what goes on in |
Ok, thanks, you're right then.
No, because, as I mentioned, GNU readline is smart enough to call isatty() itself and not interpret chars from non-tty as commands. But there're bunch of readline replacements (https://github.com/antirez/linenoise , http://www.s11n.net/editline/) which may do or not do that. (Not just an abstract point - GNU readline is GPL, so linking against it causes binary to be GPL in the strict reading of GPL). Anyway, as you pointed yourself, fixing it may be a bit tricky. And hard to do satisfying everyone. For example, my concern is that it will make uPy bigger/more complicated. One "obvious" way to "fix" it is to just skip "\r" - but then one sweet day someone will come asking "why did you eat CR inside my string literals?", etc. Well, now that it's proven as something worth a fix, I hope you'll be able to find a good way to do that. |
They are. I don't see anything wrong, or duplicated code, with how newlines are handled in py/lexer.c. Your re-write of is_physical_newline above is equivalent to the original (check the logic carefully). is_physical_newline just checks to see if a newline is coming up, it doesn't try to parse the newline. A newline is coming up if either The only thing that needs improving is unix/main.c, detecting if it's connected to a tty or not. |
Ok, so I found a mega simple fix for my - mp_parse_node_t pn = mp_parse(lex, MP_PARSE_SINGLE_INPUT, &parse_error_kind);
+ mp_parse_node_t pn = mp_parse(lex, MP_PARSE_FILE_INPUT, &parse_error_kind); This is not for the The only thing I'm thinking of now is that those constants could be renamed... May be |
These names follow the names used in the Python 3 grammar specification: https://docs.python.org/3/reference/grammar.html (see the first 3 non-comment lines on that page). I think changing them to something else would just confuse the situation for those who are already familiar with the grammar. |
BTW, |
Fair enough, I think this can be closed now. |
fixes hardware dotstar support for 3.0 and addresses issue micropython#514
Uncomment lines in mpconfigport.h for gemma_m0 to allow dotstar access. same issue as micropython#514 for trinket_m0
Looks like
\n\r
is accepted, while\r\n
isn't.Looks like the
\r\n
is the DOS EOL, while the one we accept is an odd one:The text was updated successfully, but these errors were encountered: