Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read File node in a line at a time mode returns an extra, blank line. #3913

Open
jdotbdot opened this issue Oct 10, 2022 · 3 comments
Open

Comments

@jdotbdot
Copy link

Current Behavior

GIven this datafile, each line terminated by \n (on Linux $ printf "%s\n%s\n" "ONE" "TWO" > testdata)
ONE
TWO

Read File a message per line emits 3 messages "ONE, "TWO" and ""

This is contrary to some other programming languages, at least on Linux. eg C getline() will retrieve 2 lines, so do interpreted languages such as awk and perl.

If the second new line is removed ($ printf "%s\n%s" "ONE" "TWO" > testdata), read File now emits 2 messages "ONE" and "TWO", while the C, awk, perl etc versions still report 2 lines.
NB wc -l is different, it simply counts \n characters.

If you pass the 3 messages to a Write File node, add \n to each line, the resulting file now has two new lines at the end.

Expected Behavior

Both versions of the test data file, with and without a final new line character, should result in two messages, in accordance with other programmatic ways of counting lines.
The final, empty string message is the error

Steps To Reproduce

See discussion on NR forum

Example flow

No response

Environment

  • Node-RED version: 3.0.2
  • Node.js version: 16.17.0
  • npm version: 8.15.0
  • Platform/OS: Raspberry Pi/RPiOS
  • Browser: Firefox
@dceejay
Copy link
Member

dceejay commented Oct 11, 2022

Node-RED is written in javascript (not awk, C or perl) and the way javascript spilts a file on \n character will produce an array of three items as described - likewise if that array is then joined using \n as the join character it will correctly produce a string as required.

When we split the input into lines we add a msg.parts property (as we do with the split node) - that contains the index of that line and other info so it can be rejoined later. When reading a file in per line mode we split any input into lines and send those immediately. If we then get an end() event (file complete) - we then send whatever is left and also include the total line count in the msg.parts. - If the final line is just text that is fine - if the "last" line ends \n then it was already sent (and the "cursor" is on the next line) - so we send a null "" string plus the final msg.parts.

If this stream is fed into a join node then it will rejoin them correctly and output the "file" correctly. However if sent to a file out node set to append lines and add a new line then it results in a file with an extra linefeed at the end.

So IMHO there are two possible related fixes/enhancements, to the file out node...

  1. check the msg.parts - and if it is the last line - check if the payload is blank - and if so don't add the final newline. - This would fix the issue when in append mode.

  2. an enhancement would be to also change the behaviour of the overwrite file mode of the node - currently this does it per input msg so in this case is per line... If this mode was to also honour the parts property it would only start a new file if the index was 0 and would likewise close it when complete.

Note: -in both of these - we would rely on the fact that msg.parts existed correctly - if not present then we would append (or not) the newline as configured by the setting.

I believe the first option could be considered a bug fix and while a small change in behaviour I think it is not unreasonable and does better reflect the intended behaviour.

The second is definitely breaking so should be only considered for a major upgrade.
I will create a PR for the first.

@jdotbdot
Copy link
Author

jdotbdot commented Oct 12, 2022

I just can't make sense of this interpretation.
The way that javascript splits a string is surely not a good reason to redefine the everyday understanding of a line.
But the internet contains nobody else complaining about it, so maybe it's not worth changing. Maybe even I'm wrong!

This text consists of two lines.
Two full stops does not make it 3 lines.
This text also consists of two lines.
Even though the second line has no full stop

@dceejay
Copy link
Member

dceejay commented Oct 12, 2022

and wc -l on a file with just "one line" - returns 0 as the answer so maybe need to fix that as well ?
We are in javascript land here - and yes we know javascript has its weirdness as do all languages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants