Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Xed corrupts files by placing a linebreak at the end of file #195

Closed
GitHubIsToxic opened this issue Dec 3, 2017 · 24 comments
Closed

Xed corrupts files by placing a linebreak at the end of file #195

GitHubIsToxic opened this issue Dec 3, 2017 · 24 comments

Comments

@GitHubIsToxic
Copy link

Xed corrupts every saved file by adding line break characters (CR/LF) at the end of the file. This line break is not displayed within Xed editor. Even when saving a file without changes, these characters are silently added. This is anti-intuitive behavior and causes many problems, especially when editing configuration files, or files that otherwise require strict formatting. It also causes all sorts of issues with files under version control.

I fully understand the history and absurd reasoning of this 'feature', and that Xed has merely inherited this behavior from gedit (which also has this same intentional bug). I really do think it is time to fix the default behavior. If 'cat' and other command line tools have issues parsing files without ending line breaks, then that's an issue to be solved by the respective program. No need to break other applications - and worse; user data - as workaround!

Please note that this issue is not a duplicate of #61, which merely deals with displaying this line break (and the confusion it evidently causes). This ticket addresses the cause of those issues.

Question: Is there also a workaround for Xed? For gedit one could change some gnome-settings-something, somewhere to override this behavior.

FYI: Xed 1.4.6, Mint Cinnamon 18.2 (fresh install)

@dodona2
Copy link

dodona2 commented Dec 4, 2017

I just figured out that 'cat' (cat (GNU coreutils) 8.27) doesn't have issues parsing files without ending line breaks.

@dodona2
Copy link

dodona2 commented Dec 6, 2017

in the Zealots good ol' emacs its an configurable option(guess we should do the same):

Require Final Newline
Controls whether a final newline is ensured when the file is saved. Hide
The value is an association list that for each language mode specifies
the value to give to ‘require-final-newline’ at mode initialization;
see that variable for details about the value. If a language isn’t
present on the association list, CC Mode won’t touch
‘require-final-newline’ in buffers for that language.

@haarp
Copy link

haarp commented Dec 13, 2017

It's not a bug. Xed is a text editor. Text requires there to be a trailing newline at the end of the file, or many parsers will fail. Most text editors are happy to oblige and ensure there's a newline at the end.

If you want to edit binaries, a hex editor would be better suited.

@GitHubIsToxic
Copy link
Author

GitHubIsToxic commented Dec 13, 2017

@haarp: I strongly disagree.

All line breaks, other than the one forcibly and invisibly added at the end, are editable in gedit/xed. And certainly not 'most' text editors add this extra line break. None of the text editors I've used on Windows do this (Notepad, Notepad++, PNotepad), and even on Linux not all do this either, in fact I used some of these to remove these line-breaks. None of the major IDE's do this either (MS Visual Studio, Netbeans, etc). It is not 'normal' behavior at all.

It is ridiculous to suggest one has to resort to hex editing to remove this line-break, every time one saves a text file. There aren't even any good hex editors for Linux (wxHexEditor is reasonable though, I personally use HexWorkshop running under Wine).

If an 'expert' user is so inclined to require the ending line-break in his files, he can trivially add this himself: it is just one freaking keystroke!! But as long as the line-break is forced on the user, one has to resort to far more time consuming efforts to remove it (hex editing, installing a second text editor).
The solution IMO is plain as day: disable this default behavior, perhaps add a preferences option, or simply leave it to user to hit the 'enter key' to manually add the line-break.

I've written a few text editors and complex text parsers myself, and it is absolutely trivial to handle EOF cases. I would not trust any tools that struggle with this. And as dodona2 commented, it appears the issue with 'cat' has been fixed by now anyway, so this lame workaround is no longer needed.

This issue should still be regarded as a bug; it is undocumented behavior, non-intuitive, invisible to the user, non-optional, and can unnoticeably corrupt user data. You just have to google for the tons of issues that users have due to this 'feature'; most of them have no idea their files even have ending line-breaks, because it is intentionally hidden from them. It personally took me weeks to find out that I could not trust gedit to display all characters stored in the file, I was looking for bugs in my own and other software that didn't exist.

By the way, here's the workaround for gedit: https://stackoverflow.com/questions/3056740/gedit-adds-line-at-end-of-file

@haarp
Copy link

haarp commented Dec 13, 2017

There is a reason for the trailing newline. It's literally the POSIX standard.

http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap03.html#tag_03_206

So it should come as no surprise that many parsers have problems when the trailing newline is missing. A line not ending in a newline is not considered a line at all!

It is also part of the ANSI C 1989 standard (section 2.1.1.2) and ISO C 1999 (section 5.1.1.2).

I wasn't suggesting using a hex editor to remove the newline. I was suggesting to use them on binary files, those being the only files where a trailing newline is undesirable.

The matter is not as simple as "one freaking keystroke". You will tend to forget it, often leaving broken text files in your wake. The mcedit text editor for example has the option of warning the user when he saves a file without a trailing newline.

I am not opposed to making it a setting, but it should not be the default.

@GitHubIsToxic
Copy link
Author

GitHubIsToxic commented Dec 13, 2017

Ok, that sounds fair, and probably explains why this behavior is expected by some tools.

Binary files definitely aren't the only files in which trailing line-breaks aren't needed, in our cross platform project we have thousands of plain text files without them (typically originating from Windows users), and editing by some Linux (Xed/gedit) users adds the new-line, causing confusion with version control.

I concur that it is easy to forget adding the new-line, so a warning would be useful. Making the feature optional IMO is best. I'm willing to live with current behavior as default as compromise, but I do think the new line should always be visible in the editor view, as a reminder to the user (perhaps even rendered with a visible glyph).

@haarp
Copy link

haarp commented Dec 13, 2017

but I do think the new line should always be visible in the editor view, as a reminder to the user (perhaps even rendered with a visible glyph)

That would in fact be incorrect behavior. The final newline is the terminator of the last line, not an actual line break. Granted, many tools get confused by this distinction, and the problem isn't new either. See this comment from 2005 for example:

https://slashdot.org/comments.pl?sid=165492&cid=13808398

@GitHubIsToxic
Copy link
Author

GitHubIsToxic commented Dec 13, 2017

Then explain the choice of the CR+LF characters? LF means line-feed: on old typewriters to slide the paper up, literally displaying a new line. Carriage return meant sliding the carriage back, with the caret pointing at the start of that fresh new empty line; ready for you to begin typing. But I guess this begins to expose the inadequacy of the choices of the POSIX standard a bit now...

There's absolutely no distinction between the random line-break anywhere in the file and the last one forced on us. Therefor, the editor should show one empty line at the bottom of the text, same as the others, but which one cannot delete. I have used some editors on windows in the past that did this (I think MSVC5 or VB4/5), it felt 'wrong' as UI behavior (especially because the new-line wasn't written to file) but could actually be a reasonable, intuitive solution for this case.

Even better, if the forced line-break is displayed visibly (with this symbol: ↵), it is absolutely clear to the user whether the behavior is enabled, and why there's a trailing line-break added to the file. If it is disabled, the editor could display a warning in it's place.

@haarp
Copy link

haarp commented Dec 13, 2017

I can already hear the lament of users as they try to figure out why there's a symbol they can't delete :P

@GitHubIsToxic
Copy link
Author

GitHubIsToxic commented Dec 13, 2017

Meh. Just display a tooltip with explanation when you hover the mouse pointer over the ↵ symbol.
I would suggest a similar symbol, with a cross through it when the behavior is disabled, with a tooltip explaining how to enable/disable it via Preferences. The symbols could be rendered the same way whitespace characters are visualized in some editors (gray-ish typically, non-selectable of course).

@GitHubIsToxic
Copy link
Author

Just a heads-up: this major bug is still present in Xed 1.8.3.

@asarualim
Copy link

xed 1.8.3 still does not show the last newline of a text file. I assume that POSIX does not require hiding a character from displaying, and hope that this is resolved soon. I got bitten writing a PHP script which had an extra newline not shown in xed, and my local default PHP configuration was tolerant, but on the productive server there was another configuration. This is obscure as hell, and I will use another editor until this bug is resolved. It would be best just save the file as is. People know the ENTER key and what it does.

@GitHubIsToxic
Copy link
Author

GitHubIsToxic commented Feb 3, 2019

+1 for more evidence

I've also ran into PHP issues several times; this bug can cause the well known "headers already send" PHP error because of this invisible whitespace character, e.g. in a 'config.php' file after the '?>' tag. Even if you save it without making changes. I'm sure I'm not the only one who has taken down an entire website for hours due to simply saving a file in gedit/xed.

@InfoLibre
Copy link

InfoLibre commented Nov 12, 2019

Geany shows correctly the end of file, not Pluma/xed : #61 (comment)
In Geany, there's an option to add/not add an endbreak at the end of file when saving it.

@eli-schwartz
Copy link
Contributor

geany incorrectly shows a double end-of-line character, and if you try deleting the so-called empty line at the end, it still saves the file with its (correct) single newline terminator for the last line of text.

Then when you re-open the file in geany, it still has that mysterious "final empty line" which you're positive you just deleted.

This is unintuitive, broken behavior which violates the POSIX specification (and a few others) for what the actual definition of a file, or a line within a file, is. It goes contrary to how proper text editors work, and I'm having a hard time understanding what the rationale for imitating geany's lies might be. Can you please clarify what the actual use case is for this?

@InfoLibre
Copy link

InfoLibre commented Nov 25, 2019

When you edit a document with a LF or with a CRLF at the end of each line :

When you add a new line in the middle of a CRLF file with mcedit, CR is missing at the end of this line and mcedit registers the file like this :
image
https://midnight-commander.org/ticket/1652

When you edit a document with a CR at the end of each line :

  • Only mcedit shows correctly the lines :
    image
  • Pluma/xed incorrectly adds LF but doesn't register them.
  • Geany adds LF but doesn't register them.

So, none of these editors works in the same manner. They all have defaults. But I prefer Geany because this editor shows symbols of CR and LF characters at the end of the lines and it matches correctly with what is registered. In Geany, the cursor is positioned where new typing will go. So, we know what we do. It's not the case with Pluma/xed. With mcedit, it's impossible to add CR.

@eli-schwartz
Copy link
Contributor

In Geany and mcedit, the cursor is at the right place. In Pluma/xed, it is at the wrong place. See #61 (comment) .

I have stated that xed is doing this correctly, since there is no such thing as an empty line at the end of that file. Saying that no, really, it is incorrect, may help you feel vindicated in your belief, but it doesn't actually answer my question, "what is the use case for wanting the text editor to lie"?

When you add a new line in the middle of a CRLF file with mcedit, CR is missing at the end of this line and mcedit registers the file like this :

This indicates that mcedit has opened a file formatted in the "DOS" line endings format, using the unix open mode, and is incorrectly injecting Unix line endings into a DOS-formatted file. If I were to open the file in, say, vim, it would not display confusing ^M characters that don't mean anything -- it would continue to display simply a series of lines of readable text (possibly prose sentences), but the footer, right next to the name of the file, would display the status message: [dos] indicating that the file is formatted using a line ending mode which is not native to my operating system.

Only mcedit shows correctly the lines
Geany adds LF but doesn't register them.

None of that is what I'd truly call "correct". If you want your text editor to display symbols, then the non-printable (formatting-only!) ASCII bytes 0x0D 0x0A must somehow be printed to the screen anyway. In which case, my expectation is that the formatting characteristic of "beginning a new line" shall not be rendered -- show me an unbroken stream of rendered bytes.

In Geany, the cursor is positioned where new typing will go. So, we know what we do. It's not the case with Pluma/xed. With mcedit, it's impossible to add CR.

In xed or pluma or vim or Notepad++ or any other text editor, the cursor is positioned where new typing will go. By default that is the beginning of the first line of the file, but if I move the cursor to the end of the last line, I can append to the end of the last line. If I move the cursor to the end of the last line, then type the ENTER key, future typing will be appended to a new line (which did not exist in the file when I opened the file).

I still do not understand why any of this is a problem. But then, neither do I understand why you are anxious to add ASCII byte 0x0D (also known as the C escape sequence \r or the control character "carriage return") to a file.

  • xed is not a hex editor, you're not supposed to use it to type ASCII control codes. Is there a use case for doing so either when editing prose, or when editing source code?
  • What does the presence or absence of non-printable control characters denoted by their caret notation, have to do with whether or not xed should treat prose files ending in an LF-terminated, non-empty line, as though they had an additional LF-terminated, empty line?
  • Under what circumstances should a prose file refrain from adding the expected end-of-line LF character to files which don't end with one?
  • Under what circumstances should a source code file refrain from adding the end-of-line LF character to files which don't end with one, keeping in mind that some source code file formats consider its absence to be a non-recoverable error, and I cannot think of any case where the end-of-line LF character is explicitly considered to be incorrect?

@GitHubIsToxic
Copy link
Author

GitHubIsToxic commented Dec 6, 2019

I have stated that xed is doing this correctly, since there is no such thing as an empty line at the end of that file.

Sorry, but that is factually incorrect:

CR = Carriage Return means effectively 'end of line', or more correctly, return to beginning of the (same!) line.
See also: https://en.wikipedia.org/wiki/Carriage_return

LF = Line Feed, literally means 'end of current line AND start of new line'. If no characters follow, it is by definition an empty line. Xed does not show this, but does write it to the file when saved. It even adds it to unchanged files when saved.
See also: https://en.wikipedia.org/wiki/Newline

Saying that no, really, it is incorrect, may help you feel vindicated in your belief [...]

Please don't do that sort of thing here.

I cannot think of any case where the end-of-line LF character is explicitly considered to be incorrect?

  • PHP: https://stackoverflow.com/search?q=php+header+already+sent+whitespace
  • and NO, you are not going to write PHP code with a hex editor
  • configuration files that may NOT end with linebreaks
  • single line configuration files that may not contain any linebreaks (e.g. password/hash/key)
  • version control (adding unnecessary changes to files)
  • digital archiving (not secretly changing unchanged files)

Under what circumstances should a prose file refrain from adding the expected end-of-line LF character to files which don't end with one?

When the user is in control of his/her computer. Xed takes control away from the user by adding unwanted hidden linebreaks, and 'lies' about it, by hiding it in the application.
I've never seen a keyboard without an Enter key (most keyboards have two), and have never met a computer user who does not know how to use these keys.
I mention this, because Unix and Linux is all about the user having total control.

some source code file formats consider its absence to be a non-recoverable error

Then that is a problem of that particular piece of software. Let's not add bugs to text editors to work around that problem. And as another user already commented in this thread, those issues were fixed decades ago.

If you want your text editor to display symbols, then the non-printable (formatting-only!) ASCII bytes 0x0D 0x0A must somehow be printed to the screen anyway. In which case, my expectation is that the formatting characteristic of "beginning a new line" shall not be rendered -- show me an unbroken stream of rendered bytes.

Most word processors or code editors have a feature to show hidden characters (including linebreaks): https://i.stack.imgur.com/tPKls.png

Can we please conclude that people have different preferances and opinions on this, and that Xed needs, at least, a configurable option to disable this behavior?

@eli-schwartz
Copy link
Contributor

PHP: https://stackoverflow.com/search?q=php+header+already+sent+whitespace

Right, if you have a single LF end-of-line terminator followed by an additional blank line terminated by a second end-of-line terminator, that might be a problem.

xed doesn't do that, though? xed is compliant with the following policy:

PHP coding standards mandate:

All PHP files MUST use the Unix LF (linefeed) line ending only.
All PHP files MUST end with a non-blank line, terminated with a single LF.

As long as you stick strictly to end-of-line terminators, without adding additional byte sequences consisting of a blank line, terminated by an LF, php won't try to print blank lines as part of your application before you're done printing the headers.

configuration files that may NOT end with linebreaks

WHAT. Examples???

single line configuration files that may not contain any linebreaks (e.g. password/hash/key)

I'm pretty sure every such file I've ever come across has either used linebreaks or not cared about them. I'm also pretty sure I've never seen any password system ever, which permits an LF as a character of the password itself, if only due to the fact that the key is used to submit the password, not add more bytes before submission.

Hashes are not permitted to contain newlines, since they are hex-encoded strings and thus the pool of permitted characters when representing a hash is [a-fA-F0-9]. base64 may contain [0-9a-zA-Z+/] and will ignore any LF characters, silently joining the lines before decoding.

A keyfile is usually treated as a binary file, for example you can use a png as a keyfile. I don't recommend editing binary files in xed for any reason.

version control (adding unnecessary changes to files)

Consistently using terminating LF in all your files ensures that version control correctly accounts for your files. git will produce messages indicating unnecessary changes, via the message "No newline at end of file", indicating the missing LF is bad.

digital archiving (not secretly changing unchanged files)

Do not change digitally archived files for any reason. Open them in readonly mode, and do not click save.

Then that is a problem of that particular piece of software. Let's not add bugs to text editors to work around that problem. And as another user already commented in this thread, those issues were fixed decades ago.

The nonrecoverable error is "this file does not match the POSIX definition of a line of text", but in fact most C compilers today do in fact include clumsy workarounds that complicate the parser, as a hack job in order to work around bugs in Windows text editors.

It is a generally problematic issue for any file format or workflow in which:

  • one file can be included into another file
  • multiple files can be concatenated together

and

  • the user has an expectation that his files will be faithfully represented as stored
  • line breaks change the meaning of the parser (anything other than C languages, and C languages also when the context is a preprocessor macro)

You may also try using cat on a file without a terminating LF on the last non-blank line.

Most word processors or code editors have a feature to show hidden characters (including linebreaks): https://i.stack.imgur.com/tPKls.png

Then consider showing your support for #225

Can we please conclude that people have different preferances and opinions on this, and that Xed needs, at least, a configurable option to disable this behavior?

Set the gsettings configuration key ensure-trailing-newline to false. Your file will be objectively wrong, and "make this the default" has already been rejected, but the setting does exist.

It will certainly not be made mandatory...

@collinss
Copy link
Member

collinss commented Dec 6, 2019

Xed's behavior seems to be consistent with other editors, though there does seem to be quite a bit of variation. After trying a number of different editors, they all defaulted to adding a new line, though some did have the option to disable it. Many of them don't show the extra line either, and some of them are inconsistent about it (Kate, for example, shows the extra line at first, allows you to delete it, and then adds the newline on save but doesn't show it until you reopen the file).

I think it's hard to argue that xed behaves 'incorrectly' since it doesn't do anything that isn't common among other editors. However, I do agree that there should be an option if possible so I tagged this as a feature request rather than a bug.

Please note that I'm not sure this is even something we control currently, and may be handled by the upstream libraries we rely on (looking through the code, I couldn't find any instance of where we add newline characters so my first guess is that it's added by some code in GtkSourceView or even Gtk itself). That doesn't mean we can't change the behavior, but it may pose additional challenges in implementing this option.

@eli-schwartz
Copy link
Contributor

@collinss

<key name="ensure-trailing-newline" type="b">

Which maps to
XED_SETTINGS_ENSURE_TRAILING_NEWLINE,

It seems plausible to expose this as a preferences checkbox rather than require dconf-editor. gedit doesn't seem to have a preferences checkbox, incidentally.

The checkbox would then be rather like using :set noeol in vim.

@collinss
Copy link
Member

collinss commented Dec 6, 2019

@eli-schwartz ah, thanks! I guess I was looking in the wrong place... :) If we already have a key for it, then it should be a simple matter to implement it. It wont make it into a release for 6 months or so, but there's always dconf editor for now. I wonder why it's not already in settings though...

@eli-schwartz
Copy link
Contributor

I guess it is probably inherited design from gedit?

@GitHubIsToxic
Copy link
Author

Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants