Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stuck in an unsaved changes loop #3798

Closed
afbora opened this issue Oct 12, 2021 · 17 comments · Fixed by #3887 or #3899
Closed

Stuck in an unsaved changes loop #3798

afbora opened this issue Oct 12, 2021 · 17 comments · Fixed by #3887 or #3899
Labels
type: bug 🐛 Is a bug; fixes a bug type: regression 🚨 Is a regression between versions
Milestone

Comments

@afbora
Copy link
Member

afbora commented Oct 12, 2021

Related #3758 PR and still continue unicode characters issue.

You can use following content to reproduce the issue:

[{"attrs":{"width":"full-width","mbottom":"true","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"slides":[{"title":"\u0160ildymo sistemas pri\u017ei\u016brin\u010di\u0173 profesional\u0173 ir savo \u0161ildymo sistema besir\u016bpinan\u010di\u0173 vartotoj\u0173 pam\u0117gti cheminiai priedai","bg":["slide-1.png"],"text":"<ul><li><p>Skland\u017eiam ir ekonomi\u0161kam sistemos veikimui<\/p><\/li><li><p>Efektyviam \u0161ildymo problem\u0173 sprendimui<\/p><\/li><\/ul>","button":{"type":"page","value":"contacts","text":"Daugiau informacijos"}},{"title":"Priedai sistem\u0173 apsaugai nuo korozijos, purvo, nuovir\u0173","bg":["slide-2.png"],"text":"<ul><li><p>Tausoja katil\u0105, \u0161ilumos siurbl\u012f, kitus \u0161ildymo prietaisus ir \u012frang\u0105.<\/p><\/li><li><p>I\u0161lieka veiksmingi vis\u0105 sistemos eksploatavimo laik\u0105.<\/p><\/li><\/ul>","button":{"type":"page","value":"contacts","text":"Daugiau informacijos"}},{"title":"Magnetiniai \u2013 cikloniniai \u0161ildymo sistemos filtrai","bg":["slide-3.png"],"text":"<ul><li><p>Padeda pasi\u0161alinti \u012fvairiems ter\u0161alams ir apsaugo \u0161ildymo prietaisus nuo pirmalaikio gedimo.<\/p><\/li><li><p>Modeliai kompakti\u0161kai ir erdviai katilinei.<\/p><\/li><li><p>10 \u2013 25 met\u0173 garantija.<\/p><\/li><\/ul>","button":{"type":"page","value":"products","text":"Daugiau informacijos"}},{"title":"\u0160ildymo ir \u0161aldymo sistem\u0173 plovikliai","bg":["slide-4.png"],"text":"<ul><li><p>I\u0161tirpdo \u012fvairias nuos\u0117das.<\/p><\/li><li><p>R\u016bg\u0161tiniai ir ner\u016bg\u0161tiniai.<\/p><\/li><li><p>Tinkami visiems praplovimo \u012frenginiams.<\/p><\/li><\/ul>","button":{"type":"page","text":"Daugiau informacijos","value":"products\/plovikliai"}},{"title":"\u0160ildymo sistemos prot\u0117ki\u0173 hermetizavimo priemon\u0117s","bg":["slide-5.png"],"text":"<ul><li><p>Efektyvi pagalba atsiradus mikro prot\u0117kiams.<\/p><\/li><li><p>Suveikia per kelias valandas, nereikia i\u0161leisti sistemos.<\/p><\/li><\/ul>","button":{"type":"page","value":"products\/protekiu-hermetikai","text":"Per\u017ei\u016br\u0117ti produktus"}}],"height":"h-medium","align":"align-left","valign":"valign-middle","paddingleft":null,"paddingright":52,"overlaytoggle":"false","overlaytop":"#FFFFFF","overlaybottom":"#FFFFFF","overlaymobiletoggle":"true","overlaymobiletop":"#FFFFFF82","overlaymobilebottom":"#FFFFFF82","bgmobileposition":null,"titlestyle":"hstyle2","titlecolor":"#0F4C87","tagtitle":"h2","textstyle":"tstyle2","textcolor":"#0F4C87","fontsizea":"","fontweighta":""},"id":"38e799f7-99b2-41e9-8f9c-a046b8f807f6","isHidden":false,"type":"slides"}],"id":"3f93abad-add6-4267-bc58-268f7ee5f18b","width":"1\/1"}],"id":"c35e77bb-e370-4c26-b104-8be6c1184d04"},{"attrs":{"width":"container","mbottom":"false","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"level":"h2","text":"FERNOX \u2014\u00a0jau 57 metus \u2014\u00a0patikimi, profesional\u016bs, aplink\u0105 tausojantys sprendimai j\u016bs\u0173 \u0161ildymo sistemai","halign":"","styleheading":"hstyle2"},"id":"7733aede-1e08-444e-93d7-0b42b0a04924","isHidden":false,"type":"heading"},{"content":{"features":[{"icon":["network.svg"],"title":"Fernox pasaulyje","text":"<p>Fernox - pasaulyje gerai \u017einomas vandens prie\u017ei\u016bros chemikal\u0173 ir \u00a0\u012frangos gamintojas. \u012emon\u0117 prad\u0117jo veikl\u0105 1964 metais Jungtin\u0117je Karalyst\u0117je, o \u0161iandien Fernox produktais prekiaujama daugiau nei 40-yje \u0161ali\u0173 Europoje, Amerikoje ir Azijoje.<\/p>"},{"icon":["science-molecule-strucutre.svg"],"title":"Produkt\u0173 panaudojimas","text":"<p>Fernox produktai skirti tiek priva\u010di\u0173, tiek pramonini\u0173 \u0161ildymo ir \u0161aldymo sistem\u0173 vandens paruo\u0161imui, vamzdyn\u0173 ir \u012frengini\u0173 praplovimui ir nukalkinimui, prot\u0117ki\u0173 hermetizavimui, apsaugai nuo u\u017e\u0161alimo. Platus magnetini\u0173\/ciklonini\u0173 \u0161ildymo sistemos filtr\u0173 pasirinkimas. Sukurta atskira produkt\u0173 grup\u0117 atsinaujinan\u010di\u0173 i\u0161tekli\u0173 \u0161ildymo sistem\u0173 prie\u017ei\u016brai.<\/p>"},{"icon":["file-iso.svg"],"title":"Kokyb\u0117","text":"<p>\u012emon\u0117 nuolat investuoja \u012f\u00a0mokslinius tyrimus ir ie\u0161ko novatori\u0161k\u0173 bei\u00a0aplink\u0105 tausojan\u010di\u0173 sprendim\u0173. Gymybos procese \u012fdiegti ISO 9001, ISO 14001, ISO\/TS 16949 ir OHSAS 18001 standartai. Produktai sertifikuoti \u012fvairiose nepriklausomose ES laboratorijose.<\/p>"}],"cols":4},"id":"1a1daaff-8bf8-4d04-9c13-6210f20843ba","isHidden":false,"type":"features"}],"id":"3a59ae10-4149-48a9-bd38-16bbfd0179b2","width":"1\/1"}],"id":"12fbd5d0-4d90-48f6-8ce4-c221e29e2f79"},{"attrs":{"width":"container","mbottom":"false","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"level":"h2","text":"Norite prad\u0117ti naudotis m\u016bs\u0173 produktais?","halign":"left","styleheading":""},"id":"8134a98d-16d6-4ce5-ac11-3f42f901c194","isHidden":false,"type":"heading"}],"id":"a47bdbd3-2ab3-4793-9e11-4a2eb60e08ef","width":"1\/1"}],"id":"05e639e9-3dde-4387-a197-128c0578e341"},{"attrs":{"width":"container","mbottom":"false","bgcolor":"colorLightGray"},"columns":[{"blocks":[{"content":{"level":"h3","text":"Skambinkite:\u00a0+370 685 06 811","halign":"left","styleheading":""},"id":"fe43f682-998e-4359-a9a3-e60a931437a7","isHidden":false,"type":"heading"},{"content":{"text":"<p><strong>Atstovas Lietuvoje UAB \"Sandomus\"<\/strong><\/p><p>\u012emon\u0117s kodas: 302594355<\/p><p>PVM kodas: LT100005964216<\/p><p>Mobilus: +370 685 06 811<\/p><p>El. pa\u0161tas: info@fernox.lt<\/p><p><\/p><p>Adresas: S. \u017dukausko g. 32-7, Vilnius<\/p>"},"id":"d89ce57d-1adf-4f4a-b082-1abebdc99b7d","isHidden":false,"type":"text"}],"id":"183a9a54-8723-4a29-8dec-4fac60f1d96b","width":"1\/2"},{"blocks":[{"content":{"title":"","name":"Vardas \/ \u012emon\u0117","email":"El. pa\u0161tas","phone":"Telefonas","message":"Papildoma informacija","buttontext":"Si\u0173sti"},"id":"2e8c1a2e-51de-4c5b-bcd1-2c50ece48df8","isHidden":false,"type":"contactform"}],"id":"99b084c3-8c9b-4999-a444-f7a46820ffb5","width":"1\/2"}],"id":"90df7d47-2761-4c74-a36e-4fcb0a641f6e"}]

Kirby 3.6.0-beta.3

@afbora afbora added type: bug 🐛 Is a bug; fixes a bug type: regression 🚨 Is a regression between versions labels Oct 12, 2021
@afbora afbora added this to the 3.6.0-beta.4 milestone Oct 12, 2021
@afbora
Copy link
Member Author

afbora commented Oct 12, 2021

I noticed that the Sane class convert spaces in unicode 🤷‍♂️ My brain is burned with Sane/Dom/Parsley classes 🤯

@bastianallgeier bastianallgeier changed the title [3.6.0-beta.3] Stuck in an unsaved changes loop Stuck in an unsaved changes loop Oct 15, 2021
@bastianallgeier
Copy link
Member

@lukasbestle do you have an idea why spaces are converted?

@lukasbestle
Copy link
Member

The conversion most likely happens inside libxml (PHP DOM classes) between parsing and conversion back to a string.

Is the converted result incorrect? Or is it just about the unsaved changes loop?

@distantnative
Copy link
Member

distantnative commented Oct 31, 2021

Sounds to me like our setup (e.g. the Dom class) has issues with encodings - like it receives A in encoding X, processes it and spits out B in encoding Y. This gets passed to the Panel and browser, where automatically it converts somehow to A in X again which is compared to B in Y - it differs, so we have the changes bar. And starting from top endlessly.

I don't know the Dom class well, but I see some transformations and assumptions about very specific encodings:
https://github.com/getkirby/kirby/blob/develop/src/Toolkit/Dom.php#L73-L75

E.g. what irritates me there is that the comment says ISO-8859-1 is need, but the conversion is actually to HTML-ENTITIES.

Maybe the cause is located somewhere around there?

@lukasbestle
Copy link
Member

lukasbestle commented Oct 31, 2021

E.g. what irritates me there is that the comment says ISO-8859-1 is need, but the conversion is actually to HTML-ENTITIES.

libxml parses the HTML as ISO-8859-1 unless there is a <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> tag in the input HTML (the new shorter HTML5 variant of this is not supported). So unless we inject such a meta tag and remove it manually again, we need to input code in ISO-8859-1 to the loadHTML() method. And that is done by converting everything to entities. As the entity names only use ASCII characters, the entities are valid in ISO-8859-1 and therefore parsed correctly. If we converted directly to ISO-8859-1 instead of to HTML-ENTITIES, every UTF-8 char without an equivalent would be converted to a question mark, so content would get lost.

The "inject a meta tag" route would probably be the "better" solution, but also be a hack and I'm not sure if it will cause other issues, e.g. if there's already a meta tag in the input. So I decided for entity escaping as it's also suggested in the comments in the PHP docs.

@afbora I don't fully understand which part of your test case causes which issue. Could you convert your test case into a test that directly uses the Dom class? I.e. an HTML string that is as short as possible but exhibits the wrong behavior when parsed and printed by the Dom class.

@nilshoerrmann
Copy link
Contributor

If you want to load UTF-8 content, this has worked for us pretty well over the years:

$document->loadHTML('<?xml encoding="UTF-8">' . $html);

This is how we use it:
https://github.com/hananils/kirby-tree-methods/blob/09d25aae7fadb403984c02282349b387299b4ea9/lib/tree.php#L39

@lukasbestle
Copy link
Member

Oh, that looks pretty nice. Thanks, I'll try it out!

@afbora
Copy link
Member Author

afbora commented Oct 31, 2021

@lukasbestle You can test following text on writer field:

FERNOX — jau 57 metus — patikimi, profesionalūs, aplinką tausojantys sprendimai jūsų šildymo sistemai

@afbora
Copy link
Member Author

afbora commented Oct 31, 2021

@lukasbestle FYI, also when you remove the Sane::sanitize(), works great!

https://github.com/getkirby/kirby/blob/develop/config/fields/writer.php#L32

@afbora
Copy link
Member Author

afbora commented Oct 31, 2021

If I was able to get it right, @nilshoerrmann's suggestion didn't work :((

@afbora
Copy link
Member Author

afbora commented Oct 31, 2021

Saved without sane sanitize and no stucks ✅

<p>FERNOX —&nbsp;jau 57 metus —&nbsp;patikimi, profesionalūs, aplinką tausojantys sprendimai jūsų šildymo sistemai</p>

Saved with sane sanitize and stuck 💥

<p>FERNOX — jau 57 metus — patikimi, profesionalūs, aplinką tausojantys sprendimai jūsų šildymo sistemai</p>

@lukasbestle
Copy link
Member

I tried several options with the following result (simple test with just new Dom($string)->toString(); input = expected result):

$this->doc->loadHTML($code)

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<html><body><p>TEST &acirc;&#128;&#148;&Acirc;&nbsp;j&Aring;&laquo;s&Aring;&sup3; &Aring;&iexcl;ildymo sistemai</p></body></html>

❌ Input is parsed as ISO-8859-1, which completely destroys the Unicode multibyte sequences.

$this->doc->loadHTML(mb_convert_encoding($code, 'HTML-ENTITIES', 'UTF-8'))

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<html><body><p>TEST &mdash;&nbsp;j&#363;s&#371; &scaron;ildymo sistemai</p></body></html>

❌ As the input is encoded as entities, the output also is encoded as entities.

$this->doc->loadHTML('<?xml encoding="UTF-8">' . $code)

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<?xml encoding="UTF-8"><html><body><p>TEST &mdash;&nbsp;j&#363;s&#371; &scaron;ildymo sistemai</p></body></html>

❌ The input doesn't contain entities, but weirdly libxml still converts everything to entities on output. Seems like it exports as ISO-8859-1 even though the XML declaration says the document is UTF-8.

$this->doc->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=utf-8">' . $code)

--- Expected
+++ Actual
@@ @@
-'<html><body><p>TEST — jūsų šildymo sistemai</p></body></html>
+'<html><head><meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head><body><p>TEST — jūsų šildymo sistemai</p></body></html>

✅ Promising, but we need to remove the added meta tag without removing any other tag that was already in the input.

@lukasbestle
Copy link
Member

I'm onto something here. Let's hope it will work. :)

lukasbestle added a commit that referenced this issue Oct 31, 2021
Modifications were needed in two different places because:
1. `libxml` needs to be told to parse the input as UTF-8. A `<meta>` tag would create an empty `<head>` and converting everything to entities is a hack. The XML declaration works great and is the easiest to remove right afterwards.
2. If the `<meta>` tag is used and kept in the document, additional processing (e.g. in `$dom->sanitize()`) will think that it's part of the input. So everything we add needs to be removed immediately after parsing.
3. On output we need the `<meta>` tag again as it's the only way to avoid an export as `ISO-8859-1` with entities.

Fixes #3798.
lukasbestle added a commit that referenced this issue Oct 31, 2021
Modifications were needed in two different places because:
1. `libxml` needs to be told to parse the input as UTF-8. A `<meta>` tag would create an empty `<head>` and converting everything to entities is a hack. The XML declaration works great and is the easiest to remove right afterwards.
2. If the `<meta>` tag is used and kept in the document, additional processing (e.g. in `$dom->sanitize()`) will think that it's part of the input. So everything we add needs to be removed immediately after parsing.
3. On output we need the `<meta>` tag again as it's the only way to avoid an export as `ISO-8859-1` with entities.

Fixes #3798.
lukasbestle added a commit that referenced this issue Nov 1, 2021
Modifications were needed in two different places because:
1. `libxml` needs to be told to parse the input as UTF-8. A `<meta>` tag would create an empty `<head>` and converting everything to entities is a hack. The XML declaration works great and is the easiest to remove right afterwards.
2. If the `<meta>` tag is used and kept in the document, additional processing (e.g. in `$dom->sanitize()`) will think that it's part of the input. So everything we add needs to be removed immediately after parsing.
3. On output we need the `<meta>` tag again as it's the only way to avoid an export as `ISO-8859-1` with entities.

Fixes #3798.
@afbora
Copy link
Member Author

afbora commented Nov 1, 2021

I think the issue is that the writer field's handling of the value/text conflicts with that of the sane class. But I couldn't figure out.

@lukasbestle lukasbestle removed their assignment Nov 1, 2021
@bastianallgeier bastianallgeier modified the milestones: 3.6.0-rc.3, 3.6.0-rc.4 Nov 2, 2021
@bastianallgeier
Copy link
Member

Ok, I'll try to explain the underlying problem a bit to keep track of it here.

Form changes

As soon as content gets loaded from the server to fill the form, a copy of it is stored in Vuex as original form values.

Whenever a form input updates, the updated value is also stored in Vuex in a second "changes" object. The original values object and the changes object are then compared against each other to find out of something has changed. In this case, the orange bar shows up.

This is the basic logic behind the changes bar.

Writer

The Writer initializes ProseMirror under the hood. ProseMirror receives the HTML from the server and turns it into its own JSON document format. In order to create this format, it has to be fed with all kinds of rules about allowed block level elements and inline elements. It will automatically do all the hard work and filter out unwanted shit in the HTML – kill attributes, styles, unwanted tags, etc. It is very strict when performing this task.

That JSON document format can then be turned back into clean HTML again. But that HTML might differ from the original HTML the Writer received. For example, when you give the Writer this:

<p>Some&nbsp;<b>Text</b></p>

The output after the clean-up would be:

<p>Some <strong>Text</strong></p>

This is actually really nice, but it also leads to a big problem.

When the form is loaded, the original value in Vuex would be the string from the server. We already clean up the HTML there with our Sane class and the clean up process is already very very close to what ProseMirror is doing, but it will never be 100% the same unless we'd be able to let ProseMirror run on the server, which is unfortunately not possible.

So the original already differs from the initial value once ProseMirror has been initalized. ProseMirror therefor already sends an input event and the changes object contains a value that's no longer the same than the original. That's why the changes bar appears immediately and can also never be fully removed.

@bastianallgeier
Copy link
Member

After writing this all down, I might have found a solution in the PR above.

@afbora afbora linked a pull request Nov 2, 2021 that will close this issue
@bastianallgeier
Copy link
Member

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug 🐛 Is a bug; fixes a bug type: regression 🚨 Is a regression between versions
Projects
None yet
5 participants