Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode characters not written to excel #743

Merged

Conversation

sandraros
Copy link
Collaborator

Fix #672 ("vietnamese") and #688 ("emoji").
It works for a recent system but unfortunately I can't test it on a old system to make sure that the new method is_support_non_xml_characters returns false.

sandraros and others added 5 commits February 11, 2021 21:14
Pull latest abap2xlsx changes
AUnit warning zcl excel reader huge file "# au  > "#au abap2xlsx#739 (abap2xlsx#742)
Fix abap2xlsx#672 ("vietnamese") and abap2xlsx#688 ("emoji").
It works for a recent system but unfortunately I can't test it on a old system to make sure that the new method `is_support_non_xml_characters` returns false.
@gregorwolf
Copy link
Collaborator

Can you extend or provide a test report to replicate this behaviour?

@sandraros
Copy link
Collaborator Author

I propose to extend the Hello world program ZDEMO_EXCEL1 to write "Hello world" in 10 most-spoken languages in the world + emoji 👋🌎, 👋🌍, 👋🌏 (waving hand + 3 parts of the world).

@gregorwolf
Copy link
Collaborator

Good suggestion. Please add it to this PR.

@sandraros
Copy link
Collaborator Author

Done.

@AndreaBorgia-Abo
Copy link
Member

@sandraros how old should the system be?
Would SAP_ABA SAPKA74003 be enough?

@sandraros
Copy link
Collaborator Author

@AndreaBorgia-Abo I tested on a 7.52 system so 7.40 is fine.

@AndreaBorgia-Abo
Copy link
Member

Ok, it returns false and the content of the test string is: 3C3F786D6C2076657273696F6E3D22312E30223F3E3C524F4F543EEDA080EDB0803C2F524F4F543E

unicodetest

The generated Excel looks like this:
2021-05-19 20_06_59-abap2xlsx Demo_ Hello World

@sandraros
Copy link
Collaborator Author

@AndreaBorgia-Abo I guess Excel doesn't say the file is corrupted. Could you force r_result = abap_true to see what happens?

@AndreaBorgia-Abo
Copy link
Member

With abap_false, the file is displayed, no warnings or anything: it just works.
Forcing abap_true on our system, I get the folowing messages:
1.
2021-05-20 09_43_16-Microsoft Excel
2.
2021-05-20 09_46_39-Ripristini a '~SAP{B7EF7487-7E8E-407B-A568-FE6CEF73FA98} xlsx'

@sandraros
Copy link
Collaborator Author

@AndreaBorgia-Abo So, that was the perfect test with SAP kernel having the bug, and correctly detected by abap2xlsx and handled the best way it can. Thanks!

@AndreaBorgia-Abo
Copy link
Member

AndreaBorgia-Abo commented May 20, 2021

Any particular notes to check or kernel version?

@sandraros
Copy link
Collaborator Author

@AndreaBorgia-Abo SKIP_NON_XML_CHARACTERS was made available in note 1750204 and Kernel must be as per note 2220720.

@AndreaBorgia-Abo
Copy link
Member

1750204: SAP_BASIS is SAPKB74003, higher than required
2220720: Kernel is 749 PL 500, also higher.
This should be ok, right?

@sandraros
Copy link
Collaborator Author

@AndreaBorgia-Abo Your system should have valid support for "non-XML characters" according to the SAP notes, but the method says your SAP system doesn't support them, and the test you did by forcing true via debug confirms it. What value do you get in lv_xstring please?

@AndreaBorgia-Abo
Copy link
Member

3C3F786D6C2076657273696F6E3D22312E30223F3E3C524F4F543EEDA080EDB0803C2F524F4F543E

@sandraros
Copy link
Collaborator Author

@AndreaBorgia-Abo Well, EDA080 and EDB080 correspond to invalid UTF-8 conversion of the two (invalid) code points U+D800 and U+DC00, but if the SAP kernel supports it correctly, it should return F0908080 (as on my 7.52 system), which corresponds to the UTF-8 code value of character U+10000. I don't know what to think. Maybe UCCP doesn't work well for surrogate code points U+D800 and U+DC00 in old systems. I have lv_u10000 hexadecimal value '00D800DC', and my system Code Page 4103/Little Endian (transaction code SNLS). What do you have?

@AndreaBorgia-Abo
Copy link
Member

immagine

immagine

I think I could copypaste the method body into a standalone report and run it across a few systems, without bothering with the rest of abap2xlsx, then write down SAP_ABA and SAP_BASIS plus the contents of those variables. Deal?

@sandraros
Copy link
Collaborator Author

sandraros commented May 21, 2021

@AndreaBorgia-Abo The current code does what I expect it to do, but I didn't expect that it wouldn't work in your system configuration. I think I have just found out why, maybe that doesn't work in any version before 7.52, that's explained in the note 2922674 - Support for Unicode Characters U+10000 to U+10FFFF in the iXML kernel library / ABAP package SIXML. Hence, maybe this completely new code might be more universal. Could you test it please?

@AndreaBorgia-Abo
Copy link
Member

The new code works :)
immagine

@gregorwolf gregorwolf merged commit 6eac426 into abap2xlsx:master May 21, 2021
@gregorwolf
Copy link
Collaborator

Thanks a lot.

@AndreaBorgia-Abo
Copy link
Member

@gregorwolf my comment refers to an out-of-PR version that @sandraros wanted me to test, I believe, before including it in the PR. As the master branch stands now, those changes are not yet present.

@gregorwolf
Copy link
Collaborator

Sorry. Do you have already a PR for the improvements?

@AndreaBorgia-Abo
Copy link
Member

Let's wait for @sandraros , it's her code and her PR.

@sandraros
Copy link
Collaborator Author

@AndreaBorgia-Abo Sorry for the delay. I just created the bug #756 that corresponds to the symptom you have seen in 7.40, and the PR #757, that does the same solution as you have tested but the code is a little bit different so that --I hope-- there's less risk of error in non-Unicode systems (+ comments). Could you test this new PR please?

gregorwolf added a commit to sandraros/abap2xlsx that referenced this pull request Jun 21, 2021
@sandraros sandraros deleted the Unicode-characters-not-written-to-Excel branch September 21, 2021 18:13
@sandraros sandraros mentioned this pull request Dec 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing Vietnamese text when output
3 participants