Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GAP potentially puts linebreaks between the bytes forming a UTF-8 character #5544

Open
zickgraf opened this issue Dec 15, 2023 · 6 comments
Open

Comments

@zickgraf
Copy link
Contributor

Consider the following situation:

gap> SizeScreen([80]);;
gap> Display(" →→→→→→→→→→→→→→→→→→→→→→→→→→→→");
 →→→→→→→→→→→→→→→→→→→→→→→→→�\
�→→

Observed behaviour

GAP puts a linebreak between the bytes forming the UTF-8 character .
In particular, if this happens inside the output in a .tst file, the file is not a valid UTF-8 file anymore.

Expected behaviour

The linebreak is inserted before or after the UTF-8 character.

I expect that this is a known bug, but I could not find an open issue for this.

Copy and paste GAP banner (to tell us about your setup)

 ┌───────┐   GAP 4.13dev built on 2023-12-15 03:25:31+0100
 │  GAP  │   https://www.gap-system.org
 └───────┘   Architecture: x86_64-pc-linux-gnu-default64-kv9
 Configuration:  gmp 6.2.1, GASMAN, readline
 Loading the library and packages ...
 Packages:   AClib 1.3.2, Alnuth 3.2.1, AtlasRep 2.1.7, AutPGrp 1.11, Browse 1.8.21, CaratInterface 2.3.5, CRISP 1.4.6, Cryst 4.1.26, CrystCat 1.1.10, CTblLib 1.3.6, curlInterface 2.3.2, FactInt 1.6.3, FGA 1.5.0, Forms 1.2.9, 
             GAPDoc 1.6.6, genss 1.6.8, IO 4.8.2, IRREDSOL 1.4.4, LAGUNA 3.9.6, orb 4.9.0, Polenta 1.3.10, Polycyclic 2.16, PrimGrp 3.4.4, RadiRoot 2.9, recog 1.4.2, ResClasses 4.7.3, SmallGrp 1.5.3, Sophus 1.27, SpinSym 1.5.2, 
             StandardFF 1.0, TomLib 1.2.9, TransGrp 3.6.5, utils 0.84
 Try '??help' for help. See also '?copyright', '?cite' and '?authors'
@ChrisJefferson
Copy link
Contributor

Technically, I don't think GAP promises to use UTF-8 -- someone could be using Latin-1 for example.

So, there are various things to decide -- do we want to changing printing based on terminal config, or just decide nowadays everyone wants UTF-8?

@zickgraf
Copy link
Contributor Author

Just some ideas: Maybe an efficient solution could be to not insert linebreaks at all if a string contains any characters outside of the range of printable ASCII characters. Or a partial solution could maybe restrict linebreaks to be inserted only between printable ASCII characters. But maybe that would lead to too many inconsistencies :/

@ChrisJefferson
Copy link
Contributor

This has reminded me of a PR I never got around to finishing (I've just looked at resurrecting it, will need some poking):

#5140

This disables GAP's linebreaks entirely (the reason this is a bit less trivial than you might think is GAP combines line breaks with indendation -- personal I never want GAP to line break, but always want it to indent). I'm going to work on polishing it up over the next few days, then we can see if it would solve this problem, and maybe write some docs for it.

@zickgraf
Copy link
Contributor Author

Ah, I wasn't aware of that PR. I like the idea very much, this would also solve other issues I have.

@ChrisJefferson
Copy link
Contributor

I have now updated #5140 , so it applies to master and has some basic documentation. You should be able to run SetPrintFormattingStatus("*stdout*", rec(linewrap := false, indent := true));, which should stop UTF-8 characters getting chopped, and in general stop GAP terminal wrapping (instead letting your terminal do it's normal thing).

I'd be interested if this seems to handle UTF-8 well, or if there is some unexpected issues

@zickgraf
Copy link
Contributor Author

Very nice, thanks a lot! I just tried out the PR: In a terminal I do not see problems with UTF-8 characters anymore :-) In a tst file, I don't think I can currently affect the formatting of the output stream (which I think is an OutputTextString), right? But I guess we could possibly introduce a new option for Test which sets the formatting once #5140 is merged? In any case, I think #5140 is a huge improvement! I will use it for my local GAP build and will report if anything weird shows up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants