Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Number of selected characters #745

Open
philiprbrenan opened this issue Nov 9, 2015 · 7 comments
Open

Number of selected characters #745

philiprbrenan opened this issue Nov 9, 2015 · 7 comments

Comments

@philiprbrenan
Copy link

select1
select2
When I select one character, the selection count and the column number are correct, but when I extend the selection by one character, the selection increases by 4 not 1, although the column number only increases by the expected value of 1. The information provided in the status bar by Geany is often very useful.

@elextr
Copy link
Member

elextr commented Nov 9, 2015

The column counts in glyphs, the selection counts in octets. You do not provide information on what the second entity is, so I'm assuming its code point 1d41a, which has a four octet encoding in UTF-8 which is what the buffer uses.

The manual should be updated to not describe the selection as "characters" since that only applys to ASCII.

@andreysm
Copy link

Could you please show both bytes and characters counts in the status bar? It is essential for non-english texts.
E.g. "line: 2 col: 3 sel: 48 chars: 24".

@elextr
Copy link
Member

elextr commented Dec 24, 2018

@andreysm whats a character?

@andreysm
Copy link

One symbol. Let's just count how many positions they occupy, i.e. sum selected columns for each row.

@elextr
Copy link
Member

elextr commented Dec 24, 2018

So glyphs, the Scintilla editing component doesn't supply that to Geany, but if someone wrote the code to request glyph counts from Scintilla and added them up for multiple lines counting/not counting line ends and noting that tab characters count as multiple glyphs then it can be another display item for the status bar.

But it would probably be better to make a separate issue rather than hijack a three year old one.

@b4n
Copy link
Member

b4n commented Dec 25, 2018

@elextr if it's a matter of counting columns it should be doable as Scintilla gives the column info, it's then a matter of counting how many columns are selected.

However, I'm not sure it makes much sense to count this, as some "characters" take up more than one column -- the most obvious one being the tab character, which even takes a variable amount of columns, but there are others. Counting code points might be slightly better, or rather whatever Scintilla counts as "stops", e.g. "the number of positions the caret can be at" (and this should be fairly easy to count, although probably in a fairly expansive way). Or even the number of actually composited characters. Meh, displaying symbols is so complicated.
My preference would probably go to counting the code points, regardless of their composition because that's the number of "items" stored in the file, and that's often more interesting to know in a programming context than actual rendered characters on screen; but all these informations (bytes, code points, columns, composited characters) are useful in some situations and not in others.

One fairly important information to take into account here is that Geany uses UTF-8 internally, but that does not have to be the file's encoding. This means that the byte count in Geany does not necessarily makes sense in the target encoding -- and suggests the current info is kind of irrelevant, yet is often useful as well.

@elextr
Copy link
Member

elextr commented Dec 25, 2018

if it's a matter of counting columns it should be doable as Scintilla gives the column info, it's then a matter of counting how many columns are selected.

Thats what I said :)

some "characters" take up more than one column -- the most obvious one being the tab character, which even takes a variable amount of columns

ditto :)

My preference would probably go to counting the code points, regardless of their composition because that's the number of "items" stored in the file

But we are talking about the selection, not anything in a file

but all these informations (bytes, code points, columns, composited characters) are useful in some situations and not in others.

Yep, which is the obvious problem with this sort of thing, too many possibilities, but for the selection I'm doubtful any of them are really useful for any common use-case.

Geany uses UTF-8 internally, but that does not have to be the file's encoding

or the encoding of whatever you paste the selection into, IIUC it can be re-encoded when pasted, particularly on Windows.

And counting code points is fine, but what does it give you? Don't forget those combining characters Europeans like to use for their accented characters, two code points for one glyph :)

But if @andreysm has a specific use-case that needs one or other count then it should be no trouble to accept a well written pull request from somebody to add another % code to the status bar, nobody needs to show it if they don't want to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants