Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profanity is sometimes highlighting a mention wrong. #1220

Closed
mdosch opened this issue Nov 1, 2019 · 9 comments
Closed

Profanity is sometimes highlighting a mention wrong. #1220

mdosch opened this issue Nov 1, 2019 · 9 comments
Assignees
Milestone

Comments

@mdosch
Copy link
Contributor

mdosch commented Nov 1, 2019

Profanity is configured to highlight my Nickname for mentions but sometimes it highlights something wrong.

2019-11-01-141849_scrot

Expected Behavior

In this example it should highlight "Martin"

Current Behavior

It highlights other stuff and also mixes in the username "debacle" maybe caused by the <200e>.

Context

01/11/2019 14:17:25: xmpp: DBG: RECV: <message id="e7de6887-7feb-4925-ba8a-39ae4a0c3379" type="groupchat" lang="en" to="REDACTED" from="chat@dino.im/debacle"><body>I hope, this quoting style never makes it to XMPP.
<200e>&gt; [13:00:19] <200e>Martin<200e>: Yeah, at least we get paid for that. :D
<200e>&gt; &gt; [12:58:30] <200e>Ge0rG<200e>: Martin: I know. My personal and my business behaviors in that regard are opposite as well
&gt; <200e>&gt; &gt; [12:55:25] <200e>Martin<200e>: Ge0rG: In my company they want TOFU as the expect to be able to scroll through the full history if needed...</body><origin-id id="e7de6887-7feb-4925-ba8a-39ae4a0c3379" xmlns="urn:xmpp:sid:0"/><active xmlns="http://jabber.org/protocol/chatstates"/></message>

Environment

Version 0.7.1dev.master.f71de61b
Debian Bullseye (testing)

gnome-terminal --version                                                                      :(
# GNOME Terminal 3.34.2 using VTE 0.58.2 +BIDI +GNUTLS
@jubalh
Copy link
Member

jubalh commented Nov 1, 2019

Another example:

Gajim output:

‎[04:06:29 PM] ‎jubalh‎: DebXWoody, not yet. Does it work?
‎[04:07:17 PM] ‎debacle‎: ‎[15:04:27] ‎Martin‎: debacle: Let's throw some stuff at me, let's see if it happens again.

Profanity output:
issue

XML console:

01/11/2019 16:07:17: xmpp: DBG: RECV: <message id="386f8f89-22ef-42bc-a509-c1f8722e04b7" type="groupchat" lang="en" to="martin@mdosch.de/WIXYrKnv" from="profanity@rooms.dismail.de/debacle"><body><200e>[15:04:27] <200e>Martin<200e>: debacle: Let's throw some stuff at me, let's see if it happens again.</body><origin-id id="386f8f89-22ef-42bc-a509-c1f8722e04b7" xmlns="urn:xmpp:sid:0"/><active xmlns="http://jabber.org/protocol/chatstates"/><stanza-id id="d9534fa5-7a9b-48b7-9e47-3fa910234255" by="profanity@rooms.dismail.de" xmlns="urn:xmpp:sid:0"/></message>

@mdosch
Copy link
Contributor Author

mdosch commented Nov 1, 2019

It seems these <200e> are coming from Gajim. This excerpt is from the Gajim MUC:

01.11.19 14:46:49 - Martin: Does gajim insert <200e>? https://en.wikipedia.org/wiki/Left-to-right_mark
01.11.19 14:47:59 - Martin: Because I had a problem with profanity and assume it is caused by this control character
                    (see https://github.com/profanity-im/profanity/issues/1220 ) and I think the sender used Gajim (at
                    least according to the version request answer on his muc nick).
01.11.19 14:54:13 - Link Mauve: Martin, yes, it does that.
01.11.19 14:54:21 - Link Mauve: Should be only for display though.
01.11.19 15:49:07 - debacle: Martin, Link Mauve, I copy/pasted from and to Gajim. Maybe that caused the <200e>?
01.11.19 15:52:31 - Link Mauve: It indeed is in the UI.
01.11.19 15:52:41 - Link Mauve: IIRC it was a workaround to help rtl languages.
01.11.19 15:53:13 - Link Mauve: Imo it shouldn’t be present when we know everything is ltr, or everything is rtl, but
                    that’s harder to do than putting this character always.
01.11.19 16:11:34 - Martin: I think this is causing the displaying errors in profanity.
01.11.19 16:15:42 - lovetox: yes martin is related to copy paste
01.11.19 16:16:20 - lovetox: but i think this is not necessary anymore, because there is now a gtk setting that just
                    displays everything rtl
01.11.19 16:16:25 - lovetox: so im in the process of removing that

@jubalh
Copy link
Member

jubalh commented Nov 1, 2019

In Gajim one needs to copy with right click and then selecting copy in the context menu. Otherwise it will add those special characters. This way it removes them when copying to clipboard.

@jubalh
Copy link
Member

jubalh commented Nov 1, 2019

So the question is, should we have some general filtering of incoming messages for certain sequences and remove them.
What kind of sequences would that be?

@mdosch
Copy link
Contributor Author

mdosch commented Nov 1, 2019 via email

@jubalh
Copy link
Member

jubalh commented Nov 1, 2019

After reading https://en.wikipedia.org/wiki/ANSI_escape_code I'm still not sure. Can I just look for \\e and remove this? Seems there are various ways this can be encoded? Even depending on the used terminal?

@jubalh
Copy link
Member

jubalh commented Nov 5, 2019

Seems this is hard/impossible to do. We have various unicode stuff and escape stuff and we don't know what is what.

mcabber and poezio also seem not to do manipulate this.
Will close for now. Feel free to reopen or comment with ideas.

@jubalh jubalh closed this as completed Nov 5, 2019
@mdosch
Copy link
Contributor Author

mdosch commented Nov 6, 2019

I created an issue at the Gajim tracker: https://dev.gajim.org/gajim/gajim/issues/9881

@lovetox
Copy link

lovetox commented Nov 9, 2019

Just replace \u200E and \u200F before you send it to display.

these are 2 unicode chars and not disallowed by xmpp or xml standard, so you have to deal with them someway

you can not depend on other clients sending you only unicode chars you can display

@jubalh jubalh reopened this Nov 11, 2019
@jubalh jubalh self-assigned this Nov 11, 2019
@jubalh jubalh added this to the 0.8.0 milestone Nov 11, 2019
jubalh added a commit to jubalh/profanity that referenced this issue Nov 12, 2019
Gajim sends \u200E and \u200F for RTL.
It is planned that Gajim stops doing this and uses some GTK feature to
get the same result.

However users expressed the whish that we filter out such characters in
incoming messages before displaying them to make Profanity more robust.

I'm still not sure whether I like the solution because it means a lot of
allocating/deallocating upon every new message.

Fix profanity-im#1220
@jubalh jubalh closed this as completed in 87f9bac Nov 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants