-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arabic not rendering correctly #484
Comments
I'm looking at the diff for ChatRenderer.cs 1.50.6 -> 1.50.7 and I honestly can't figure it out. |
Found the problem. It doesn't seem like it should have, but this line change is what caused it all.
Yes, really. Fixing the CLI not using the embedded Inter font broke Arabic rendering. I don't know why. Edit: It's because Arabic support for the Inter fontface is broken. Just installed Inter from fonts.google.com and that version also renders incorrectly. |
rsms/inter#523 (comment) It looks like it's official that Inter doesn't support Arabic. |
@mohad12211 can you compile that branch and see if Arabic renders correctly when given a font that properly supports it. I'm paranoid it's the fault of my system. |
I used the github-actios binaries instead of compiling, assuming that it won't affect anything, let me know if you want me to actually compile it instead of using the github-actions. rendered: rendered: chat.json: https://file.io/ANLWjGSSBf4D edit: yes, the first problem has nothing to do with Arabic, replacing the Arabic text with hello produced this: |
Github-actions build works great
This is probably an artifact of this change made in #464. Before it would advance by the amount of codepoints but 99.9% of emojis are handled as a single char in Substring's eyes so an index out of bounds was not uncommon. I personally would rather have the duplicate emojis than the ioob.
I can see why this is a problem and how it occurs. My solution to this is would be a rewrite of ChatRenderer.cs in OOP because the structure of it as it stands can be a headache to work with and is very unoptimal. For example there is no reason to draw nonFont chars one-by-one and we could speed up rendering while also making it easier to read by drawing whole nonFont sections in one go, but doing this would require rewriting a few functions. |
I see, it's a rare occurrence anyways, for now everything seems to be working. |
what do you mean by an "invalid font"? |
In order for the arabic to render correctly in the image below, I passed TwitchDownloader/TwitchDownloaderCore/ChatRenderer.cs Lines 60 to 71 in e87319a
When I pass any other valid font, such as Both images had the exact same message input as the first message in this issue thread, I just copy-pasted it into a different chat for faster render times. |
Ohhh now I see, in that case I have the same problem. Tested on Linux. |
The thing is though, passing a font that doesnt support arabic/no -f argument goes and fetches your system font to render nonFont chars with, but only in the case of arabic (as far as i can tell) does it fail to render correctly in this specific scenario despite it fetching the same font as it would have had you passed |
I think that this is the problem, maybe... I can see that both of these images use the same font, the letters are drawn in the same style (font). but the letters are in the wrong form. in Arabic, the shape of the letters changes depending on whether they occur alone, initially, medially, or finally within a word. so, I assume that you normally render word by word, but in the case of nonFont chars, you render them one by one. is that the case? are you rendering word by word normally but in nonFont chars, char by char? |
Y'know I never even thought of that. I'll test if that's the case and I guess also test my theory about the wasted memory. |
And as a side effect, rendering nonFont got slightly faster with the chat you provided Before: FINISHED. RENDER TIME: 15s SPEED: 3.81x Edit: This seems to be a flat speed increase rather than a % speed increase |
@mohad12211 Fixed this edge case. Curiously though, this only lost the duplicate emoji. I have no clue what the white thing is. Perhaps a modifier that our emoji codec doesn't support? |
Were ioob that common? Why weren't there more crashes reported because of this.
Aren't 99.9% of emojis multiple characters? Because the char in C# internally is UTF-16 and most emojis seem to be a high and low surrogate of these UTF-16 characters to make a UTF-32 character. EDIT: |
8-30-21.iLoveKeepo69.-.emojis.mp4Some old emoji testing I did, but it never resulted in an OOB exception for me. |
I ran into them when someone sent the English flag 8 times in a row. Substring sees it as 1 char but it's made of 14 codepoints so before we would substring up 14 chars and IOOB.
It's a mix of both. Working with Unicode has been a pain in my ass. There's technically another edge case that results in the wrong font being used for nonFont rendering because there is no way to get the Unicode range of a specific char. I looked online for like 3 days, stackoverflow always egotistically says to have the dev specify the desired Unicode range in the source code which does jack for us. |
Do you still have an example JSON or could construct one that would crash on the older versions? I did testing with flags before and I didn't notice anything like that. I feel pretty strongly that we should not be decrementing just by 1. |
I just changed it to using the length of the sequence string rather than incrementing by just 1, thats what fixed the duplicate emojis in #484 (comment). Before the patch we were incrementing by the number of codepoints. 5974546 337bc10 |
Oh that works. I just don't see how it's functionally different than what was in before #464. Besides the obvious trim difference, I feel it would always return the same thing?
Oh you catch the OOB now. If you could paste me a string that hits the OOB I'm still really curious... |
Oh. Guess I'll add the trim back. I swear at some point we were substringing by the amount of codepoints. Heres the OOB string: 🏴🏴🏴🏴🏴🏴🏴🏴🏴🏴🏴🏴 |
Ok I'm running it again and its not OOB anymore, what?? I'm so confused man. Did I accidentally change it to codepoints length and cause the OOB myself or did swapping to spans fix it? |
No yeah, it did render on 1.51.1 alright |
Well I've re-added the trim but left the try-catch incase it happens again. In other words it's back to how it used to substring in a6dda93 1.51.0/1.51.1 substring by 1 only char. |
Checklist
Edition
Both
Describe your issue here
Some or all Arabic may not be rendering correctly. 1.50.7 also has the problem but as far as I can tell 1.50.6 does not
Source
1.50.7 - Master
1.50.6
The text was updated successfully, but these errors were encountered: