Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Utf8 Badge Fix And URL Parser International Support (initial) #6426

Merged
merged 15 commits into from Jul 24, 2019

Conversation

@underhood
Copy link
Contributor

underhood commented Jul 10, 2019

Summary

Fixes #3117

Additionally it adds support for UTF-8 in URL parser (as it should).
Label sizes now are updated by browser with JavaScript (although guess is still calculated by verdana11_widths with minor improvements)

Component Name

API/Badges, LibNetData/URL

Additional Information

It was found that not only verdana11_widths need to be updated but the url parser replaces international characters with spaces (one space per each byte of multibyte character).
Therefore I update both to support international chars.

Screenshots Before/After

Before "v1.16.0":
OLDTAG_v1_16_0

After Firefox:
Latest_Firefox

After Chromium:
Latest_Chromium

@underhood underhood requested review from cakrit and thiagoftsm as code owners Jul 10, 2019
@CLAassistant

This comment has been minimized.

Copy link

CLAassistant commented Jul 10, 2019

CLA assistant check
All committers have signed the CLA.

@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 10, 2019

Better Pics for compare

Old

old

New

new

libnetdata/url/url.c Outdated Show resolved Hide resolved
@thiagoftsm

This comment has been minimized.

Copy link
Contributor

thiagoftsm commented Jul 11, 2019

Hi @underhood ,

Your PR is working as expected, I used Latin alphabet and Cyrillic alphabet and for both I could get the result correctly, so congratulation.

Now let me explain a little more about the format for you adjust it and we approve the PR. For example, in the declaration of the function url_decode_multibyte_utf8, you are putting the pointer signal in two different positions:

url_decode_multibyte_utf8(char *s, char d, char d_end)

At least in my vision, the "char* d_end" is something that I hope to see in Java code, in C code we generally use the other, so you are having two patterns in the same function declaration.

Another thing that I saw, in the function verdana_width, in the line 152 you have a variable "double t" that will never be used, because you created another in the line 162. It is completely possible that with the compilers that we have nowadays, this missing variable will be ignored and it won't try to move the value for the specific registers, but case we compile this file applying "-Werror -Wfatal-errors -Wall" it won't compile, because the warning will become an error.

Please, do these two adjusts and I will approve it!

Copy link
Contributor

thiagoftsm left a comment

Please, see the comments that I did about a variable that we are missing and the format of pointers in a function.

web/api/badges/web_buffer_svg.c Outdated Show resolved Hide resolved
web/api/badges/web_buffer_svg.c Outdated Show resolved Hide resolved
web/api/badges/web_buffer_svg.c Outdated Show resolved Hide resolved
web/api/badges/web_buffer_svg.c Show resolved Hide resolved
web/api/badges/web_buffer_svg.c Show resolved Hide resolved
web/api/badges/web_buffer_svg.c Show resolved Hide resolved
@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 11, 2019

Hi @thiagoftsm

not sure I completely understand you in first point

url_decode_multibyte_utf8(char *s, char *d, char *d_end)

The meaning here is:

  • s - source
  • d - destination
  • d_end - last_byte of destination

Alternative may be source, destination and destination size.
Do you object to this or naming of variables?

For second point "double t" yep that slipped my attention I will remove it.

@underhood underhood requested review from mfundul and vlvkobal as code owners Jul 11, 2019
@ilyam8

This comment has been minimized.

Copy link
Member

ilyam8 commented Jul 11, 2019

@underhood If you are still working on it please add wip to the PR title

@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 11, 2019

@underhood If you are still working on it please add wip to the PR title

hi, almost done... sry for delay working on it only on lunch breaks and evenings

@ilyam8

This comment has been minimized.

Copy link
Member

ilyam8 commented Jul 11, 2019

hi, almost done... sry for delay working on it only on lunch breaks and evenings

no worries, mate

wip is needed to show that the work in progress and we dont need to review.

@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 11, 2019

hi, almost done... sry for delay working on it only on lunch breaks and evenings

no worries, mate

wip is needed to show that the work in progress and we dont need to review.

OK done for now with latest push... in case no one has new requests or see new problems

Copy link
Contributor

cakrit left a comment

make dist failed in travis, not sure what the problem with utf.h is...

In file included from libnetdata/os.h:6:0,
                 from libnetdata/os.c:3:
libnetdata/libnetdata.h:316:25: fatal error: string/utf8.h: No such file or directory
 #include "string/utf8.h"
                         ^
compilation terminated.
In file included from ./daemon/common.h:6:0,
                 from ./web/api/web_api_v1.h:6,
                 from web/api/formatters/rrd2json.h:6,
                 from web/api/formatters/json_wrapper.h:6,
                 from web/api/formatters/json_wrapper.c:3:
./daemon/../libnetdata/libnetdata.h:316:25: fatal error: string/utf8.h: No such file or directory
 #include "string/utf8.h"
@cakrit cakrit requested a review from thiagoftsm Jul 11, 2019
@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 11, 2019

Hmmm... strange. Compiled fine here. If someone has idea come with it. If not I will continue/investigate during weekend (tomorrow my schedule is packed :( )

Copy link
Contributor

thiagoftsm left a comment

Hmmm... strange. Compiled fine here. If someone has idea come with it. If not I will continue/investigate during weekend (tomorrow my schedule is packed :( )

Hmmm... strange. Compiled fine here. If someone has idea come with it. If not I will continue/investigate during weekend (tomorrow my schedule is packed :( )

Please, see the comment that I did in the line that you added the new header and you will understand the motive.

libnetdata/url/url.c Outdated Show resolved Hide resolved
web/api/badges/web_buffer_svg.c Show resolved Hide resolved
libnetdata/string/utf8.h Outdated Show resolved Hide resolved
libnetdata/libnetdata.h Show resolved Hide resolved
libnetdata/url/url.c Outdated Show resolved Hide resolved
@underhood underhood changed the title Utf8 Badge Fix And URL Parser International Support (initial) WIP Utf8 Badge Fix And URL Parser International Support (initial) Jul 11, 2019
@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 11, 2019

Put WIP smaller obvious fixes I will do asap but I will not be able to get to %c2 issue until weekend

@underhood underhood changed the title WIP Utf8 Badge Fix And URL Parser International Support (initial) Utf8 Badge Fix And URL Parser International Support (initial) Jul 13, 2019
@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 13, 2019

Hi guys,

so I made it actually return HTTP STATUS 400 in case there is anything wrong in URL (as it should). Original code tried to skip strange chars but still used the input which was dangerous. Therefore this is improvement in any case (it is doing what it should have done in first place even before my changes).

@cakrit

This comment has been minimized.

Copy link
Contributor

cakrit commented Jul 22, 2019

@underhood please sign the CLA again, we made a change to it.

underhood added 10 commits Jul 10, 2019
label sent to browser is set to original one because
verdana11_width removed chars which it didn't have in table
https://www.w3.org/International/articles/idn-and-iri/

This allows for api requests with international characters
and emoticons etc. in NetData Badge label etc.
This is not so important anymore since badge size is updated
dynamically by browser now
Still I thought I will update this just in case JS is disabled
Although rework is suggested
This function is now used only as backup and will be reworked in future
@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 22, 2019

Done

@thiagoftsm

This comment has been minimized.

Copy link
Contributor

thiagoftsm commented Jul 23, 2019

Hi @underhood ,

I was testing your PR now, and I saw that for short utf8 your code is doing a good job, so I decided to go to the limits ( 5 and 6 bytes), and I got the same error I was having with %c2. Please check this for us!

Netdata4

@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 23, 2019

I was testing your PR now, and I saw that for short utf8 your code is doing a good job, so I decided to go to the limits ( 5 and 6 bytes), and I got the same error I was having with %c2. Please check this for us!
Add overlong UTF8 checking that was originally planned to add with next pull request.

Well UTF8 chars longer than 4 bytes (not valid) are correctly rejected. Your failing case is reporting 4 byte char with 4 bytes given and %c0 in middle. This then is rejected by browser.

Added overlong UTF8 checking that was originally planned to add with next pull request. Therefore this and possible other issues are covered as well.
Function that check this is under good license
http://www.cl.cam.ac.uk/~mgk25/short-license.html

@thiagoftsm

This comment has been minimized.

Copy link
Contributor

thiagoftsm commented Jul 24, 2019

I was testing your PR now, and I saw that for short utf8 your code is doing a good job, so I decided to go to the limits ( 5 and 6 bytes), and I got the same error I was having with %c2. Please check this for us!
Add overlong UTF8 checking that was originally planned to add with next pull request.

Well UTF8 chars longer than 4 bytes (not valid) are correctly rejected. Your failing case is reporting 4 byte char with 4 bytes given and %c0 in middle. This then is rejected by browser.

Added overlong UTF8 checking that was originally planned to add with next pull request. Therefore this and possible other issues are covered as well.
Function that check this is under good license
http://www.cl.cam.ac.uk/~mgk25/short-license.html

Hi @underhood, of course it is very important to have in our minds whether the input is valid or not, but returning for the output showed in the image, our current parser does not show any error for the wrong string given as argument. I know our parser is not complete yet, this is the motive that you are working on it, but there are some behaviors that it is interesting to keep. In my vision, at least, I think it is better to hidden the wrong characters instead to send the message "URL not valid. I don't understand you...".

@underhood

This comment has been minimized.

Copy link
Contributor Author

underhood commented Jul 24, 2019

Well in case you have malformed Utf8 as input I believe returning 400 is correct way to go. If you just type special chars into the address bar browser will correctly form url. If you have something else it is probably some hacking attempt (or testing). I think it is quite dangerous trying to fix it. If you have Utf8 char that requires 4 bytes but there are 3... How many bytes do you skip? 3or4? What if 4th byte is beginning of next Utf8 char? What if 4th byte is start of url encoded ASCII char? I think trying to continue parsing url when we know it is not ok is inherently unsafe and sketchy (user will have some funky stuff in the badge)

@cakrit

This comment has been minimized.

Copy link
Contributor

cakrit commented Jul 24, 2019

Well in case you have malformed Utf8 as input I believe returning 400 is correct way to go. If you just type special chars into the address bar browser will correctly form url. If you have something else it is probably some hacking attempt (or testing). I think it is quite dangerous trying to fix it. If you have Utf8 char that requires 4 bytes but there are 3... How many bytes do you skip? 3or4? What if 4th byte is beginning of next Utf8 char? What if 4th byte is start of url encoded ASCII char? I think trying to continue parsing url when we know it is not ok is inherently unsafe and sketchy (user will have some funky stuff in the badge)

This makes sense. Let's approve @thiagoftsm

@cakrit
cakrit approved these changes Jul 24, 2019
@thiagoftsm thiagoftsm self-requested a review Jul 24, 2019
Copy link
Contributor

thiagoftsm left a comment

All right @underhood , with your last comments I got your point, and I am approving it.

@cakrit cakrit merged commit 19f1bd1 into netdata:master Jul 24, 2019
9 of 12 checks passed
9 of 12 checks passed
Header rules - netdata No header rules processed
Details
Pages changed - netdata 2 new files uploaded
Details
Redirect rules - netdata No redirect rules processed
Details
Codacy/PR Quality Review Up to standards. A positive pull request.
Details
LGTM analysis: C/C++ No new or fixed alerts
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No code changes detected
Details
Mixed content - netdata No mixed content detected
Details
Travis CI - Pull Request Build Passed
Details
WIP Ready for review
Details
license/cla Contributor License Agreement is signed.
Details
netlify/netdata/deploy-preview Deploy preview ready!
Details
@underhood underhood deleted the underhood:utf8-badge branch Jul 24, 2019
jacekkolasa added a commit to jacekkolasa/netdata that referenced this pull request Aug 6, 2019
…a#6426)

#### Summary
Fixes netdata#3117

Additionally it adds support for UTF-8 in URL parser (as it should).
Label sizes now are updated by browser with JavaScript (although guess is still calculated by verdana11_widths with minor improvements)

#### Component Name
API/Badges, LibNetData/URL

#### Additional Information
It was found that not only verdana11_widths need to be updated but the url parser replaces international characters with spaces (one space per each byte of multibyte character).
Therefore I update both to support international chars.
jacekkolasa added a commit to jacekkolasa/netdata that referenced this pull request Aug 6, 2019
…a#6426)

#### Summary
Fixes netdata#3117

Additionally it adds support for UTF-8 in URL parser (as it should).
Label sizes now are updated by browser with JavaScript (although guess is still calculated by verdana11_widths with minor improvements)

#### Component Name
API/Badges, LibNetData/URL

#### Additional Information
It was found that not only verdana11_widths need to be updated but the url parser replaces international characters with spaces (one space per each byte of multibyte character).
Therefore I update both to support international chars.
sunflowerbofh added a commit to sunflowerbofh/netdata that referenced this pull request Aug 15, 2019
…a#6426)

#### Summary
Fixes netdata#3117

Additionally it adds support for UTF-8 in URL parser (as it should).
Label sizes now are updated by browser with JavaScript (although guess is still calculated by verdana11_widths with minor improvements)

#### Component Name
API/Badges, LibNetData/URL

#### Additional Information
It was found that not only verdana11_widths need to be updated but the url parser replaces international characters with spaces (one space per each byte of multibyte character).
Therefore I update both to support international chars.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants
You can’t perform that action at this time.