Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

International Domain Names (IDN) Support #2034

Merged
merged 4 commits into from
Sep 28, 2022
Merged

International Domain Names (IDN) Support #2034

merged 4 commits into from
Sep 28, 2022

Conversation

vinibrsl
Copy link
Contributor

@vinibrsl vinibrsl commented Jul 19, 2022

Changes

This PR adds support for IDNs as discussed in #993. I replaced the latin alphanumeric validation with a more broad one, supporting other alphabets. I also made some changes to URI encoding to make sure our URLs are still good looking.

Tests

  • Automated tests have been added

Changelog

  • Entry has been added to changelog

Documentation

  • This change does not need a documentation update

Dark mode

  • The UI has been tested both in dark and light mode

@bundlemon
Copy link

bundlemon bot commented Jul 28, 2022

BundleMon

Unchanged files (7)
Status Path Size Limits
static/css/app.css
515.19KB -
static/js/dashboard.js
295.63KB -
static/js/app.js
12.13KB -
static/js/embed.host.js
5.58KB -
static/js/embed.content.js
5.06KB -
tracker/js/plausible.js
748B -
static/js/applyTheme.js
314B -

No change in files bundle size

Final result: ✅

View report in BundleMon website ➡️


Current branch size history | Target branch size history

Comment on lines 44 to 45
|> validate_format(:domain, ~r/^[a-zA-Z0-9\-\.\/\:]*$/,
|> validate_format(:domain, ~r/^[-.\\:\p{L}0-9]*$/u,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replaced a-zA-Z with p{L} which stands for Unicode letter (in any alphabet).

lib/plausible/sites.ex Outdated Show resolved Hide resolved
@vinibrsl vinibrsl requested a review from ukutaht July 28, 2022 19:24
@vinibrsl vinibrsl marked this pull request as ready for review July 28, 2022 19:27
lib/plausible/site/schema.ex Outdated Show resolved Hide resolved
@vinibrsl vinibrsl added the feature request New feature or request label Sep 15, 2022
@vinibrsl vinibrsl requested review from a team and ukutaht September 23, 2022 15:13
@metmarkosaric
Copy link
Contributor

@vinibrsl nice! for marketing/communication purposes: when this is merged, we will support any character for the domain name? or this adds support for some specific international characters only? basically need to know what to communicate to people

@vinibrsl
Copy link
Contributor Author

@vinibrsl nice! for marketing/communication purposes: when this is merged, we will support any character for the domain name? or this adds support for some specific international characters only? basically need to know what to communicate to people

@metmarkosaric this pull request adds support for any Unicode character in the "letter" category, not any character. In other words, a letter in any language. Plus numbers, dots, and slashes.

Copy link
Contributor

@RobertJoonas RobertJoonas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

Copy link
Contributor

@ukutaht ukutaht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks good overall!

The new encoding function is used in a couple places but there are many many more places where we encode domains. Try running git grep 'URI.encode_www_form' in the project to see all of them.

I need some convincing in the first place that the old function isn't going to work:

iex(7)> URI.encode_www_form("wèbsite.com/wat")
"w%C3%A8bsite.com%2Fwat"

When I paste this into the browser, I get:
Screenshot 2022-09-26 at 16 43 04

Which looks correct to me. So maybe we don't need to re-do the encoding?

@ukutaht
Copy link
Contributor

ukutaht commented Sep 26, 2022

I've always thought the techincal debt we have is not using the Phoenix Route helpers which automatically url-encode your params. So instead of:

    |> redirect(to: "/#{URI.encode_www_form(site.domain)}/settings/danger-zone")

We could do

    |> redirect(to: Routes.site_path(conn, :settings_danger_zone, site.domain))

But I think the behaviour between them is exactly the same

@vinibrsl
Copy link
Contributor Author

vinibrsl commented Sep 27, 2022

The PR looks good overall!

The new encoding function is used in a couple places but there are many many more places where we encode domains. Try running git grep 'URI.encode_www_form' in the project to see all of them.

I need some convincing in the first place that the old function isn't going to work:

iex(7)> URI.encode_www_form("wèbsite.com/wat")
"w%C3%A8bsite.com%2Fwat"

When I paste this into the browser, I get: Screenshot 2022-09-26 at 16 43 04

Which looks correct to me. So maybe we don't need to re-do the encoding?

You're right, we don't need the new encoding function for the URLs to work. However, there are two places where the escaped URL is displayed: (1) dashboard visibility toggle and (2) shared links. Here's how they look without the new function:

Screen Shot 2022-09-27 at 12 29 35

At least for the visibility toggle we should have some kind of escaping. I moved the new function to be used only in that view (d49fb6a). Based on your suggestion I also changed Phoenix links and buttons to the generated Routes functions in that same view (31775fc). Here's how it's looking:

Screen Shot 2022-09-27 at 12 27 03

@ukutaht ukutaht merged commit 7489290 into plausible:master Sep 28, 2022
@vinibrsl vinibrsl deleted the idn-support branch September 28, 2022 11:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants