Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter out invalid characters server side #4

Closed
freekmurze opened this issue Oct 27, 2017 · 11 comments
Closed

Filter out invalid characters server side #4

freekmurze opened this issue Oct 27, 2017 · 11 comments

Comments

@freekmurze
Copy link
Member

freekmurze commented Oct 27, 2017

Only alphanumeric characters and a . should be accepted. Let's just filter those out.

@brendt
Copy link
Contributor

brendt commented Oct 27, 2017

We're going to need a little more than just alphanumeric characters and a ., eg. - and _.

This post (which is based on the URI spec) lists a lot more: https://stackoverflow.com/questions/1547899/which-characters-make-a-url-invalid/1547940#1547940

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-._~:/?#[]@!$&'()*+,;=`.

All of these are valid URI characters. ? eg. doesn't make sense in a domain name, but people could still paste a whole URL, which should be able to be parsed. I think PHP's parse_url might actually be a good solution, because than you shouldn't worry about parsing the input at all. If it's a valid URL, parse_url($url, PHP_URL_HOST) will return the host, otherwise it will return null. http:// or https:// should be present though.

@bbashy
Copy link
Contributor

bbashy commented Oct 27, 2017

@willemvb
Copy link
Contributor

Side note: what with top level domains?
Eg. . as first character might need some extra attention:

  • be currently works
  • . currently works
  • .be fails

@brendt
Copy link
Contributor

brendt commented Oct 27, 2017

@bbashy that's perfect, no?

protected function sanitizeInput(string $input = ''): string
{
    $input = str_replace(['http://', 'https://'], '', $input);

    $input = parse_url("http://{$input}", PHP_URL_HOST);

    return strtolower($input);
}

@bbashy
Copy link
Contributor

bbashy commented Oct 27, 2017

@brendt Yeah something like that is fine apart from you need to remove http(s):// since dig doesn't accept it.

@bbashy
Copy link
Contributor

bbashy commented Oct 27, 2017

Not like #9

@brendt
Copy link
Contributor

brendt commented Oct 27, 2017

@bbashy parse_url also automatically filters out http:// and https://. You just have to make sure the protocol is there to be able to parse it: https://3v4l.org/8nqnA

@brendt
Copy link
Contributor

brendt commented Oct 27, 2017

Also @willemvb parse_url addresses that issue: https://3v4l.org/CI9pp

@bbashy
Copy link
Contributor

bbashy commented Oct 27, 2017

Ah yeah, of course!

@brendt
Copy link
Contributor

brendt commented Oct 27, 2017

@willemvb I was wrong about parse_url addressing the be issue.

To be complete:

  • .be is not a valid domain, but for better user experience, it should work and be filtered.
  • be. is a valid domain, but doesn't work with parse_url.
  • . also doesn't work, but is a valid domain (the root domain). This exception should also work.

So these three exceptions should be handled before parse_url; but maybe than parse_url might not be needed at all.

@freekmurze
Copy link
Member Author

Fixed in #9

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants