Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce URL parser based on algorithm provided in Living URL standard #32513

Closed
poutsma opened this issue Mar 22, 2024 · 1 comment
Closed
Assignees
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Milestone

Comments

@poutsma
Copy link
Contributor

poutsma commented Mar 22, 2024

In the UriComponentsBuilder::fromUriString, we use regular expressions to parse a given String into the various URI components (scheme, host, path, etc.). Regular expressions, by their very nature, are limited in what they can and cannot track. Because of these limitations, URL parsing has been a significant source of security reports recently. Additionally, the expressions have grown to be quite complicated over the years.

The Living URL standard provides a robust algorithm for parsing URLs. We should introduce a URL parser based on that algorithm, instead of using regular expressions.

@poutsma poutsma added in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement labels Mar 22, 2024
@poutsma poutsma added this to the 6.2.0-M1 milestone Mar 22, 2024
@poutsma poutsma self-assigned this Mar 22, 2024
@poutsma
Copy link
Contributor Author

poutsma commented Mar 22, 2024

Due to security considerations, this is an issue that we'd like to handle ourselves, and as such it is not open for external contributions.

@poutsma poutsma modified the milestones: 6.2.0-M1, 6.2.0-M2 Apr 9, 2024
poutsma added a commit that referenced this issue Apr 18, 2024
poutsma added a commit that referenced this issue Apr 26, 2024
poutsma added a commit that referenced this issue May 2, 2024
poutsma added a commit that referenced this issue May 3, 2024
Improvements include:
- Replace throwing exceptions with failure results in hot areas,
- Verify digits of a string before passing it to Integer::parseInt
- Lazily initialization of fields
- Using LinkedList instead of ArrayList where size is not known
  beforehand

See gh-32513
poutsma added a commit that referenced this issue May 3, 2024
- Consistent use of codePointAt instead of charAt.
- Fix bug in domainToAscii

See gh-32513
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
in: web Issues in web modules (web, webmvc, webflux, websocket) type: enhancement A general enhancement
Projects
None yet
Development

No branches or pull requests

1 participant