New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fix] support data:image URLs in prerendering #2713
Conversation
|
Would be nice to add a test case for this as well |
Discussion on Discord about whether we might need to do HTML parsing to make this work: https://discord.com/channels/457912077277855764/781266015601295360/904411015083606106 |
there is long discusion https://discord.com/channels/457912077277855764/781266015601295360/906016984267886682 about using combination of more regexes with JS logic... and spliting it to more steps. I liked idea of using something like Syberag suggested function extractData(source) {
let getTag = /<(a|link|img|source)\b/g
let getAttributes = /(?<name>\w+)\s*(?:=(?:"([^"]*)"|'([^']*)|([^>]+)\b))?/g
let tags = source.matchAll(getTag)
return Array.from(tags).map(tag => {
let start = tag.index
let subSource = source.substr(start + tag[0].length) // Issue is probably here, because I take the document until the end instead of just the tag
let attributes = Array.from(subSource.matchAll(getAttributes))
return {
tag: tag[1],
attributes: attributes.map(attr => ({name: attr.groups.name, value: attr[2]||attr[3]||attr[4]}))}
})
} even tho, this doesn't work correctly, but with this approach I think, we can make performance fast, dependency less and less buggy solution. On the other hand, JSdom is something, other devs know more likely, and would be able to contribute in future with less effort to understand code. E: I also got one really dumb idea of after matching tag, extract it, serialize, and then character after character read it, where we would put some boolean to variable, if attribute value is closed, and if not, ignore E: seems parse5 solution will be good for solving this issue. |
My understanding from the Discord thread is that the issue here is attributes containing the |
In input, we get string, that contains HTML code... but in this string how we can detect which Now imagine You have this crazy input: <img data-something=" src=" crossorigin src="dog>cat.png"> crazy link </img> It means, that
And that's when there is not another tag before or after. And there can be even nested attributes inside attributes... not just If we match anything between
When we want to read src value, we get:
those both values are incorrect, we want I'm not sure how we would be able to fix it with escaping, when we still needs to find where to escape where @Conduitry answered #2742 (comment) I think, because regex can't cover all cases of HTML language parsing, I'm sure it would be good to rewrite it, and use parse5 parser instead. It would fix a lot of bugs, and even those, we don't know about today. I think parse5 solution will give us a lot of advantages, both now and in future. And it will make code even more maintainable, as regex is just not readable. Ofc, maybe I'm incorrect, and it is possible to escape it, I don't totally understand this, but still I think, parse5 is better solution. |
@gg187on is attempting to deploy a commit to the Svelte Team on Vercel. A member of the Team first needs to authorize it. |
fixes: #2645
Before submitting the PR, please make sure you do the following
Tests
pnpm test
and lint the project withpnpm lint
andpnpm check
Changesets
pnpx changeset
and following the prompts. All changesets should bepatch
until SvelteKit 1.0