Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add XML fragment parsing to xml5ever #271

Open
Ygg01 opened this issue May 5, 2017 · 12 comments
Open

Add XML fragment parsing to xml5ever #271

Ygg01 opened this issue May 5, 2017 · 12 comments
Labels

Comments

@Ygg01
Copy link
Contributor

Ygg01 commented May 5, 2017

Since I deleted the xml5ever repostiory, I've decided to move this issue here. This was a feature @nox asked, and it seems that it's a pain point in servo as well see servo/servo#11995.

The algorithm for parsing xml fragment is in HTML5 spec.

Basically, what's needed is to construct a context element with given name, a NamespaceMap, and input string.
It's advised to use parse_fragment from h5e as reference point.

@Ygg01 Ygg01 changed the title Add XML templates to xml5ever Add XML fragment parsing to xml5ever May 5, 2017
@SimonSapin
Copy link
Member

SimonSapin commented May 5, 2017

(Digression)

What? Why delete the repository? A number of existing links are gonna be broken :/

@Ygg01
Copy link
Contributor Author

Ygg01 commented May 5, 2017

(Digression)

I was doing some spring cleaning and removing repositories I didn't need. I wasn't aware there were _any_ links to it.

I'll fix any breakage if possible. The only issue I see is xml5ever Travis badge. Unless there are some invisible Git links.

@SimonSapin
Copy link
Member

SimonSapin commented May 5, 2017

(Digression)

The code and git history lives on, but a *github* repository also houses a pull request and issue tracker. These contain discussion of why things were done they way they were. For example https://github.com//issues/266 links to https://github.com/Ygg01/xml5ever/issues/30. These discussions can still be relevant later.

I wasn't aware there were any links to it.

That’s how the web works. Everything has an URL, and someone doesn’t need to tell you when they start using that URL somewhere. This is why cool URLs don’t change.

@Ygg01
Copy link
Contributor Author

Ygg01 commented May 8, 2017

(Digression)

What can I say - Mea culpa. I'll keep that in mind in future.

@SimonSapin
Copy link
Member

Sorry for going off-topic. Back to the actual issue, is an actual NamespaceMap really required or only a resolved (prefix, ns_url, local) name?

@Ygg01
Copy link
Contributor Author

Ygg01 commented May 11, 2017

(Digression)

No worries, you were right about it. It was a foolish decision on my part.
-------

If I understood spec correctly when it says:

declaring all the namespace prefixes that are in scope on that element in the DOM, as well as declaring the default namespace (if any) that is in scope on that element in the DOM.

That means function parsing XML fragment needs pairs of (prefix, ns_url) as input, plus a way to denote default namespace, so it could create documents with all QualName fields assigned.

To answer the question, I don't think it's necessary to send a NamespaceMap, we just need the tuples (prefix, ns_url), possibly using some sort of common interface like IntoIter could work.

@SimonSapin
Copy link
Member

Ah, I see. The part I was missing is that the "starting point" of parse_fragment also includes all of the context’s namespace declarations, not just the for the "context element".

@Ygg01
Copy link
Contributor Author

Ygg01 commented May 12, 2017

I think the question that remains is parse_fragments definition:

  • With NamespaceMap - I think this might be revealing too much internal implementation
fn parse_fragment<Sink>(mut sink: Sink, opts: ParseOpts,
                        context_name: StrTendril, context_namespace: NamespaceMap)
-> Parser<Sink>
  • With IntoIter<(prefix, ns_url)> with default namespace
fn parse_fragment<Sink>(mut sink: Sink, opts: ParseOpts,
                        context_name: IntoIter<(Prefix, Namespace)>, 
                        default_namespace: Option<Namespace>)
-> Parser<Sink>
  • With IntoIter<(Option<prefix>, ns_url)> with default namespace baked in IntoIter (although that leaves the nasty option of what to do when multiple (None, ns_url) pairs)
fn parse_fragment<Sink>(mut sink: Sink, opts: ParseOpts,
                        context_name: IntoIter<(Option<prefix>, ns_url)>)
-> Parser<Sink>
  • Or some other option.

@SimonSapin
Copy link
Member

SimonSapin commented May 13, 2017

In selectors, users provide a generic thing with lookup methods:

https://github.com/servo/servo/blob/master/components/selectors/parser.rs#L106-L113

    fn default_namespace(&self) -> Option<<Self::Impl as SelectorImpl>::NamespaceUrl> {
        None
    }

    fn namespace_for_prefix(&self, _prefix: &<Self::Impl as SelectorImpl>::NamespacePrefix)
                            -> Option<<Self::Impl as SelectorImpl>::NamespaceUrl> {
        None
    }

(NamespaceUrl and NamespacePrefix are also generic string types in selectors, but for xml5ever they can be concrete markup5ever::Namespace and markup5ever::Prefix.)

@SimonSapin
Copy link
Member

It's advised to use parse_fragment from h5e as reference point.

Maybe not. Servo does not use parse_fragment (or anything in the driver module), it uses TreeBuilder::new_for_fragment and tokenizer_state_for_context_elem directly.

I don’t know if there is a use for fragment parsing outside of a browser engine. To the extend that Servo is the only user, there is no need to spend too much time working on a nice API that minimize the amount of user code.

I’m considering removing html5ever::driver::parse_fragment*, which I believe nobody use. @nox what do you think?

@s6nqou
Copy link

s6nqou commented Apr 12, 2022

Is there any progress or plan?

From the above discussion, I understand that only a XmlTreeBuilder::new_for_fragment needs to be implemented, right?

Maybe I will work on this, in order to implement parse_xml_fragment in Servo. But I'm not sure how complicated it is, I need to get familiar with xml5ever.

@Ygg01
Copy link
Contributor Author

Ygg01 commented Apr 13, 2022

Dunno. I had some plans but they have fallen down to the wayside due to RL issues and other projects.

You can have a crack at it. It's not super complicated, but also not super newbie-friendly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants