Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrieving modified HTML #66

Closed
ghost opened this issue Sep 16, 2016 · 6 comments
Closed

Retrieving modified HTML #66

ghost opened this issue Sep 16, 2016 · 6 comments

Comments

@ghost
Copy link

ghost commented Sep 16, 2016

Perhaps I'm being very silly but I cannot find a way to retrieve the modified HTML from the tree. If I print the tree by using myhtml_tree_print_node_children I can see all the modifications, but this output is of course not the HTML I want.

The only way I found to get HTML out again was using myhtml_tree_incoming_buffer_first, but this gets me the unmodified HTML, which is not useful to me.

What is the correct way of doing this?

@lexborisov
Copy link
Owner

lexborisov commented Sep 16, 2016

Hi!
You want to make a serialization of the the final tree? See example.

Or I can create serialization as required by the specification. Something like a function that will return const char * of fully modified HTML.

@ghost
Copy link
Author

ghost commented Sep 16, 2016

I would - of course - be very happy if you were to make a full serialization algorithm. It would save a lot of work for me traversing the whole tree to build all the attributes.

@lexborisov
Copy link
Owner

Ok, I will create it

@lexborisov
Copy link
Owner

Done. See example. But, I have not time to test this. Please, test the serialization function.

@ghost
Copy link
Author

ghost commented Sep 19, 2016

It seems to work well when serializing HTML that is already valid. When you parse something invalid and then serialize it the library fixes this to make the output valid (which is good!), but not all the elements that appear in the output can be found using myhtml_get_nodes_by_tag_id.

Observe the following - obviously incorrect - HTML code:

<!DOCTYPE html>
<html>
    <body>
        <code><a href="http://www.google.com/?q=google"></code>Google it!<code></a></code>
    </body>
</html>

This gets serialized to the following output:

<!DOCTYPE html><html><head></head><body>
        <code><a href="http://www.google.com/?q=google"></a></code><a href="http://www.google.com/?q=google">Google it!<code></code></a>


</body></html>

As you can see the invalid code tag has caused the the tag to be sort of duplicated. Only the first of these tags is findable using myhtml_get_nodes_by_tag_id.

@lexborisov
Copy link
Owner

Fixed!
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant