Retrieving modified HTML #66

ghost · 2016-09-16T11:39:58Z

Perhaps I'm being very silly but I cannot find a way to retrieve the modified HTML from the tree. If I print the tree by using myhtml_tree_print_node_children I can see all the modifications, but this output is of course not the HTML I want.

The only way I found to get HTML out again was using myhtml_tree_incoming_buffer_first, but this gets me the unmodified HTML, which is not useful to me.

What is the correct way of doing this?

lexborisov · 2016-09-16T12:18:17Z

Hi!
You want to make a serialization of the the final tree? See example.

Or I can create serialization as required by the specification. Something like a function that will return const char * of fully modified HTML.

ghost · 2016-09-16T12:36:18Z

I would - of course - be very happy if you were to make a full serialization algorithm. It would save a lot of work for me traversing the whole tree to build all the attributes.

lexborisov · 2016-09-16T13:18:35Z

Ok, I will create it

lexborisov · 2016-09-16T19:19:56Z

Done. See example. But, I have not time to test this. Please, test the serialization function.

ghost · 2016-09-19T08:55:15Z

It seems to work well when serializing HTML that is already valid. When you parse something invalid and then serialize it the library fixes this to make the output valid (which is good!), but not all the elements that appear in the output can be found using myhtml_get_nodes_by_tag_id.

Observe the following - obviously incorrect - HTML code:

<!DOCTYPE html>
<html>
    <body>
        <code><a href="http://www.google.com/?q=google"></code>Google it!<code></a></code>
    </body>
</html>

This gets serialized to the following output:

<!DOCTYPE html><html><head></head><body>
        <code><a href="http://www.google.com/?q=google"></a></code><a href="http://www.google.com/?q=google">Google it!<code></code></a>


</body></html>

As you can see the invalid code tag has caused the the tag to be sort of duplicated. Only the first of these tags is findable using myhtml_get_nodes_by_tag_id.

…ction; #66 (comment)

lexborisov · 2016-09-19T09:22:32Z

Fixed!
Thanks!

lexborisov added a commit that referenced this issue Sep 16, 2016

Added tree serialization by specification; Added example; #66

8dfd72f

lexborisov added a commit that referenced this issue Sep 19, 2016

Added inserting node to index for myhtml_tree_node_insert_by_node fun…

a42e067

…ction; #66 (comment)

lexborisov closed this as completed Sep 24, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieving modified HTML #66

Retrieving modified HTML #66

ghost commented Sep 16, 2016

lexborisov commented Sep 16, 2016 •

edited

Loading

ghost commented Sep 16, 2016

lexborisov commented Sep 16, 2016

lexborisov commented Sep 16, 2016

ghost commented Sep 19, 2016

lexborisov commented Sep 19, 2016

Retrieving modified HTML #66

Retrieving modified HTML #66

Comments

ghost commented Sep 16, 2016

lexborisov commented Sep 16, 2016 • edited Loading

ghost commented Sep 16, 2016

lexborisov commented Sep 16, 2016

lexborisov commented Sep 16, 2016

ghost commented Sep 19, 2016

lexborisov commented Sep 19, 2016

lexborisov commented Sep 16, 2016 •

edited

Loading