Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parsing encoding problem with .html() on <style> tag #1186

Closed
mino181295 opened this issue Mar 15, 2019 · 5 comments
Closed

Parsing encoding problem with .html() on <style> tag #1186

mino181295 opened this issue Mar 15, 2019 · 5 comments
Labels
duplicate This is a duplicate issue or root-cause of another issue

Comments

@mino181295
Copy link

I have a problem parsing and then setting the <style> tag.
When I do .html() and then set back the string with .html(s) I get the HTML entity < that does not work correctly in CSS.

When I do:

        String html = ""
                + "<html>"
                + " <head>"
                + "     <style>.outer > .inner {background-color:white;}</style>"
                + " </head>"
                + " <body>"
                + "     Example"
                + " </body>"
                + "</html>";
        Document document = Jsoup.parse(html);
        Element style = document.selectFirst("style");
        
        String s = style.html();
        //..
        style.html(s);
        
        System.out.println(document.toString());

I get:

<html>
 <head>
  <style>.outer &gt; .inner {background-color:white;}</style>
 </head>
 <body>
   Example
 </body>
</html>

But I'm expecting that the style tag preserve the > selector or it won't work correctly.
I'm expecting that the result HTML code is:

<html>
 <head>
  <style>.outer > .inner {background-color:white;}</style>
 </head>
 <body>
   Example
 </body>
</html>

I would be happy to contribute to an eventual patch if you give me some hints.

@mino181295 mino181295 changed the title Problems .html() on <style> tag Parsing encoding problem with .html() on <style> tag Mar 15, 2019
@mino181295
Copy link
Author

@jhy

@lnostdal
Copy link

lnostdal commented Mar 21, 2019

Work on a clone in order to avoid mutating the document instance. Essentially the same as your code, but in Clojure

symbolicweb.core> (let [document (Jsoup/parse (str ""
                                                   "<html>"
                                                   " <head>"
                                                   "     <style>.outer > .inner {background-color:white;}</style>"
                                                   " </head>"
                                                   " <body>"
                                                   "     Example"
                                                   " </body>"
                                                   "</html>"))
                        style (.selectFirst document "style")]
                    (println (.toString (.html style (.html style))))
                    (println "\n###\n")
                    (println (.toString document)))
<style>.outer &gt; .inner {background-color:white;}</style>

###

<html>
 <head> 
  <style>.outer &gt; .inner {background-color:white;}</style> 
 </head> 
 <body>
   Example 
 </body>
</html>
nil
symbolicweb.core> 

...when we use .clone, it is fixed:

symbolicweb.core> (let [document (Jsoup/parse (str ""
                                                   "<html>"
                                                   " <head>"
                                                   "     <style>.outer > .inner {background-color:white;}</style>"
                                                   " </head>"
                                                   " <body>"
                                                   "     Example"
                                                   " </body>"
                                                   "</html>"))
                        style (.clone (.selectFirst document "style"))]
                    (println (.toString (.html style (.html style))))
                    (println "\n###\n")
                    (println (.toString document)))
<style>.outer &gt; .inner {background-color:white;}</style>

###

<html>
 <head> 
  <style>.outer > .inner {background-color:white;}</style> 
 </head> 
 <body>
   Example 
 </body>
</html>
nil
symbolicweb.core> 

@eolivelli
Copy link

eolivelli commented Mar 22, 2019

I have found the problem.
In @mino181295 case we have to mutate the STYLE element (the example was not complete).

Mutating STYLE element must not be done with Element.html() but with a DataNode

I am sending a Pull Request with an example in form of test case
#1186

@mino181295
Copy link
Author

Thank you @eolivelli that solved the problem.

@jhy jhy added the duplicate This is a duplicate issue or root-cause of another issue label Dec 29, 2020
@jhy
Copy link
Owner

jhy commented Dec 29, 2020

This was fixed with #1419

@jhy jhy closed this as completed Dec 29, 2020
jhy added a commit that referenced this issue Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This is a duplicate issue or root-cause of another issue
Projects
None yet
Development

No branches or pull requests

4 participants