Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

<script> containing tags causes issues #104

Closed
kozura opened this issue Jun 14, 2011 · 3 comments
Closed

<script> containing tags causes issues #104

kozura opened this issue Jun 14, 2011 · 3 comments
Assignees
Labels
bug Confirmed bug that we should fix
Milestone

Comments

@kozura
Copy link

kozura commented Jun 14, 2011

Thanks for the release, using 1.6.0 now, and getting issues with http://techcrunch.com. html has a script tag containing tags inside of javascript strings. Seems to be treating those as real tag openers, creating tag elements and causing the close script tag to be ignored and therefore include a ton of other stuff. I think this was working in 1.5.2.

Simplified example:

<HTML>
<body>
 <div class=vsc sig=Uga>
  <div class=before></div>
  <script type="text/javascript">
   header = jQuery('#header_features');
   if(header.length){
    header
     .prepend('<a class="prevPage browse left " />')
     .append('<a class="nextPage browse right" />');

    items
     .wrapAll('<div class="scrollable"/>')
     .wrapAll('<ul class="items"/>')
     .wrap('<li/>');
   }
   </script>
   <div class=after></div>
 </div>
</body>
</HTML>

Result, notice the script strings become tags and the script tag now subsumes the following div:

<html>
 <body> 
  <div class="vsc" sig="Uga"> 
   <div class="before"></div> 
   <script type="text/javascript">
   header = jQuery('#header_features');
   if(header.length){
    header
     .prepend('
    <a class="prevPage browse left ">') .append('</a>
    <a class="nextPage browse right">'); items .wrapAll('
     <div class="scrollable">
      ') .wrapAll('
      <ul class="items">
       ') .wrap('
       <li>'); }  
        <div class="after"></div> </li>
      </ul>
     </div>  </a>
   </script>
  </div>
 </body>
</html>
@kozura
Copy link
Author

kozura commented Jun 14, 2011

Looking around, both script and style tags should be treated as CDATA, not that I've seen any examples of issues on the latter..

/* is Gravatar's scheme to get me to join by making my default avatar a kitty? bugger em... */

@jhy
Copy link
Owner

jhy commented Jun 14, 2011

Thanks. That's... very odd.

@ghost ghost assigned jhy Jun 14, 2011
@jhy jhy closed this as completed in 97ecc73 Jun 14, 2011
@kozura
Copy link
Author

kozura commented Jun 14, 2011

Works great now, thanks.

michael-simons pushed a commit to michael-simons/jsoup that referenced this issue Jul 12, 2011
When in body where the tokeniser wouldn't switch to the InScript state, which meant that data in a <script> wouldn't parse correctly.

Fixes jhy#104
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bug that we should fix
Projects
None yet
Development

No branches or pull requests

2 participants