Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty print whitespace handling should be stable #1939

Closed
kicktipp opened this issue Apr 18, 2023 · 2 comments
Closed

Pretty print whitespace handling should be stable #1939

kicktipp opened this issue Apr 18, 2023 · 2 comments
Labels
duplicate This is a duplicate issue or root-cause of another issue

Comments

@kicktipp
Copy link

kicktipp commented Apr 18, 2023

With jsoup 1.15.4 I get strange results when parsing a document again and again;

import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Entities;
import org.junit.jupiter.api.Test;

import java.nio.charset.StandardCharsets;

import static org.junit.jupiter.api.Assertions.assertEquals;

public class JsoupTest {


    @Test
    public void test() {
        var text = """
                <div>
                        <a> <b>Hello</b> </a>
                </div>""";
        var os = (new Document.OutputSettings())
                .syntax(Document.OutputSettings.Syntax.html)
                .indentAmount(2)
                .charset(StandardCharsets.UTF_8)
                .escapeMode(Entities.EscapeMode.base)
                .prettyPrint(true)
                .outline(false);
        String text1 = Jsoup.parse(text).body().html();
        String text2 = Jsoup.parse(text1).outputSettings(os).body().html();
        String text3 = Jsoup.parse(text2).outputSettings(os).body().html();
        String text4 = Jsoup.parse(text3).outputSettings(os).body().html();
        System.out.println(text1 + "\n");
        System.out.println(text2 + "\n");
        System.out.println(text3 + "\n");
        System.out.println(text4 + "\n");
        assertEquals(text3, text4);
    }
}

This test fails. And IMHO it shouldn't

The output is

<div><a> <b>Hello</b> </a>
</div>

<div>
  <a> <b>Hello</b> </a>
</div>

<div><a> <b>Hello</b> </a>
</div>

<div>
  <a> <b>Hello</b> </a>
</div>
@jhy
Copy link
Owner

jhy commented Apr 29, 2023

Hi, thanks for the clear report. This is fixed already in the upcoming 1.16.1 release. I've made a few fixes but I think it would be #1906.

The output now is:

<div>
 <a> <b>Hello</b> </a>
</div>

<div>
  <a> <b>Hello</b> </a>
</div>

<div>
  <a> <b>Hello</b> </a>
</div>

<div>
  <a> <b>Hello</b> </a>
</div>

@jhy jhy closed this as completed Apr 29, 2023
@jhy jhy added the duplicate This is a duplicate issue or root-cause of another issue label Apr 29, 2023
@kicktipp
Copy link
Author

kicktipp commented May 2, 2023

It works! thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
duplicate This is a duplicate issue or root-cause of another issue
Projects
None yet
Development

No branches or pull requests

2 participants