Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replacement for newlines and tabs #7

Closed
Goram opened this issue Oct 8, 2013 · 4 comments
Closed

Replacement for newlines and tabs #7

Goram opened this issue Oct 8, 2013 · 4 comments

Comments

@Goram
Copy link

Goram commented Oct 8, 2013

Hi,

ist the " " as replacement correct? Newlines and tabs don´t do much in html.

The html I have:

 <p>
    This is some text<br/>
    with a break in the middle
 </p>

Results to:

This is some text
 with a break in the middle

The " " before "with" is not correct there, but I can´t decide if there aren´t any other side effects?

@Goram
Copy link
Author

Goram commented Oct 9, 2013

After a few more thoughts. In formatted texts the " " is correct.

Example:

<p>
    this is some text
    and this too
</p>

There you want the " ". Can´t we put a matching expression before which matches on

<br/> + \n

as an edge case?

@mtibben
Copy link
Owner

mtibben commented Oct 10, 2013

Sure, if you can create a testcase and pull request that seems reasonable

@ckrack ckrack mentioned this issue Feb 12, 2015
@anedisi
Copy link

anedisi commented Mar 27, 2015

will this be merged ?
its kinda silly problem but its annoying. ?

@andrewnicols
Copy link
Collaborator

This behaviour gets my -1, or at least the unit test does.
There should be no difference between:

<p>Foo</p>

and

<p>
    Foo
</p>

These are both syntactically identical as far as HTML is concerned, and the Html2Text conversion should also yield identical results. If this were a <pre> tag, that would be different, but it isn't.

That said, I do feel that there is still a bug. That is to say that:

<p>
    Foo<br/>
    Bar
</p>
<p>
    Foo<br/>
    Bar
</p>
<p>Foo<br/>Bar</p>

Should all output identically:

Foo
Bar

Foo
Bar

Foo
Bar

So I see two bugs at play here:

  1. leading whitespace is respected incorrectly in paragraph tags; and
  2. Whitespace is incorrectly added at the end of a paragraph tag

andrewnicols added a commit to andrewnicols/html2text that referenced this issue Oct 12, 2015
The content of paragraph tags should be treated equally. That is to say that
whitespace within the tag (newline, and spaces) should be compressed before
the tag is treated.
andrewnicols added a commit to andrewnicols/html2text that referenced this issue Oct 12, 2015
The content of paragraph tags should be treated equally. That is to say that
whitespace within the tag (newline, and spaces) should be compressed before
the tag is treated.
andrewnicols added a commit to andrewnicols/html2text that referenced this issue Oct 12, 2015
The content of paragraph tags should be treated equally. That is to say that
whitespace within the tag (newline, and spaces) should be compressed before
the tag is treated.
This was referenced Feb 22, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants