You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this way the regex analize the words and not the array characters
(example: the word 'p' and not the word 'pre')
Another problem: the spaces after a flow tag (example: <b> or <i>) are deleted.
To retain this spaces, you can modify this line of code in MoveUntilMatch function HtmlEnumerator class:
I effectively tested your first fix but I don't have much time to perform many testing.
I'm glad you come back with this troubleshooting, coz I found the same bug but I didn't make the link with the regex changes.
So thanks, you make my day :-)
onizet wrote Dec 17, 2015 at 3:50 PM
If I'm not mistaken, I can only stick with \bp\b because the other tags are very different from the others Html tags.
So I can keep only:
You're right.
It could be only for a correct logic maintain the other \b
onizet wrote Dec 17, 2015 at 5:29 PM
about your statement:
Another problem: the spaces after a flow tag (example: <b> or <i>) are deleted
If you paste your HTML in a browser, you will see they will be deleted.
Associated with changeset 90889: This is a major commit about RowSpan bug (#13058, #12781, #13689). Also, include the fix from giorand about spaces.
giorand wrote Dec 18, 2015 at 7:46 AM
You're right, in browser there is a space between the words 'beautiful' and 'world'.
But if you parsing with actual dll, the result in Word 2013 is 'beautifulworld' without space (as you can see in the first image)
onizet wrote Jan 12, 2016 at 9:25 PM
just to notified you that I'm still working on this issue, which I consider major.
The text was updated successfully, but these errors were encountered:
[Copied from codeplex]
In the constructor of class HtmlEnumerator this line:
In this way the regex analize the words and not the array characters
(example: the word 'p' and not the word 'pre')
Another problem: the spaces after a flow tag (example: <b> or <i>) are deleted.
To retain this spaces, you can modify this line of code in MoveUntilMatch function HtmlEnumerator class:
modified in:
This is an HTML example to parse:
This is the behaviour now:
This is the new behaviour with the modified code:
onizet wrote Dec 17, 2015 at 2:47 PM
onizet wrote Dec 17, 2015 at 3:50 PM
giorand wrote Dec 17, 2015 at 4:32 PM
onizet wrote Dec 17, 2015 at 5:29 PM
giorand wrote Dec 18, 2015 at 7:46 AM
onizet wrote Jan 12, 2016 at 9:25 PM
The text was updated successfully, but these errors were encountered: