-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract hrefs from links, add to end of text and reference them. #3
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Alex, thank you for this PR, your code looks almost perfect!
I left a few comments in the review, but there's also another issue I noticed. I haven't run the code but I think that the fact that links are collected in that private List field could create an unideal situation where links in a list item or in a heading are placed inside the item/heading, instead of at the bottom of the document.
If you take a look here you can see that an HtmlTraversal
instance is created when a list item needs to be converted. To be honest, I don't remember the exact reason why I did this, but that was what I went to. So if there's a link inside the <li>
, when that list item is rendered links would be placed inline, which is not what we want.
Off the top of my mind I don't have an immediate solution to this... We probably need to refactor a bit those sections, to avoid creating a new instance of HtmlTraversal
, or at least to share the state somehow. One temporary workaround could be to have a flag to disable the links references generation, so the constructor would be: HtmlTraversal(bool appendLinks = true)
. Links in list items and headings would be then discarded, which is unideal, but...
There was a test that was commented out for nested lists which I think is where the double traversal object originally was an attempt to solve that use case but was failing. I've taken it out and put the nesting handling (adding tabs) just before the output is appended. Added some more tests with deeper testing for this. |
Thanks for the changes! I think I remember why there was that double traversal object: it was needed in order to add the The same thing happens with headings. The code needs the whole content of the heading to:
If we keep the double traversal object, one solution I can think of is to pass the links list to new Any thoughts? |
…rsion and ignore links with anchor hrefs.
I understand, but I've decided to change the dividers to fixed length which doesn't required the new object. For my use case the trade-off for slightly less "beautiful" output vs. the other features and simplicity I gain is worth it, but I'm not sure if you want to pull it in for your case or others that depend on the existing output. It's here for your taking if you want, otherwise the changes will just live on in my fork. Thanks for the basis though, saved me a chunk of time. |
Sorry, I'm very late here. I don't have much time to think about this at the moment, but I'll merge and release :) I'll create an issue to track the problems mentioned above with this approach. Thank you! |
This change allows the links to survive the text conversion by extracting the urls (href attributes) during traversal and then appending them as a numbered list to the end of the document.