Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desktop,Cli: enex_to_md: support bold formatting in span tags #1708

Merged
merged 3 commits into from Jul 29, 2019

Conversation

JOJ0
Copy link
Contributor

@JOJ0 JOJ0 commented Jul 5, 2019

Hi, I added a draft feature to detect some styles buried inside of span tags. Evernote did that very often to my notes. E.g. most of my bold text was saved as inline styles in span tags and not as <b> or <strong> tags. This feature is far from perfect but I wanted to discuss things further before moving on. I read the contribution guidelines and tried to follow them but I didn't write a unit test yet. I will if it is necessary.

I am aware of the following issues:

  • It detects either bold or italic or monospace - bold and italic should be supported i think. monospace and bold/italic should be ignored - the most likely case is people use monospace text to show code. I think it is not possible in MD to format stuff eg bold inside of `code` sections? Right?
  • the string comparision matches only the text "bold" and not "font-weight: bold;" for a reason: Evernote sometimes uses the font-weight style but sometimes stuff like this: "font-family: TimesNewRoman,Bold;"
  • it also has problems with something like the following correctly:

<div><span style="font-weight: bold;">text bold blabla, </span>text normal blabla </div>

it produces markdown that is not correct, because there is a space missing after the closing **:

**text bold blabla, **text normal blabla 

this could happen with normal <strong> tag too. how did you handle it?

  • should span style detection move to a new seperate function, similar to isStrongTag?
  • or should isStrongTag better be extended to also support bold inside of span styles?
  • which way should i go?

Please tell me which fixes/features you'd like to see to accept this.
Thanks a lot
Jojo

@tessus tessus added the desktop All desktop platforms label Jul 5, 2019
@laurent22
Copy link
Owner

Answering all these questions is hard and as you found out there are lots of edge cases to handle, which is why it has never been implemented. This is the kind of feature that's all or nothing - either it works in all cases, or we simply don't include it (it's better to have non-formatted text than one with broken format).

The best approach I think would be to do it one step at a time: write a few tests (in the EnexToMd test unit) and confirm that with your changes they pass, and that other features have not been broken. Start with one narrow case and build on that, maybe over several PR.

@JOJ0 JOJ0 force-pushed the enex_span_styles branch 2 times, most recently from a5028c5 to 8b96ff3 Compare July 8, 2019 05:13
@JOJ0
Copy link
Contributor Author

JOJ0 commented Jul 8, 2019

Hi Laurent,
thanks a lot for the input, I am already on it and writing test html and md files :) But I am having a hard time getting the fileformat of the final .md files right, that are used for the comparing with the generated output - I am on a mac using vim and it saves a newline at each end of the file. The output that's generated by enex-to-md-gen.js doesn't have this newline so i have to manually remove it to get the test passed. I currently use the tool "truncate" to remove the last byte of the file. then the test passes. now i got to a point where truncate does not work and removes the last character as well with the newline. (I also tried dos2unix, unix2dos, mac2unix etc. programs already but they don't help)

how are you handling this? are you on an operating system that doesn't save newlines as the last character (eg. windows doesn't i think).

Sorry I did some force pushing to this branch already but I think this is not a problem for the PR. I am planning on removing some features, adding the proper tests and rebase everything to one nice commit. Therefore I'd need to force push to my branch again.

@JOJ0
Copy link
Contributor Author

JOJ0 commented Jul 8, 2019

I could fix the newline at end of file troubles with a vim setting. in case anybody is interested:
do

:set binary
:set noel

prior to saving the file or just put it into .vimrc

@JOJ0 JOJ0 closed this Jul 8, 2019
@JOJ0 JOJ0 reopened this Jul 8, 2019
JOJ0 added 2 commits July 9, 2019 07:32
* one pair for basic text formatting tags: strong, b, i, em
* and one using span tags with inline styles for bold formatting

Note: The html files include the Evernote-typical "linebreak tags inside of separate <div> tags"
to represent empty lines!
* function isSpanWithStyle() checks if further processing of a span tag
  makes sense
* function isSpanStyleBold() checks if bold formatting via styles is
  used - a similar function could be written for each span-inline-style-format
  that should be supported
@JOJ0 JOJ0 changed the title Desktop: Enex import: support some styles in span tags Desktop,Cli: enex_to_md: support bold formatting in span tags Jul 9, 2019
@JOJ0
Copy link
Contributor Author

JOJ0 commented Jul 9, 2019

Hi @laurent22
I think I got it:

  • rebased to 2 commits: one with testfiles, one with code
  • reduced functionality to just bold format for now - it's using exact matching and a regex now
  • it's done with 2 functions and can be easilty expanded to support further span-styles in the future:
    JOJ0@7031753#diff-d392630ff4e6be6e0aa61755e6a25872R397
  • I left the debugging console messages commented out, I'll need them when adding the next supported span-style - hope that's ok?
  • the bad news is: the one case I described in my first comment (trailing spaces lead to markdown that can't be interpreted correctly) is not only an issue with this new feature but also with the already existing conversion of basic text formatting html tags like <b>, <i>, etc. have a look at this test file. this is what we currently have to expect: https://github.com/laurent22/joplin/blob/440e1f1646e9c7c4808e55099347215e9231bd90/CliClient/tests/enex_to_md/text_formatting.md
    I built all the testfiles so the tests pass but this probably should be fixed in another pr

Hope you like it or suggest what could be improved :-)

pushing attributes of span tag to state object now
happens outside of isSpanWithStyle()
@laurent22 laurent22 merged commit b47cb4e into laurent22:master Jul 29, 2019
@laurent22
Copy link
Owner

Thanks @JOJ0! Hard to tell if it will work in all cases but I can see it seems fine with my old Evernote notes and your tests are good too, so let's merge.

@JOJ0
Copy link
Contributor Author

JOJ0 commented Jul 29, 2019

Hi Laurent, thanks a lot for testing and merging! I also did a lot of tests with my evernote notes and it seems good, except the cases with leading and trailing spaces I mentioned. But that's the same with the regular bold, italics and so on tags. I guess this will be a rather tricky one to fix. Thanks again and all the best!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
desktop All desktop platforms
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants