Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Desktop: Add `enex` to `html` export #1795

Merged
merged 37 commits into from Sep 17, 2019

Conversation

@devonzuegel
Copy link
Contributor

commented Aug 9, 2019

Features introduced in this PR

Adds a new ENEX – Evernote Export File (as HTML) importer (as discussed in the forum):

It takes the raw XML in the .enex file (which is very close to HTML), replaces the en-* tags with the appropriate HTML elements, and imports the resulting HTML as notes.

Refactored in this PR

Questions

I'd love your thoughts on the following:

  • Would you like me to split this out into 2-3 PRs, so that it's easier to review? I could make 1 for the refactoring, 1 for the backend importer, and 1 for the UI. I realize that this PR is a bit large.

  • Want me to squash the commits? They're a bit messy.

  • Want me to add tests for the UI or any additional cases?

  • Instead of relying on the clean-html module, which adds a new dependency, would you prefer I implement something more lean like this to obviate that additional dependency?

  • When the HTML importer sees an en-todo, it currently outputs '<input type="checkbox" onclick="return false;" />'. This is readonly in a preview pane (i.e. you have to go into the HTML in the editor to toggle a checkbox), which is not ideal since the checkboxes in the Markdown importer are toggle-able from the preview pane. That said, I don't think it's a huge problem because (a) you're most likely to be doing an import of old notes whose state you don't want to change much but rather treat more as an archive and (b) you're most likely to want the HTML importer for web clippings, which generally don't have checkboxes in the first place.

    With that background, my question is this: would you like me to make imported checkboxes clickable?

Next steps

I'd love to also add an importer that uses the HTML importer if it's a clipping and converts to Markdown if it's a user-made note. I see two ways of doing this:

  1. We could add another importer to the existing list, something like:
  2. We could consolidate the 3 .enex importers to a single ENEX - Evernote Export File option, and when that's selected have a pop-up where the user specifies whether they want clipped and personal notes imported as Markdown or HTML.

Option 2 seems more elegant to me, plus it gives us space to offer a bit more information about the consequences of the options, but it does deviate from the current import model more.

*/
lines.push(resourceUtils.audioElement({src, id: resource.id}));
} else {
// TODO: figure out if we need to handle other mime types

This comment has been minimized.

Copy link
@devonzuegel

devonzuegel Aug 11, 2019

Author Contributor

Are there other image mime types that should be handled here?

@devonzuegel devonzuegel force-pushed the devonzuegel:messy/evernote-html-export branch from 6cc1220 to 73363c0 Aug 11, 2019

@devonzuegel devonzuegel marked this pull request as ready for review Aug 11, 2019

@laurent22

This comment has been minimized.

Copy link
Owner

commented Aug 29, 2019

Thanks for the pull request @devonzuegel and sorry for the delay in replying. I was in vacation in August and didn't have time to really look into the code till now.

Would you like me to split this out into 2-3 PRs, so that it's easier to review? I could make 1 for the refactoring, 1 for the backend importer, and 1 for the UI. I realize that this PR is a bit large.

It's fine as it is as the feature is quite self contained.

Want me to squash the commits? They're a bit messy.

We always squash and merge so multiple commits are fine.

Want me to add tests for the UI or any additional cases?

The tests you've added are fine I think. As for the UI, we currently don't have any way to test it unfortunately.

Instead of relying on the clean-html module, which adds a new dependency, would you prefer I implement something more lean like this to obviate that additional dependency?

I don't quite understand the need for HTML cleaning. The Evernote HTML should already be valid, shouldn't it?

When the HTML importer sees an en-todo, it currently outputs ''. This is readonly in a preview pane (i.e. you have to go into the HTML in the editor to toggle a checkbox), which is not ideal since the checkboxes in the Markdown importer are toggle-able from the preview pane. That said, I don't think it's a huge problem because (a) you're most likely to be doing an import of old notes whose state you don't want to change much but rather treat more as an archive and (b) you're most likely to want the HTML importer for web clippings, which generally don't have checkboxes in the first place.

I agree with that, we can leave it as you did.

@@ -130,6 +139,15 @@ class InteropService {
return output;
}

newModuleFromPath_(options) {

This comment has been minimized.

Copy link
@laurent22

laurent22 Aug 29, 2019

Owner

Any reason why this function was needed to load the modules?

This comment has been minimized.

Copy link
@devonzuegel

devonzuegel Sep 8, 2019

Author Contributor

The existing newModule_ fn would load by the input format. This was fine when there was a 1-1 mapping of input formats to output formats, but now that we have 2 possible outputs for an enex input, we need to be explicit with which importer we want to use.

It might make sense to simply move all the existing formatters to the newModuleFromPath approach, so that there's only one way to do this mapping. Would you like me to do that? If so, I'd prefer to do that refactoring in a separate PR since this one is already quite large.

This comment has been minimized.

Copy link
@devonzuegel

devonzuegel Sep 8, 2019

Author Contributor

Btw I pushed a commit to update the existing function names to make this slightly more clear in the meantime.

This comment has been minimized.

Copy link
@laurent22

laurent22 Sep 9, 2019

Owner

Ok that makes sense. No need to refactor for now, it can be done if it ever becomes necessary. If you could add the explanation above as a comment for now that would be great.


const attachmentElement = ({src, attributes, id}) =>
[
`<a href='#' ${attributesToStr(attributes)} ${ipcProxySendToHost(id)}>`,

This comment has been minimized.

Copy link
@laurent22

laurent22 Aug 29, 2019

Owner

What does this code do? Is it to load audio files?

This comment has been minimized.

Copy link
@devonzuegel

devonzuegel Sep 8, 2019

Author Contributor

Given the source for a file whose type is otherwise not supported (unlike img, audio, etc), it's a helper to create an <a /> link to the file in the HTML.

@laurent22

This comment has been minimized.

Copy link
Owner

commented Aug 29, 2019

Otherwise I've checked the code and it's pretty much ready to merge. If you could just check the few comments I left that would be great.

@devonzuegel

This comment has been minimized.

Copy link
Contributor Author

commented Aug 30, 2019

No worries on the delay! Thank you for the review, and I hope you had a wonderful vacation.

Instead of relying on the clean-html module, which adds a new dependency, would you prefer I implement something more lean like this to obviate that additional dependency?

I don't quite understand the need for HTML cleaning. The Evernote HTML should already be valid, shouldn't it?

Yeah, Evernote HTML is valid, but (1) the default whitespace is ugly and hard to read and (2) even though it is technically valid, sometimes the poor rendering of whitespace means that the Markdown renderer reads some overly-indented lines as a code block when they should not be.

To illustrate point 1, that the cleaning is so that the resulting HTML is maximally nice looking for the viewer:

Before cleaning:
After cleaning:

The delta is easier to see with an external editor... Here's the same file opened in VSCode:

Before cleaning:

After cleaning:

This isn't even a particularly degenerate case! Often, the resulting valid HMTL pre-cleaning is absolutely nasty.

@laurent22

This comment has been minimized.

Copy link
Owner

commented Sep 4, 2019

Thanks for clarifying and indeed it makes sense, so we can keep this html cleaning module.

In a recent commit, I've also added html-minifier, mostly to reduce the size of imported clipped pages. Do you think you can use this one to clean the Evernote notes? I guess the purpose of these modules is a bit different though, so if not possible, it's fine to keep the clean-html module.

Also when creating these notes, I think it would be better to set the markup_language property to MARKUP_LANGUAGE_HTML. This will tell Joplin that the content is only HTML (no Markdown) and make the rendering faster and more accurate. For example, it won't wrongly interpret indented HTML as a code block. This has been done for example when importing clipped content:

output.markup_language = Note.MARKUP_LANGUAGE_HTML;

@devonzuegel

This comment has been minimized.

Copy link
Contributor Author

commented Sep 8, 2019

In a recent commit, I've also added html-minifier, mostly to reduce the size of imported clipped pages. Do you think you can use this one to clean the Evernote notes? I guess the purpose of these modules is a bit different though, so if not possible, it's fine to keep the clean-html module.

From a look through the docs, the minifier seems only to collapse the HTML, rather than displaying it as a nicely-indented, pretty-formatted output. Unless I'm missing some function that reverse-minifies, in which case I'm happy to use it!

Also when creating these notes, I think it would be better to set the markup_language property to MARKUP_LANGUAGE_HTML. This will tell Joplin that the content is only HTML (no Markdown) and make the rendering faster and more accurate. For example, it won't wrongly interpret indented HTML as a code block. This has been done for example when importing clipped content:

Oh that sounds great! Can you point me to how to do it? Since the importer just receives the final string, I'm not sure where to set he markup_language property (or should I be setting the convert_to property?). Is it somewhere in InteropService.js?

@laurent22

This comment has been minimized.

Copy link
Owner

commented Sep 8, 2019

From a look through the docs, the minifier seems only to collapse the HTML, rather than displaying it as a nicely-indented, pretty-formatted output. Unless I'm missing some function that reverse-minifies, in which case I'm happy to use it!

Right I wasn't sure, but indeed they don't really do the same thing, so let's keep the clean-html module you've added then.

Oh that sounds great! Can you point me to how to do it? Since the importer just receives the final string, I'm not sure where to set he markup_language property (or should I be setting the convert_to property?). Is it somewhere in InteropService.js?

markup_language is a property of a note, so you'd need to set it where they are created, in importEnex. The way I'd do it is for example here: https://github.com/laurent22/joplin/pull/1795/files#diff-18eba34ac5ac6744001f6fe640eb05edL218 As you've done for the note body, you could check if importOptions.outputFormat === 'html' and if it is, set markup_language to MARKUP_LANGUAGE_HTML.

Also is it possible to add a comment to explain what newModuleFromPath_ does and why it is needed? Also the other function I wasn't sure about was attachmentElement - if you could add a comment next to it too that would be great.

@devonzuegel

This comment has been minimized.

Copy link
Contributor Author

commented Sep 15, 2019

Sweet, I've incorporated those two pieces of feedback. Thanks for the help!

Please do check that I handled MARKUP_LANGUAGE_HTML as you intended, and lmk if there's anything else before this is ready to merge. 😊

@laurent22

This comment has been minimized.

Copy link
Owner

commented Sep 16, 2019

I've tested and it's all looking good. HTML and resources are imported fine and the HTML mode also seems to be working.

There's just the package-lock conflict that needs to be fixed. You could just delete the file actually and recreate it by running npm i on the CliClient dir. Once this is fixed we're good to merge.

@devonzuegel

This comment has been minimized.

Copy link
Contributor Author

commented Sep 17, 2019

Hm when I do that, it seems to create more of a mess. I've gone in and resolved the conflict by hand, but lmk if that looks wrong.

@laurent22 laurent22 merged commit 2f14832 into laurent22:master Sep 17, 2019

@laurent22

This comment has been minimized.

Copy link
Owner

commented Sep 17, 2019

I'm going to merge and to be sure I'll rebuilt package-lock once it's on master.

In any case thanks a lot for your great work @devonzuegel!

@devonzuegel devonzuegel deleted the devonzuegel:messy/evernote-html-export branch Sep 18, 2019

@devonzuegel

This comment has been minimized.

Copy link
Contributor Author

commented Sep 19, 2019

What is the process for cutting a new build for the desktop Electron app? (Again, happy to add some docs about this if helpful, once I know how it works!)

@laurent22

This comment has been minimized.

Copy link
Owner

commented Sep 19, 2019

The steps to build a release is on the Travis file: https://github.com/laurent22/joplin/blob/master/.travis.yml#L55 Normally if you run these commands you should get the macOS package in the "dist" folder.

laurent22 added a commit that referenced this pull request Sep 20, 2019
Revert "Desktop, Cli: Fixed interop service so that it still allow au…
…to-detecting importer based on format (required for Cli and for test units)"

Reverting PR #1795 due to broken MD import and other issues

This reverts commit 558b644.
laurent22 added a commit that referenced this pull request Sep 20, 2019
Revert "Desktop: Add ENEX to HTML export (#1795)"
This reverts commit 2f14832.

Reverting PR #1795 due to broken MD import and other issues
@laurent22

This comment has been minimized.

Copy link
Owner

commented Sep 20, 2019

@devonzuegel, unfortunately I had to temporarily revert this PR due to a few issues. I tried to fix some of them yesterday and I thought I was done, but today I found something else and it's harder to fix. I don't want this to block the release of the app, so that's why I'm taking it out but happy to merge it back again once it's fixed.

The first issue was that the app normally relies on auto-detection of the format based on the extension, but this capability was removed with the addition of newModuleByFormat_. The CLI app import was broken because of this, as well as the test units. I've tried to fix this here: 558b644

The second issue is a slightly bigger one and I haven't tested everything but in particular the feature to import MD directory is broken on the desktop app (and probably CLI). This is also due to newModuleByFormat_ which sets the module metadata to the function options, and not to the metadata. In particular, the fileExtensions array is missing and probably other properties. You've actually spotted this issue as you've added a TODO! Basically setMetdadata needs to be given the metadata defined in the importModules or exportModules array. It needs to come directly from that array, so that we can add more properties if needed.

I feel the simplest way would be to go back to using the newModule_() function so that we preserve the capability to auto-detect the module based on the extension. And you can add an option to disambiguate when an extension matches more than one module. That's a bit what I've started doing in my commit.

This commit reverts my changes: c7c57ab and this one reverts yours: 50b66cc

Sorry about that, I wish I had spotted all this before. I guess one issue is that we don't have any unit testing for the front-end so it's easy to miss these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.