-
-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improved markdown & orgmode parsing #766
Conversation
if newtag: | ||
if newtag.lower() not in tags: | ||
tags_string = (newtag + DELIM) + tags_string | ||
tags = list(dict.fromkeys(get_org_tags(match.group('tags') or ''))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3670,34 +3651,24 @@ def import_org(filepath: str, newtag: Optional[str]): | |||
tag_list_cleaned.append(tag.strip()) | |||
return tag_list_cleaned | |||
|
|||
# Supported OrgMode format: `[[url][title]] :tags:` (or `[[url]] :tags:`) | |||
_url, _maybe_title = r'(?P<url>((?!\]\[).)+?)', r'(\]\[(?P<title>.+))?' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regex for URL means "any string not containing ][
"
('foo, bar, baz', None, ',bar,baz,foo,'), | ||
('foo, bar, baz', 'new tag', ',bar,baz,foo,new tag,'), | ||
]) | ||
@pytest.mark.parametrize('title', ['Bookmark title', '', None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Bookmark title](…)
, [](…)
& <…>
('tag1: ::tag2:tag::3:tag4:: :tag:::5: ta g::6:: ', None, ',tag1,:tag2,tag:3,tag4:,tag::5,ta g:6:,'), | ||
('tag1: ::tag2:tag::3:tag4:: :tag:::5: ta g::6:: ', 'new tag', ',new tag,tag1,:tag2,tag:3,tag4:,tag::5,ta g:6:,'), | ||
]) | ||
@pytest.mark.parametrize('title', ['Bookmark title', '', None]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[[…][Bookmark title]]
, [[…][]]
& [[…]]
from buku import import_md | ||
|
||
p = tmpdir.mkdir("importmd").join("test.md") | ||
p.write("[text1](http://example.com)") | ||
print(line := (f'<{url}>' if title is None else f'[{title}]({url})') + | ||
('' if not tags else f' <!-- TAGS: {tags} -->')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Printing out the line to be parsed makes it easier to figure out what went wrong when a test fails.
|
||
parse_tags([tags]) | ||
tags = DELIM.join(s for s in [newtag, match.group('tags')] if s) | ||
tags = parse_tags([tags]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…Smh the output of parse_tags()
was ignored before 😅
Nice improvement, thank you! |
When trying to import a generated DB file, I've noticed some irregularities. I went around the issue by converting the file to Markdown format, but I still ended up making a few improvements to the parsing code; namely:
<url>
/[[url]]
)Markdown import
OrgMode import
(Unlike in Markdown, empty titles –
[[url][]]
– are explicitly invalid in OrgMode)