-
-
Notifications
You must be signed in to change notification settings - Fork 294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
implementing search-with-markers #777
Conversation
bookmark URL with comma-separated tags | ||
(prepend tags with '+' or '-' to use fetched tags) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…Forgot to update readme in #775 😅
@@ -72,13 +73,19 @@ SKIP_MIMES = {'.pdf', '.txt'} | |||
PROMPTMSG = 'buku (? for help): ' # Prompt message string | |||
|
|||
strip_delim = lambda s, delim=DELIM, sub=' ': str(s).replace(delim, sub) | |||
taglist = lambda ss: sorted(s.lower() for s in set(ss) if s) | |||
taglist = lambda ss: sorted(set(s.lower().strip() for s in ss if (s or '').strip())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ensuring taglist()
works correctly with unstripped tags
like_escape = lambda s, c='`': s.replace(c, c+c).replace('_', c+'_').replace('%', c+'%') | ||
split_by_marker = lambda s: re.split(r'\s+(?=[.:>#*])', s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is basically how bukubrow does it
return self.value != other and ((self.value < other) == self.ascending) | ||
|
||
def __repr__(self): | ||
return ('+' if self.ascending else '-') + repr(self.value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used to apply a sort direction to each field individually (when sorting in Python)
valid = list(names) + list(names.values()) + ['tags'] | ||
_fields = [(re.sub(r'^[+-]', '', s), not s.startswith('-')) for s in (fields or [])] | ||
_fields = [(names.get(field, field), direction) for field, direction in _fields if field in valid] | ||
return _fields or [('id', True)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing and validating an ordering description, converting it into a simple format for internal use.
text_fields = (set() if not ignore_case else {'url', 'desc', 'title', 'tags'}) | ||
get = lambda x, k: (getattr(x, k) if not ignore_case or k not in text_fields else str(getattr(x, k) or '').lower()) | ||
order = self._ordering(fields, for_db=False) | ||
return sorted(bookmark_vars(records), key=lambda x: [SortKey(get(x, k), ascending=asc) for k, asc in order]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorting already fetched data… is not actually used anywhere in the code, but it might come in handy anytime.
"""Converts field list to SQL 'ORDER BY' parameters. (See also BukuDb._ordering().)""" | ||
text_fields = (set() if not ignore_case else {'url', 'desc', 'metadata', 'tags'}) | ||
return ', '.join(f'{field if field not in text_fields else "LOWER("+field+")"} {"ASC" if direction else "DESC"}' | ||
for field, direction in self._ordering(fields)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generates a list of ORDER BY
SQL clauses
|
||
Returns | ||
------- | ||
list | ||
A list of tuples representing bookmark records. | ||
""" | ||
|
||
return self._fetch('SELECT * FROM bookmarks', lock=lock) | ||
return self._fetch(f'SELECT * FROM bookmarks ORDER BY {self._order(order)}', lock=lock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default behaviour (when order
is not specified) won't be affected here
if not s: | ||
return [] | ||
tags = ([s] if regex and not markers else taglist(s.split(DELIM))) | ||
return [('metadata', deep, s), ('url', deep, s), ('desc', deep, s)] + (tags and [('tags', deep, *tags)]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Each token includes the field name, its own deep
value (to account for special tags behaviour), and one or more parameter (again, to account for tags since they cannot include ,
).
Note that for a token to be matched, all of its parameters must be matched (i.e. when searching for foo,bar,baz
in tags, all three tags must be matched for the token to match).
And for the keyword to match, any of its tokens must be matched.
param = border(field, param[0]) + re.escape(param) + border(field, param[-1]) | ||
args += [param] | ||
clauses += (_clauses if len(_clauses) < 2 else [f'({" AND ".join(_clauses)})']) | ||
return ' OR '.join(clauses), args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on the original implementation of searchdb()
; regex=True
means that deep
is ignored and only one param
is expected in a token (since comma-splitting is not done in regex mode), and otherwise we rely on \b
to determine the word edge (or ,
when working with tags).
Note that I removed stripping /
from the end of the string, since it's easy to not include a slash for the user if it's not necessary, but if it's stripped then it's impossible to actually search for it correctly (e.g. specifying .com/
in URL search may not produce the desired results).
q0 = q0[:-3] + ' AS score FROM bookmarks WHERE score > 0 ORDER BY score DESC)' | ||
query = ('SELECT id, url, metadata, tags, desc, flags\nFROM (SELECT *, (' + | ||
'\n + '.join(map(_count, clauses)) + | ||
f') AS score\n FROM bookmarks WHERE score > 0 ORDER BY score DESC, {_order})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query-generating code itself is much more compact now.
(As for the \n
s, they make no difference to the DB engine, but reading logged queries is easier when they're formatted.)
q0 = q0[:-3] + ' AS score FROM bookmarks WHERE score > 0 ORDER BY score DESC)' | ||
query = ('SELECT id, url, metadata, tags, desc, flags\nFROM (SELECT *, (' + | ||
'\n + '.join(map(_count, clauses)) + | ||
f') AS score\n FROM bookmarks WHERE score > 0 ORDER BY score DESC, {_order})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that since score
is the first clause of ORDER BY
, the order
value only affects search results that have the same "score".
query = ('SELECT id, url, metadata, tags, desc, flags FROM bookmarks WHERE (' + | ||
f' {search_operator} '.join("tags LIKE '%' || ? || '%'" for tag in qargs) + | ||
')' + ('' if not excluded_tags else ' AND tags NOT REGEXP ?') + | ||
f' ORDER BY {_order}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than replacing the ORDER BY
clause, the changes here don't really affect the produced SQL.
all_keywords: bool = False, | ||
deep: bool = False, | ||
regex: bool = False, | ||
stag: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding default values allows using these as keyword parameters.
(…And yes, stag
is not actually a string here from what I could tell – the code even uses .join()
to convert it into a string 😅)
"""Search bookmarks for entries with keywords and specified | ||
criteria while filtering out entries with matching tags. | ||
|
||
Parameters | ||
---------- | ||
keywords : list of str | ||
Keywords to search. | ||
without : list of str | ||
Keywords to exclude; ignored if empty. Default is None. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this parameter simplifies search invocation
|
||
count = 0 | ||
out = '' | ||
if export_type == 'markdown': | ||
for row in resultset: | ||
out += '- [' + title(row, 'Untitled') + '](' + row.url + ')' | ||
_title = title(row) | ||
out += (f'- <{row.url}>' if not _title else f'- [{_title}]({row.url})') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can handle <raw URLs>
as input, it seemed like a good idea to output them as well – thus allowing to reimport the same exact data that was exported (…sans descriptions, I suppose)
if row.tags: | ||
out += ' TAGS="' + html.escape(row.tags).encode('ascii', 'xmlcharrefreplace').decode('utf-8') + '"' | ||
out += '>\n<title>{}</title>\n</bookmark>\n'\ | ||
out += '>\n <title>{}</title>\n </bookmark>\n'\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the same formatting as in the HTML export (…mostly for sake of appearances)
num : int | ||
Number of results to show per page. Default is 10. | ||
""" | ||
|
||
if not isinstance(obj, BukuDb): | ||
LOGERR('Not a BukuDb instance') | ||
return | ||
bdb = obj |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is just less confusing
@@ -4623,28 +4694,31 @@ def prompt(obj, results, noninteractive=False, deep=False, listtags=False, sugge | |||
|
|||
# search ANY match with new keywords | |||
if nav.startswith('s '): | |||
results = obj.searchdb(nav[2:].split(), False, deep) | |||
keywords = (nav[2:].split() if not markers else split_by_marker(nav[2:])) | |||
results = bdb.searchdb(keywords, deep=deep, markers=markers, order=order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search behaviour only changes after enabling search-with-markers
else: | ||
print('Invalid input') | ||
ids and bdb.print_rec(ids, order=order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows to sort the printed records according to selected order
@@ -6010,6 +6103,8 @@ POSITIONAL ARGUMENTS: | |||
elif args.unlock is not None: | |||
BukuCrypt.decrypt_file(args.unlock) | |||
|
|||
order = [s for ss in (args.order or []) for s in re.split(r'\s*,\s*', ss.strip()) if s] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces and/or commas can be used to separate fields in the order description
search_opted = True | ||
tags_search = bool(args.stag is not None and len(args.stag)) | ||
exclude_results = bool(args.exclude is not None and len(args.exclude)) | ||
search_results, search_opted = None, True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"not None
and not empty" is how truthiness is defined for lists anyway
else: | ||
LOGERR('no keyword') | ||
search_results = bdb.search_keywords_and_filter_by_tags( | ||
args.sany, deep=args.deep, stag=args.stag, markers=args.markers, without=args.exclude, order=order) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…Yes, all of that got replaced by a single call (which still does the same)
@@ -6273,7 +6299,7 @@ POSITIONAL ARGUMENTS: | |||
|
|||
if args.json is None and not args.format: | |||
num = 10 if not args.count else args.count | |||
prompt(bdb, search_results, oneshot, args.deep, num=num) | |||
prompt(bdb, search_results, noninteractive=oneshot, deep=args.deep, markers=args.markers, order=order, num=num) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using keyword params explicitly
@media (min-width: 1200px) { | ||
.filters .filter-op {width: 280px !important} | ||
.filters .filter-val {width: 595px !important} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some reason, flask-admin
sets style="width: …"
on all dropdowns in the filters (matching whatever the current width happens to be); thus the need for !important
, and for reiterating the default width.
adder.onclick = () => { | ||
try { | ||
let key = bukuFilters().first().val() || 'buku_search_markers_match_all'; | ||
setTimeout(() => sync(key)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When a buku
filter is added, it's automatically matched to already existing ones (defaulting to the mode used in navbar search).
let _value = filterInput(this).val(); | ||
$(this).val(evt.val).triggerHandler('change', '$norecur$'); | ||
filterInput(this).val(_value); // retaining the last value for other filters | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When one of the buku
filter modes is changed, all others get updated as well (and their keywords are restored)
{% endblock %} | ||
|
||
{% block menu_links %} | ||
{{ super() }} | ||
<form class="navbar-form navbar-right" action="{{ url_for('bookmark.index_view') }}" method="GET"> | ||
<form class="navbar-form navbar-right" action="{{ url_for('admin.search') }}" method="POST"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Search splitting is done in the admin.search
handler, thus replacing the action=
value.
@@ -4,14 +4,16 @@ | |||
{% block head %} | |||
{{ super() }} | |||
{{ buku.close_if_popup() }} | |||
{{ buku.focus('form[action="/"]') }} | |||
{{ buku.focus('main form[action="/"]') }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both forms have the same action
now, thus specifying which one we want focused on page load.
this.title = this.title.replace(/'(.*?)'/g, `'<strong><tt>$1</tt></strong>'`).replace(/\bFULL\b/g, `<strong><em>full</em></strong>`); | ||
}).attr('data-container', 'body').attr('data-trigger', 'hover').tooltip(); | ||
</script> | ||
<style>.tooltip-inner {text-align: left; white-space: pre; max-width: 600px}</style> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabling (and prettifying) Bootstrap tooltips
deep, all_keywords = (x and not regex for x in [form.deep.data, form.all_keywords.data]) | ||
flt = bs_filters.BookmarkBukuFilter(deep=deep, regex=regex, markers=markers, all_keywords=all_keywords) | ||
vals = ([('', form.keyword.data)] if not markers else enumerate(buku.split_by_marker(form.keyword.data))) | ||
url = url_for('bookmark.index_view', **{filter_key(flt, idx): val for idx, val in vals}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old behaviour without markers
, otherwise splitting into multiple keywords based on detected markers.
(…Also I removed combinations of regex
with deep
/all_keywords
since those do nothing 😅)
(1, "http://example.org", None, ",", "", 0), | ||
(2, "http://google.com", "Google", ",", "", 0), | ||
(2, "http://example.org", None, ",bar,baz,foo,", "", 0), | ||
(3, "http://google.com", "Google", ",bar,baz,foo,", "", 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding tags to test their export as well. (…Also fixing indices since these wouldn't be encountered in actual data)
assert split_by_marker(search_string) == [ | ||
' global substring', '.title substring', ':url substring', ':https', | ||
'> description substring', '#partial,tags:', '#,exact,tags,', '*another global substring ', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The split condition is “one or multiple spaces followed by a marker” (with the marker being retained)
assert SortKey('foo', ascending=False) < SortKey('bar', ascending=False) | ||
assert not SortKey('foo', ascending=False) < SortKey('foo', ascending=False) # pylint: disable=unnecessary-negation | ||
assert not SortKey('foo', ascending=False) > SortKey('foo', ascending=False) # pylint: disable=unnecessary-negation | ||
assert not SortKey('foo', ascending=False) > SortKey('bar', ascending=False) # pylint: disable=unnecessary-negation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Testing <
/>
specifically here
assert not SortKey('foo', ascending=False) > SortKey('bar', ascending=False) # pylint: disable=unnecessary-negation | ||
|
||
custom_order = lambda s: (SortKey(len(s), ascending=False), SortKey(s, ascending=True)) | ||
assert sorted(['foo', 'bar', 'baz', 'quux'], key=custom_order) == ['quux', 'bar', 'baz', 'foo'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
“Sort longer strings first, ordering lexicographically when they have the same length”
" OR (tags LIKE ('%' || ? || '%') AND tags LIKE ('%' || ? || '%') AND tags LIKE ('%' || ? || '%'))"), | ||
]) | ||
def test_search_clause(bukuDb, regex, tokens, args, clauses): | ||
assert bukuDb()._search_clause(tokens, regex=regex) == (clauses, args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoroughly testing search SQL generation
ff92356
to
a4c5de9
Compare
Thank you! |
Please update the ToDO list tracker. Also, is the |
Ah, no; forgot about that. …Say; are the completion lists meant to be sorted? I think the last few options that were added to them are misplaced if that's the case 😅 |
fixes #740:
.
for title,>
for description,:
for URL,#
for tags;*
or no prefix for all fields)--markers
), interactive shell (m
), bukuserverbuku
filter mode (markers*
), bukuserver index page/navbar searchalso:
--order
), interactive shell (v
… more fitting letters are taken already 😅), bukuserver bookmarks filter (order
); + aBukuDb
method for sorting already fetched bookmarks (with matching behaviour)buku
filter modes are synchronized (without resetting them)Note: multiple order filters can be affected by an upstream bug when edited.
Screenshots
CLI
("sort records by tags in ascending order, and those with the same tags by URL in descending order")
buku --order +tags,-url --print | less
buku --order +tags,-url --print --json | less
(“search for
bar
in titles and.foo/
in URLs”)export
interactive shell
webUI – index page
(these are default parameters now since the markers don't really interfere with most searches unless you actually include them)
(navbar search applies the same defaults so it's the same search as above)
webUI – bookmarks
(this is the search that either of these forms will submit when using the query
. bar :.foo/
)webUI – filters width
(these are based on primary
@media (min-width: *)
values used by Bootstrap)(the last one is the default width which is used as the fallback value when none of the above constraints are met)