implementing search-with-markers #777

LeXofLeviafan · 2024-09-06T00:41:51Z

fixes #740:

adding an option to enable "search-with-markers", wherein each keyword can be applied to a specific field instead of all fields (based on keyword prefix: . for title, > for description, : for URL, # for tags; * or no prefix for all fields)
implementing support for it via CLI (--markers), interactive shell (m), bukuserver buku filter mode (markers*), bukuserver index page/navbar search
updating user manuals accordingly

also:

adding support for specifying the bookmarks output sorting order (as appropriate)
implementing support for it via CLI (--order), interactive shell (v… more fitting letters are taken already 😅), bukuserver bookmarks filter (order); + a BukuDb method for sorting already fetched bookmarks (with matching behaviour)
fixing the filters form width in bukuserver, and ensuring all buku filter modes are synchronized (without resetting them)
updating the Markdown/OrgMode export of untitled bookmarks to match the recent import improvements

Note: multiple order filters can be affected by an upstream bug when edited.

Screenshots

CLI

("sort records by tags in ascending order, and those with the same tags by URL in descending order")

buku --order +tags,-url --print | less

buku --order +tags,-url --print --json | less

(“search for bar in titles and .foo/ in URLs”)

buku --np --order +tags,-url --markers --sall '. bar' :.foo/ | less

export

buku --order +tags,-url --markers --sall '. bar' :.foo/ --export generated.md && less generated.md

interactive shell

v title, -desc, +id
p 1-10

m
v tags,-url
S . bar :.foo/

webUI – index page

(these are default parameters now since the markers don't really interfere with most searches unless you actually include them)

(navbar search applies the same defaults so it's the same search as above)

webUI – bookmarks

(this is the search that either of these forms will submit when using the query . bar :.foo/)

webUI – filters width

(these are based on primary @media (min-width: *) values used by Bootstrap)

(the last one is the default width which is used as the fallback value when none of the above constraints are met)

LeXofLeviafan · 2024-09-06T00:42:44Z

README.md

                           bookmark URL with comma-separated tags
+                           (prepend tags with '+' or '-' to use fetched tags)


…Forgot to update readme in #775 😅

LeXofLeviafan · 2024-09-06T00:45:56Z

buku

@@ -72,13 +73,19 @@ SKIP_MIMES = {'.pdf', '.txt'}
 PROMPTMSG = 'buku (? for help): '  # Prompt message string

 strip_delim = lambda s, delim=DELIM, sub=' ': str(s).replace(delim, sub)
-taglist = lambda ss: sorted(s.lower() for s in set(ss) if s)
+taglist = lambda ss: sorted(set(s.lower().strip() for s in ss if (s or '').strip()))


Ensuring taglist() works correctly with unstripped tags

LeXofLeviafan · 2024-09-06T00:46:21Z

buku

 like_escape = lambda s, c='`': s.replace(c, c+c).replace('_', c+'_').replace('%', c+'%')
+split_by_marker = lambda s: re.split(r'\s+(?=[.:>#*])', s)


This is basically how bukubrow does it

LeXofLeviafan · 2024-09-06T00:48:28Z

buku

+        return self.value != other and ((self.value < other) == self.ascending)
+
+    def __repr__(self):
+        return ('+' if self.ascending else '-') + repr(self.value)


This is used to apply a sort direction to each field individually (when sorting in Python)

LeXofLeviafan · 2024-09-06T00:49:50Z

buku

+        valid = list(names) + list(names.values()) + ['tags']
+        _fields = [(re.sub(r'^[+-]', '', s), not s.startswith('-')) for s in (fields or [])]
+        _fields = [(names.get(field, field), direction) for field, direction in _fields if field in valid]
+        return _fields or [('id', True)]


Parsing and validating an ordering description, converting it into a simple format for internal use.

LeXofLeviafan · 2024-09-06T00:51:10Z

buku

+        text_fields = (set() if not ignore_case else {'url', 'desc', 'title', 'tags'})
+        get = lambda x, k: (getattr(x, k) if not ignore_case or k not in text_fields else str(getattr(x, k) or '').lower())
+        order = self._ordering(fields, for_db=False)
+        return sorted(bookmark_vars(records), key=lambda x: [SortKey(get(x, k), ascending=asc) for k, asc in order])


Sorting already fetched data… is not actually used anywhere in the code, but it might come in handy anytime.

LeXofLeviafan · 2024-09-06T00:53:44Z

buku

+        """Converts field list to SQL 'ORDER BY' parameters. (See also BukuDb._ordering().)"""
+        text_fields = (set() if not ignore_case else {'url', 'desc', 'metadata', 'tags'})
+        return ', '.join(f'{field if field not in text_fields else "LOWER("+field+")"} {"ASC" if direction else "DESC"}'
+                         for field, direction in self._ordering(fields))


This generates a list of ORDER BY SQL clauses

LeXofLeviafan · 2024-09-06T00:54:20Z

buku


        Returns
        -------
        list
            A list of tuples representing bookmark records.
        """

-        return self._fetch('SELECT * FROM bookmarks', lock=lock)
+        return self._fetch(f'SELECT * FROM bookmarks ORDER BY {self._order(order)}', lock=lock)


Default behaviour (when order is not specified) won't be affected here

LeXofLeviafan · 2024-09-06T01:01:16Z

buku

+            if not s:
+                return []
+            tags = ([s] if regex and not markers else taglist(s.split(DELIM)))
+            return [('metadata', deep, s), ('url', deep, s), ('desc', deep, s)] + (tags and [('tags', deep, *tags)])


Each token includes the field name, its own deep value (to account for special tags behaviour), and one or more parameter (again, to account for tags since they cannot include ,).

Note that for a token to be matched, all of its parameters must be matched (i.e. when searching for foo,bar,baz in tags, all three tags must be matched for the token to match).

And for the keyword to match, any of its tokens must be matched.

LeXofLeviafan · 2024-09-06T01:09:36Z

buku

+                        param = border(field, param[0]) + re.escape(param) + border(field, param[-1])
+                    args += [param]
+                clauses += (_clauses if len(_clauses) < 2 else [f'({" AND ".join(_clauses)})'])
+        return ' OR '.join(clauses), args


This is based on the original implementation of searchdb(); regex=True means that deep is ignored and only one param is expected in a token (since comma-splitting is not done in regex mode), and otherwise we rely on \b to determine the word edge (or , when working with tags).

Note that I removed stripping / from the end of the string, since it's easy to not include a slash for the user if it's not necessary, but if it's stripped then it's impossible to actually search for it correctly (e.g. specifying .com/ in URL search may not produce the desired results).

LeXofLeviafan · 2024-09-06T01:11:54Z

buku

-            q0 = q0[:-3] + ' AS score FROM bookmarks WHERE score > 0 ORDER BY score DESC)'
+            query = ('SELECT id, url, metadata, tags, desc, flags\nFROM (SELECT *, (' +
+                     '\n    + '.join(map(_count, clauses)) +
+                     f') AS score\n  FROM bookmarks WHERE score > 0 ORDER BY score DESC, {_order})')


The query-generating code itself is much more compact now.

(As for the \ns, they make no difference to the DB engine, but reading logged queries is easier when they're formatted.)

LeXofLeviafan · 2024-09-06T01:13:28Z

buku

-            q0 = q0[:-3] + ' AS score FROM bookmarks WHERE score > 0 ORDER BY score DESC)'
+            query = ('SELECT id, url, metadata, tags, desc, flags\nFROM (SELECT *, (' +
+                     '\n    + '.join(map(_count, clauses)) +
+                     f') AS score\n  FROM bookmarks WHERE score > 0 ORDER BY score DESC, {_order})')


Note that since score is the first clause of ORDER BY, the order value only affects search results that have the same "score".

LeXofLeviafan · 2024-09-06T01:15:16Z

buku

+            query = ('SELECT id, url, metadata, tags, desc, flags FROM bookmarks WHERE (' +
+                     f' {search_operator} '.join("tags LIKE '%' || ? || '%'" for tag in qargs) +
+                     ')' + ('' if not excluded_tags else ' AND tags NOT REGEXP ?') +
+                     f' ORDER BY {_order}')


Other than replacing the ORDER BY clause, the changes here don't really affect the produced SQL.

LeXofLeviafan · 2024-09-06T01:17:36Z

buku

+            all_keywords: bool = False,
+            deep: bool = False,
+            regex: bool = False,
+            stag: Optional[List[str]] = None,


Adding default values allows using these as keyword parameters.

(…And yes, stag is not actually a string here from what I could tell – the code even uses .join() to convert it into a string 😅)

LeXofLeviafan · 2024-09-06T01:19:22Z

buku

        """Search bookmarks for entries with keywords and specified
        criteria while filtering out entries with matching tags.

        Parameters
        ----------
        keywords : list of str
            Keywords to search.
+        without : list of str
+            Keywords to exclude; ignored if empty. Default is None.


Adding this parameter simplifies search invocation

LeXofLeviafan · 2024-09-06T01:22:56Z

buku


    count = 0
    out = ''
    if export_type == 'markdown':
        for row in resultset:
-            out += '- [' + title(row, 'Untitled') + '](' + row.url + ')'
+            _title = title(row)
+            out += (f'- <{row.url}>' if not _title else f'- [{_title}]({row.url})')


Since we can handle <raw URLs> as input, it seemed like a good idea to output them as well – thus allowing to reimport the same exact data that was exported (…sans descriptions, I suppose)

LeXofLeviafan · 2024-09-06T01:23:36Z

buku

            if row.tags:
                out += ' TAGS="' + html.escape(row.tags).encode('ascii', 'xmlcharrefreplace').decode('utf-8') + '"'
-            out += '>\n<title>{}</title>\n</bookmark>\n'\
+            out += '>\n        <title>{}</title>\n    </bookmark>\n'\


Using the same formatting as in the HTML export (…mostly for sake of appearances)

LeXofLeviafan · 2024-09-06T01:24:10Z

buku

    num : int
        Number of results to show per page. Default is 10.
    """

    if not isinstance(obj, BukuDb):
        LOGERR('Not a BukuDb instance')
        return
+    bdb = obj


This name is just less confusing

LeXofLeviafan · 2024-09-06T01:25:02Z

buku

@@ -4623,28 +4694,31 @@ def prompt(obj, results, noninteractive=False, deep=False, listtags=False, sugge

        # search ANY match with new keywords
        if nav.startswith('s '):
-            results = obj.searchdb(nav[2:].split(), False, deep)
+            keywords = (nav[2:].split() if not markers else split_by_marker(nav[2:]))
+            results = bdb.searchdb(keywords, deep=deep, markers=markers, order=order)


Search behaviour only changes after enabling search-with-markers

LeXofLeviafan · 2024-09-06T01:26:40Z

buku

                    else:
                        print('Invalid input')
+                ids and bdb.print_rec(ids, order=order)


This allows to sort the printed records according to selected order

LeXofLeviafan · 2024-09-06T01:27:39Z

buku

@@ -6010,6 +6103,8 @@ POSITIONAL ARGUMENTS:
    elif args.unlock is not None:
        BukuCrypt.decrypt_file(args.unlock)

+    order = [s for ss in (args.order or []) for s in re.split(r'\s*,\s*', ss.strip()) if s]


Spaces and/or commas can be used to separate fields in the order description

LeXofLeviafan · 2024-09-06T01:28:38Z

buku

-    search_opted = True
-    tags_search = bool(args.stag is not None and len(args.stag))
-    exclude_results = bool(args.exclude is not None and len(args.exclude))
+    search_results, search_opted = None, True


"not None and not empty" is how truthiness is defined for lists anyway

LeXofLeviafan · 2024-09-06T01:30:42Z

buku

-        else:
-            LOGERR('no keyword')
+            search_results = bdb.search_keywords_and_filter_by_tags(
+                args.sany, deep=args.deep, stag=args.stag, markers=args.markers, without=args.exclude, order=order)


…Yes, all of that got replaced by a single call (which still does the same)

LeXofLeviafan · 2024-09-06T01:31:08Z

buku

@@ -6273,7 +6299,7 @@ POSITIONAL ARGUMENTS:

        if args.json is None and not args.format:
            num = 10 if not args.count else args.count
-            prompt(bdb, search_results, oneshot, args.deep, num=num)
+            prompt(bdb, search_results, noninteractive=oneshot, deep=args.deep, markers=args.markers, order=order, num=num)


Using keyword params explicitly

LeXofLeviafan · 2024-09-06T01:33:49Z

bukuserver/static/bukuserver/css/list.css

+@media (min-width: 1200px) {
+  .filters .filter-op  {width: 280px !important}
+  .filters .filter-val {width: 595px !important}
+}


For some reason, flask-admin sets style="width: …" on all dropdowns in the filters (matching whatever the current width happens to be); thus the need for !important, and for reiterating the default width.

LeXofLeviafan · 2024-09-06T01:35:18Z

bukuserver/static/bukuserver/js/buku_filter.js

+  adder.onclick = () => {
+    try {
+      let key = bukuFilters().first().val() || 'buku_search_markers_match_all';
+      setTimeout(() => sync(key));


When a buku filter is added, it's automatically matched to already existing ones (defaulting to the mode used in navbar search).

LeXofLeviafan · 2024-09-06T01:36:43Z

bukuserver/static/bukuserver/js/buku_filter.js

+        let _value = filterInput(this).val();
+        $(this).val(evt.val).triggerHandler('change', '$norecur$');
+        filterInput(this).val(_value);  // retaining the last value for other filters
+      }


When one of the buku filter modes is changed, all others get updated as well (and their keywords are restored)

LeXofLeviafan · 2024-09-06T01:38:03Z

bukuserver/templates/bukuserver/home.html

 {% endblock %}

 {% block menu_links %}
 {{ super() }}
-<form class="navbar-form navbar-right" action="{{ url_for('bookmark.index_view') }}" method="GET">
+<form class="navbar-form navbar-right" action="{{ url_for('admin.search') }}" method="POST">


Search splitting is done in the admin.search handler, thus replacing the action= value.

LeXofLeviafan · 2024-09-06T01:39:12Z

bukuserver/templates/bukuserver/home.html

@@ -4,14 +4,16 @@
 {% block head %}
  {{ super() }}
  {{ buku.close_if_popup() }}
-  {{ buku.focus('form[action="/"]') }}
+  {{ buku.focus('main form[action="/"]') }}


Both forms have the same action now, thus specifying which one we want focused on page load.

LeXofLeviafan · 2024-09-06T01:40:15Z

bukuserver/templates/bukuserver/home.html

+      this.title = this.title.replace(/'(.*?)'/g, `'<strong><tt>$1</tt></strong>'`).replace(/\bFULL\b/g, `<strong><em>full</em></strong>`);
+    }).attr('data-container', 'body').attr('data-trigger', 'hover').tooltip();
+  </script>
+  <style>.tooltip-inner {text-align: left;  white-space: pre;  max-width: 600px}</style>


Enabling (and prettifying) Bootstrap tooltips

LeXofLeviafan · 2024-09-06T01:43:03Z

bukuserver/views.py

+        deep, all_keywords = (x and not regex for x in [form.deep.data, form.all_keywords.data])
+        flt = bs_filters.BookmarkBukuFilter(deep=deep, regex=regex, markers=markers, all_keywords=all_keywords)
+        vals = ([('', form.keyword.data)] if not markers else enumerate(buku.split_by_marker(form.keyword.data)))
+        url = url_for('bookmark.index_view', **{filter_key(flt, idx): val for idx, val in vals})


Old behaviour without markers, otherwise splitting into multiple keywords based on detected markers.

(…Also I removed combinations of regex with deep/all_keywords since those do nothing 😅)

LeXofLeviafan · 2024-09-06T01:45:10Z

tests/test_buku.py

-        (1, "http://example.org", None, ",", "", 0),
-        (2, "http://google.com", "Google", ",", "", 0),
+        (2, "http://example.org", None, ",bar,baz,foo,", "", 0),
+        (3, "http://google.com", "Google", ",bar,baz,foo,", "", 0),


Adding tags to test their export as well. (…Also fixing indices since these wouldn't be encountered in actual data)

LeXofLeviafan · 2024-09-06T01:46:45Z

tests/test_buku.py

+    assert split_by_marker(search_string) == [
+        ' global substring', '.title substring', ':url substring', ':https',
+        '> description substring', '#partial,tags:', '#,exact,tags,', '*another global substring ',
+    ]


The split condition is “one or multiple spaces followed by a marker” (with the marker being retained)

LeXofLeviafan · 2024-09-06T01:47:34Z

tests/test_buku.py

+    assert SortKey('foo', ascending=False) < SortKey('bar', ascending=False)
+    assert not SortKey('foo', ascending=False) < SortKey('foo', ascending=False)  # pylint: disable=unnecessary-negation
+    assert not SortKey('foo', ascending=False) > SortKey('foo', ascending=False)  # pylint: disable=unnecessary-negation
+    assert not SortKey('foo', ascending=False) > SortKey('bar', ascending=False)  # pylint: disable=unnecessary-negation


Testing </> specifically here

LeXofLeviafan · 2024-09-06T01:48:58Z

tests/test_buku.py

+    assert not SortKey('foo', ascending=False) > SortKey('bar', ascending=False)  # pylint: disable=unnecessary-negation
+
+    custom_order = lambda s: (SortKey(len(s), ascending=False), SortKey(s, ascending=True))
+    assert sorted(['foo', 'bar', 'baz', 'quux'], key=custom_order) == ['quux', 'bar', 'baz', 'foo']


“Sort longer strings first, ordering lexicographically when they have the same length”

LeXofLeviafan · 2024-09-06T01:52:12Z

tests/test_bukuDb.py

+     " OR (tags LIKE ('%' || ? || '%') AND tags LIKE ('%' || ? || '%') AND tags LIKE ('%' || ? || '%'))"),
+])
+def test_search_clause(bukuDb, regex, tokens, args, clauses):
+    assert bukuDb()._search_clause(tokens, regex=regex) == (clauses, args)


Thoroughly testing search SQL generation

jarun · 2024-09-07T02:18:25Z

Thank you!

jarun · 2024-09-07T02:23:23Z

Please update the ToDO list tracker.

Also, is the --marker option added to the auto-completion scripts?

LeXofLeviafan · 2024-09-07T05:57:06Z

Also, is the --markers option added to the auto-completion scripts?

Ah, no; forgot about that.

…Say; are the completion lists meant to be sorted? I think the last few options that were added to them are misplaced if that's the case 😅

LeXofLeviafan commented Sep 6, 2024

View reviewed changes

[jarun#740] implement search-with-markers (based on bukubrow search)

a4c5de9

LeXofLeviafan commented Sep 6, 2024

View reviewed changes

LeXofLeviafan force-pushed the search-with-markers branch from ff92356 to a4c5de9 Compare September 6, 2024 01:54

LeXofLeviafan mentioned this pull request Sep 6, 2024

fixing filters being reordered after editing #778

Merged

jarun merged commit e757c8e into jarun:master Sep 7, 2024
1 check passed

LeXofLeviafan deleted the search-with-markers branch September 7, 2024 06:48

LeXofLeviafan mentioned this pull request Sep 29, 2024

ToDo list #484

Open

99 tasks

		bookmark URL with comma-separated tags
		(prepend tags with '+' or '-' to use fetched tags)

		like_escape = lambda s, c='`': s.replace(c, c+c).replace('_', c+'_').replace('%', c+'%')
		split_by_marker = lambda s: re.split(r'\s+(?=[.:>#*])', s)

implementing search-with-markers #777

implementing search-with-markers #777

Conversation

LeXofLeviafan commented Sep 6, 2024

Screenshots

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeXofLeviafan Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeXofLeviafan Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LeXofLeviafan Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jarun commented Sep 7, 2024

jarun commented Sep 7, 2024

LeXofLeviafan commented Sep 7, 2024

LeXofLeviafan Sep 6, 2024 •

edited

Loading

LeXofLeviafan Sep 6, 2024 •

edited

Loading

LeXofLeviafan Sep 6, 2024 •

edited

Loading