Searching the documentation with filters #131

polibb · 2021-05-14T15:50:38Z

Searching the mathlib library contents in the documentation and filtering of the resulting set of declarations.
One is now able to:

search through the declaration names, the module names and the description texts that belong to any declarations of theorems, axioms, definitions, etc.
filter the search results before or after submitting the search query by choosing values for attributes and kind of the resulting declarations
selected attributes are combined with an OR. If an attribute in the list of attributes for any given declaration is found in the list submitted in the filters, then the declaration is legitimate.
selected kind options are combined with an OR.
if both any attributes are chosen and any kind options are selected, then the two sets of OR-connected values are combined with an AND. Hence, if a declaration fits the selection of one, but does not fit the possibilities allowed by the other filter, the declaration is considered illegitimate.

More information:

Results are shown in prioritized sequence from most matches per declaration contents to least.
Highest priority is given to the declaration name. Following is its descriptive text added as a comment in the mathlib library itself. Lastly, the module name the declaration is found in (file name) carries the least priority.
Submitting the filters form at any point also re-submits the search query, if there is any.
Maximum of 10 results is shown, but an option to expand and review all results is present in the UI.
While the user is on any given page and searches for a term there for the first time the results will take awhile to load, but any subsequent search takes no extra time whatsoever. Thus, we recommend loading a search result once on any page and designating it for further searches, if needed, because it will save you time.
Navigation with the keyboard is not possible at this point, please expect it in the near future.

This way pylint will notice if you accidentally use the wrong key. I'm not sure if this is officially "pythonic". It seems like more could be done translating the json to actual data structures with named fields; then these lookup tables wouldn't be necessary! But this might be a lot of work for little gain.

fix build

use class instead of dict for structure field names

…search-docs

polibb · 2021-05-15T14:33:34Z

UPDATE: Fixes pushed, please pull to see them.

handled console errors for missing elements and trying to access methods on them
loading added when going from basic (max 10) results set to "Show all" when rendering the results on the DOM itself takes too long and might crash the site is abused

The script that has been used for almost a year and is running in the background for the search to function in this PR too is being ran by a SharedWorker which is not compatible with most browsers on a phone and Safari and IE on a desktop. This is not ideal, I understand, but will be improved in the future as it is not part of the scope at this time for this PR.

bryangingechen · 2021-05-15T14:38:42Z

#deploy

github-actions · 2021-05-15T14:50:02Z

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

RaitoBezarius

It's super cool :)

RaitoBezarius · 2021-05-16T13:36:47Z

mathlib_data_structures.py

@@ -0,0 +1,53 @@
+class mathlibStructures:


Maybe, that would make sense to make those inherit from enum.Enum also?

I did try using Enum at first, yes, but there was some issue - not sure what it was since I haven't focused on that in a while. Actually none of the classes here are used at the moment, this is going to be included in another PR which focuses on organizational structure of the python script itself, so that it's easier to read and update. I should probably remove this file from this PR and will look into using Enums again for the next one where those will be needed.

RaitoBezarius · 2021-05-16T13:38:37Z

searchWorker.js

+        }
+    } 
+
+    return hasKindFilter && hasAttrFilter ? 


Wouldn't it be better to split these 3 ternary expressions into 3-ifs or early return?

The ternary operator by itself is faster than if/else if no additional computation is involved during its execution, but what's more important I think is that this is more readable and easier to change or extend to more types of filters in the long run. I can return the result before getting here, of course - where the check for hasKindFilter is, for example, just return if there is no hasAttrFilter. Also return directly when hasAttrFilter gives a value to isAttrFilterIncluded, if there is no hasKindFilter. But this kind of structuring mixes two different actions: checking if the result matches any of the filters, and deciding whether this is sufficient. To be more exact, if we decide to change the logic from "you need to cover both the attribute filters and the kind filters, if both are present" to "you can cover either filter selected, as long as at least one of any of them matches your attributes or kind" we will have to change the return statement here only. Otherwise we will have to follow through the whole logic in detail and decide whether to move each return statement and how are they connected at all for the purpose of the method itself. I'd leave it like it is, to be honest

gebner · 2021-05-17T07:08:09Z

#deploy

github-actions · 2021-05-17T07:19:15Z

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

bryangingechen · 2021-06-08T20:58:30Z

#deploy

github-actions · 2021-06-08T21:10:07Z

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

eric-wieser · 2021-06-08T22:08:50Z

searchWorker.js

-// Adapted from the default tokenizer and
-// https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=%5Cp%7BZ%7D&abb=on&c=on&esc=on&g=&i=
-const SEPARATOR = /[._\n\r \u00A0\u1680\u2000-\u200A\u2028\u2029\u202F\u205F\u3000]+/u
+req.open('GET', 'searchable_data.json', false /* blocking */);


This file is 7-8x larger than the decl.bmp we currently load- I worry this will substantially increase load time for every search.

bryangingechen · 2021-06-09T01:32:48Z

One other thing that would be great to have before this goes live is an easily accessible / easy-to-find page describing all the search features. Maybe an extra section on the mostly-blank index page https://leanprover-community.github.io/mathlib_docs_demo/ will do?

robertylewis · 2021-06-29T17:45:29Z

I think the only blocking thing here is the size of the json file as Eric points out. Was the reason for using a bmp before compression or caching? If we zip the json file when we generate it and unzip locally, I guess browsers won't cache the unzipped version, right?

For @bryangingechen 's suggestion, I think that's a good place to describe the search features and we could probably use some text from this PR.

gebner · 2021-06-29T18:47:28Z

The reason for the .bmp extension instead of .json is that github will then automatically gzip it. See #125

gebner · 2021-06-29T18:50:07Z

#deploy

github-actions · 2021-06-29T19:14:22Z

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

eric-wieser · 2021-07-13T15:34:01Z

The bmp is now 1660854 bytes, and loads in half a second on my machine - I think it was 15mB before compression before, so the file extension change definitely helps.

However, the call to miniSearch.addAll(indexedData) takes 28 seconds for me, vs 6 seconds it takes on the live site.

robertylewis · 2021-08-09T14:42:07Z

I've created polibb#4 to briefly document the search.

I agree with Eric that there's still a slowdown, but it's not near 28 seconds for me. To me the new performance is acceptable, and I'm okay merging, what do others think?

robertylewis · 2021-08-12T20:03:15Z

#deploy

github-actions · 2021-08-12T20:20:46Z

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

robertylewis · 2021-08-13T19:48:29Z

Hmm, the 404 page needs to be updated as well:
https://leanprover-community.github.io/mathlib_docs_demo/find/nat.addo

robertylewis · 2021-11-11T15:17:45Z

#deploy

github-actions · 2021-11-11T15:36:19Z

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

gebner · 2022-01-28T17:34:43Z

#deploy

github-actions · 2022-01-28T17:54:57Z

This PR has been successfully deployed at http://leanprover-community.github.io/mathlib_docs_demo!

polibb and others added 30 commits November 10, 2020 09:41

setup branch

6dbc6f0

Merge branch 'master' into search-docs

3b931f6

Merge branch 'master' into search-docs

353ff0d

improve code readability

c6711a7

Merge branch 'master' into search-docs

80686ab

minor fixes

e23988f

python script readability improvements

c48b293

Merge branch 'master' into search-docs

6b4bfc3

extract data structures for readability

7110fa7

fix build

ab0e46f

Merge pull request #1 from leanprover-community/search-docs

1228ce8

fix build

Merge pull request #2 from leanprover-community/search-docs-class

67287f0

use class instead of dict for structure field names

small fixes

d965133

main py script structure fixes

ac59932

Merge branch 'search-docs' of https://github.com/pokixu/doc-gen into …

7341c7d

…search-docs

Merge branch 'master' into search-docs

a932deb

Merge branch 'master' into search-docs

be8c23e

cleanup comments

15eb701

searchable data export structure

a37d600

searchable data lists per file name

ccb7abf

data structure for searching done

e7f6812

searchable json holds files togethere with declarations

d8212ca

Merge branch 'master' into search-docs

c21ebe6

building leanpkg increases commit #

8d4da88

searching through index; styling; WIP filters

8dddd0a

add filters to UI

5432001

show all results implementation

4c78d9c

clean up and version of mathlib commits up

3d52ff2

filter searching on top of results

0028904

polibb added 2 commits May 15, 2021 17:17

no errors on nullable elements; add loading when render all too slow

ada2c1a

Merge remote-tracking branch 'upstream/master'

8b8b257

RaitoBezarius reviewed May 16, 2021

View reviewed changes

polibb added 2 commits May 17, 2021 09:38

trim query whitespace before searching

c01f230

remove unnecessary files

14355ad

eric-wieser reviewed Jun 8, 2021

View reviewed changes

Change extension from .json to .bmp

d71a7da

Merge branch 'master' into master

e4765ef

gebner mentioned this pull request Jan 28, 2022

feat: simplest possible docstring search #156

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Searching the documentation with filters #131

Searching the documentation with filters #131

polibb commented May 14, 2021

polibb commented May 15, 2021

bryangingechen commented May 15, 2021

github-actions bot commented May 15, 2021

RaitoBezarius left a comment

RaitoBezarius May 16, 2021

polibb May 16, 2021

RaitoBezarius May 16, 2021

polibb May 16, 2021

gebner commented May 17, 2021

github-actions bot commented May 17, 2021

bryangingechen commented Jun 8, 2021

github-actions bot commented Jun 8, 2021

eric-wieser Jun 8, 2021

bryangingechen commented Jun 9, 2021

robertylewis commented Jun 29, 2021

gebner commented Jun 29, 2021

gebner commented Jun 29, 2021

github-actions bot commented Jun 29, 2021

eric-wieser commented Jul 13, 2021

robertylewis commented Aug 9, 2021

robertylewis commented Aug 12, 2021

github-actions bot commented Aug 12, 2021

robertylewis commented Aug 13, 2021

robertylewis commented Nov 11, 2021

github-actions bot commented Nov 11, 2021

gebner commented Jan 28, 2022

github-actions bot commented Jan 28, 2022

Searching the documentation with filters #131

Are you sure you want to change the base?

Searching the documentation with filters #131

Conversation

polibb commented May 14, 2021

polibb commented May 15, 2021

bryangingechen commented May 15, 2021

github-actions bot commented May 15, 2021

RaitoBezarius left a comment

Choose a reason for hiding this comment

RaitoBezarius May 16, 2021

Choose a reason for hiding this comment

polibb May 16, 2021

Choose a reason for hiding this comment

RaitoBezarius May 16, 2021

Choose a reason for hiding this comment

polibb May 16, 2021

Choose a reason for hiding this comment

gebner commented May 17, 2021

github-actions bot commented May 17, 2021

bryangingechen commented Jun 8, 2021

github-actions bot commented Jun 8, 2021

eric-wieser Jun 8, 2021

Choose a reason for hiding this comment

bryangingechen commented Jun 9, 2021

robertylewis commented Jun 29, 2021

gebner commented Jun 29, 2021

gebner commented Jun 29, 2021

github-actions bot commented Jun 29, 2021

eric-wieser commented Jul 13, 2021

robertylewis commented Aug 9, 2021

robertylewis commented Aug 12, 2021

github-actions bot commented Aug 12, 2021

robertylewis commented Aug 13, 2021

robertylewis commented Nov 11, 2021

github-actions bot commented Nov 11, 2021

gebner commented Jan 28, 2022

github-actions bot commented Jan 28, 2022