-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom word splitting regex #2818
Conversation
I feel regexp is too strong to configure the behavior of word splitting. @shimizukawa In Japanese search, you had already introduced three types of splitters. |
@tk0miya Can you describe what you mean by an interface of splitter-extensions? The main desired functionality (from my perspective) of this PR is to be able to search for URLS and hyphenated words without having to break them up into fairly unintuitive chunks (that often don't return what they should). |
Japanese search module has plugable interfalce as: https://github.com/sphinx-doc/sphinx/blob/master/sphinx/search/ja.py#L534 |
Hi @shimizukawa & @tk0miya. I'm taking over this custom word splitting that Hana was working on, and I wanted to clarify what exactly you're asking for. I realize there are 3 specific splitter options for the Japanese language. Are you asking for a similar custom splitter within the English language (so introducing a custom splitter class the users can specify with the splitter name rather than providing a specific regex pattern in html_search_options)? Or, instead, are you asking that we give users the option to provide a custom regex pattern to use as the splitter for any of the available languages rather than just English? |
Closing this PR as it needs more discussion on an issue -- making A |
Added feature to pass in custom word splitting regex in
html_search_options
(English only)Use by setting
wordre
inhtml_search_options
inconf.py
(example below)html_search_options = { 'wordre': r'[\w\.\\:\/-]+(?u)' }
Sample regex above allows users to search for strings containing these punctuation characters:
\
,/
,:
,.
, and-
.