New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compress search_index.json #1128
Conversation
This looks like a good change. Thank you. However, we are not planning on releasing any new version (except perhaps a bug-fix release) before 1.0 which will have the changes planned in #859. Therefore, I'm inclined to hold off on this now and incorporate these changes into that refactor. |
OK, no rush, I can have it within our CI made with sed like this: #!/bin/bash
sed 's/\\u00a0/ /g' -i "$@"
sed 's,\\n\\n, ,g' -i "$@"
sed 's,\\n , ,g' -i "$@"
sed 's, \\n, ,g' -i "$@"
sed -e 's/^[ \t]*//' -i "$@" |
@@ -90,7 +100,7 @@ def generate_search_index(self): | |||
page_dicts = { | |||
'docs': self._entries, | |||
} | |||
return json.dumps(page_dicts, sort_keys=True, indent=4) | |||
return json.dumps(page_dicts, sort_keys=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the compact encoding in the docs, we should consider doing that:
json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',',':'))
From https://docs.python.org/2/library/json.html
I don't really know if it has any drawbacks or limitations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't have any drawbacks as it will still be valid JSON.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, I just read the Python docs. Seems good!
As a reminder, we may also want to explore creating a gzipped copy of the index. Both files should exist side-by-side and properly configured servers will serve the gzipped version. |
This PR is a great help. I was able to decrease the size of my search_index.json from 1250 KB to just 980 KB. |
I merged the changes made by @davidhrbac on my local installation to search_index.py (https://github.com/mkdocs/mkdocs/blob/master/mkdocs/contrib/legacy_search/search_index.py) and it works well. Could this PR be updated? (If you'd like me to open a new PR with the change I can). |
@coliff, the entire search function is scheduled for a complete rewrite. Its likely that the rewrite will actually start from scratch. Therefore any changes to the existing code will likely be scrapped. This PR remains open as a means to remind us to include a similar feature in the rewrite. As an aside, now that search is contained within a plugin, anyone is free to fork it (within the confines of the license) and make any changes they desire. For that matter, someone could build their own third-party search plugin which is better than and removes the need for us to maintain the one we have now. |
Remove all unnesecary whitespace. Closes mkdocs#1128.
Remove all unnesecary whitespace. Closes mkdocs#1128.
* Use a web worker in the browser with a fallback (fixes #859 & closes #1396). * Optionally pre-build search index (fixes #859 & closes #1061). * Upgrade to lunr.js 2.x (fixes #1319). * Support search in languages other than English (fixes #826). * Allow the user to define the word separators (fixes #867). * Only run searches for queries of length > 2 (fixes #1127). * Remove dependency on require.js, mustache, etc. (fixes #1218). * Compress the search index (fixes #1128).
This is a quick solution to compress search_index.json. It replaces duplicated characters. u00a0 is Unicode nowrap space. Also indentation is disable. I could shrink file from 2248293 Bytes to 1752850 Bytes. Might help with search download and start-up. Links to #1127.