New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress search_index.json #1128

Closed
wants to merge 2 commits into
base: master
from

Conversation

Projects
5 participants
@davidhrbac
Contributor

davidhrbac commented Jan 20, 2017

This is a quick solution to compress search_index.json. It replaces duplicated characters. u00a0 is Unicode nowrap space. Also indentation is disable. I could shrink file from 2248293 Bytes to 1752850 Bytes. Might help with search download and start-up. Links to #1127.

davidhrbac added some commits Jan 20, 2017

@waylan waylan added the Enhancement label Jan 20, 2017

@waylan waylan added this to the 1.0.0 milestone Jan 20, 2017

@waylan waylan self-assigned this Jan 20, 2017

@waylan

This comment has been minimized.

Member

waylan commented Jan 20, 2017

This looks like a good change. Thank you. However, we are not planning on releasing any new version (except perhaps a bug-fix release) before 1.0 which will have the changes planned in #859. Therefore, I'm inclined to hold off on this now and incorporate these changes into that refactor.

@davidhrbac

This comment has been minimized.

Contributor

davidhrbac commented Jan 20, 2017

OK, no rush, I can have it within our CI made with sed like this:

#!/bin/bash
sed 's/\\u00a0/ /g' -i "$@"
sed 's,\\n\\n, ,g' -i "$@"
sed 's,\\n , ,g' -i "$@"
sed 's, \\n, ,g' -i "$@"
sed -e 's/^[ \t]*//' -i "$@"
@@ -90,7 +100,7 @@ def generate_search_index(self):
page_dicts = {
'docs': self._entries,
}
return json.dumps(page_dicts, sort_keys=True, indent=4)
return json.dumps(page_dicts, sort_keys=True)

This comment has been minimized.

@d0ugal

d0ugal Jan 20, 2017

Member

See the compact encoding in the docs, we should consider doing that:

json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',',':'))

From https://docs.python.org/2/library/json.html

I don't really know if it has any drawbacks or limitations.

This comment has been minimized.

@facelessuser

facelessuser Jan 20, 2017

Contributor

It shouldn't have any drawbacks as it will still be valid JSON.

This comment has been minimized.

@d0ugal

d0ugal Jan 20, 2017

Member

Yup, I just read the Python docs. Seems good!

@waylan

This comment has been minimized.

Member

waylan commented Jan 20, 2017

As a reminder, we may also want to explore creating a gzipped copy of the index. Both files should exist side-by-side and properly configured servers will serve the gzipped version.

@waylan waylan added this to To Do in Refactor search. May 2, 2017

@coliff

This comment has been minimized.

Contributor

coliff commented May 5, 2017

This PR is a great help. I was able to decrease the size of my search_index.json from 1250 KB to just 980 KB.

@coliff

This comment has been minimized.

Contributor

coliff commented Nov 1, 2017

I merged the changes made by @davidhrbac on my local installation to search_index.py (https://github.com/mkdocs/mkdocs/blob/master/mkdocs/contrib/legacy_search/search_index.py) and it works well. Could this PR be updated? (If you'd like me to open a new PR with the change I can).

@waylan

This comment has been minimized.

Member

waylan commented Nov 1, 2017

@coliff, the entire search function is scheduled for a complete rewrite. Its likely that the rewrite will actually start from scratch. Therefore any changes to the existing code will likely be scrapped. This PR remains open as a means to remind us to include a similar feature in the rewrite.

As an aside, now that search is contained within a plugin, anyone is free to fork it (within the confines of the license) and make any changes they desire. For that matter, someone could build their own third-party search plugin which is better than and removes the need for us to maintain the one we have now.

@waylan waylan added the Plugin label Nov 1, 2017

@waylan waylan moved this from To Do to Completed in new-search branch in Refactor search. Jan 31, 2018

waylan added a commit to waylan/mkdocs that referenced this pull request Jan 31, 2018

Compress search index
Remove all unnesecary whitespace. Closes mkdocs#1128.

waylan added a commit to waylan/mkdocs that referenced this pull request Feb 27, 2018

Compress search index
Remove all unnesecary whitespace. Closes mkdocs#1128.

@waylan waylan referenced this pull request Feb 27, 2018

Merged

Refactor search plugin #1418

@waylan waylan closed this in #1418 Mar 6, 2018

Refactor search. automation moved this from In Progress to Done Mar 6, 2018

waylan added a commit that referenced this pull request Mar 6, 2018

Refactor search plugin (#1418)
* Use a web worker in the browser with a fallback (fixes #859 & closes #1396).
* Optionally pre-build search index (fixes #859 & closes #1061).
* Upgrade to lunr.js 2.x (fixes #1319).
* Support search in languages other than English (fixes #826).
* Allow the user to define the word separators (fixes #867).
* Only run searches for queries of length > 2 (fixes #1127).
* Remove dependency on require.js, mustache, etc. (fixes #1218).
* Compress the search index (fixes #1128).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment