Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compress search_index.json #1128

Closed
wants to merge 2 commits into from
Closed

Conversation

davidhrbac
Copy link
Contributor

This is a quick solution to compress search_index.json. It replaces duplicated characters. u00a0 is Unicode nowrap space. Also indentation is disable. I could shrink file from 2248293 Bytes to 1752850 Bytes. Might help with search download and start-up. Links to #1127.

@waylan waylan added this to the 1.0.0 milestone Jan 20, 2017
@waylan waylan self-assigned this Jan 20, 2017
@waylan
Copy link
Member

waylan commented Jan 20, 2017

This looks like a good change. Thank you. However, we are not planning on releasing any new version (except perhaps a bug-fix release) before 1.0 which will have the changes planned in #859. Therefore, I'm inclined to hold off on this now and incorporate these changes into that refactor.

@davidhrbac
Copy link
Contributor Author

OK, no rush, I can have it within our CI made with sed like this:

#!/bin/bash
sed 's/\\u00a0/ /g' -i "$@"
sed 's,\\n\\n, ,g' -i "$@"
sed 's,\\n , ,g' -i "$@"
sed 's, \\n, ,g' -i "$@"
sed -e 's/^[ \t]*//' -i "$@"

@@ -90,7 +100,7 @@ def generate_search_index(self):
page_dicts = {
'docs': self._entries,
}
return json.dumps(page_dicts, sort_keys=True, indent=4)
return json.dumps(page_dicts, sort_keys=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the compact encoding in the docs, we should consider doing that:

json.dumps([1,2,3,{'4': 5, '6': 7}], separators=(',',':'))

From https://docs.python.org/2/library/json.html

I don't really know if it has any drawbacks or limitations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't have any drawbacks as it will still be valid JSON.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, I just read the Python docs. Seems good!

@waylan
Copy link
Member

waylan commented Jan 20, 2017

As a reminder, we may also want to explore creating a gzipped copy of the index. Both files should exist side-by-side and properly configured servers will serve the gzipped version.

@coliff
Copy link
Contributor

coliff commented May 5, 2017

This PR is a great help. I was able to decrease the size of my search_index.json from 1250 KB to just 980 KB.

@coliff
Copy link
Contributor

coliff commented Nov 1, 2017

I merged the changes made by @davidhrbac on my local installation to search_index.py (https://github.com/mkdocs/mkdocs/blob/master/mkdocs/contrib/legacy_search/search_index.py) and it works well. Could this PR be updated? (If you'd like me to open a new PR with the change I can).

@waylan
Copy link
Member

waylan commented Nov 1, 2017

@coliff, the entire search function is scheduled for a complete rewrite. Its likely that the rewrite will actually start from scratch. Therefore any changes to the existing code will likely be scrapped. This PR remains open as a means to remind us to include a similar feature in the rewrite.

As an aside, now that search is contained within a plugin, anyone is free to fork it (within the confines of the license) and make any changes they desire. For that matter, someone could build their own third-party search plugin which is better than and removes the need for us to maintain the one we have now.

@waylan waylan added the Plugin label Nov 1, 2017
waylan added a commit to waylan/mkdocs that referenced this pull request Jan 31, 2018
Remove all unnesecary whitespace. Closes mkdocs#1128.
waylan added a commit to waylan/mkdocs that referenced this pull request Feb 27, 2018
Remove all unnesecary whitespace. Closes mkdocs#1128.
@waylan waylan mentioned this pull request Feb 27, 2018
@waylan waylan closed this in #1418 Mar 6, 2018
waylan added a commit that referenced this pull request Mar 6, 2018
* Use a web worker in the browser with a fallback (fixes #859 & closes #1396).
* Optionally pre-build search index (fixes #859 & closes #1061).
* Upgrade to lunr.js 2.x (fixes #1319).
* Support search in languages other than English (fixes #826).
* Allow the user to define the word separators (fixes #867).
* Only run searches for queries of length > 2 (fixes #1127).
* Remove dependency on require.js, mustache, etc. (fixes #1218).
* Compress the search index (fixes #1128).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants