Fix: `parse_json` also including html in titles #5970

dojutsu-user · 2019-07-20T21:08:04Z

Some titles have html code in them:

parse_json was not yielding the text if title has backticks content.

humitos

Changes look good to me!

humitos · 2019-07-22T13:18:35Z

readthedocs/rtd_tests/tests/test_search_json_parsing.py

@@ -19,6 +19,9 @@ def test_h2_parsing(self):
        )
        self.assertEqual(data['path'], 'api')
        self.assertEqual(data['sections'][1]['id'], 'a-basic-api-client-using-slumber')
+
+        # In api.fjson, title is in the form: A basic API client ``using slumber``
+        self.assertEqual(data['sections'][1]['title'], 'A basic API client using slumber')


In my test the quotes will still be there:

>>> pyquery.PyQuery('A basic API client ``using slumber``').text().replace('¶', '').strip() 'A basic API client ``using slumber``' >>>

When a text like this: Include ``404`` page is converted to html by sphinx. It is something like this: Include <code class=\"docutils literal notranslate\"><span class=\"pre\">404</span></code> page.

We were indexing this line as it is, because of which -- we are getting results with the html code - http://docs.celeryproject.org/en/latest/search.html?q=utilities&check_keywords=yes&area=default

It is something like this:

>>> import pyquery >>> temp = 'Include <code class=\"docutils literal notranslate\"><span class=\"pre\">404</span></code> page' >>> pyquery.PyQuery(temp).text().replace('¶', '').strip() 'Include 404 page'

Thanks for clarifying this!

parse_json fix

733cc38

dojutsu-user requested a review from ericholscher July 20, 2019 21:08

humitos approved these changes Jul 22, 2019

View reviewed changes

ericholscher merged commit 7049425 into readthedocs:master Jul 22, 2019

dojutsu-user deleted the fix-parse-json branch July 22, 2019 16:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: `parse_json` also including html in titles #5970

Fix: `parse_json` also including html in titles #5970

dojutsu-user commented Jul 20, 2019

humitos left a comment

humitos Jul 22, 2019

dojutsu-user Jul 22, 2019

humitos Jul 22, 2019

Fix: parse_json also including html in titles #5970

Fix: parse_json also including html in titles #5970

Conversation

dojutsu-user commented Jul 20, 2019

humitos left a comment

Choose a reason for hiding this comment

humitos Jul 22, 2019

Choose a reason for hiding this comment

dojutsu-user Jul 22, 2019

Choose a reason for hiding this comment

humitos Jul 22, 2019

Choose a reason for hiding this comment

Fix: `parse_json` also including html in titles #5970

Fix: `parse_json` also including html in titles #5970