[4.0] Changing format handling in Smart Search #24464
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Smart Search has the ability to parse different formats of data, be it HTML, TXT or RTF. And in theory you could extend that to parse also PDFs or DOCX. Unfortunately, right now you are forced to decide which format the whole result item is in for all properties when calling index() in your finder plugin. This prevents us from having results that contain normal HTML and then maybe a file in PDF and another in DOCX format. It is a bit the Highlander issue...
This PR changes that so that you can set the format of the property when telling the indexer how to index that part.
How to test?
In core Joomla, this should not have any impact at all. So simply run the indexer and/or use finder how you did before and you should not notice a difference. I'm asking however for a codereview on this to make sure that my idea is sane. Thus I'm calling for the powers of @chrisdavenport and @wilsonge 馃槈
In another PR, I would also like to add the possibility to point to a file path. If we really want to allow indexing of PDF files for example, we have to change the system a bit to not add the whole file to the result object, too.