Improved HTML parser for Smart Search #5340

chrisdavenport · 2014-12-07T16:46:31Z

This PR fixes several known issues with the HTML parser used by Smart Search. Most of the issues result from breaking the input string into 2Kb chunks to improve performance when saving large articles.

In particular

Correctly removes HTML tags even when part of the tag (usually a tag which has attributes) crosses the 2Kb boundary.
Correctly removes script blocks even when the block crosses the 2Kb boundary.
Removes embedded style blocks.
Also incorporates the fix from PR Fixes Issues in "Smart Search" [#5204] #5206 from @smanzi.

To test this PR

Make sure that Smart Search is setup and working.
Make sure you can save embedded JavaScript and stylesheet blocks: you might need to adjust your input filter settings and/or change your default editor to "none" in order to do this.
Create an article and enter some text then save it.
Check that the words in the article have been indexed (the quickest way to do this is to use the auto-completion feature on the front-end search forms).
Check that none of the HTML tag names or any words contained in attributes of the HTML tags, have been indexed.
Try to embed some JavaScript and check that none of the words contained in the JavaScript code have been indexed. You don't need to use real JavaScript to do this; just use some dummy text.
Try to embed some stylesheet declarations and check that none of the stylesheet keywords have been indexed. You don't need to use real CSS to do this; just use some dummy text.
If you're really clever you can try the same tests with the words or embedded blocks crossing a 2Kb boundary.

smanzi · 2014-12-07T23:58:08Z

Thanks! I will test this ASAP.
If it works (and I'm quite sure it will!) this is something that should definitely go into 3.4!

dgrammatiko · 2014-12-07T23:59:13Z

@smanzi I just finished my document for the test:

<div id="navigation">
        [stuff]
    </div>
    <div id="content">
        [stuff]
    </div>
<p>You could apply styles to <span class="whatever">this text</span> or <span class="whatever">tis thext</span> using the span tag.</p>

<title>Shiny Gongs</title>

<link rel="stylesheet" type="text/css" href="default.css" />

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<meta name="keywords" content="darwin, evolution, natural selection, species, beagle, 1859" />
<meta scheme="ISBN"  name="identifier" content="0-14-043205-1" />

<h1>The main heading</h1>
<h2>A subheading</h2>
<h2>Another subheading</h2>
<h3>Another subheading</h3>
<h4>Another subheading</h4>
<h5>Another subheading</h5>
<h6>Another subheading</h6>

<p>ra ra ra ra ra</p>
<p>That's <strong>strong emphasis</strong> ladies and gentlemen.</p>
<p>That's <em>emphasis</em> ladies and gentlemen.</p>
<abbr title="HyperText Markup Language">HTML</abbr>
<acronym title="Cascading Style Sheets">CSS</acronym>
<address>77 HTML Dog Road, Ealing, London</address>
<p>The output of this <bdo dir="rtl">word</bdo> will actually be "drow".</p>
<blockquote cite="http://www.htmldog.com/reference/htmltags/blockquote/">
    <p>A large quotation. The content of a blockquote element must include block-level elements such as headings, lists, paragraphs or div's.</p>
    <p>cite can be used to specify the location (in the form of a URI) where the quote has come from.</p>
</blockquote>
<p>Bob said <q>sexy pyjamas</q> but Chris said <q>a kimono</q></p>
<p>You can use the <code><?php echo 'any errors?'; ?></code> tag to define computer code.</p>
<p>It really was <ins cite="rarara.html" datetime="20031024">very</ins> good.</p>
<p>It really was<del cite="rarara.html" datetime="20031023">n't</del> very good.</p>
<p><dfn title="Microsoft web browser">Internet Explorer</dfn> is the most popular browser used underwater.</p>
<p>Type <kbd>www.htmldog.com</kbd> into your browser.</p>

<pre>
<code>&lt;html&gt;</code>
    <code>&lt;head&gt;</code>
    <code>&lt;/head&gt;</code>
    <code>&lt;body&gt;</code>
        <code>[stuff]</code>
    <code>&lt;/body&gt;</code>
<code>&lt;/html&gt; </code>
</pre>

<p>If you select the 'champion' option, you will receive the message <samp>The monkey is not a caterpillar</samp>.</p>

<code><var>wordcount</var> = 6878;</code>

<p>some text ra ra<br />
and some more ra ra</p>

<p><a href="http://www.htmldog.com">Link to a URI</a></p>
<p><a href="#content">Link to a page anchor</a></p>

<img src="http://www.htmldog.com/images/logo.gif" alt="HTML Dog" />

<map id ="atlas">
    <area shape ="rect" coords ="0,0,115,90" href ="northamerica.html" alt="North America" />
    <area shape ="poly" coords ="113,39,187,21,180,72,141,77,117,86" href ="europe.html" alt="Europe" />
    <area shape ="poly" coords ="119,80,162,82,175,102,183,102,175,148,122,146" href ="africa.html" alt="Africa" />
</map>

<object classid="clsid:D27CDB6E-AE6D-11cf-96B8-444553540000" codebase="someplace/swflash.cab" width="200" height="300" id="penguin">
    <param name="movie" value="flash/penguin.swf" />
    <param name="quality" value="high" />
    <img src="images/penguin.jpg" width="200" height="300" alt="Penguin" />
</object>

<ul>
    <li>This</li>
    <li>That</li>
    <li>The other</li>
</ul>

<ol>
    <li>First item</li>
    <li>Second item</li>
    <li>Third item</li>
</ol>

<dl>
    <dt>Dog</dt>
    <dd>A carnivorous mammal of the family Canidae.</dd>
</dl>

<table>
    <thead>
        <tr>
            <th>Header 1</th>
            <th>Header 2</th>
            <th>Header 3</th>
        </tr>
    </thead>
    <tfoot>
        <tr>
            <td>Footer 1</td>
            <td>Footer 2</td>
            <td>Footer 3</td>
        </tr>
    </tfoot>
    <tbody>
        <tr>
            <td>Cell data 1</td>
            <td>Cell data 2</td>
            <td>Cell data 3</td>
        </tr>
        <tr>
            <td>Cell data 4</td>
            <td>Cell data 5</td>
            <td>Cell data 6</td>
        </tr>
        <tr>
            <td>Cell data 7</td>
            <td>Cell data 8</td>
            <td>Cell data 9</td>
        </tr>
    </tbody>
</table>

<table>
    <colgroup span="2" class="columns1and2"></colgroup>
    <tr>
        <th>lime</th>
        <th>lemon</th>
        <th>orange</th>
        <th>blood orange</th>
    </tr>
    <tr>
        <td>8</td>
        <td>7</td>
        <td>12</td>
        <td>5</td>
    </tr>
</table>

<form action="/somedirectory/somformprocessingscript.php" method="post">
    <div>House number: <input type="text" name="housenumber" /></div>
    <div>Street: <input type="text" name="street" /></div>
    <div><input type="submit" /></div>
</form>

<script type="text/javascript" src="somescript.js"></script>
<script type="text/javascript">
    function koala() {
        alert('KOALA! KOALA!');
    }
</script>

<noscript>
    <p>What? No JavaScript?</p>
</noscript>


<p><b>This is bold</b>, <i>this is italic</i>, <tt>this is teletype</tt>.</p>
<hr />
<p><sub>This is subscript</sub>, <sup>this is superscript</sup>, <big>this is big</big>, <small>this is small</small>.</p>

smanzi · 2014-12-08T00:02:58Z

Perfect, Dimitris, thanks, but... too neat! I'll minify this into a "one liner"...

dgrammatiko · 2014-12-08T00:08:20Z

@test success For the give input above the results are like:

Thanks Chris 👍

@smanzi you will need also this one #5336

smanzi · 2014-12-08T00:25:46Z

@dgt41 are you on latest staging? I can't run the indexer... no progress bar... stuck... only:

Starting Indexer
The indexer is being initialised. Do not close this window.

smanzi · 2014-12-08T00:26:36Z

Just to be clear the above problem is without this PR!

dgrammatiko · 2014-12-08T00:26:48Z

@smanzi check #5336

dgrammatiko · 2014-12-08T00:28:14Z

@test on a 2k boundary success

Article used:

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaa<script src="/nothere/notthere.js" /> some

Result:

smanzi · 2014-12-08T00:29:21Z

WTF! #5336 MUST be merged!!! 😄 thanks! 👍

smanzi · 2014-12-08T00:39:57Z

Seems to be really OK, but, being the a..hole I am, I want to test it more thoroughly and also in multilingual environment

In the meanwhile I'm going (my tail between legs) to close #5206

Thanks Chris!!

chrisdavenport · 2014-12-08T00:45:51Z

It definitely needs testing in a multi-lingual environment. I think it should be okay because all the string manipulations are done using byte offsets, but it really does need testing to make sure.

Thanks for #5206. It was the stimulus I needed to get the new parser finished!

smanzi · 2014-12-08T00:51:01Z

ah, ah... first small issue:
I slightly modified your test as here below:

    <div id="content">
        [stuff]
    </div>
<h1Title</h1><p>testing if you could apply

Searching for "Testing" doesn't give any result, but searchin for "apply" (a word close to "testing") I get this:

Got you!!! 😄

smanzi · 2014-12-08T00:53:30Z

I'm DUMB
I did disable the PR, sorry...
retesting now...

dgrammatiko · 2014-12-08T00:54:11Z

But <h1Title</h1> is not valid! Using any of the editors (other than none) this will never appear. Also using none it means you know the basics and that there is no safety net! Sorry Sergio not a bug for me

smanzi · 2014-12-08T00:57:35Z

I always use "editor: none" 😃
it is valid in the article: mistake only here. It really is:

    <div id="content">
        [stuff]
    </div>
<h1>Title</h1><p>testing if you could apply

smanzi · 2014-12-08T00:59:05Z

👍

dgrammatiko · 2014-12-08T01:02:38Z

@smanzi @chrisdavenport Confirmed that tags in series should have a space to separate the words

smanzi · 2014-12-08T01:04:07Z

@chrisdavenport Chris, about multilingual do you know of the bug (feature?) that makes so that if you have content flagged for "All" languages it is really searched only for the "default" language and not any other?

This is driving me crazy, because on one of my sites (bilingual) I also have pages where content is not assigned to a particular language but to "All" (It really is content for all languages!), and... I can't find it in the secondary language. But this is of course another story....

smanzi · 2014-12-08T01:05:05Z

@dgt41 really??? My bad ignorance.... sorry!

dgrammatiko · 2014-12-08T01:07:09Z

@chrisdavenport @smanzi Sorry I removed the patch earlier 😕

smanzi · 2014-12-08T01:09:02Z

@dgt41 This is generated by JCH Optimizer Pro. It might be their mistake, but...

... <a href=#>My awesome new website</a></h2><p>Welcome to our new web site!</p> ...

smanzi · 2014-12-08T01:13:08Z

@dgt41 I think only inline elements (eg. <span>...</span>) must have a space, not block elements (like <h1>...</h1>)

chrisdavenport · 2014-12-08T01:19:35Z

So, is there a problem? If there is, can you give me a specific test where it fails?

chrisdavenport · 2014-12-08T01:21:21Z

@smanzi Regarding the language issue, can you open a separate issue for that (assuming you haven't already)? Let's keep the issues separate. Thanks.

smanzi · 2014-12-08T01:22:09Z

No Chris, no problem! The only problem is me, that I disabled your PR when I couldn't index (because of the lack of PR #5336) and then forgot to re-enable it...

So far everything is OK!

smanzi · 2014-12-08T01:23:30Z

@chrisdavenport The other issue... I've opened it at the times of old JTracker, and I think I also reopened here in GitHub. Let me check...

dgrammatiko · 2014-12-08T01:24:45Z

@chrisdavenport All Good here as well @test success

smanzi · 2014-12-08T01:26:24Z

@chrisdavenport There is #5204 where I reported both issues: the one for the tags without spacing and also the multilingual search...

smanzi · 2014-12-08T01:29:24Z

@chrisdavenport Do you mind if I wait to give you the @test until I finished some more tests? Anyway... it seems REALLY OK!

dgrammatiko · 2014-12-08T01:29:45Z

@chrisdavenport Spacing problem also occurs in 2k boundary, try this:

Aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaa<script src="/nothere/notthere.js" ><script/> some

The word some should come up as single word but it comes up as aaaaaaaaaaaaasome

…im so that tokens remain separated when 2Kb chunks are re-assembled.

chrisdavenport · 2014-12-08T09:31:37Z

Okay, I fixed a couple of bugs.

Bug reported by @dgt41 The trim in the process method was removing the separation between tokens across the 2Kb boundary, so I removed the trim as it was overkill anyway.
Duplicate substrings were being added to the output when the text contained invalid script or style blocks (that is, blocks with a start tag but no corresponding end tag).

smanzi · 2014-12-08T16:59:51Z

@dgt41 How do you make sure that a short string used for searching falls exactly at the 2k boundary inside a long string? 2k starting counting from what?

dgrammatiko · 2014-12-08T17:08:10Z

@smanzi count 2048 a or any other character to get to 2k boundary, subtract few for the tag e.g. <script is 6 chars so 2042 a plus <script src="whatever"></script> ensures that the tag rests and crosses the boundary… Hint: Word got a word count that also counts chars

smanzi · 2014-12-08T17:09:57Z

I have prepared a test file (an extended an slightly modified version of the @dgt41 one).
Everybody can download it from http://smz.it/test-files/test-for-com_finder-v1.zip
Unzip, copy content from the included .html file and paste it inside one or more articles using editor none (to be sure it is not modified by WYSIWYG editors...)

N.B.: it is not enough to switch editor: editor none must be the default editor...

smanzi · 2014-12-08T17:12:09Z

@test success
Thanks again, Chris! 👍

smanzi · 2014-12-08T17:34:32Z

@dgt41 Thanks Dimitris!
Ehmmm... why don't you take my "-v1" test file and make a "-v2" out of it with the 2k long string correctly set up? 😇

You can then send it to me by mail or skype...

dgrammatiko · 2014-12-08T19:09:44Z

@test success no trim on the 2k boundary!

smanzi · 2014-12-09T16:09:34Z

@dgt41 Dimitris, can you give the @test to this also in http://issues.joomla.org/tracker/joomla-cms/5340 so that this can go RTC? Thanks!

wilsonge · 2014-12-14T01:20:32Z

RTC

_{This comment was created with the J!Tracker Application at issues.joomla.org/joomla-cms/5340.}

wilsonge · 2014-12-14T01:23:10Z

And merged into staging. Thanks Chris!

Improved HMTL parser for Smart Search

6fd968c

jissues-bot added the PR-staging label Dec 7, 2014

chrisdavenport mentioned this pull request Dec 7, 2014

Fixes Issues in "Smart Search" [#5204] #5206

Closed

Prevent substring duplication when removing invalid blocks. Remove tr…

36e9735

…im so that tokens remain separated when 2Kb chunks are re-assembled.

This was referenced Dec 8, 2014

Issues in "Smart Search" (com_finder) #5204

Closed

com_finder - Issue in multilingual searches #5352

Closed

wilsonge closed this in 63e7772 Dec 14, 2014

wilsonge added this to the Joomla! 3.4.0 milestone Dec 14, 2014

Improved HTML parser for Smart Search #5340

Improved HTML parser for Smart Search #5340

Conversation

chrisdavenport commented Dec 7, 2014

smanzi commented Dec 7, 2014

dgrammatiko commented Dec 7, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

chrisdavenport commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

chrisdavenport commented Dec 8, 2014

chrisdavenport commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

chrisdavenport commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

smanzi commented Dec 8, 2014

dgrammatiko commented Dec 8, 2014

smanzi commented Dec 9, 2014

wilsonge commented Dec 14, 2014

wilsonge commented Dec 14, 2014