Skip to content

Commit

Permalink
re-add default searchform docs
Browse files Browse the repository at this point in the history
add blurb about simple theme

text extraction

synonyms
  • Loading branch information
andrewandante authored and dhensby committed Jun 20, 2018
1 parent e08731d commit 53eb826
Show file tree
Hide file tree
Showing 6 changed files with 118 additions and 74 deletions.
3 changes: 2 additions & 1 deletion docs/en/00_index.md
Expand Up @@ -18,12 +18,13 @@
- Advanced configuration
- [Facets](04_advanced_configuration.md#facets)
- [Using multiple indexes](04_advanced_configuration.md#multiple-indexes)
- [Synonyms](04_advanced_configuration.md#synonyms)
- [Analyzers, tokens and token filters](04_advanced_configuration.md#analyzers,-tokenizers-and-token-filters)
- [Spellcheck](04_advanced_configuration.md#spell-check-("did-you-mean..."))
- [Highlighting](04_advanced_configuration.md#highlighting)
- [Boosting](04_advanced_configuration.md#boosting)
- [Indexing related objects](04_advanced_configuration.md#indexing-related-objects)
- [Subsites](04_advanced_configuration.md#subsites)
- [Custom field types](04_advanced_configuration.md#custom-field-types)
= [Text extraction](04_advanced_configuration.md#text-extraction)
- Troubleshooting
- [Gotchas](05_troubleshooting.md#common-gotchas)
15 changes: 12 additions & 3 deletions docs/en/01_getting_started.md
Expand Up @@ -52,6 +52,15 @@ This will:
- Create a DefaultIndex
- Run a [Solr Configure](03_configuration.md#solr-configure) and a [Solr Reindex](03_configuration.md#solr-reindex)

You'll then need to build a search form and results display that suits the functionality of your site.

// TODO update me when https://github.com/silverstripe/silverstripe-fulltextsearch/pull/216 is merged
If you have the [CMS module](https://github.com/silverstripe/silverstripe-cms) installed, you will be able to simply add
`$SearchForm` to your template to add a Solr search form. Default configuration is added via the
[`ContentControllerExtension`](/src/Solr/Control/ContentControllerExtension.php) and alternative
[`SearchForm`](/src/Solr/Forms/SearchForm.php). With the
[Simple theme](https://github.com/silverstripe-themes/silverstripe-simple), this is in the
[`Header`](https://github.com/silverstripe-themes/silverstripe-simple/blob/master/templates/Includes/Header.ss#L10-L15)
by default.

Ensure that you _don't_ have `SilverStripe\ORM\Search\FulltextSearchable::enable()` set in `_config.php`, as the
`SearchForm` action provided by that class will conflict.

You can override the default template with a new one at `templates/Layout/Page_results_solr.ss`.
4 changes: 2 additions & 2 deletions docs/en/03_configuration.md
Expand Up @@ -13,8 +13,8 @@ Solr::configure_server([
'path' => '/solr', // The suburl the Solr service is available on
'version' => '4', // Solr server version - currently only 3 and 4 supported
'service' => 'Solr4Service', // The class that provides actual communcation to the Solr server
'extraspath' => BASE_PATH .'/fulltextsearch/conf/solr/4/extras/', // Absolute path to the folder containing templates used for generating the schema and field definitions
'templates' => BASE_PATH . '/fulltextsearch/conf/solr/4/templates/', // Absolute path to the configuration default files, e.g. solrconfig.xml
'extraspath' => BASE_PATH .'/vendor/silverstripe/fulltextsearch/conf/solr/4/extras/', // Absolute path to the folder containing templates used for generating the schema and field definitions
'templates' => BASE_PATH . '/vendor/silverstripe/fulltextsearch/conf/solr/4/templates/', // Absolute path to the configuration default files, e.g. solrconfig.xml
'indexstore' => [
'mode' => NULL, // [REQUIRED] a classname which implements SolrConfigStore, or 'file' or 'webdav'
'path' => NULL, // [REQUIRED] The (locally accessible) path to write the index configurations to OR The suburl on the Solr host that is set up to accept index configurations via webdav (e.g. BASE_PATH . '/.solr')
Expand Down
102 changes: 101 additions & 1 deletion docs/en/04_advanced_configuration.md
Expand Up @@ -114,7 +114,77 @@ SilverStripe\FullTextSearch\Search\FullTextSearch:
- CoreSearchIndex
```

## Synonyms
## Analyzers, Tokenizers and Token Filters

When a document is indexed, its individual fields are subject to the analyzing and tokenizing filters that can transform
and normalize the data in the fields. You can remove blank spaces, strip HTML, replace a particular character and much
more as described in the [Solr Wiki](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters).

### Synonyms

To add synonym processing at query-time, you can add the `SynonymFilterFactory` as an `Analyzer`:

```php
use SilverStripe\FullTextSearch\Solr\SolrIndex;
use Page;

class MyIndex extends SolrIndex
{
public function init()
{
$this->addClass(Page::class);
$this->addField('Content');
$this->addAnalyzer('Content', 'filter', [
'class' => 'solr.SynonymFilterFactory',
'synonyms' => 'synonyms.txt',
'ignoreCase' => 'true',
'expand' => 'false'
]);
}
}
```

This generates the following XML schema definition:

```xml
<field name="Page_Content">
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
</field>
```

In this case, you most likely also want to define your own synonyms list. You can define a mapping in one of two ways:

* A comma-separated list of words. If the token matches any of the words, then all the words in the list are
substituted, which will include the original token.

* Two comma-separated lists of words with the symbol "=>" between them. If the token matches any word on
the left, then the list on the right is substituted. The original token will not be included unless it is also in the
list on the right.

For example:

```text
couch,sofa,lounger
teh => the
small => teeny,tiny,weeny
```

Then you should update your [Solr configuration](03_configuration.md#solr-server-parameters) to include your synonyms
file via the `extraspath` parameter, for example:

```php
use SilverStripe\FullTextSearch\Solr\Solr;

Solr::configure_server([
'extraspath' => BASE_PATH . '/mysite/Solr/',
'indexstore' => [
'mode' => 'file',
'path' => BASE_PATH . '/.solr',
]
]);
```

Will include `/mysite/Solr/synonyms.txt` as your list after a [Solr configure](03_configuration.md#solr-configure)

## Spell check ("Did you mean...")

Expand Down Expand Up @@ -340,3 +410,33 @@ Find the fields in your overloaded `types.ss` that you want to enable this behav
```xml
<filter class="solr.ASCIIFoldingFilterFactory"/>
```

## Text extraction

Solr provides built-in text extraction capabilities for PDF and Office documents, and numerous other formats, through
the `ExtractingRequestHandler` API (see [the Solr wiki entry](http://wiki.apache.org/solr/ExtractingRequestHandler).
If you're using a default Solr installation, it's most likely already bundled and set up. But if you plan on running the
Solr server integrated into this module, you'll need to download the libraries and link them first. Run the following
commands from the webroot:

```
wget http://archive.apache.org/dist/lucene/solr/4.10.4/solr-4.10.4.tgz
tar -xvzf solr-4.10.4.tgz
mkdir .solr/PageSolrIndexboot/dist
mkdir .solr/PageSolrIndexboot/contrib
cp solr-4.10.4/dist/solr-cell-4.10.4.jar .solr/PageSolrIndexboot/dist/
cp -R solr-4.10.4/contrib/extraction .solr/PageSolrIndexboot/contrib/
rm -rf solr-4.10.4 solr-4.10.4.tgz
```

Create a custom `solrconfig.xml` (see [File-based configuration](03_configuration.md#file-based-configuration)).

Add the following XML configuration:

```xml
<lib dir="./contrib/extraction/lib/" />
<lib dir="./dist" />
```

Now run a [Solr configure](03_configuration.md#solr-configure). You can use Solr text extraction either directly through
the HTTP API, or through the [Text extraction module](https://github.com/silverstripe-labs/silverstripe-textextraction).
66 changes: 0 additions & 66 deletions docs/en/Solr.md
@@ -1,69 +1,3 @@
### Adding Analyzers, Tokenizers and Token Filters

When a document is indexed, its individual fields are subject to the analyzing and tokenizing filters that can transform and normalize the data in the fields. For example — removing blank spaces, removing html code, stemming, removing a particular character and replacing it with another
(see [Solr Wiki](http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters)).

Example: Replace synonyms on indexing (e.g. "i-pad" with "iPad")

```php
use SilverStripe\FullTextSearch\Solr\SolrIndex;

class MyIndex extends SolrIndex
{
public function init()
{
$this->addClass(Page::class);
$this->addField('Content');
$this->addAnalyzer('Content', 'filter', ['class' => 'solr.SynonymFilterFactory']);
}
}
```

Generates the following XML schema definition:

```xml
<field name="Page_Content" ...>
<filter class="solr.SynonymFilterFactory" synonyms="syn.txt" ignoreCase="true" expand="false"/>
</field>
```

### Text Extraction

Solr provides built-in text extraction capabilities for PDF and Office documents,
and numerous other formats, through the `ExtractingRequestHandler` API
(see http://wiki.apache.org/solr/ExtractingRequestHandler).
If you're using a default Solr installation, it's most likely already
bundled and set up. But if you plan on running the Solr server integrated
into this module, you'll need to download the libraries and link the first.

```
wget http://archive.apache.org/dist/lucene/solr/3.1.0/apache-solr-3.1.0.tgz
mkdir tmp
tar -xvzf apache-solr-3.1.0.tgz
mkdir .solr/PageSolrIndexboot/dist
mkdir .solr/PageSolrIndexboot/contrib
cp apache-solr-3.1.0/dist/apache-solr-cell-3.1.0.jar .solr/PageSolrIndexboot/dist/
cp -R apache-solr-3.1.0/contrib/extraction .solr/PageSolrIndexboot/contrib/
rm -rf apache-solr-3.1.0 apache-solr-3.1.0.tgz
```

Create a custom `solrconfig.xml` (see "File-based configuration").
Add the following XML configuration.

```xml
<lib dir="./contrib/extraction/lib/" />
<lib dir="./dist" />
```

Now apply the configuration:

```
vendor/bin/sake dev/tasks/Solr_Configure
```

Now you can use Solr text extraction either directly through the HTTP API,
or indirectly through the ["textextraction" module](https://github.com/silverstripe-labs/silverstripe-textextraction).

## Adding DataObject classes to Solr search

If you create a class that extends `DataObject` (and not `Page`) then it won't be automatically added to the search
Expand Down
2 changes: 1 addition & 1 deletion src/Solr/SolrIndex.php
Expand Up @@ -137,7 +137,7 @@ public function getTypes()
*
* @param string $field
* @param string $type
* @param Array $params Parameters for the analyzer, usually at least a "class"
* @param array $params parameters for the analyzer, usually at least a "class"
*/
public function addAnalyzer($field, $type, $params)
{
Expand Down

0 comments on commit 53eb826

Please sign in to comment.