-
Notifications
You must be signed in to change notification settings - Fork 6
/
Tools.pm
324 lines (207 loc) · 6.68 KB
/
Tools.pm
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
package Search::Tools;
use 5.008_003;
use strict;
use warnings::register;
use Carp;
use Scalar::Util qw( openhandle );
use File::Basename;
our $VERSION = '1.002';
use XSLoader;
XSLoader::load( 'Search::Tools', $VERSION );
our $XS_DEBUG = 0;
sub parser {
my $class = shift;
require Search::Tools::QueryParser;
return Search::Tools::QueryParser->new(@_);
}
sub regexp {
my $class = shift;
croak("as of version 0.24 you should use parser() instead of regexp()");
}
sub hiliter {
my $class = shift;
require Search::Tools::HiLiter;
return Search::Tools::HiLiter->new(@_);
}
sub snipper {
my $class = shift;
require Search::Tools::Snipper;
return Search::Tools::Snipper->new(@_);
}
sub transliterate {
my $class = shift;
require Search::Tools::Transliterate;
return Search::Tools::Transliterate->new->convert(@_);
}
sub spellcheck {
my $class = shift;
require Search::Tools::SpellCheck;
return Search::Tools::SpellCheck->new(@_);
}
sub slurp {
my ( $self, $file ) = @_;
my ( $buf, $fh );
my ( $name, $path, $suffix ) = fileparse( $file, qr/\.[^.]*/ );
$suffix = lc($suffix);
if ( $suffix eq '.gz' ) {
require IO::Uncompress::Gunzip;
$fh = IO::Uncompress::Gunzip->new($file);
}
elsif ( $suffix eq '.bz2' ) {
require IO::Uncompress::Bunzip2;
$fh = IO::Uncompress::Bunzip2->new($file)
or die "bunzip2 failed: $IO::Uncompress::Bunzip2::Bunzip2Error\n";
}
else {
require IO::File;
$fh = openhandle($file) || IO::File->new( $file, '<' );
}
die "Failed to open $file: $!" unless $fh;
while ( my $ln = $fh->getline ) {
$buf .= $ln;
}
return $buf;
}
1;
__END__
=pod
=head1 NAME
Search::Tools - high-performance tools for building search applications
=head1 SYNOPSIS
use Search::Tools;
my $string = 'the quik brown fox';
my $qparser = Search::Tools->parser();
my $query = $qparser->parse($string);
my $snipper = Search::Tools->snipper(query => $query);
my $hiliter = Search::Tools->hiliter(query => $query);
my $spellcheck = Search::Tools->spellcheck(query_parser => $qparser);
my $suggestions = $spellcheck->suggest($string);
for my $s (@$suggestions) {
if (! $s->{suggestions}) {
# $s->{word} was spelled correctly
}
elsif (@{ $s->{suggestions} }) {
printf "Did you mean: %s\n", join(' or ', @{$s->{suggestions}}));
}
}
for my $result (@search_results) {
print $hiliter->light( $snipper->snip( $result->summary ) );
}
=head1 DESCRIPTION
As of version 1.000 Search::Tools uses L<Moo> and L<Class::XSAccessor>.
Search::Tools is a set of utilities for building search applications.
Rather than adhering to a particular search application or framework,
the goal of Search::Tools is to provide general-purpose methods for common
search application features. Think of Search::Tools like a toolbox
rather than a hammer.
Examples include:
=over
=item
Parsing search queries for the meaningful terms
=item
Rich regular expressions for locating terms in the original
indexed documents
=item
Contextual snippets showing query terms
=item
Highlighting of terms in context
=item
Spell check terms and suggestions of alternate spellings.
=back
Search::Tools is derived from some of the features in HTML::HiLiter
and SWISH::HiLiter, but has been re-written with an eye to accomodating
more general purpose features.
=head1 METHODS
=head2 parser( I<args> )
Returns a Search::Tools::Parser object, passing I<args> to new().
=head2 regexp
Deprecated. Use parser() instead.
=head2 hiliter( I<args> )
Returns a Search::Tools::HiLiter object, passing I<args> to new().
=head2 snipper( I<args> )
Returns a Search::Tools::Snipper object, passing I<args> to new().
=head2 transliterate( I<str> )
Same as:
Search::Tools::Transliterate->new()->convert( $str )
=head2 spellcheck( I<args> )
Returns a Search::Tools::SpellCheck object, passing I<args> to new().
=head2 slurp( I<filename> )
Reads contents of I<filename> into a scalar variable. Similar to File::Slurp,
but will handle compressed files (.gz or .bz2) transparently
using IO::Uncompress.
=cut
=head1 FUNCTIONS
=head2 describe( I<object> )
XS debugging help. Same as using Devel::Peek.
=head1 REQUIREMENTS
Perl 5.8.3 or later is required. This is for full UTF-8 support.
The following non-core CPAN modules are required:
=over
=item Class::XSAccessor
=item Search::Query
=item Data::Dump
=item Encode
=item Encoding::FixLatin
=item Carp
=back
The following CPAN modules are recommended for the full set of features
and for performance.
=over
=item Text::Aspell
=back
See also the specific module documentation for individual requirements.
=head1 HISTORY
The public API has changed as of version 0.24. The following classes
are now removed:
Search::Tools::Keywords
Search::Tools::RegExp
Search::Tools::RegExp::Keywords
Search::Tools::RegExp::Keyword
The following Search::Tools method is deprecated:
regexp()
The following classes are new as of version 0.24:
Search::Tools::HeatMap
Search::Tools::Query
Search::Tools::QueryParser
Search::Tools::RegEx
Search::Tools::Token
Search::Tools::TokenList
Search::Tools::TokenListPP
Search::Tools::TokenListUtils
Search::Tools::TokenPP
Search::Tools::Tokenizer
=head1 EXAMPLES
See the tests in t/ and the example scripts in example/.
=head1 AUTHOR
Peter Karman C<< <karman@cpan.org> >>
=head1 ACKNOWLEDGMENTS
The original idea and regular expression builder comes from
HTML::HiLiter by the same author, copyright 2004 by Cray Inc.
Thanks to Atomic Learning C<www.atomiclearning.com>
for sponsoring the development of some of these modules.
=head1 BUGS
Please report any bugs or feature requests to C<bug-search-tools at rt.cpan.org>, or through
the web interface at L<http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools>.
I will be notified, and then you'll
automatically be notified of progress on your bug as I make changes.
=head1 SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Search::Tools
You can also look for information at:
=over 4
=item * RT: CPAN's request tracker
L<http://rt.cpan.org/NoAuth/Bugs.html?Dist=Search-Tools>
=item * AnnoCPAN: Annotated CPAN documentation
L<http://annocpan.org/dist/Search-Tools>
=item * CPAN Ratings
L<http://cpanratings.perl.org/d/Search-Tools>
=item * Search CPAN
L<http://search.cpan.org/dist/Search-Tools/>
=back
=head1 COPYRIGHT
Copyright 2006-2009, 2014 by Peter Karman.
This package is free software; you can redistribute it and/or modify it under the
same terms as Perl itself.
=head1 SEE ALSO
HTML::HiLiter, SWISH::HiLiter, L<Moo>, L<Class::XSAccessor>, L<Text::Aspell>
=cut