Skip to content
Find file
Fetching contributors…
Cannot retrieve contributors at this time
3236 lines (2159 sloc) 127 KB
Wed Mar 16 13:37:17 GMT 2011 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py,xappy/cachemanager/xapian_manager.py,
xappy/indexerconnection.py,xappy/searchconnection.py,
xappy/unittests/multiple_caches.py: Patch from Bruno Rezende -
allow multiple caches to be applied. Closes ticket #36.
Mon Dec 20 14:24:39 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/searchresults.py,xappy/unittests/collapse.py: Add access to
the collapse_key and collapse_count properties of search results.
Add test of these.
Fri Nov 05 13:14:23 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/indexerconnection.py: Add documentation comment
advising that apply_cached_items() should probably be called
here.
Thu Sep 09 15:25:08 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Add set_queryid() method, to allow
stored query ids to be set explicitly.
Thu Jun 03 18:20:47 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py,utils/make_xappy_tarballs: Updated tarballs
and branchpoints. New xapian packages include a bug fix for the
matcher which was occasionally causing documents to be missed
from the result set.
Sun May 09 09:45:53 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py: New packages, patched to avoid a segfault
with ValueGePostList.
Fri May 07 02:44:00 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py: Updated packages - should give considerably
better performance for range queries.
Thu May 06 13:17:09 GMT 2010 Richard Boulton <richard@tartarus.org>
* external_posting_source/sortdatabase/Makefile: Fix to work with
latest xapian packages, by dropping the -1.1 suffix to
xapian-config. Thanks to Shane Evans for pointing it out.
Wed May 05 09:26:11 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/build_xapian.sh: Use xapian-config not xapian-config-1.1,
now we're using packages based on 1.2.x
Tue May 04 21:54:06 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py: Use new xapian archives - in particular,
these fix a replication bug.
Tue Mar 30 17:16:10 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/unittests/cached_searches.py: Test that replacing a
document when using store_only=True drops the document from
cached queries.
Tue Mar 30 17:08:58 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/indexerconnection.py: If a document is replaced with
store_only set, remove it from the cache.
Tue Mar 30 16:40:31 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/unittests/spell_correct_1.py: Enable an old test for a bug
which has long been fixed.
Wed Mar 10 15:15:41 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py: Link to new version of xapian.
Thu Mar 04 18:07:38 GMT 2010 Richard Boulton <richard@tartarus.org>
* utils/verify_cache.py: Move cache verifier into
xappy.cachemanager.verify_cache, so that it'll be available in
installed versions of xappy.
Thu Mar 04 18:06:34 GMT 2010 Richard Boulton <richard@tartarus.org>
* utils/verify_cache.py: Expand verification routine to check the
cache much more thoroughly, and report errors more clearly.
Thu Mar 04 16:39:51 GMT 2010 Richard Boulton <richard@tartarus.org>
* utils/verify_cache.py: Add a utility to verify the integrity of a
cache.
Wed Feb 24 10:26:38 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/datastructures.py,xappy/unittests/calc_hash.py: Add a
ProcessedDocument.remove_term() method, and a brief test of it.
Tue Feb 23 00:35:35 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Fix the backwards compatibility
fallback for setting keymaker parameters to catch the correct
exception type.
Tue Feb 23 00:17:14 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py,xappy/unittests/sort.py: Sort document
with missing values in the slots to the end.
Mon Feb 22 23:38:17 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/mset_search_results.py: Support both the old and new xapian
APIs for getting facet values.
Mon Feb 22 23:14:54 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py: Update to latest versions of xapian (now
taken directly from the "xappy" branch in Xapian SVN, rather than
from combining all the feature branches).
Sat Feb 06 11:20:27 GMT 2010 Richard Boulton <richard@tartarus.org>
* utils/make_xappy_tarballs: Update branchpoints.
Sat Feb 06 11:19:18 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/query.py: Basic fixes to the Query.get_facet{,s} code. No
tests for this yet, since it's not actually hooked in to running
tests.
Fri Feb 05 09:34:08 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/doctests/searchconnection_doctest2.txt,
xappy/mset_search_results.py: Remove use of score_evenness(): use
a much simpler algorithm instead. Testing with real-world data
shows that score_evenness() doesn't return a particularly useful
value anyway, so it'll be being removed from Xapian soon. We're
likely to remove get_suggested_facets() at some point soonish,
too.
Thu Feb 04 15:24:01 GMT 2010 Richard Boulton <richard@tartarus.org>
* research/expand_prefixes.py: Add a little research script.
Tue Feb 02 16:48:19 GMT 2010 Richard Boulton <richard@tartarus.org>
* utils/make_xappy_tarballs: Update branchpoint.
* xappy/mset_search_results.py: Use UnbiasedNumericRanges instead
of NumericRanges if it's available.
Mon Feb 01 11:19:12 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Check for the correct metadata key.
Wed Jan 27 15:20:15 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/indexerconnection.py,xappy/searchconnection.py,
xappy/unittests/cached_searches.py: Add ability to copy a cache
into the main xapian index. When this has been done, open the
cache automatically, both in future IndexerConnections and in
SearchConnections. Also, complain if an IndexerConnection which
has had a cache applied doesn't have a cache connected when a
deletion is performed.
Tue Jan 26 14:12:56 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/unittests/cached_searches.py: Test asking for facets with a
cached query id, when none are cached.
Tue Jan 26 14:01:01 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/doctests/searchconnection_doctest2.txt,
xappy/unittests/facets.py: Fix a couple of facet testcases which
got missed.
Tue Jan 26 13:36:59 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Handle case where there are no cached
facets for a query, but facets are requested. Need to add a
test for this.
Tue Jan 26 13:04:49 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py,xappy/mset_search_results.py,
xappy/unittests/cached_searches.py: Revert sort order of facet
values to be by ascending order of key - putting them in
frequency order wasn't particularly helpful, and some external
code might depend on the order. Also makes it easier to merge
facet values from multiple databases.
Tue Jan 26 10:51:36 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Add clear_facets method to base
class.
Tue Jan 26 09:43:01 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/cache_search_results.py,xappy/cachemanager/generic.py,
xappy/doctests/searchconnection_doctest2.txt,
xappy/mset_search_results.py,xappy/searchconnection.py,
xappy/searchresults.py,xappy/unittests/cached_searches.py,
xappy/unittests/facets.py: Finish implementing support for facets
and stats served from the cache.
Thu Jan 21 16:15:28 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Add support for setting facets and
stats individually and accumulating them.
Thu Jan 21 15:32:59 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Ensure that queries actually are
connected to the SearchConnection that they're returned from.
Thu Jan 21 15:32:06 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/unittests/query_serialise.py: Test that empty queries are
connected when unserialised.
Wed Jan 20 17:01:58 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/query.py: Add support for requesting facets to the Query
object: not yet hooked up.
Wed Jan 20 16:56:53 GMT 2010 Richard Boulton <richard@tartarus.org>
* docs/introduction.rst,xappy/unittests/db_type1.py,
xappy/unittests/weight_action.py: Update for new default backend,
and availability of brass backend. (Percentage weights are
slightly different with chert, because it stores more statistics
which allow tighter bounds.)
Mon Jan 18 16:36:14 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/indexerconnection.py: Change default type for new databases
to chert, and add brass as an option.
Fri Jan 15 10:56:33 GMT 2010 Richard Boulton <richard@tartarus.org>
* external_posting_source/sortdatabase/sortdatabase.cc: Include
cstdio to fix warning on pickier new compilers.
Fri Jan 15 10:54:26 GMT 2010 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py,utils/make_xappy_tarballs: Update for new
version of xapian (main difference is that xapian-compact works
with chert in this version).
Mon Jan 04 16:56:05 GMT 2010 Richard Boulton <richard@tartarus.org>
* xappy/searchresults.py: Fix to stop completely ignoring the
results of _reorder_by_clusters().
Wed Dec 30 22:59:51 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/datastructures.py: Add get_terms() method to get the terms
in a given field in a ProcessedDocument.
Wed Dec 30 19:26:52 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cache_search_results.py: Fix bug in accessing results
directly by index.
Wed Dec 30 19:26:18 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/unittests/cached_searches.py: Check accesses to search
results which didn't start with 0.
Wed Dec 30 15:37:32 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/unittests/testdata/chert_db/iamchert: Update chert format.
Wed Dec 30 15:03:45 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py,xappy/cachemanager/xapian_manager.py,
xappy/unittests/cachemanager.py: Add a "clear()" method to
CacheManager to delete all the items in the cache.
Wed Dec 30 13:21:14 GMT 2009 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py: New xapian packages. Includes performance
improvement for replace_document() when few changes have been
made to a document.
Wed Dec 30 12:20:46 GMT 2009 Richard Boulton <richard@tartarus.org>
* utils/make_xappy_tarballs: Update branchpoints.
Tue Dec 15 18:42:13 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Fix bug with facets for search
with cached hits - was always trying to use cached facets, too.
Tue Dec 15 18:40:42 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/unittests/cached_searches.py: Add tests of getting facets
from cached searches.
Tue Dec 15 16:42:10 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cache_search_results.py: Remove old parameter from
get_suggested_facets().
Tue Dec 15 16:09:04 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cache_search_results.py,xappy/mset_search_results.py,
xappy/searchconnection.py: Remove CacheResultStats and
MSetResultStats - replace them with a generic ResultStats which
takes both the mset and the cached value, and returns the cached
values if not None, and reads the mset otherwise. Adjust the
search code to perform the search if any of the cached values are
None.
* xappy/unittests/cached_searches.py: Minimal test for stats return
values.
Tue Dec 15 16:08:23 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Revert previous fix to stats
storage, but also don't store anything if all stats are None.
Tue Dec 15 14:14:27 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Return None if no stats, instead
of (None, None, None).
Tue Dec 15 14:12:03 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Update a couple of comments.
Tue Dec 15 09:56:56 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/mset_search_results.py,xappy/searchresults.py,
xappy/unittests/facets.py: Add SearchResults.get_facets() which
returns all the facets. Refactor the facet calculation to store
results in a more useful form for returning this.
Tue Dec 15 08:55:49 GMT 2009 Richard Boulton <richard@tartarus.org>
* setup.py,xappy/__init__.py: Bump version to 0.6.0 - release won't
be for a little while yet, but this allows me to test the version
for backward compatibility.
Tue Dec 15 08:43:24 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/mset_search_results.py,xappy/searchconnection.py,
xappy/searchresults.py: Refactor facet calculation. Scoring now
happens immediately, rather than when get_suggested_facets() is
called. This requires adding a "facet_desired_num_of_categories"
parameter to SearchConnection.search(). I'm likely to remove
get_suggested_facets() entirely soon, since experience has shown
that the facet suggestion algorithm it uses isn't of much
practical use: it's probably better to just return all the valid
facets for now, and allow higher layers to choose them, until we
know a better algorithm.
* docs/introduction.rst,xappy/doctests/searchconnection_doctest2.txt:
Update to expect range facets values in a tuple rather than a
list.
Mon Dec 14 18:44:00 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/mset_search_results.py,xappy/searchconnection.py: Pass the
field types through to the facet results directly, instead of the
kwargs from the action.
Mon Dec 14 13:58:19 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/unittests/query_id.py: Test query_id for correct
serialisation.
Mon Dec 14 13:54:51 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Fix argument serialisation for query
constructors which have no default arguments: fixes
evalable_repr() for queries created with query_id().
Mon Dec 14 13:47:26 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cache_search_results.py,xappy/searchconnection.py: Tie in
facets and stats from the cache.
Mon Dec 14 13:45:31 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Implement get_facets() properly.
Mon Dec 14 11:41:35 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py: Add storage of statistics and
facets to the cache.
Thu Dec 10 11:34:07 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/highlight.py: Use thread local storage correctly - make the
"stemmers" dict on demand in each thread.
Thu Dec 10 11:28:01 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/highlight.py: Cache the stemmers, and the results of the
stemmers, in thread local storage, to allow the cache to be
reused across uses of the Highlighter. Based on a patch from
Bruno Rezende.
Wed Dec 09 13:09:35 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Add stats_checkatleast, to allow the
value of checkatleast when calculating stats to be set.
Wed Dec 09 12:05:31 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cache_search_results.py,xappy/mset_search_results.py,
xappy/searchresults.py: Get the start and end rank from the
ordering instead of the stats; makes a lot more sense this way.
Wed Dec 09 03:24:52 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/utils.py: Add missing import of "math".
Wed Dec 09 03:20:51 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py,xappy/searchresults.py,
xappy/unittests/cached_searches.py: Refactor searchresults to use
the cache automatically if the query is appropriate.
Wed Dec 09 03:19:39 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cache_search_results.py: Add a preliminary set of classes
for representing the results of reading the cache in a form
suitable for use in search results.
Wed Dec 09 03:18:39 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/query.py: Add support for storing the queryid for a cached
query in the Query object.
Wed Dec 09 01:04:03 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/doctests/searchconnection_doctest2.txt: Be less susceptible
to garbage collection timing.
Tue Dec 08 12:29:52 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/doctests/searchconnection_doctest2.txt,
xappy/mset_search_results.py,xappy/searchconnection.py,
xappy/searchresults.py: Pull out the mset-specific code for
search results into mset_search_results.py
Tue Dec 08 11:49:10 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py,xappy/searchresults.py: Factor out
facet information to a MSetFacetResults object.
Tue Dec 08 11:17:20 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py,xappy/searchresults.py: Refactor
getting result stats into a separate object.
Tue Dec 08 10:54:27 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py,xappy/searchresults.py,xappy/utils.py:
Move add_to_dict_of_dicts() into utils.py. Add
SearchResultContext, used to bundle up all the resources that a
SearchResult object might need.
Tue Dec 08 09:25:54 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/: Split get_significant_digits into separate utility
function, in new "utils.py" file. Refactor SearchResults
reordering stuff to use a MSetResultOrdering or
ReorderedMSetResultOrdering object to control get_hits and iter:
removes the _mset_order member from SearchResults. Working
towards removing the _mset member, too.
Tue Dec 08 00:06:31 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/: Backport licenses from the xappy2 repository at
http://code.google.com/p/xappy2/. Also, some minor tidying up of
inline calls to doctest - these are no longer required since the
top level test script can be used instead. Also, fix
documentation comment for query.py which had been cut-and-pasted
from searchconnection.py.
Mon Dec 07 20:18:51 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/: Split the inverters into separate files, and
fallback to inmemory if numpy is not available. Various other
code tidyups.
Mon Dec 07 14:15:03 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py,xappy/searchresults.py: Split the
SearchResults and SearchResult classes out of
searchconnection.py, as a first step to making the code a bit
more readable. Also, minimise the number of imported symbols,
get rid of the "*" in "from foo import *", and use the proper
names of various imports rather than prefixing them with an
underscore.
Sat Dec 05 23:50:15 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/doctests/indexerconnection_doctest1.txt: Be less fussy
about required text in exception message (it's changed with
latest Xapian).
Sat Dec 05 23:37:58 GMT 2009 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py,utils/make_xappy_tarballs: Link to packages
built with new version of xapian.
Wed Dec 02 01:16:00 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py,
xappy/cachemanager/xapian_manager.py: Use the hash of query
strings, rather than the raw strings, to avoid problems with key
length limits in xapian. Store the raw string additionally, so
we can report a cache miss in the event of a collision, and so we
can iterate through the stored query strings.
Tue Dec 01 23:54:39 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/query.py,xappy/unittests/query_serialise.py: Improve
serialisation functions: flatten nested combinations into lists,
and improved the serialised representations of these too. Unify
representation of queries joined with .compose(OP_AND) and those
joined with &, and similarly for OP_OR and |.
Tue Dec 01 23:53:22 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Add 'Query' to the unserialisation
context. Fix an incorrect return type if a field was specified
for geolocation sorting which hadn't been indexed appropriately.
Tue Dec 01 12:42:35 GMT 2009 Richard Boulton <richard@tartarus.org>
* MANIFEST.in,setup.py: Add xappy.cachemanger to distribution.
Tue Dec 01 11:47:46 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/query.py: Improve the serialisation of the result of
norm().
Mon Nov 30 12:10:19 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/searchconnection.py: Check that only one of docid and xapid
are set in get_document() before entering the while loop. This
fixes a bug where the xapid got set later in the loop, so when a
DatabaseModifiedException error happened, the xapid was set the
next time around, triggering the error.
Thu Nov 26 17:29:47 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/query.py,xappy/unittests/cached_searches.py: Add
Query.merge_with_cached() to make applying a cached query easier.
Add a test for this.
Thu Nov 26 12:58:38 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/query.py,xappy/unittests/weight_action.py: Allow norm() to
take a maxweight parameter. Update tests to check this works.
Thu Nov 26 12:54:33 GMT 2009 Richard Boulton <richard@tartarus.org>
* utils/make_xappy_tarballs: Update revision numbers.
Thu Nov 26 01:32:29 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py,xappy/cachemanager/xapian_manager.py,
xappy/unittests/cachemanager.py: Add a new iter_query_strs()
method.
Thu Nov 26 00:17:57 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/generic.py,xappy/cachemanager/queryinvert.py:
Fully integrate the numpy-based inversion - caching its results
for faster application to many subdatabases, and factoring it out
as a MixIn.
Wed Nov 25 10:59:19 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/__init__.py,xappy/cachemanager/generic.py,
xappy/cachemanager/xapian_manager.py: Release these files under
the MIT/X license.
Wed Nov 25 10:52:12 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/cachemanager/queryinvert.py: Some fairly efficient code to
invert lists of docids for queries, thanks to Shane for the
original implementation.
Tue Nov 24 11:23:02 GMT 2009 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py,utils/make_xappy_tarballs: Update for new
version of Xapian; main difference is a considerable improvement
in speed with the chert backend. WARNING: there appears to be a
compiler bug in GCC 4.2 which can cause searches using chert to
fail. If using GCC, I recommend using GCC 4.3 or later to avoid
this bug.
Tue Nov 24 10:16:31 GMT 2009 Richard Boulton <richard@tartarus.org>
* xappy/indexerconnection.py: Ignore missing docids when applying
the cached items to a database - this is likely to be deliberate.
Fri Nov 20 11:16:27 GMT 2009 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py: New version of xapian - should have improved
performance for chert when values are in use.
Thu Nov 19 15:41:50 GMT 2009 Richard Boulton <richard@tartarus.org>
* libs/get_xapian.py,utils/make_xappy_tarballs,
xappy/searchconnection.py: Link to new xapian packages. These
packages have a slightly different API for doing sorting, so
update searchconnection.py accordingly: this does mean that the
latest version of xapian is required to perform custom sorting,
so xapian and xappy need to be upgraded together here.
Mon Nov 09 04:45:55 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* docs/cachedresults.rst: Tweak final phrase.
Mon Nov 09 04:35:22 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* docs/cachedresults.rst: Documentation, and lots of analysis.
Mon Nov 09 04:09:28 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/perftest/cachemanager.py: Improve log messages marginally.
Mon Nov 09 03:05:36 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/perftest/cachemanager.py: Performance test which includes
search tests, and is more careful with some of the other tests.
Mon Nov 09 03:05:05 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Turn off debugging print.
Mon Nov 09 03:04:30 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/cachemanager/generic.py: Turn off debugging print.
Mon Nov 09 01:58:19 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/perftest/harness.py: Fix some bugs in the timers.
Sun Nov 08 11:27:35 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/cached_searches.py: Update test for new search
interface.
Sun Nov 08 11:21:28 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Implement query_cached() to return a
query which returns the cached items for a given queryid, instead
of the additional cached_query_str() parameter to search() - much
more flexible.
Sun Nov 08 11:20:27 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/cachemanager/__init__.py,xappy/cachemanager/xapian_manager.py:
Add CacheManager implementation which uses a secondary Xapian
database to perform the inversion.
Sun Nov 08 11:17:11 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/perftest/cachemanager.py,xappy/perftest/harness.py: Add
performance test harness, and a test for the cachemanager
performance.
Sun Nov 08 11:09:09 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Update to pass the docids as well as
the rank estimates to the cache.
Sun Nov 08 11:07:14 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/cachemanager/generic.py: Pull out the preparations for
iter_by_docid to a separate method, and fix remove_hits to allow
the supplied ranks to be mis-estimates.
Sun Nov 08 11:06:15 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/cachemanager.py: Extend test to check what
happens when the supplied ranks are over estimates.
Sat Nov 07 21:37:52 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/cachemanager.py: Allow iter_by_docid() to return
the query ids as something other than a list.
Mon Nov 02 17:53:59 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/cachemanager/generic.py,xappy/cachemanager/xapian_manager.py,
xappy/unittests/cachemanager.py: Modify the
KeyValueStoreCacheManager to be a UserDict.DictMixin subclass,
and use the standard __getitem__, etc, accessors. It doesn't
quite match the API contract of DictMixin because all keys are
always present as far as getitem and delitem are concerned: their
value is '' if they haven't already been assigned to, and
deleting just sets the value to ''. However, it's close enough
that this is useful for debugging purposes.
Mon Nov 02 17:15:08 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/cachemanager.py,xappy/cachemanager/__init__.py,
xappy/cachemanager/generic.py,xappy/cachemanager/xapian_manager.py:
Refactor cachemanager into a submodule, and separate the
xapian-based cache into a separate file, so we can do a
conditional import of it, handling the caes of Xapian not being
installed more cleanly.
Mon Nov 02 03:28:35 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Pass the xappy.Query to the
SearchResults object, not the xapian Query.
Mon Nov 02 03:14:55 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/cachemanager.py,xappy/indexerconnection.py,
xappy/searchconnection.py: Initial implementation of pre-cached
searches. Uses the value slots to hold information about the
ranks assigned by the pre-cached searches, which is relatively
inefficient on flint.
* xappy/unittests/cachemanager.py: Test of the cachemanager
implemetnation.
* xappy/unittests/cached_searches.py: Test that searches using the
cachemanager work correctly.
Sun Nov 01 21:10:10 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/unittests/docids.py: Allow a
Xapian document ID to be specified to the add(), replace(), and
delete() methods.
Sun Nov 01 21:07:32 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* test.py: Skip test modules which depend on missing modules,
displaying an appropriate message.
Sun Nov 01 21:01:54 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update branchpoint revision numbers.
Mon Sep 07 08:13:30 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix wildcard support; was omitting
a "self" when looking for the wildcard flags.
Fri Sep 04 14:04:00 GMT 2009 Charlie Hull <charlie@flax.co.uk>
* libs/build_xapian_win32.bat: new file to assist building under
Win32.
Thu Sep 03 12:05:34 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix
SearchConnection._get_freetext_fields - not sure we want to keep
this (it's not public, or called from anywhere), but might as
well make it work while we decide.
Tue Sep 01 13:53:06 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Updated packages with fixed windows build
system (hopefully).
Wed Aug 26 08:42:59 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: New packages with fixes for windows.
Mon Aug 24 16:02:34 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update branchpoint for windows
compilable tarballs.
* xappy/fieldactions.py: Fix for document.fields being an iterator.
Mon Aug 24 12:16:23 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: Updated tarballs -
should build for windows.
Fri Aug 21 12:17:29 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour.py: Add iter_buckets_rgb() to iterate through all
the buckets. This can be used to generate a target palette for
quantization.
Fri Aug 21 09:00:41 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour.py: Some more utility functions for colour
conversion, to let us get back from a bucket or from lab coords
to an rgb value for display.
Fri Aug 21 07:46:32 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update to get new packages.
Fri Aug 21 06:47:58 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour.py: import xapian-dependent stuff lazily, to make it
easier to use this module standalone.
Fri Aug 21 06:44:53 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* README: Note that scipy 0.7 or later is needed.
* xappy/colour.py,xappy/unittests/colour.py: Fix off-by-one
error for buckets at the top end of the range.
Wed Aug 19 11:06:35 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Apply untested patch for wildcards.
Mon Aug 17 23:44:38 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS: Update license of majority of code to MIT (but note that
it's still effectively GPL for now).
Mon Aug 17 18:13:02 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs,xappy/_checkxapian.py,
xappy/searchconnection.py,xappy/unittests/general1.py: Update for
new facet support interface in xapian (now in trunk).
Fri Aug 14 09:58:37 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update version numbers of
branchpoints.
Wed Aug 05 05:50:57 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* README: Mention where to get colormath from.
Wed Aug 05 05:40:39 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour_data.py: Fix not to use defaultdict.
Wed Aug 05 05:40:17 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: Fix to work with FieldGroups again.
Wed Aug 05 01:29:36 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour.py: Uncomment needed imports (which I'm not
currently sure how to satisfy).
Wed Aug 05 01:28:51 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour.py: Various updates and fixes.
Wed Aug 05 05:50:57 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* README: Mention where to get colormath from.
Wed Aug 05 05:40:39 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour_data.py: Fix not to use defaultdict.
Wed Aug 05 05:40:17 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: Fix to work with FieldGroups again.
Wed Aug 05 01:29:36 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour.py: Uncomment needed imports (which I'm not
currently sure how to satisfy).
Wed Aug 05 01:28:51 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/colour.py: Various updates and fixes.
Wed Aug 05 00:18:25 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS,docs/image.rst,xappy/colour.py,xappy/fieldactions.py,
xappy/searchconnection.py,xappy/unittests/colour.py: Merge in
changes from colour branch.
Tue Aug 04 21:39:23 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update revision numbers, and add
coloursim branch.
Tue Jul 28 17:16:32 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: remove postingsources branch - now
merged to trunk.
Sun Jul 19 15:46:43 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/: Remove replaylog functionality - it never fully worked,
and it's better to either use python's trace functions, or a
debug logging build of xapian.
Sun Jul 19 15:37:19 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,utils/make_xappy_tarballs,
xappy/_checkxapian.py,xappy/doctests/searchconnection_doctest2.txt,
xappy/doctests/searchconnection_doctest3.txt,xappy/fieldactions.py,
xappy/searchconnection.py,xappy/unittests/: Remove TAG feature -
it can be implemented (and more efficiently) using FACET.
Sun Jul 19 10:23:05 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py: Apply
trimming of the terms in PrefixedTermIterator after removing the
prefix, instead of before. Allows the special handling for
colons to work correctly.
Fri Jun 19 11:14:51 BST 2009 Tom Mortimer <tom@flax.co.uk>
* xappy/indexerconnection.py: Allow PrefixedTermIter to take an
explicit trim length (for removing prefix).
* xappy/searchconnection.py: Added optional starts_with parameter
to iter_terms_for_field().
Tue May 26 07:54:48 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Don't recalculate all the potentials
each time - just update the one which has changed. About 50%
faster in my tests.
Mon May 25 00:39:18 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py: Display
docid in error messages.
Sat May 23 08:29:08 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Estimate the initial utilities only on
the basis of the top 100 results, so that paging through the
results further doesn't result in an unstable ordering.
Fri May 22 13:16:30 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/unittests/diversity.py: Add
support for "collapse_max" parameter to search, to allow more
than 1 hit to be returned in each collapse category. Also, add a
new "reorder_by_collapse" to reorder the results to acheive
maximum diversity based on the hits returned.
Thu May 21 06:39:33 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/build_xapian.sh,libs/get_xapian.py: Update xapian packages.
Thu May 14 19:18:52 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py: Fallback to importing sha instead of
hashlib, for python 2.4 compatibility.
Mon May 11 10:49:56 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: New branchpoint and
new tarballs.
Fri May 08 08:37:09 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: New xapian tarballs.
Fri May 08 08:22:47 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: More updates to branchpoints.
Thu May 07 23:14:23 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update branchpoints, and remove the
old opsynonym branch, which has now been merged to trunk.
Wed May 06 11:20:13 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/query_all.py: Add test of new weight parameter
for query all.
Wed May 06 11:11:48 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix serialisation of query_difference
queries, and add an optional "weight" parameter to query_all().
Mon Apr 20 12:54:07 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py,xappy/unittests/field_associations.py: Add
"link_associations" parameter to STORE_CONTENT. Defaults to
True, but if False this stops associated field data being stored
as an extra item in the document contents. Use this if you don't
want to use relevant_data() with simple=False.
Wed Apr 15 13:06:41 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight.py: Remove unneccessary use of enumerate - thanks
to Shane Evans for spotting it.
Wed Apr 15 10:12:35 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Improve calculation of range terms to
use to use a tighter bound when it can.
Wed Apr 15 10:02:00 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight.py: Add internal function to highlighter to count
the number of words.
Tue Apr 07 10:13:59 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Use ranges from FACET action in
query_range(), if they're not available from the
SORT_AND_COLLAPSE action.
Tue Apr 07 10:02:31 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/range_accel.py,xappy/unittests/range_speed.py:
Update to test the new range acceleration implementation.
Tue Apr 07 10:00:47 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Improvements to the range
acceleration; use a superset of terms covering the range if
possible, and AND this with the range restriction. Also, if the
terms cover the range exactly, don't bother with the VALUE_RANGE
part.
Thu Apr 02 17:54:23 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight.py: Add a simple cache for stemming in
highlighters - improves performance noticeably in some cases (I
measured about a 20% improvement in highlighting speed for a
database in which the documents are produce descriptions,
compared to a 50% improvement if I turn stemming in highlighting
off completely).
Thu Apr 02 17:25:33 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight.py: Avoid parsing the query into terms repeatedly
when highlight is called for multiple fields with the same query.
Thu Apr 02 15:47:31 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Implement the allow and deny
parameters for relevant_data()
Wed Apr 01 11:07:25 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: Update with new
branchpoints and tarballs.
Fri Mar 27 13:41:17 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update xapian-extras packages.
Fri Mar 27 12:49:36 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/doctests/indexerconnection_doctest2.txt: Explicitly access
the hit data when trying to trigger DatabaseModifiedError - data
is now read lazily, so the test was failing with the new
tarballs.
Fri Mar 27 12:37:10 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Update reset() to init() in posting
source, and handle missing field action in _get_approx_params.
Fri Mar 27 12:31:26 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add "assume_single_value" parameter to
_cluster(), to let it know that all fields in use are
single-valued. Also, add support for doing range acceleration
searches using terms and values where the terms cover the range,
and the values are used to do a second-pass restriction.
Fri Mar 27 10:06:16 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update with new tarballs.
Fri Mar 27 10:04:59 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/build_xapian.sh: Set program-suffix to empty when building:
possibly we should use a special xappy suffix by default, but
we're installing locally in this script, anyway.
Fri Mar 27 09:29:29 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add support for new clustering method
for single-valued single fields, reading from a value slot; use
it automatically if index strategy makes this possible.
Fri Mar 27 09:26:34 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/cluster.py: Add test for clustering.
Fri Mar 27 08:29:08 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Stop chasing a moving target - set the
trunk_revision to build tarballs for explicitly.
Tue Mar 24 14:48:12 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Catch DatabaseModifiedError in
_load_config() call to get_metadata(), too.
Tue Mar 24 14:44:07 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Catch DatabaseModifiedError in
get_metadata(), and reopen and retry.
Wed Mar 11 15:28:38 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,xappy/unittests/general1.py,
xappy/unittests/testdata/chert_db/iamchert,
xappy/unittests/weight_action.py: Update expected weights in
remaining tests, and the format of the sample chert database, to
work with new tarballs. Allows removal of a hacky "2.0" in the
weight_action test.
Wed Mar 11 15:20:44 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* docs/queries.rst: Update with value of
get_max_possible_weight() returned by new xapian tarballs; these
provide tighter bounds.
Wed Mar 11 15:20:12 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Avoid using deprecated form of
set_sort_by_key_then_relevance.
Wed Mar 11 13:52:19 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update with new packages (core and bindings,
old extras packages should be compatible).
Wed Mar 11 11:37:00 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update branchpoint numbers.
Fri Mar 06 05:48:28 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/build_xapian.sh,libs/get_xapian.py: Update with new
tarballs, and add script to build and install (locally) all the
tarballs.
Thu Mar 05 16:20:01 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/distance.py: Fix completely broken distance test.
Thu Mar 05 16:10:02 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/doctests/searchconnection_doctest2.txt,
xappy/unittests/general1.py: Update for new tarballs, and move
some tests from doctest to unittest so they can be handled more
neatly.
Thu Mar 05 16:09:44 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Update for new xappy tarballs.
Thu Mar 05 10:43:36 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: More updates to base revisions.
Thu Mar 05 10:26:02 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Fix base revision for postingsources.
Thu Mar 05 08:29:02 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Add postingsources branch.
Wed Mar 04 16:35:16 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/imgseek.py: Add buckets parameter, and some
useful commented out debug prints.
Wed Mar 04 16:34:22 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: add "buckets" parameter to IMGSEEK action,
to allow the number of buckets to be controlled.
Wed Mar 04 14:59:16 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/imgseek.py: Add (commented out) debugging prints.
Tue Mar 03 17:10:16 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py,
xappy/indexerconnection.py,xappy/searchconnection.py,
xappy/unittests/imgseek.py: Add an alternate, approximate but
much faster, method of doing the image similarity searches.
* perftest/imgidx.py,perftest/imgsearch.py: Add some simple
performance tests for image searches.
Tue Mar 03 15:48:30 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* docs/image.rst: Documentation for the imgseek feature.
Mon Mar 02 13:48:01 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update branchpoints, and remove
valuemapsource branch which has been merged to trunk.
Fri Feb 27 14:23:26 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/_checkxapian.py,xappy/fieldactions.py,
xappy/searchconnection.py,xappy/unittests/imgseek.py: Add new
IMGSEEK field action, and corresponding query_image_similarity()
method. Also add test case. All requires a new set of xapian
pacakges, which are not yet available other than from SVN - this
feature is still under heavy development.
Fri Feb 27 13:05:03 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/testdata/sampleimages/candle.jpg,
xappy/unittests/testdata/sampleimages/cat.jpg,
xappy/unittests/testdata/sampleimages/looroll.jpg: Add some
sample data for testing image search.
Fri Feb 20 10:39:36 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py,xappy/unittests/calc_hash.py: Add
calc_hash() method to ProcessedDocument, returning a hash to use
to avoid reindexing documents which have not changed. Patch from
Pablo.
Fri Feb 20 10:21:03 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py,xappy/indexerconnection.py,
xappy/unittests/store_only.py: Patch from Pablo, with minor
tweaks from me, to add a store_only parameter to the methods
which process documents which only runs the STORE_CONTENT
actions.
Wed Feb 18 04:55:16 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/unittests/field_groups.py: Fix to
previous patch (thanks to Pablo, again).
Tue Feb 17 19:21:53 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/unittests/field_groups.py:
Another patch from Pablo: this one replaces the old, not very
handy, "group" parameter to relevant_data() with a
"groupnumbers" parameter, which returns the group number for each
bit of returned data.
Tue Feb 17 19:16:00 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/doctests/indexerconnection_doctest1.txt: Update doctest to
work with previous patch.
Tue Feb 17 19:13:25 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/errors.py,xappy/indexerconnection.py,
xappy/unittests/indexer_errors.py: Another patch from Pablo
Hoffman, this time to add a specific error type for reporting
that an ID has already been used.
Tue Feb 17 15:31:39 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py,xappy/unittests/field_groups.py: Patch
from Pablo Hoffman to return a dictionary of the groups, keyed by
group number.
Fri Feb 13 23:17:42 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS,xappy/datastructures.py,xappy/searchconnection.py,
xappy/unittests/field_groups.py: Patch and unittest for bug with
field groups which contain duplicated data. Thanks to Pablo
Hoffman for the fix and test.
Fri Feb 13 10:31:04 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/distance.py: Extend unittest to cover
max_range parameter.
Thu Feb 12 09:37:57 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/sortdatabase/README: Fix typo.
Wed Feb 11 23:08:17 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update version of packages to get.
Wed Feb 11 22:35:21 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/sortdatabase/README: Document the
"reversed" parameter to make_order.py
Wed Feb 11 21:23:55 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/sortdatabase/README: Add instructions for
sortdatabase.
Wed Feb 11 18:59:17 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/sortdatabase/Makefile,
external_posting_source/sortdatabase/make_order.py,
external_posting_source/sortdatabase/sortdatabase.cc: Much faster
sortdatabase code. Reads documents in order, in big batches,
writes the documents to batch files, then reads each batch file,
sorts it, and writes the documents in it.
Fri Feb 06 11:55:51 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/sortdatabase/sortdatabase.cc: Separate
reading the documents from writing them, in the hope of a
performance boost. Doesn't seem to help much, but committing the
code in case I'm missing something.
Tue Jan 20 10:18:28 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/field_associations.py: Add test for relevant_data
when allow_field_specific is False.
Mon Jan 19 17:45:02 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/sortdatabase/Makefile,
external_posting_source/sortdatabase/sortdatabase.cc,
external_posting_source/sortdatabase/test_sortdatabase.py: Add
sortdatabase command, to sort a database by an external list of
orders.
Mon Jan 19 14:18:37 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update versions of xapian packages.
Mon Jan 19 14:17:02 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* coverage.py: Update coverage version.
Tue Jan 13 17:10:47 GMT 2009 Tom Mortimer <tom@lemurconsulting.com>
* xappy/searchconnection.py: Switched to using SORTABLE action for
query_valuemap, avoiding format problems of FACET.
* xappy/tests/valuemapsource_1: Added test cases for simple queries
with and without default weights
Tue Jan 13 14:58:00 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Don't forget to build SWIG before
using it!
Tue Jan 13 14:39:35 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Replace accidentally removed line.
Tue Jan 13 14:35:39 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Use $HOME/src/xappy if $HOME/xappy
doesn't exist, to please Tom. Include the branchpoint revision
numbers in patches to avoid confusion when those change. Make
xappylibdir if it doesn't already exist.
Tue Jan 13 13:20:23 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update package build script to use the
last-changed revision number on each branch for the branch diffs
file, to avoid having to refetch all of them on any commit
anywhere in the xapian tree.
Tue Jan 13 13:03:38 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Remove trailing whitespace, and tidy
some documentation comment formatting. Validate the weights
passed to query_valuemap(), and return a xappy.Query() object
instead of a xapian.Query() object from it, to keep a reference
to the postingsource alive, and also to enable query
serialisation and combination syntax.
Tue Jan 13 12:55:58 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: In _get_eterms with an non-indexed
document, ensure that the document ID is set back to its original
value even if an exception happens in doc.prepare().
Mon Jan 12 15:08:09 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Include swig in the export. Remove
search-xapian and xapian-applications from the export, since
they're not needed for xappy.
Mon Jan 12 14:55:46 GMT 2009 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Updated script, which doesn't require
lots of pre-existing checkouts.
Fri Dec 26 15:48:27 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py: Update to
use the close() method on xapian databases, when this is finally
implemented (it's just been added to xapian trunk, but not yet
made its way into packages).
Tue Dec 23 00:06:15 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/query_id.py,xappy/unittests/query_id_test.py:
Rename to query_id.py, and test cases where only a single ID is
supplied.
Tue Dec 23 00:04:15 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add query_id() function to search for
documents with a particular ID.
* xappy/unittests/query_id_test.py: Add test of query_id().
Mon Dec 22 20:06:09 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py: Add ProcessedDocument.get_distance() to
get the distance from a document to a point, or to another
document.
* xappy/searchconnection.py: Add SearchConnection.calc_distance()
to get the distance between two points. This is a static method,
because there's nothing particular to a SearchConnection used in
it, but it seems like the most natural place for it to go at
present.
Mon Dec 22 19:27:12 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/field_associations.py: Add test that
grouped_data[0] is the same as data, when there are no groups.
Mon Dec 22 19:06:31 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py,xappy/searchconnection.py,
xappy/unittests/field_groups.py: Move grouped_data into
ProcessedDocument, and change it into a property, to match data.
Mon Dec 22 18:50:51 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add doccomment for query_distance().
Mon Dec 22 17:43:35 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py,xappy/searchconnection.py,
xappy/unittests/field_groups.py: Add SearchResult.grouped_data()
method, to return data grouped by FieldGroup.
Mon Dec 22 15:20:13 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Use new xapian packages, which include
geospatial stuff.
* docs/introduction.rst,xappy/doctests/searchconnection_doctest2.txt:
Fix various tests to cover fixes in the similarity search
algorithms in xapian core - various terms which didn't really
contribute to the similarity are no longer produced by it.
* xappy/searchconnection.py: Change sort-by-distance to sort in
increasing distance order, rather than decreasing, and keep
references to the bits used by the DistancePostingSource.
* xappy/unittests/distance.py: Enable test of query_distance().
Mon Dec 22 14:16:07 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update branchpoints, and add
geospatial branch.
Mon Dec 22 14:14:44 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: Note a potential performance problem.
Sat Dec 06 18:28:43 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add "group" option to relevant data,
which causes all data in a group which is partly relevant to be
returned.
* xappy/unittests/field_associations.py: Test new syntax for
building up documents.
* xappy/unittests/field_groups.py: Test group option for
relevant_data().
Sat Dec 06 18:01:20 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py: Allow a FieldGroup, or a set of
parameters for making a FieldGroup, to be passed to extend().
Sat Dec 06 18:00:44 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fields.py: Correct a typo in a comment.
Fri Dec 05 11:39:41 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* setup.py: Bump version to 0.5.1, since it's significantly changed
since 0.5, and we should probably make a release soon.
* xappy/: Add GEOLOCATION action, for storing latitude-longitude
coordinates, and allowing ranking based on distance. Add sort by
distance method.
* xappy/unittests/difference.py,xappy/unittests/distance.py:
Move distance.py to difference.py, and add new tests for
geolocation, field groups, and convenience methods on
UnprocessedDocument.
Fri Nov 21 10:09:32 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/field_groups.py: Missed this file from previous
commit.
Fri Nov 21 10:03:26 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/__init__.py,xappy/datastructures.py,
xappy/doctests/indexerconnection_doctest1.txt,
xappy/doctests/indexerconnection_doctest2.txt,
xappy/doctests/indexerconnection_doctest3.txt,xappy/,
xappy/unittests/docbuild.py: Add shortcut interface for adding
fields to document (new UnprocessedDocument.append() and extend()
methods). Add support for FieldGroups, used to group field data
instances together, to allow the summarisation code to return
fields associated to that which are relevant to the search (in
relevant_data()).
Changes datastructures on disk - stored documents now have an
extra key. Should be backwards compatible (and this is tested) -
ie, new versions of xappy should be able to read old databases.
However, old versions of xappy won't be able to read databases
created by this version.
Tue Nov 18 22:40:31 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add support for slicing the
SearchResults object, as suggested in issue #25 by user "kapilt".
* xappy/unittests/searchresults_slice.py: Add test for slicing the
SearchResults object.
Tue Nov 18 21:51:35 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS: Credit Michael Elsdörfer.
* xappy/searchconnection.py: Add support for sorting by multiple
keys (inspired by the implementation supplied by Michael in issue
#24).
* xappy/unittests/sort.py: Add Michael's test of sort by multiple
(and single) keys.
Tue Nov 18 21:45:07 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/distance.py: Add copyright header, and remove
unneccessary import of xapian.
Fri Oct 31 21:06:57 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update to download new packages (11600).
Fri Oct 31 21:04:30 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/HOWTO: Updated and finished instructions.
Fri Oct 31 20:53:52 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* external_posting_source/HOWTO,
external_posting_source/mypostingsource.h: Add nearly working
details of how to compile an external posting source.
* utils/make_xappy_tarballs: Update version numbers.
Fri Oct 31 11:37:15 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/exact_index_terms.py: Add unittest which I forgot
to commit a while ago.
Fri Oct 31 11:27:07 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update to new version of packages (11584).
* xappy/searchconnection.py: Use the newly added
FixedWeightPostingSource to provide weights for
query_difference(), to fix regression introduced with the last
xapian packages (due to query_all() no longer returning non-zero
weights). This is a more efficient implementation, anyway, since
the matcher will usually not need to check for documents
existing.
Thu Oct 30 21:20:30 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: Update xapian
tarballs with changes in xapian trunk to revision 11580.
Mon Oct 06 12:27:29 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* docs/queries.rst,xappy/searchconnection.py: Add
query_difference() to product a query which ranks documents
according to how close to a particular value one of the fields
is.
* xappy/unittests/distance.py: Add tests for query_difference.
* AUTHORS: Mention that Paul did this.
Sun Sep 28 15:52:10 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update to use new packages.
Fri Sep 26 04:25:48 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Updated xapian packages.
Thu Sep 25 18:04:15 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Expand a FIXME.
Thu Sep 25 09:07:08 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Fix add_subfacet() to not repeatedly
add a parent which is already there. Should have no functional
effect, but should keep the size of the configuration down.
Fri Sep 19 10:55:29 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add checks to ensure that weighting
parameters are in the allowed ranges.
Thu Sep 18 00:55:28 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Allow getting a document based on
the xapian docid, instead of the external unique ID used by
xappy.
Tue Sep 09 17:15:59 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/field_associations.py: Extend the test a bit
further (it still passes!).
Tue Sep 09 17:01:19 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py: Add optional "weight" member to Field:
defaults to 1.0, not currently used anywhere other than in the
relevant_data() calculation for associated fields.
* xappy/fieldactions.py: Store the weight values specified. Also,
modify STORE_CONTENT to simply increase the stored weight if a
field is repeated multiple times, rather than storing it multiple
times. Change in database format for existing databases which
use field assocs (but this feature hasn't been in a release yet,
anyway).
* xappy/searchconnection.py: Use weights stored in field actions
to make relevant_data() return fields in a sorted (descending)
order of relevance, and to return the data within each field in
descending order of weight. Also, fix a bug with query_range()
when both endpoints were None - previously, this returned an
"all documents" query - now it returns an "all documents which
have an entry in the slot" query, which is more logically
consistent.
* xappy/unittests/field_associations.py: Test repeated, and
weighted, field associations.
Thu Sep 04 14:11:41 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Basic documentation on field associations.
Thu Sep 04 13:53:18 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/testdata/chert_db/iamchert: Update test data for
latest chert format.
Thu Sep 04 13:18:47 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: Update for new
versions of xapian.
Tue Sep 02 15:51:57 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/range_accel.py: Add some further tests of the
range acceleration prefix handling. Also, tidy up some
whitespace issues.
Mon Sep 01 12:03:59 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Fix iteration of terms in fields for
which there is an empty term.
* xappy/unittests/terms_for_field.py: Add regression test for empty
field behaviour.
Fri Aug 29 17:08:49 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Updated tarball (fixes a the bug with tags
better).
Fri Aug 29 16:32:14 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/tag_prefixes.py: Regression test for bug with the
tagspy.
Fri Aug 29 15:57:48 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add workaround for a bug in the
tagspy.
Fri Aug 29 14:51:20 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: Handle repeated adding of sort or facet
field actions with range parameters correctly. Previously, a new
prefix was chosen each time, breaking accelerated searches for
that field.
* xappy/unittests/range_accel.py: Test repeated sort and facet
field actions.
Fri Aug 29 11:22:38 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/dump_field_actions.py: Output the slot and prefix
information, too.
Fri Aug 29 10:58:45 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/dump_field_actions.py: Sort the fields, and allow a
specific field to be requested.
Fri Aug 29 10:39:27 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/dump_field_actions.py: Add script to dump the field actions
for a database.
* utils/replay_search_log.py: Correct copyright date.
Fri Aug 29 09:54:01 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/replay_search_log.py: Add utility to replay a search log,
and time it.
Fri Aug 29 09:24:33 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py: Copy the __serialised member correctly when a
Query is copied.
Fri Aug 29 09:08:06 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix indentation.
Thu Aug 28 18:18:53 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add "internal" interface to allow
building ELITE_SET queries from raw xapian terms.
Thu Aug 28 17:29:24 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py,xappy/unittests/query_serialise.py: Add support
and tests for serialising the result of xappy.Query()
Thu Aug 28 13:49:57 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py,xappy/searchconnection.py: Add methods to allow a
query to be serialised and unserialised.
* xappy/unittests/query_serialise.py: Test the new query
serialisation stuff.
Tue Aug 26 16:59:54 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: Fix generation of field associations for
EXACT and FACET fields - was forgetting to add the term prefix.
* xappy/searchconnection.py: Avoid returning relevant data items
more than once.
Tue Aug 26 09:07:57 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py: Remove a bit of debugging code which
broke backwards compatibility.
Tue Aug 26 08:35:07 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py,
xappy/doctests/indexerconnection_doctest1.txt,
xappy/doctests/indexerconnection_doctest2.txt,
xappy/doctests/indexerconnection_doctest3.txt,xappy/,
xappy/unittests/field_associations.py: Extend Field data
structure with a new member (and optional constructor parameter)
to store "associated" data. This data will be returned in the
document data field instead of the original data, and is intended
to allow particular return data to be associated with a piece of
input data. Documentation and examples still needed. Also, add
new method to SearchResult to return a list of relevant fields,
and the corresponding piece of data for them.
Wed Aug 20 16:14:06 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Ensure that _next_docid is initialised
so that searches on empty databases with no configuration don't
fail.
* xappy/unittests/emptydb_search.py: Add regression test for this
problem.
Wed Aug 20 12:31:38 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* build.py,setup.py,test.py,xappy/,xappy/unittests/: Remove
trailing whitespace.
Wed Aug 13 16:20:35 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Change get_terms_for_field() to
iter_terms_for_field() (and make it iterate the terms, rather
than returning them in a list, too).
* xappy/unittests/get_terms_for_field.py,
xappy/unittests/terms_for_field.py: Rename get_terms_for_field.py
to terms_for_field.py and expand it a bit.
Wed Aug 13 15:46:35 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Add support for prefixes longer than
1 character to PrefixedTermIter.
Wed Aug 13 14:14:01 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/unittests/get_terms_for_field.py:
Whitespace tidyup.
Wed Aug 13 12:36:58 GMT 2008 Tom Mortimer <tom@lemurconsulting.com>
* xappy/searchconnection.py: Added get_terms_for_field().
* xappy/unittests/get_terms_for_field.py: Test
get_terms_for_field().
Wed Aug 06 07:11:23 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py,
xappy/unittests/dociter.py: Add support for iterating through all
the documents to get a list directly, rather than having to call
get_document() on every item returned by iterids().
Wed Aug 06 07:10:57 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/range_accel.py,xappy/unittests/range_speed.py:
Add standard copyright headers.
Wed Aug 06 07:09:09 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* docs/queries.rst: Add a placeholder for documenting
query_external_weight().
Tue Aug 05 14:21:03 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Explain why we multiply the range
accel searches by 0.
Tue Aug 05 11:11:39 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/unittests/similar.py: Add test
case for similarity search with a non-indexed document, and fix
the remaining problems which that threw up.
Tue Aug 05 09:35:16 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/range_speed.py: Fix test to make problem more
obvious.
Tue Aug 05 08:08:12 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py: Convert
the IndexerConnection._allocate_id() method into a function, so
that SearchConnection can use it too. Use it in SearchConnection
to allocate IDs in the temporary database correctly.
Tue Aug 05 08:02:19 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/searchconn_process.py: Add comment explaining
test case.
Tue Aug 05 07:37:49 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Support ProcessedDocument or
UnprocessedDocument objects in the list of ids for
query_similar(). As yet untested and undocumented.
Tue Aug 05 06:22:51 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/__init__.py,xappy/fieldactions.py,xappy/searchconnection.py,
xappy/unittests/searchconn_process.py,
xappy/unittests/weight_external.py: Add support for using an
external source of weight information in searches. This is a bit
slow, so probably only useful for small databases, or for
off-line testing. Also, add support for using a SearchConnection
to process a document (ie, translating an UnprocessedDocument to
a ProcessedDocument) - this can be handy in various odd
situations.
Mon Aug 04 17:19:41 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/range_speed.py: Convert to use range search
instead of facet ranges, to work with xapian SVN trunk.
Mon Aug 04 07:29:01 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/doctests/searchconnection_doctest1.txt,
xappy/doctests/searchconnection_doctest2.txt: Fixes needed for
the new xapian packages: percentage rounding has changed again,
and the error message when a database can't be found has been
changed (for the better).
Mon Aug 04 07:26:18 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update with details of new xapian packages.
Mon Aug 04 06:34:56 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update with latest branchpoints.
Sun Aug 03 20:39:26 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/range_speed.py: Add another test case - this
checks that some searches which should return the same values do
so. More usefully, parts can be commented out to display the
speed of the various kinds of range searches.
Sun Aug 03 20:38:54 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/range_accel.py: Some formatting fixes.
Sun Aug 03 12:10:42 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS: Update copyright date, and mention Paul Rudin.
* docs/queries.rst: Make clear that approximate range terms can be
used to speed up exact range searches.
Sun Aug 03 11:57:37 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py,xappy/searchconnection.py,
xappy/unittests/range_accel.py: Add "range acceleration" terms,
to allow approximate ranges to be stored in documents. This
allows fast, but approximate, range searches to be performed, and
also allows such a search to be combined with an exact range
search to speed up the range search.
* docs/queries.rst: Document the approximate range queries.
Sun Aug 03 11:26:32 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* test.py: Add some more useful facilities to the environment that
doctests are run in.
Sun Aug 03 11:25:18 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/weight_action.py: Make the comment on
test_regression() more explanatory.
Tue Jul 29 23:09:16 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: For INDEX_EXACT, TAG and FACET term
queries, multiply the weight by 0 to tell xapian that the weight
is always zero, and allow better optimisation.
* xappy/doctests/indexerconnection_doctest2.txt: Update expected
query to correspond to this change.
* xappy/doctests/searchconnection_doctest2.txt: Modify expected
percentage weight to align with recent rounding fixes in xapian's
percentage weight calculation routines.
Fri Jul 18 12:57:56 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py: If a query is empty, don't require that it has a
search connection in get_max_possible_weight() - we know that the
maximum weight of an empty query is 0, and relaxing this
restriction makes it more convenient to write code.
Thu Jul 10 13:30:12 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Updated versions of the tarballs (difference
since the previous versions is some build system updates and a
performance improvement).
* utils/make_xappy_tarballs: Update branchpoint version numbers.
Wed Jul 09 15:12:40 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Yet another version, since the last one
didn't fix autoreconf either.
Wed Jul 09 14:49:35 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: And another upload, this time to fix
autoreconf.
Wed Jul 09 14:14:28 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: And another set of updated packages!
Wed Jul 09 11:55:40 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: New version to fix segfault.
Wed Jul 09 08:23:21 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Updated packages.
Fri Jul 04 17:10:46 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: Point to updated
tarballs, which fix an assertion failure in some situations, and
update version numbers in tarball generation script accordingly.
Wed Jul 02 11:53:48 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix for issue #21:
"SearchConnection._get_prefix_from_term fails when terms contain
numbers", as supplied by Shane Evans.
Wed Jul 02 11:53:07 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/facet_hierarchy_1.py: Add regression test for
issue #21: "SearchConnection._get_prefix_from_term fails when
terms contain numbers".
Wed Jul 02 11:00:09 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* setup.py,test.py,xappy/test.py: Move test.py to top level, for
easier running.
Thu Jun 26 12:18:02 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* docs/queries.rst: Further documentation on how to use queries.
Thu Jun 26 10:44:09 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py: Add Query.empty(), with the same semantics as
xapian.Query.empty().
Thu Jun 26 06:38:13 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* docs/queries.rst: Add new document describing how to use Query
objects. Some FIXMEs in sections which are still to be written.
* xappy/test.py: Add some useful symbols to the namespace in which
tests are run. Test weighting.rst and queries.rst
* build.py: Build queries.html
* docs/introduction.rst: Change some examples to use the new method
of combining queries.
Thu Jun 26 06:35:18 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py: Add methods to get the maximum possible weight,
normalise the weight of a Query, and to perform a search, to
Query().
* xappy/searchconnection.py: Supply self to newly constructed
Query() objects so they can keep track of the connection. Allow
weight_params to be specified to get_max_possible_weight(), so
that the return value is compatible with that.
Wed Jun 25 21:32:30 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py: Code tidyup. Also, make queries keep track of
the connection they were obtained from: complain if an attempt to
combined queries from different connections is made. This is
preparation for allowing the search to be performed as a method
of Query.
Wed Jun 25 13:53:02 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py: Allow division of queries by a number (equivalent
to multiplying by 1.0 / number).
Wed Jun 25 13:36:30 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/query.py: Add &, | and ^ operators for queries.
Wed Jun 25 12:09:21 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/: Move implementation of Query() to a separate file, and
remove the __iter__() method, since it returns raw xapian terms,
which we don't want to expose in the user interface.
Wed Jun 25 11:33:27 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/weight_action.py: Modify regression test so that
it actually tests the problem.
* xappy/searchconnection.py: Add new Query object, which keeps
track of references to necessary extra objects. Use this to
ensure that the PostingSource for a weight query doesn't get
forgotten before the query does. This class also provides an
alternative, and hopefully nicer, way to compose queries. Modify
all methods which create queries to return xappy.Query objects
instead of xapian.Query objects, and add support everywhere for
being supplied with either xapian.Query objects or xappy.Query
objects. Documentation of this, other than doccomments, is
not yet written.
* xappy/__init__.py: Add Query object to external names.
* xappy/highlight.py: Add understanding of xappy.Query() objects.
Tue Jun 24 17:47:41 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/weight_action.py: Add an attempt to reproduce a
reported segfault.
Tue Jun 24 13:28:21 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update with latest version numbers.
* xappy/unittests/: Move the common parts of the existing tests
into a new "xappytest" file, and tidy up the tests a bit. Also,
add license headers to each test.
Wed Jun 11 13:36:02 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/unittests/db_type1.py,
xappy/unittests/db_type_compat1.py: Add dbtype parameter to
IndexerConnection constructor, allowing the type of newly created
databases to be specified.
Thu Jun 05 06:06:49 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add a new parameter, "weight_params",
to the search() method, to allow the weighting system to be
modified.
* docs/weighting.rst: Document the available weighting parameters.
* build.py: Add weighting.rst to the documentation build.
* xappy/unittests/weight_action.py: Modify the path so this test
can be run standalone.
* xappy/unittests/weight_params.py: Add test of the weight_params
parameter to search().
Wed Jun 04 14:10:59 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/unittests/db_type_compat1.py:
If a database exists, but isn't a flint database, catch the
DatabaseOpeningError and try to open it anyway. Add test for
this behaviour.
Wed Jun 04 14:08:33 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/unittests/testdata/chert_db/,
xappy/unittests/testdata/flint_db/: Add some empty databases in
chert and flint formats for testing.
Wed Jun 04 07:46:31 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py,
xappy/unittests/facet_hierarchy_1.py,
xappy/unittests/testdata/old_facet_db/: Fix bug in the backwards
compatibility code for facet hierarchies. Add a database in the
old format, and a test which uses this to check the backwards
compatibility code.
Fri May 30 11:00:59 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py,
xappy/unittests/facet_hierarchy_1.py: Allow multiple parent
facets to be defined for each child.
Mon May 12 19:58:34 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Add documentation of the WEIGHT action.
* xappy/fieldactions.py: Convert field values for WEIGHT fields to
float before storing them.
Mon May 12 18:32:14 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: Fix typo.
* xappy/searchconnection.py: Add support for ranking search results
using weight fields. Also, add get_max_possible_weight(query) to
assist in balancing the weights returned by different query
components.
* xappy/unittests/weight_action.py: Add test of "WEIGHT" action.
Mon May 12 13:16:33 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs,xappy/_checkxapian.py,
xappy/doctests/searchconnection_doctest2.txt,xappy/fieldactions.py,
xappy/indexerconnection.py,xappy/searchconnection.py: Add links
to new xapian packages with support for ValueWeightPostingSource.
Tidy up error handling in search and indexer connections if an
error occurs while reading the configuration. Update a test case
which has slightly different behaviour with the new xapian. Add
support for "WEIGHT" action for fields when indexing (no
corresponding support at search time yet, so this isn't useful
currently).
Wed May 07 22:34:59 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* setup.py,testsuite/coverage.py,testsuite/runtests.py,xappy/test.py:
Reorganise test runners, so that "eggsetup.py test" actually
works.
Wed May 07 14:25:54 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* testsuite/coverage.py: Update with changes from latest upstream.
Wed May 07 14:13:04 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* setup.py,testsuite/runtests.py,testsuite/unittests/,xappy/,
xappy/unittests/: Move doctests and unittests to separate
directories under xappy, make unittests into a module, and
incorporate running the unittests into the main testsuite runner.
Tue May 06 15:39:03 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* secore/,testsuite/runtests.py: Remove secore backwards
compatibility layer - we're about to break compatibility anyway,
so this will shortly be completely useless.
Tue Apr 29 08:25:59 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* MANIFEST.in,setup.py: Update setup.py and MANIFEST ready for 0.5
release.
Tue Apr 29 07:58:10 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: Update scripts to
get custom version of xapian to latest version.
Mon Apr 28 23:25:46 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/: Add check that xapian is at at least version 1.0.6; raise
ImportError at import time if version is too old. Add checks for
a version of xapian with sufficient features to support tags and
facets, and disable those features if they're not present: an
exception will be raised when tag or facet features are used if
the xapian version is too old.
Fri Apr 25 12:23:09 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* testsuite/runtests.py: Copy fixes from xapian_1.0 branch to make
the testsuite pass on windows.
Thu Apr 17 23:07:35 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* testsuite/unittests/freetext_1.py: New unit test for the
search_by_default and allow_field_specific fields.
Thu Apr 17 22:04:34 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,xappy/fieldactions.py,
xappy/searchconnection.py: Add allow_field_specific and
search_by_default flags to INDEX_FREETEXT action.
Sat Mar 29 16:59:23 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix some "foo if bar" constructions
which broken python2.4. Should now work with python 2.4 again.
Wed Mar 26 08:14:27 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/fieldactions.py: Tidy up the imports in this file, too.
Wed Mar 26 07:52:47 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/__init__.py,xappy/datastructures.py,
xappy/indexerconnection.py: Remove several "import *" lines from
__init__.py, replacing them by importing the specific symbols
desired. Remove the nasty renaming of imported symbols in the
files thus imported, since this was to work around polluting the
namespace when "import *" was used.
Tue Mar 25 12:46:33 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* testsuite/unittests/facet_query_type_1.py,xappy/indexerconnection.py,
xappy/searchconnection.py: More facet selection improvements from
Tom Winch: allow a set of associations between query types and
facets to be stored in the database configuration, and use these
facets to either prevent or prefer certain facets from being
chosen for a particular query type. (Query types are specified
by an additional parameter to the search() method.)
Wed Mar 19 01:42:29 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py: Remove
backwards compatibility support for reading fieldactions from a
file. It just makes the code more complex, and xappy really
needs to use a more recent version of xapian.
Tue Mar 18 21:34:29 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS: Add Tom Winch.
Tue Mar 18 21:29:25 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* testsuite/unittests/facet_hierarchy_1.py,xappy/indexerconnection.py,
xappy/searchconnection.py: Add support for defining a facet
heirarchy, for use when selecting facets. Not yet used, but is
stored in the database configuration, and available to both the
indexer connection and the search connection.
Mon Mar 17 16:46:33 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix setting of stemmer so that it
still works with replaylog enabled.
Thu Feb 21 13:12:03 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update to new tarballs which actually work,
this time.
Thu Feb 21 01:54:18 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Revert to earlier tarballs - the new ones
don't work.
Thu Feb 21 01:50:17 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update with new tarballs.
Thu Feb 21 01:23:46 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update the branchpoint version
numbers.
Tue Feb 05 09:59:50 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Upgrade version of xapian used to one which
contains database replication functionality.
Tue Feb 05 09:45:27 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py,
xappy/searchconnection_doctest3.txt: Add interface to
IndexerConnection for setting and getting metadata, and interface
to SearchConnection for getting metadata.
Mon Feb 04 00:58:05 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Cope with a facet being declared as
SORTABLE, but without a type, but of facet type float. (Treat the
facet search as a numeric range, correctly - used to fail to
serialise the numbers correctly.)
* xappy/searchconnection_doctest2.txt: Add regression test.
Sat Feb 02 17:59:24 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Add simple work-around for synonyms -
allow a field to be specified for the original word separately
from the synonym. Needs tidying up, but allows slightly more
flexibility in synonyms.
* xappy/searchconnection_doctest2.txt: Adjust test accordingly.
Sat Feb 02 13:14:29 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py: Backwards
compatibility fix for reading the metadata: if the config isn't
in the metadata key, or metadata isn't supported by the version
of xapian in use, read it from the file. When writing the
config, if it can't be stored in the metadata, store it in a
file. A database can now be upgraded to use the new metadata
method simply by opening an indexerconnection on it, and then
closing it.
Sat Feb 02 12:47:33 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Tidy up tarball making script.
Sat Feb 02 12:44:59 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add faster implementation of expand
decider, using a regexp for the prefixes.
* xappy/searchconnection_doctest2.txt: Test it.
Mon Jan 28 15:13:04 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py,xappy/searchconnection.py: Change the
storage of the settings from a file in the database directory to
be in a metadata. This change allow the forthcoming replication
support to copy databases without losing their settings, and
should also be helpful when we implement remote database support.
Wed Jan 23 22:27:50 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Even newer xapian tarball - containing more
fixes from charlie for windows.
Wed Jan 23 16:04:12 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py,utils/make_xappy_tarballs: Update xapian
tarballs - mainly to get fixes for the build system on windows.
Thu Jan 10 00:28:26 UTC 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: Update to get latest archives.
Thu Jan 10 00:04:51 UTC 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update with new branchpoint.
Wed Jan 09 22:50:36 UTC 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update with version numbers for latest
branches.
* libs/get_xapian.py: Update with details of latest tarballs, which
include OP_VALUE_GE and OP_VALUE_LE.
* xappy/searchconnection.py: Allow None to be specified as the
begin or end or a range query - allows half ranges to be
specified.
* xappy/searchconnection_doctest2.txt: Test passing None as the end
parameters of a range query.
Wed Jan 09 22:46:31 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update with version numbers for latest
branches.
* libs/get_xapian.py: Update with details of latest tarballs, which
include OP_VALUE_GE and OP_VALUE_LE.
* xappy/searchconnection.py: Allow None to be specified as the
begin or end or a range query - allows half ranges to be
specified.
* xappy/searchconnection_doctest2.txt: Test passing None as the
end parameters of a range query.
Mon Jan 07 19:47:36 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* libs/get_xapian.py: New script (taken from flax) to download the
xapian tarballs and unpack them, ready to be built.
* libs/*.tgz: Remove the tarballs from svn - they were too big to
be kept here. They're now hosted on the googlecode download
area, which should be as reliable as the googlecode svn server.
* utils/make_xappy_tarballs: Update with new version numbers.
Mon Jan 07 16:59:22 GMT 2008 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update version numbers for latest
branch updates, to build new tarballs.
Mon Dec 31 13:09:33 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/searchconnection_doctest1.txt:
Test opening of a database which doesn't exist, and set _index to
None in class initialiser to avoid assertion error when calling
close() from __del__() in this situation.
Mon Dec 17 09:28:01 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add _cluster method, and
_reorder_by_clusters() method.
Mon Dec 10 19:46:25 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Update to apply the changes in the
clustering branch.
Mon Dec 10 19:45:30 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add ability to restrict the reordering
to just use specific fields, and to use approximations for the
termfreqs to speed it up.
Mon Dec 10 17:19:34 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Remove accidentally committed
debugging prints.
Mon Dec 10 17:18:34 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add (experimental)
_reorder_by_similarity() method to SearchResults.
Thu Dec 06 16:53:23 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add a "userdata" parameter to the
closehandler callback, to make writing the callbacks easier.
* xappy/searchconnection_doctest2.txt: Test the userdata parameter.
Thu Dec 06 12:27:53 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight_doctest1.txt: Don't display the output of the
highlighter - we're just testing that it returns promptly.
Thu Dec 06 12:19:55 UTC 2007 Tom Mortimer <tom@lemurconsulting.com>
* xappy/highlight.py,xappy/highlight_doctest1.txt: Simplified
regexp to work around freezing problem. Less procise now but
probably good enough temporarily. Fixed test case.
Thu Dec 06 07:38:49 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight_doctest1.txt: Add testcase of a pathological
document for highlighting - the regular expression currently
takes a ridiculously long time to process this.
Thu Dec 06 07:29:28 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* testsuite/runtests.py: Fix for running with debug logging.
Wed Dec 05 15:57:41 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Improve a documentation comment.
Mon Dec 03 18:01:51 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/searchconnection_doctest2.txt:
Add ability to set a callback on SearchConnection to be called
when the object is closed (even if this is an implicit close due
to being deleted).
Thu Nov 29 17:58:18 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Expose an API for setting the minimum
weight or percentage allowed for a result to be returned; this is
done by supplying the percentcutoff or weightcutoff parameters to
SearchConnection.search()
* xappy/searchconnection_doctest2.txt: Test the weight and
percentage cutoff parameters.
Wed Nov 28 10:18:43 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Fix returning of empty facet value,
which translates into numeric range from -inf to -inf: this is
returned when some documents do not have an entry in a numeric
range, with a count of the number of documents which matched but
didn't have a numeric facet. Just ignore this information.
Wed Nov 28 08:37:08 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Remove a typo.
Wed Nov 28 08:15:09 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,xappy/indexerconnection_doctest2.txt,
xappy/searchconnection.py,xappy/searchconnection_doctest2.txt:
Modify query parsing to ensure that exact matches are given a
higher weight than stemmed or synonym matches. Update testcases
accordingly.
Wed Nov 28 07:34:21 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* libs/: Update the xapian tarballs; these now include OP_SYNONYM
and use it for synonym searches, wildcards, and partial searches.
Tue Nov 27 22:39:54 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* utils/make_xappy_tarballs: Add script to update the xappy
tarballs from xapian SVN.
Mon Nov 26 14:51:24 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* testsuite/runtests.py: Call close methods on anything from xappy
which has one when cleaning up.
Mon Nov 26 14:11:46 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection_doctest1.txt: Windows doesn't give a
detail for why a DatabaseLockError can't be obtained, so make the
test case more flexible there.
Mon Nov 26 12:36:38 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* testsuite/runtests.py: Delete entries in the dictionary before
calling teardown; should help avoid trying to delete open files
on windows.
Sun Nov 18 15:45:26 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/replaylog.py: New file - allows all calls to xapian to be
logged, such that they could be replayed later for debugging.
Has rather an unpleasant implementation, but as a result has
minimal impact when not turned on - I've not been able to measure
any performance impact incurred when not logging.
* xappy/__init__.py: Expose new function "set_replay_path" used to
start logging.
* xappy/marshall.py, xappy/fieldactions.py,
xappy/datastructures.py, xappy/indexerconnection.py,
xappy/searchconnection.py: Hook into the replay logging.
Sun Nov 18 15:44:50 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* testsuite/runtests.py: Run without profiling by default - much
faster.
Thu Nov 15 08:38:18 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* testsuite/runtests.py: Allow coverage and profiling measures to
be turned on and off easily (not yet with command line options,
but now only needs a simple edit to the code).
Wed Nov 07 17:45:48 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add extra "query" parameter to
summarise() and highlight() methods, which can be used to
override the query used as the basis of the highlighting.
Wed Nov 07 17:30:02 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight.py: Fix tests to correspond to recent change.
* xappy/searchconnection.py: Add "query_none()" to get an empty
query explicitly. Can be useful as a placeholder.
Tue Nov 06 13:54:13 UTC 2007 Tom Mortimer <tom@lemurconsulting.com>
* xappy/highlight.py: Highlighter works with stemmed and unstemmed
terms. Workaround until we have proper phrase highlighting
Thu Nov 01 14:43:34 UTC 2007 Richard Boulton <richard@lemurconsulting.com>
* libs/win32msvc.tgz: Updated build files for windows.
Wed Oct 31 19:01:03 UTC 2007 Richard Boulton <richard@lemurconsulting.com>
* libs/xapian-bindings-xappy.tgz: Version with a concurrency
problem fixed.
Wed Oct 31 17:57:00 UTC 2007 Richard Boulton <richard@lemurconsulting.com>
* libs/matchspy.cc: Version of matchspy.cc with quick workaround to
avoid segfault.
Tue Oct 30 11:12:28 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Document the members of SearchResult.
Tue Oct 30 11:02:22 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add weight and percent members to
SearchResult objects.
Mon Oct 29 21:21:27 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* README: Update to tell users to use the tarballs from the libs/
subdirectory.
Mon Oct 29 21:19:01 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/searchconnection_doctest2.txt:
Fix setting of the prefix to use the correct form of add_prefix,
and fix the expected output of scale weight queries to use the
new style of output.
Mon Oct 29 20:19:27 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* libs/win32msvc.tgz,libs/xapian-bindings-xappy.tgz,
libs/xapian-core-xappy.tgz: Add tarballs containing a suitable
version of xapian to use with xappy.
Mon Oct 29 15:07:00 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Turn off the max_mem_use setting by
default, so we don't mess up performance of existing applications.
Mon Oct 29 14:59:26 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/indexerconnection.py: Increase estimate of amount of memory
used, based on profiling observations.
Mon Oct 29 14:14:04 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* testsuite/unittests/spell_correct_1.py: Add unittest
demonstrating problem with spelling correction.
Mon Oct 29 14:08:59 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/memutils.py: New file, which gets the total amount of
physical memory on the system (for windows and POSIX).
* xappy/indexerconnection.py: Add set_max_mem_use(), which causes
an automatic flush if more than a certain (configurable) amount
of memory is used. This should help to avoid using all the
memory for buffered changes, resulting in swapping. The estimate
of the memory used is fairly primitive, though, so could do with
improvment.
Mon Oct 29 09:30:21 GMT 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Retry parse_query() attempt without
support for boolean operators if it fails in spell correct
routine, to match behaviour of query_parse() routine.
Fri Oct 12 09:52:31 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: If allow, deny, default_deny or
default_allow are passed as empty lists, behave as if they were
passed as None.
Wed Oct 10 22:56:19 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Update some documentation comments.
Wed Oct 10 18:21:33 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add the default_op, default_allow, and
default_deny optional parameters to spell_correct(), so that it
takes the same arguments as query_parse().
Wed Oct 10 01:25:01 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/perftest.py: Remove facet and tags test runs - we don't
have the data needed to make them run, anyway.
Wed Oct 10 01:22:20 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/perftest.py,perftest/searcher.py: Add use_or option to
search runs, and do an "OR" run by default.
Tue Oct 09 16:03:42 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Expand documentation comment to
explain the distinction between default_{allow,deny} and
{allow,deny}.
Tue Oct 09 15:36:32 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/perftest.py: Add "--usedb" parameter - if supplied, a
ready made DB is assumed to be at that path, and no index run
will be done.
Tue Oct 09 15:10:00 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/perftest.py: If pylab isn't available, don't call the
analyse_* functions (and don't produce pretty graphs, as a
result).
Tue Oct 09 14:55:53 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* README,docs/introduction.rst: Update comments about the version
of Xapian which is required.
Tue Oct 09 02:14:24 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/running_perftest.txt: More instructions.
Tue Oct 09 02:05:03 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/running_perftest.txt: Add some notes on running the
performance tests, with wikipedia data.
Tue Oct 09 01:48:58 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/perftest.py: More tidying, ready for running big tests
against wikipedia.
Tue Oct 09 01:32:43 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/searcher.py: Tidy up headings.
Tue Oct 09 01:21:38 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/: Sort out search side of performance tests.
Mon Oct 08 23:46:18 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/: Tidy up, towards making automated performance tests
runnable just by running a single script. Fix graph drawing for
cases where there are few sample points.
Mon Oct 08 14:22:25 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS: Add second name for Bruno Rezende.
Sun Oct 07 01:56:47 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Change from using OP_MULT_WEIGHT to
use OP_SCALE_WEIGHT, to work with latest version of xapian.
Sat Oct 06 01:55:34 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* MANIFEST.in,eggsetup.py,setup.py: Basic start of distutils
packaging.
Wed Oct 03 14:00:17 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* build.py: Don't include private variables in the output of epydoc;
this makes it more useful as an API reference.
* docs/introduction.rst: Add a note on error handling.
Wed Oct 03 13:04:01 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/errors.py,xappy/errors_doctest1.txt: Export all the xapian
error types (eg, xapian.FooError) as xappy.XapianFooError. Also,
make them all subclasses of xappy.XapianError. This allow
a particular Xapian error to be caught using "except
xappy.XapianFooError", or all Xapian errors to be caught using
"except xappy.XapianError".
Wed Oct 03 12:36:27 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/highlight.py,xappy/searchconnection_doctest3.txt: Fix from
Alex Bowley to coerce maxlen into an int in highlight.py
Tue Oct 02 18:31:54 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Minor correction to a documentation
comment.
Tue Oct 02 18:03:31 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add query_adjust(), allowing the
weights of one query to be adjusted based on the results of a
second query.
* xappy/searchconnection_doctest2.txt: Add test for query_adjust()
Mon Oct 01 15:23:11 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add a __len__() method for
SearchResults().
* xappy/searchconnection_doctest1.txt,
xappy/searchconnection_doctest2.txt: Test it
Mon Oct 01 14:23:22 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add SearchConnection.query_multweight,
which produces a query from a subquery by multiplying the weights
by a multiplier.
* xappy/indexerconnection_doctest2.txt: Modify test of "Cannot
specify both `allow` and `deny`" to expect new extended message.
* xappy/searchconnection_doctest2.txt: Add test of a multweight
query.
Sun Sep 30 10:37:21 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/searchconnection_doctest2.txt:
Add "required_facets" parameter to get_suggested_facets(),
allowing certain facets to be required in the list of returned
facets.
Fri Sep 28 16:12:35 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* testsuite/coverage.py: Update copy of coverage.py to latest
version (with patches applied) to get correct results with
python2.5
Mon Sep 24 14:21:48 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py,xappy/searchconnection_doctest2.txt:
Add default_allow and default_deny parameters to query_parse.
These allow a list of field names to be specified which will be
searched by default (instead of searching all free-text fields).
Needs latest SVN version of xapian.
Sat Sep 22 09:21:13 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/searchconnection.py: Add 'rb' to another call to open that
I missed.
Thu Sep 20 15:20:41 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS: Start list of individuals who have contributed in any
way.
Tue Sep 18 13:23:15 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* xappy/datastructures.py,xappy/indexerconnection_doctest1.txt:
Document, and test, that it's okay to use an iterator for
UnprocessedDocument.fields.
Tue Sep 18 13:05:53 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Clarify some of the documentation about
facets.
Fri Sep 07 16:51:33 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* ChangeLog: Tidy-up whitespace.
Wed Sep 05 15:33:27 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* examples/fileindex.py,examples/search.py,perftest/index_from_dump.py,
perftest/search_speed.py: Change all remaining references to
"secore" name, except in the compatibility wrapper and the tests
for that, to "xappy".
Wed Sep 05 15:29:22 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Change references to secore to references
to xappy.
* xappy/fieldmappings_doctest1.txt: Add test which I wrote ages
ago, but had forgotten to commit.
Wed Sep 05 15:05:57 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/,testsuite/runtests.py: Add compatibility layer so that
old scripts can run without needing to change from "secore" to
"xappy", for now. Change testsuite to run using new names (but
use the old ones too, to test the compatibility layer).
Wed Sep 05 14:24:52 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS,README,build.py,secore/,xappy/__init__.py: Rename secore
to xappy. Adjust accompanying scripts and documentation
accordingly.
Wed Sep 05 02:23:46 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection_doctest2.txt: Change expected output to
match output given by xapian SVN HEAD, once the bug in
check_at_least is resolved.
Fri Aug 17 14:41:13 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* perftest/index_from_dump.py: Remove pointless "os.stat"
Fri Aug 17 14:38:46 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/indexerconnection.py: Add a method to get a list of the
fields which have actions defined.
Thu Aug 16 17:42:30 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/indexerconnection.py: Add a check for the database not
having been closed.
* secore/searchconnection.py: Add method for getting an iterator
over all the documents in the database. Add support for
rerunning the attempted access if get_document() catches a
DatabaseModifiedError.
* secore/searchconnection_doctest2.txt: Change
"get_significant_terms" to "significant_terms()", and test it.
Thu Aug 16 17:24:06 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py,secore/searchconnection_doctest2.txt:
Add new method "get_significant_terms()" which returns the most
significant terms in the set of ids specified.
Thu Aug 16 10:22:16 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection_doctest2.txt: Add test for supplying a
document ID to query_similar() which isn't in the database.
Thu Aug 16 09:49:09 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Add documentation for similarity search.
Thu Aug 16 09:15:44 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Retry the expand until we don't get a
DatabaseModifiedError.
Thu Aug 16 09:06:44 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py,secore/searchconnection_doctest2.txt:
Implementation of the query_similar() method, returning a query
to use to get a new set of results based on similarity.
Wed Aug 08 18:52:24 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py,secore/searchconnection_doctest2.txt:
Add API for performing searches for similar documents. Currently
just works out which fields should be used for performing the
similarity comparison, but doesn't do the actual similarity
search.
Wed Aug 08 13:37:26 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Fix up calculation of significant
digits to cope with extreme values (ie, 0), to round to the
nearest significant digit (previously, it rounded down), and to
use math.log10 instead of a loop to calculate the logarithm.
Wed Aug 08 13:21:41 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Fix issue with new match estimate
rounding when match estimate is 0.
Wed Aug 08 03:17:37 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Improve documentation comment.
Wed Aug 08 03:13:10 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Add "matches_human_readable_estimate"
to search results - returns an estimate of the number of matching
documents, rounded according to how tight the upper and lower
bounds are.
Sat Aug 04 08:16:01 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Add convenience methods for checking
if a particular field can be collapsed or sorted on.
Sat Aug 04 03:09:32 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,secore/: Reorganise field structures,
allowing us to support multiple occurrences of facets of string
type for a single document. This change requires all databases
to be rebuilt. Tests updated accoringly. Also, requires latest
SVN HEAD build of xapian.
Wed Aug 01 14:45:41 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection_doctest2.txt: Test for multiple facet
values in a single document (currently fails, due to this not
being supported correctly yet)
Wed Aug 01 14:23:24 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection_doctest2.txt: More tests, including a
regression test for bug with facet calculation on database with
no facet fields defined.
Wed Aug 01 12:43:57 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection_doctest2.txt: Improve test coverage.
Wed Aug 01 12:28:48 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Fix bug when facet calculation is
requested, but no facet fields are present in the database (used
to throw an exception when the list of suggested facets was
requested in this case - now it just returns an empty list).
Tue Jul 31 08:34:49 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/indexerconnection_doctest2.txt,secore/searchconnection.py,
secore/searchconnection_doctest3.txt: Improve test coverage - in
particular, add tests of various conditions which cause errors.
Mon Jul 30 16:10:52 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/datastructures.py,secore/datastructures_doctest1.txt:
Add test for terms which are too long, and note in the code about
why this restriction exists, and how it could be removed.
Mon Jul 30 14:29:51 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Update test for displaying a facet range.
Mon Jul 30 14:29:08 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Update documentation comment for
query_facet() method.
Mon Jul 30 12:38:25 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Fix bug with handling of facets of
type 'float'.
Fri Jul 27 02:04:27 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/fieldactions.py,secore/searchconnection.py: Add facet
searching.
Fri Jul 27 01:13:27 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: For facet selection - don't return
facets which only have 1 or 0 values.
Thu Jul 26 17:02:33 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/datastructures.py: Warn early if a field is too long:
we might be able to replace this by hashing if necessary, but
this is better than waiting for the xapian error in this case.
Wed Jul 25 08:37:51 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Add documentation on indexing and
searching facets.
Wed Jul 25 03:06:19 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Correct small initialisation bug.
Wed Jul 25 01:53:51 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/: Change string marshalling to use xapian's stuff; cuts
down code, uses a more compact representation, and is compatible
with the facet range calculation stuff. Update tests to cover
the facet calculation stuff in more detail (but still need more
coverage). Fix bug with converting string range to a numeric
range more than once if get_suggested_facets is called
repeatedly.
Tue Jul 24 10:32:56 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/fieldactions.py,secore/searchconnection.py,
secore/searchconnection_doctest2.txt: Add the "FACET" action, to
store facets for field search and facet selection. Still
remaining is to change the serialisation of floats to match
Xapian to make the numeric range calculation work correctly,
translate the resulting numeric ranges into a more suitable
python representation, and handle multiple values for a
particular facet being specified for a single document.
Mon Jul 23 10:59:58 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,secore/searchconnection.py: Add special
value of -1 for checkatleast parameter, to check all matches, and
document it (and the general reason for setting the checkatleast
parameter when using get_top_tags()).
Mon Jul 23 10:40:22 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/: Add missing synonym stuff.
Tue Jul 17 13:42:39 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,secore/fieldactions.py,
secore/searchconnection.py: Add some documentation of tags, and
fix a couple of bugs.
Tue Jul 17 13:25:10 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/fieldactions.py,secore/searchconnection.py: Add support
for tagging - as yet, undocumented, and minimally tested.
Mon Jul 16 11:32:18 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Check for KeyError when getting a
slot number for a range restriction, too.
Mon Jul 16 11:18:36 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Check for KeyError when accessing
list of field actions, and behave as if an empty list was found
if the field is unknown.
Fri Jul 13 16:39:50 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Add an option to query_filter to
return only those documents which _don't_ match the filter,
instead of those which do.
* secore/searchconnection_doctest2.txt: Add a test for using a
filter with exclude=True.
Mon Jul 09 11:23:56 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Convert supplied sequence of queries
to list, since xapian query constructor isn't happy to take an
iterator.
Fri Jul 06 16:38:11 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/indexerconnection.py: Fix bug in replace() causing the
document data not to be stored.
Wed Jul 04 17:24:40 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,perftest/index_from_dump.py,secore/:
Change all occurrences of unique_id to just "id" - no need to say
the unique bit, so it's just wasted typing.
Mon Jul 02 09:08:39 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: If a search connection is opened
before the fieldmappings file has been created, give it an empty
FieldMappings object. Also, fix a bug in handling of
checkatleast.
Sun Jul 01 18:13:34 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Add query_all(), to make a search
matching all documents in the database.
Fri Jun 29 09:36:49 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/searchconnection.py: Encode correctly spelt queries in UTF8,
for consistency with output of Xapian.
* secore/searchconnection_doctest2.txt: Test spelling correction
with unicode strings.
Fri Jun 29 09:00:14 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/indexerconnection_doctest3.txt,
secore/searchconnection_doctest2.txt: Improve test coverage.
Fri Jun 29 08:47:34 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst,secore/: Finally, properly implement and
document the spelling correction support.
Thu Jun 28 23:33:06 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Fix documentation of parameters to
INDEX_FREETEXT, and adjust an example to give expected output
after recent fix to highlighting.
* secore/fieldactions.py: Add an option allowing indexing without
positional information, and an option allowing indexing with
spelling correction.
* secore/highlight.py,secore/searchconnection_doctest1.txt: Update
output of examples to match recent fix to highlighting.
Thu Jun 28 15:26:36 BST 2007 Tom Mortimer <tom@lemurconsulting.com>
* secore/fieldactions.py: Add stopwording, and an option not to
store prefixed terms, to freetext indexing.
* secore/highlight.py: Fix a bug causing the requested maxlength to
be exceeded if the derived blocks were too big.
Fri Jun 08 15:37:21 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS,testdata/query_sourcewords.txt: Add source words for
generating queries to test with, and list the copyright holders
in AUTHORS.
* perftest/: Add some routines for performing performance tests,
and analysing the logs of these tests.
* perftest/parse_wikipedia/: Add some routines which convert XML
dumps of wikipedia data into scriptindex compatible forms.
Fri Jun 08 11:58:54 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* Initial import into code.google.com repository.
Wed May 16 20:03:02 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* Remove textprocessor.py in favour of using the new TermGenerator
stuff in Xapian 1.0.0. Implement sorting by date or floating
point. Add query_range() method for searching all documents in a
given range.
Wed May 16 15:44:18 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* README: Add a note on the name, and list docutils as a
dependency.
Wed May 16 10:27:22 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/indexerconnection.py: Add get_document() method, to get a
document given it's unique ID.
Fri Apr 27 18:34:25 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* README: Adjust some paths.
Fri Apr 27 18:22:19 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Finish documentation by documenting
collapse.
Fri Apr 27 18:17:41 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: And the section of sorting is done.
Fri Apr 27 18:00:32 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Finished section on searching, apart from
bits on sorting and collapsing.
Fri Apr 27 17:11:16 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* docs/introduction.rst: Finished section on indexing.
Fri Apr 27 16:41:23 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* build.py: Change method of calling rst2html and epydoc to avoid
using system, and hopefully be cross-platform.
Fri Apr 27 16:05:44 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* AUTHORS,COPYING_GPL: Add files listing authors, copyrights and
license.
Fri Apr 27 16:00:08 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* build.py,secore/__init__.py,testsuite/runtests.py: Fix headers to
contain licenses and appropriate comments.
Fri Apr 27 15:50:50 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* README,build.py,docs/makedocs.py: Move makedocs.py to build.py,
and add documentation on the prerequisites needed.
Fri Apr 27 15:16:34 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* README,docs/introduction.rst: Move details of how to run
testsuite from README into new file, together with basic
introduction to secore. (Sections which need finishing are
marked with FIXME.)
* docs/makedocs.py: Add script which makes all the documentation.
* testsuite/runtests.py: Add ability to test doctests in external
documentation, and add 'docs/introduction.rst' to the tests.
Fri Apr 27 14:14:58 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* README: Add a basic README to point users to the appropriate
documentation.
Fri Apr 27 13:16:45 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* coverage.py,runtests.py,testsuite/runtests.py: Move the testsuite
into a subdirectory, to tidy up the top level.
Fri Apr 27 12:45:18 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* secore/datastructures.py,secore/errors.py: Fixes to documentation
comments to make epydoc happier.
Fri Apr 27 12:37:29 BST 2007 Richard Boulton <richard@lemurconsulting.com>
* ChangeLog: Added new file, to keep track of changes.
Current project status:
- Basic API complete and implemented.
- Tests cover all lines of code, apart from TextProcessor, which
is scheduled for removal soon anyway.
Something went wrong with that request. Please try again.