# Basic Cheshire objects and methods

This file wants to document how one can use Cheshire to query the CLiC database. 
It fills in the gaps of the official cheshire documentation and it 
provides a number of very specific, hands-on examples.

Author: Johan de Joode

Dates: 10/2/2015

Database: Dickens



## Setup

In [1]:
# coding: utf-8

import os

from cheshire3.baseObjects import Session
from cheshire3.document import StringDocument
from cheshire3.internal import cheshire3Root
from cheshire3.server import SimpleServer   

session = Session()
session.database = 'db_dickens'
serv = SimpleServer(session, os.path.join(cheshire3Root, 'configs', 'serverConfig.xml'))
db = serv.get_object(session, session.database)
qf = db.get_object(session, 'defaultQueryFactory')
resultSetStore = db.get_object(session, 'resultSetStore')
idxStore = db.get_object(session, 'indexStore')

## Querying

Build a query. This does not hit the database itself. 

In [2]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" and/cql.proxinfo c3.chapter-idx = "fog"')

A query can be printed as CQL or as XCQL

In [3]:
print query.toCQL()

(c3.subcorpus-idx = "dickens" and/cql.proxinfo c3.chapter-idx = "fog")


In [4]:
print query.toXCQL()

<triple xmlns="http://www.loc.gov/zing/cql/xcql/">
  <boolean>
    <value>and</value>
    <modifiers>
      <modifier>
                     <type>cql.proxinfo</type>
      </modifier>    </modifiers>
  </boolean>
  <leftOperand>
    <searchClause>
      <index>
        <value>c3.subcorpus-idx</value>
      </index>
      <relation>
        <value>=</value>
      </relation>
      <term>dickens</term>
    </searchClause>
  </leftOperand>
  <rightOperand>
    <searchClause>
      <index>
        <value>c3.chapter-idx</value>
      </index>
      <relation>
        <value>=</value>
      </relation>
      <term>fog</term>
    </searchClause>
  </rightOperand>
</triple>



To search the database using this particular query, one needs to use the search method on a database object. This spits out a result set.

In [5]:
result_set = db.search(session, query)
result_set

<cheshire3.resultSet.SimpleResultSet at 0x7f954ca35c10>

## Handling the results

When using the chapter index, the result set is an iterable of results in which *each chapter* is one result. For the query above, there are thus 35 chapters that match the query.

In [6]:
print len(result_set)

35


Each result in the result set refers to a particular recordStore in which, surprise surprise, the actual chapter is stored.

In [7]:
for result in result_set:
    print result

recordStore/0
recordStore/2
recordStore/3
recordStore/4
recordStore/16
recordStore/21
recordStore/44
recordStore/92
recordStore/151
recordStore/157
recordStore/168
recordStore/170
recordStore/172
recordStore/207
recordStore/216
recordStore/221
recordStore/235
recordStore/285
recordStore/312
recordStore/345
recordStore/471
recordStore/477
recordStore/499
recordStore/539
recordStore/633
recordStore/649
recordStore/650
recordStore/659
recordStore/673
recordStore/689
recordStore/690
recordStore/719
recordStore/744
recordStore/745
recordStore/791


## Understanding the results

For each of these results a number of attributes can be accessed using the dot notation. The choices are:

* result.attributesToSerialize  
* result.id                     
* result.recordStore
* result.database
* result.diagnostic
* result.fetch_record
* result.proxInfo
* result.weight
* result.numericId
* result.resultSet
* result.occurences
* result.serialize
* result.scaledWeight 

In our current setup it seems that results are not weighed. 

`proxInfo` is one of the most important attributes for our purposes.

It describes the proximity information for a hit in a particular record,
or in other words, where in a record the search string can be found.

We currently assume the following values:
* the first item is the id of the root element from
which to start counting to find the word node
for instance, 0 for a chapter view (because the chapter
is the root element), but 151 for a search in quotes
text.
* the second item in the deepest list (169, 171)
is the id of the <w> (word) node
* the third element is the character offset, 
or the exact character (spaces, and
and punctuation (stored in <n> (non-word) nodes
at which the search term starts
* the fourth element is the total amount of characters
 in the document

In [8]:
for result in result_set:
    print 'result.id: ', result.id
    print 'result.database: ', result.database
    print 'result.occurrences: ', result.occurences
    print 'result.proxInfo: ', result.proxInfo
    print "#########"

result.id:  0
result.database:  db_dickens
result.occurrences:  22
result.proxInfo:  [[[0, 169, 1033, 15292]], [[0, 171, 1049, 15292]], [[0, 206, 1241, 15292]], [[0, 216, 1295, 15292]], [[0, 247, 1471, 15292]], [[0, 183, 1112, 15292]], [[0, 211, 1267, 15292]], [[0, 223, 1344, 15292]], [[0, 237, 1415, 15292]], [[0, 264, 1574, 15292]], [[0, 283, 1671, 15292]], [[0, 312, 1836, 15292]], [[0, 314, 1846, 15292]], [[0, 336, 1955, 15292]], [[0, 392, 2248, 15292]], [[0, 433, 2499, 15292]], [[0, 449, 2586, 15292]], [[0, 556, 3190, 15292]], [[0, 727, 4181, 15292]], [[0, 2017, 11496, 15292]], [[0, 2365, 13596, 15292]], [[0, 2430, 13942, 15292]]]
#########
result.id:  2
result.database:  db_dickens
result.occurrences:  2
result.proxInfo:  [[[0, 5841, 30641, 15292]], [[0, 7479, 39482, 15292]]]
#########
result.id:  3
result.database:  db_dickens
result.occurrences:  3
result.proxInfo:  [[[0, 590, 3263, 15292]], [[0, 694, 3814, 15292]], [[0, 4848, 26290, 15292]]]
#########
result.id:  4
result.databa

In [9]:
for result in result_set:
    print result.attributesToSerialize

[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurences', 0), ('weight', 0.5), ('scaledWeight', 0.5)]
[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurences', 0), ('weight', 0.5), ('scaledWeight', 0.5)]
[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurences', 0), ('weight', 0.5), ('scaledWeight', 0.5)]
[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurences', 0), ('weight', 0.5), ('scaledWeight', 0.5)]
[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurences', 0), ('weight', 0.5), ('scaledWeight', 0.5)]
[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurences', 0), ('weight', 0.5), ('scaledWeight', 0.5)]
[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurences', 0), ('weight', 0.5), ('scaledWeight', 0.5)]
[('id', 0), ('numericId', None), ('recordStore', ''), ('database', ''), ('occurence

From what I gather, a result in a resultSet is only a pointer to the document and not the document itself.
The latter needs to be fetched and is generally called a record.

Records have the following attributes (most of which seem irrelevant for our purposes and several of which
only return empty strings):

    rec.baseUri           rec.elementHash       rec.get_sax           rec.parent            rec.rights            
    rec.byteCount         rec.fetch_proxVector  rec.get_xml           rec.processHistory    rec.sax               
    rec.context           rec.fetch_vector      rec.history           rec.process_xpath     rec.size              
    rec.digest            rec.filename          rec.id                rec.recordStore       rec.status            
    rec.dom               rec.get_dom           rec.metadata          rec.resultSetItem     rec.tagName  
    rec.wordCount         rec.xml

In [10]:
for result in result_set:
    rec = result.fetch_record(session)
    print type(rec), rec

<class 'cheshire3.record.LxmlRecord'> recordStore/0
<class 'cheshire3.record.LxmlRecord'> recordStore/2
<class 'cheshire3.record.LxmlRecord'> recordStore/3
<class 'cheshire3.record.LxmlRecord'> recordStore/4
<class 'cheshire3.record.LxmlRecord'> recordStore/16
<class 'cheshire3.record.LxmlRecord'> recordStore/21
<class 'cheshire3.record.LxmlRecord'> recordStore/44
<class 'cheshire3.record.LxmlRecord'> recordStore/92
<class 'cheshire3.record.LxmlRecord'> recordStore/151
<class 'cheshire3.record.LxmlRecord'> recordStore/157
<class 'cheshire3.record.LxmlRecord'> recordStore/168
<class 'cheshire3.record.LxmlRecord'> recordStore/170
<class 'cheshire3.record.LxmlRecord'> recordStore/172
<class 'cheshire3.record.LxmlRecord'> recordStore/207
<class 'cheshire3.record.LxmlRecord'> recordStore/216
<class 'cheshire3.record.LxmlRecord'> recordStore/221
<class 'cheshire3.record.LxmlRecord'> recordStore/235
<class 'cheshire3.record.LxmlRecord'> recordStore/285
<class 'cheshire3.record.LxmlRecord'> re

The `get_dom(session)` method spits out the record in parsed xml.
This is essential for our purposes.

In [11]:
for result in result_set:
    rec = result.fetch_record(session)
    print "rec.id: ", rec.id
    print 'rec.wordCount: ', rec.wordCount
    print 'rec.get_dom(session): ', rec.get_dom(session)
    print "#######"

rec.id:  0
rec.wordCount:  5008
rec.get_dom(session):  <Element div at 0x7f954c15af38>
#######
rec.id:  2
rec.wordCount:  14902
rec.get_dom(session):  <Element div at 0x7f954c15a9e0>
#######
rec.id:  3
rec.wordCount:  9279
rec.get_dom(session):  <Element div at 0x7f954c15ad88>
#######
rec.id:  4
rec.wordCount:  10716
rec.get_dom(session):  <Element div at 0x7f954c15a8c0>
#######
rec.id:  16
rec.wordCount:  10690
rec.get_dom(session):  <Element div at 0x7f954c15abd8>
#######
rec.id:  21
rec.wordCount:  9765
rec.get_dom(session):  <Element div at 0x7f954c15ac20>
#######
rec.id:  44
rec.wordCount:  10135
rec.get_dom(session):  <Element div at 0x7f954c15a440>
#######
rec.id:  92
rec.wordCount:  3524
rec.get_dom(session):  <Element div at 0x7f954c15a878>
#######
rec.id:  151
rec.wordCount:  12178
rec.get_dom(session):  <Element div at 0x7f954c15ab48>
#######
rec.id:  157
rec.wordCount:  9454
rec.get_dom(session):  <Element div at 0x7f954c15a0e0>
#######
rec.id:  168
rec.wordCount:  6952
rec

In [12]:
result_set.attributesToSerialize

[('id', ''),
 ('termid', -1),
 ('totalOccs', 0),
 ('totalRecs', 0),
 ('expires', 0),
 ('queryTerm', ''),
 ('queryFreq', 0),
 ('queryPositions', []),
 ('relevancy', 0),
 ('maxWeight', 0),
 ('minWeight', 0),
 ('termWeight', 0.0),
 ('recordStore', ''),
 ('recordStoreSizes', 0),
 ('index', None),
 ('queryTime', 0.0),
 ('query', '')]

In [13]:
result.attributesToSerialize

[('id', 0),
 ('numericId', None),
 ('recordStore', ''),
 ('database', ''),
 ('occurences', 0),
 ('weight', 0.5),
 ('scaledWeight', 0.5)]

In [14]:
for result in result_set:
    print result.serialize(session)

<item><d n="recordStore" t="str">recordStore</d><d n="database" t="str">db_dickens</d><d n="occurences" t="int">22</d><proxInfo><hit><w e="0" w="169" o="1033" t="15292"/></hit><hit><w e="0" w="171" o="1049" t="15292"/></hit><hit><w e="0" w="206" o="1241" t="15292"/></hit><hit><w e="0" w="216" o="1295" t="15292"/></hit><hit><w e="0" w="247" o="1471" t="15292"/></hit><hit><w e="0" w="183" o="1112" t="15292"/></hit><hit><w e="0" w="211" o="1267" t="15292"/></hit><hit><w e="0" w="223" o="1344" t="15292"/></hit><hit><w e="0" w="237" o="1415" t="15292"/></hit><hit><w e="0" w="264" o="1574" t="15292"/></hit><hit><w e="0" w="283" o="1671" t="15292"/></hit><hit><w e="0" w="312" o="1836" t="15292"/></hit><hit><w e="0" w="314" o="1846" t="15292"/></hit><hit><w e="0" w="336" o="1955" t="15292"/></hit><hit><w e="0" w="392" o="2248" t="15292"/></hit><hit><w e="0" w="433" o="2499" t="15292"/></hit><hit><w e="0" w="449" o="2586" t="15292"/></hit><hit><w e="0" w="556" o="3190" t="15292"/></hit><hit><w 

A record can be transformed into raw xml (in order to understand it), using
a method from lxml:

In [15]:
from lxml import etree
rec_tostring = etree.tostring(rec2)
print rec_tostring

NameError: name 'rec2' is not defined

This could also be used in simple python string manipulations. 
For instance, to highlight something in a chapter, or to build 
a concordance based on the raw string rather than an xml tree.

In that case one should note that only each occurrence of a term is 
duplicated because it is present in `<txt>` and in its own word node.

In [16]:
# find the first occurrence of the term love
# because that is what we are all looking for
love = rec_tostring.find('love')
conc_line = rec_tostring[love-50 : love + len('love') + 50]
conc_line.replace('love', 'LOVE')

NameError: name 'rec_tostring' is not defined

## Transforming a result

Rather than manually handling the xml like this, Cheshire has a class called a
Transformer that can perform xsl transformations on the xml of a chapter.

Transformers are defined in a configuration file. In our project they live in an
xsl file.

The following examples use a transformer that was not designed to work with our input, 
but they do illustrate how transformers can be invoked.

In [17]:
kwicTransformer = db.get_object(session, 'kwic-Txr')

In [18]:
print kwicTransformer

<cheshire3.transformer.LxmlXsltTransformer object at 0x7f954c1b3510>


In [19]:
doc = kwicTransformer.process_record(session, rec).get_raw(session)

In [66]:
print doc

<?xml version="1.0"?>
<div class="chapterDiv"><span>ID: LD.1</span><h3>Chapter 1</h3>
CHAPTER 1 Sun and Shadow

<p><span><span onclick="getCFP('Thirty')">Thirty</span> <span onclick="getCFP('years')">years</span> <span onclick="getCFP('ago')">ago</span>, <span onclick="getCFP('Marseilles')">Marseilles</span> <span onclick="getCFP('lay')">lay</span> <span onclick="getCFP('burning')">burning</span> <span onclick="getCFP('in')">in</span> <span onclick="getCFP('the')">the</span> <span onclick="getCFP('sun')">sun</span>, <span onclick="getCFP('one')">one</span> <span onclick="getCFP('day')">day</span>.</span></p>

<p><span><span onclick="getCFP('A')">A</span> <span onclick="getCFP('blazing')">blazing</span> <span onclick="getCFP('sun')">sun</span> <span onclick="getCFP('upon')">upon</span> <span onclick="getCFP('a')">a</span> <span onclick="getCFP('fierce')">fierce</span> <span onclick="getCFP('August')">August</span> <span onclick="getCFP('day')">day</span> <span onclick="getCFP('was')">wa

In [67]:
from cheshire3.transformer import XmlTransformer

In [74]:
dctxr = db.get_object(session, 'kwic-Txr')

In [76]:
dctxr

<cheshire3.transformer.LxmlXsltTransformer at 0x7f954c1b3510>

In [77]:
doc = dctxr.process_record(session, record)
print doc.get_raw(session)[:1000]

NameError: name 'record' is not defined

# Retrieving a chapter

In [21]:
query = qf.get_query(session, 'c3.book-idx = "LD"')
result_set = db.search(session, query)

In [22]:
chapter_1 = result_set[0]
chapter_44 = result_set[43]

In [23]:
chapter_1

Ptr:recordStore/394

In [24]:
rec = chapter_1.fetch_record(session).get_dom(session)
print rec

<Element div at 0x7f954c02e5f0>


In [25]:
rec.attrib

{'book': 'LD', 'type': 'chapter', 'id': 'LD.1', 'num': '1'}

In [26]:
rec.attrib['id']

'LD.1'

In [27]:
type(rec)

lxml.etree._Element

In [28]:
print rec

<Element div at 0x7f954c02e5f0>


In [29]:
doc = kwicTransformer.process_record(session, chapter_1.fetch_record(session)).get_raw(session)

In [30]:
print doc


CHAPTER 1 Sun and Shadow

Thirty years ago, Marseilles lay burning in the sun, one day.Thirty  years  ago , Marseilles  lay  burning  in  the  sun , one  day .

A blazing sun upon a fierce August day was no greater rarity in southern France then, than at any other time, before or since.A  blazing  sun  upon  a  fierce  August  day  was  no  greater  rarity  in  southern  France  then , than  at  any  other  time , before  or  since .Everything in Marseilles, and about Marseilles, had stared at the fervid sky, and been stared at in return, until a staring habit had become universal there.Everything  in  Marseilles , and  about  Marseilles , had  stared  at  the  fervid  sky , and  been  stared  at  in  return , until  a  staring  habit  had  become  universal  there .Strangers were stared out of countenance by staring white houses, staring white walls, staring white streets, staring tracts of arid road, staring hills from which verdure was burnt away.Strangers  were  stared  out  of  c

In [31]:
articleTransformer = db.get_object(session, 'article-Txr')

In [32]:
doc = articleTransformer.process_record(session, chapter_1.fetch_record(session)).get_raw(session)

In [33]:
print doc

<?xml version="1.0"?>
<div class="chapterDiv"><span>ID: LD.1</span><h3>Chapter 1</h3>
CHAPTER 1 Sun and Shadow

<p><span><span onclick="getCFP('Thirty')">Thirty</span> <span onclick="getCFP('years')">years</span> <span onclick="getCFP('ago')">ago</span>, <span onclick="getCFP('Marseilles')">Marseilles</span> <span onclick="getCFP('lay')">lay</span> <span onclick="getCFP('burning')">burning</span> <span onclick="getCFP('in')">in</span> <span onclick="getCFP('the')">the</span> <span onclick="getCFP('sun')">sun</span>, <span onclick="getCFP('one')">one</span> <span onclick="getCFP('day')">day</span>.</span></p>

<p><span><span onclick="getCFP('A')">A</span> <span onclick="getCFP('blazing')">blazing</span> <span onclick="getCFP('sun')">sun</span> <span onclick="getCFP('upon')">upon</span> <span onclick="getCFP('a')">a</span> <span onclick="getCFP('fierce')">fierce</span> <span onclick="getCFP('August')">August</span> <span onclick="getCFP('day')">day</span> <span onclick="getCFP('was')">wa

In [36]:
#FIXME How can you get immediately query for a chapter, 
# rather than getting all chapters of a book first?
# --> you need to build a better index for this
query = qf.get_query(session, 'c3.book-idx "LD" and div.id = "LD.1"')

Diagnostic: info:srw/diagnostic/1/10 [Malformed Query]: Unprocessed tokens remain: u'div.id'

In [39]:
result_set = db.search(session, query)

In [40]:
len(result_set)

70

In [41]:
#TODO if recordStore's are unique AND they represent chapters, it could also be possible to simply 
# get a particular recordStore from Cheshire (without querying the database again).


# Searching in a specific book

In [42]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" and c3.chapter-idx = "fog" and c3.book-idx = "BH"')
result_set = db.search(session, query)
len(result_set)

7

# Messing around

In [43]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                            and/cql.proxinfo c3.chapter-idx = "dense fog" \
                            ') #and c3.chapter-idx = "dense"')

In [44]:
rs = db.search(session, query)

In [45]:
len(rs)

3

In [46]:
for result in rs:
    print result.proxInfo
    #FIXME it seems that occurences cannot be trusted?
    print result.occurences

[[[0, 391, 2242, 10255], [0, 392, 2248, 15292]]]
22
[[[0, 1075, 5840, 10255], [0, 1076, 5846, 15292]]]
1
[[[0, 2640, 14808, 10255], [0, 2641, 14814, 15292]]]
2


In [47]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                            and/cql.proxinfo c3.chapter-idx = "the" \
                            ')

In [48]:
query.addPrefix(query, 'test')

In [49]:
query.toCQL()

u'(><cheshire3.cqlParser.Triple instance at 0x7f953e94ccf8>="test" c3.subcorpus-idx = "dickens" and/cql.proxinfo c3.chapter-idx = "the")'

Phrase search

In [50]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                            and/proxinfo c3.chapter-idx = "dense fog" \
                            ')

In [51]:
rs = db.search(session, query)

In [52]:
total = 0
for result in rs:
    total += len(result.proxInfo)

In [53]:
total

3

And search

In [54]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                            and/cql.proxinfo c3.chapter-idx = "fog" \
                            and c3.chapter-idx = "dense"')

In [55]:
rs = db.search(session, query)

In [56]:
len(rs)

8

Or search

In [57]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                            and/cql.proxinfo c3.chapter-idx = "fog" \
                            or c3.chapter-idx = "dense"')
rs = db.search(session, query)
len(rs)

112

In [58]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                               and c3.book-idx = "LD"')
rs = db.search(session, query)
len(rs)

70

In [59]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                               and c3.chapter-idx = "he" prox/distance=1/unordered c3.chapter-idx = "said" \
                               or c3.chapter-idx = "did" or c3.chapter-idx = "wanted"')
rs = db.search(session, query)
len(rs)

2035

In [60]:
#TODO not
#TODO wildcards

In [61]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                               and c3.chapter-idx window/distance<5/unordered "low voice"')
rs = db.search(session, query)
len(rs)

225

In [62]:
for result in rs:
    print result.proxInfo

[[[0, 1192, 5995, 23235], [0, 1193, 5999, 42329]], [[0, 7112, 37499, 23235], [0, 7113, 37503, 42329]], [[0, 7147, 37693, 23235], [0, 7148, 37697, 42329]]]
[[[0, 2863, 15852, 23235], [0, 2864, 15856, 42329]]]
[[[0, 4664, 25227, 23235], [0, 4665, 25231, 42329]]]
[[[0, 166, 890, 23235], [0, 167, 894, 42329]]]
[[[0, 6983, 37063, 23235], [0, 6984, 37067, 42329]], [[0, 7198, 38186, 23235], [0, 7199, 38190, 42329]]]
[[[0, 324, 1866, 23235], [0, 321, 1849, 42329]]]
[[[0, 1300, 6921, 23235], [0, 1301, 6925, 42329]]]
[[[0, 4939, 26070, 23235], [0, 4941, 26084, 42329]]]
[[[0, 2033, 10839, 23235], [0, 2034, 10843, 42329]]]
[[[0, 371, 2046, 23235], [0, 372, 2050, 42329]]]
[[[0, 2869, 15413, 23235], [0, 2870, 15417, 42329]]]
[[[0, 5276, 28599, 23235], [0, 5278, 28613, 42329]]]
[[[0, 2860, 15866, 23235], [0, 2861, 15870, 42329]]]
[[[0, 3341, 18932, 23235], [0, 3342, 18936, 42329]]]
[[[0, 1318, 7394, 23235], [0, 1319, 7398, 42329]]]
[[[0, 219, 1132, 23235], [0, 220, 1136, 42329]]]
[[[0, 2683, 14283, 2

In [63]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                               and c3.chapter-idx window/distance<5/unordered "voice low"')
rs = db.search(session, query)
len(rs)

225

In [64]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                               and c3.chapter-idx window/distance<5/unordered "low high"')
rs = db.search(session, query)
len(rs)

23

In [78]:
query = qf.get_query(session, 'c3.subcorpus-idx = "dickens" \
                               and c3.chapter-idx window/distance<3 "Mr Arthur said"')
rs = db.search(session, query)
len(rs)

2

# Proximity Information

In [80]:
query = qf.get_query(session, '(c3.subcorpus-idx all "dickens" and/cql.proxinfo c3.chapter-idx any "dense fog")')
result_set = db.search(session, query)
count = 0
for result in result_set:
    record = result.fetch_record(session)
    print result.occurences, record #wordCount #.process_xpath('//w[@o=%s]' % result.proxInfo[0][1])
    for y in result.proxInfo:
        print y
        count += 1

#TODO why does proxinfo only have three values here?
# --> because the last any does not have a proxinfo value

1 recordStore/0
[[0, 391, 2242]]
1 recordStore/2
[[0, 5811, 30485]]
1 recordStore/3
[[0, 593, 3275]]
1 recordStore/4
[[0, 8, 47]]
1 recordStore/5
[[0, 4555, 25000]]
1 recordStore/16
[[0, 1988, 11147]]
1 recordStore/21
[[0, 39, 239]]
1 recordStore/39
[[0, 743, 4219]]
2 recordStore/44
[[0, 1756, 9305]]
[[0, 1847, 9807]]
1 recordStore/53
[[0, 4400, 24578]]
1 recordStore/56
[[0, 1085, 5709]]
1 recordStore/92
[[0, 1075, 5840]]
1 recordStore/97
[[0, 4203, 22964]]
1 recordStore/99
[[0, 40, 209]]
1 recordStore/114
[[0, 2971, 16432]]
1 recordStore/115
[[0, 1434, 7912]]
1 recordStore/116
[[0, 2011, 11393]]
1 recordStore/122
[[0, 915, 5058]]
1 recordStore/129
[[0, 3411, 19097]]
1 recordStore/131
[[0, 1543, 8395]]
1 recordStore/132
[[0, 737, 4115]]
1 recordStore/133
[[0, 2143, 12329]]
1 recordStore/141
[[0, 427, 2451]]
1 recordStore/145
[[0, 2033, 10924]]
1 recordStore/151
[[0, 1851, 9818]]
1 recordStore/157
[[0, 170, 914]]
1 recordStore/168
[[0, 149, 802]]
1 recordStore/170
[[0, 990, 5285]]
1 rec

## Term highlighting

In [82]:
from cheshire3.transformer import LxmlQueryTermHighlightingTransformer