# Solr Search Operations  

This is a continuation of `04-indexing-custom-docs.ipynb`. I strongly suggest you familiarize yourself well with the notebook before proceeding any further!

In [8]:
import simplejson as json
from requests import request

# define Solr instance resources
base_url = 'http://localhost:8983'
core_name = 'localDocs'
# define important paths
api_endpoint = f'{base_url}/api/cores/{core_name}' # note that we are using API V2
query_endpoint = f'{api_endpoint}/query'
# set http header content
headers = {
    'Content-type':'application/json'
}

def handle_request(endpoint=query_endpoint, method="GET", headers=headers, body={}):
    r = request(method, endpoint, headers=headers, json=body)
    return json.loads(r.text)

In [2]:
# The most common, transparent, easy but dirty way of firing a get request is to append query parameters to endpoint url
print(json.dumps(handle_request(f'{query_endpoint}?q=author:Yvone Vera'), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 24,
    "params": {
      "q": "author:Yvone Vera",
      "json": "{}"
    }
  },
  "response": {
    "numFound": 3,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      {
        "id": "1",
        "title": "Opening Spaces: An Anthology of Contemporary African Women's Writing",
        "author": [
          "Yvonne Vera"
        ],
        "author_bio": [
          "EDITOR Yvonne Vera was born and raised in Bulawayo, Zimbabwe, gained her Ph.D. from York University in Canada, and was the Director of the National Gallery of Zimbabwe in Bulawayo. Yvonne Vera died at age 40 in 2005 Yvonne Vera\u2019s Without a Name and Under the Tongue both won first prize in the Zimbabwe Publishers Literary Awards of 1995 and 1997 respectively. Under the Tongue won the 1997 Commonwealth Writers Prize (Africa Region). Yvonne Vera won the Swedish literary award The Voice of Africa 1999. \n"
        ],
        "authors": [
          "Yvonne Ver

In [3]:
# But there is a neat, maintainable alternative. 
# We will adopt this strategy in edeavor to explore our search engine

query = {
    'query':'author:Yvonne Vera'
}

print(json.dumps(handle_request(body=query), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 6,
    "params": {
      "json": "{\"query\": \"author:Yvonne Vera\"}"
    }
  },
  "response": {
    "numFound": 3,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      {
        "id": "1",
        "title": "Opening Spaces: An Anthology of Contemporary African Women's Writing",
        "author": [
          "Yvonne Vera"
        ],
        "author_bio": [
          "EDITOR Yvonne Vera was born and raised in Bulawayo, Zimbabwe, gained her Ph.D. from York University in Canada, and was the Director of the National Gallery of Zimbabwe in Bulawayo. Yvonne Vera died at age 40 in 2005 Yvonne Vera\u2019s Without a Name and Under the Tongue both won first prize in the Zimbabwe Publishers Literary Awards of 1995 and 1997 respectively. Under the Tongue won the 1997 Commonwealth Writers Prize (Africa Region). Yvonne Vera won the Swedish literary award The Voice of Africa 1999. \n"
        ],
        "authors": [
          "Yvonne Ver

### NOTE: Our Schema fields definition has a direct impact on search results returned  

For instance, take field `title` of type `text_en` and `author` of type `string`. 

#### `string` field definition  

In [6]:
print(json.dumps(handle_request(method='GET', endpoint=f'{api_endpoint}/schema/fieldtypes/string'), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 0
  },
  "fieldType": {
    "name": "string",
    "class": "solr.StrField",
    "sortMissingLast": true,
    "docValues": true
  }
}


#### `text_en` field definition  

In [7]:
print(json.dumps(handle_request(method='GET', endpoint=f'{api_endpoint}/schema/fieldtypes/text_en'), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 2
  },
  "fieldType": {
    "name": "text_en",
    "class": "solr.TextField",
    "positionIncrementGap": "100",
    "indexAnalyzer": {
      "tokenizer": {
        "class": "solr.StandardTokenizerFactory"
      },
      "filters": [
        {
          "class": "solr.StopFilterFactory",
          "words": "lang/stopwords_en.txt",
          "ignoreCase": "true"
        },
        {
          "class": "solr.LowerCaseFilterFactory"
        },
        {
          "class": "solr.EnglishPossessiveFilterFactory"
        },
        {
          "class": "solr.KeywordMarkerFilterFactory",
          "protected": "protwords.txt"
        },
        {
          "class": "solr.PorterStemFilterFactory"
        }
      ]
    },
    "queryAnalyzer": {
      "tokenizer": {
        "class": "solr.StandardTokenizerFactory"
      },
      "filters": [
        {
          "class": "solr.SynonymGraphFilterFactory",
          "expand": "true",
          "i

While we have **index and query** `tokenizers` and `filters` defined for `text_en` field, `string` field has no `tokenizers` and `filters`. This means that `author` field must be searched in whole else query returns no results.  

Since `title` field is analysed, conversely to `author` field, one can search for specific words composing a `title`.  


In [12]:
print(json.dumps(handle_request(body={'query':'author:Yvonne'}), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 0,
    "params": {
      "json": "{\"query\": \"author:Yvonne\"}"
    }
  },
  "response": {
    "numFound": 0,
    "start": 0,
    "numFoundExact": true,
    "docs": []
  }
}


In [11]:
print(json.dumps(handle_request(body={'query':'title:Autobiography of America'}), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 2,
    "params": {
      "json": "{\"query\": \"title:Autobiography of America\"}"
    }
  },
  "response": {
    "numFound": 492,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      {
        "id": "198",
        "title": "Written by Herself: Autobiographies of American Women",
        "author": [
          "Jill Ker Conway"
        ],
        "authors": [
          "Jill Ker Conway"
        ],
        "title_slug": [
          "written-by-herself"
        ],
        "author_slug": [
          "jill-ker-conway"
        ],
        "isbn13": [
          9780679736332
        ],
        "isbn": "0679736336",
        "price": 15.99,
        "format": [
          "Paperback"
        ],
        "publisher": "Knopf Doubleday Publishing Group",
        "publication_date": "1992-11-01T00:00:00Z",
        "edition": [
          "1st ed"
        ],
        "subjects": [
          "Literary Figures - Women's Biography, Historical Bi

Notice we had 0 results for `author:Yvonne` but 492 results for `title:Autobiography of America`. Surprisingly, first result title **"Written by Herself: Autobiographies of American Women"** was matched to our query terms despite not being an exact match. Several other query results conform to this behaviour! This is the `power` in `tokenizers and filters`. Information of how to modify your schema to reflect correctly during text analysis can be found [here](https://solr.apache.org/guide/8_8/understanding-analyzers-tokenizers-and-filters.html).  

For this lab, we will explore Solr search API with disregard to importance of schema definition. Later in the next lab, we will **combine lab 03 and lab 04** to `re-index` our documents and perform `relevant` queries.  

Let's narrow down the query above to answer the question: **Give me `Autobiography of America` documents where format is `Hardcover`, must be `inStock`, `price` not exceeding $10, a maximum of 200 pages, sorted by `price` in ascending order.**

In [32]:
query = {
    'query':'Autobiography of America',
    'params':{
        'sort':'price asc, pages asc',
        'fq':[
            'format:Hardcover',
            'inStock:true',
            'price:[* TO 10]',
            'pages:[* TO 200]'
        ]
    }
}
print(json.dumps(handle_request(body=query), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "json": "{\"query\": \"Autobiography of America\", \"params\": {\"sort\": \"price asc, pages asc\", \"fq\": [\"format:Hardcover\", \"inStock:true\", \"price:[* TO 10]\", \"pages:[* TO 200]\"]}}"
    }
  },
  "response": {
    "numFound": 0,
    "start": 0,
    "numFoundExact": true,
    "docs": []
  }
}


Surprisingly, the query returns zero documents! What could have happened? perhaps we a re looking for a very cheap item which doesn't exist, or probably the said documents are out of stok.  

Let's re-evaluate our query without limiting to documents inStock.  



In [33]:
query = {
    'query':'Autobiography of America',
    'params':{
        'sort':'price asc, pages asc',
        'fq':[
            'format:Hardcover',
            'price:[* TO 10]',
            'pages:[* TO 200]'
        ]
    }
}
print(json.dumps(handle_request(body=query), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 1,
    "params": {
      "json": "{\"query\": \"Autobiography of America\", \"params\": {\"sort\": \"price asc, pages asc\", \"fq\": [\"format:Hardcover\", \"price:[* TO 10]\", \"pages:[* TO 200]\"]}}"
    }
  },
  "response": {
    "numFound": 3,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      {
        "id": "474",
        "title": "Out on the Porch: An Evocation in Words and Pictures",
        "author": [
          "Reynolds Price"
        ],
        "authors": [
          "Reynolds Price, Clifton Dowell (Editor), Reynolds Price"
        ],
        "title_slug": [
          "out-on-the-porch"
        ],
        "author_slug": [
          "reynolds-price"
        ],
        "isbn13": [
          9780945575931
        ],
        "isbn": "0945575939",
        "price": 1.99,
        "format": [
          "Hardcover"
        ],
        "publisher": "Algonquin Books of Chapel Hill",
        "publication_date": "1992-01-0

Now we have three documents returned. This confirms our search engine is working as expected.  

How about relevancy?  

Are the results fulfiling user demands?, probably **NOT**, they are out of stock.  
Do the documents detail **Autobiography of America**? let's check at documents' title, synopsis, preface, subject, etc to exermine!  To some extent, the documents' are a curation of American History!  

Now let's try with a wider price margin, and probably bigger volume since in most of cases autobiographies consume a good number of pages.

In [36]:
query = {
    'query':'Autobiography of America',
    'params':{
        'sort':'price asc, pages asc',
        'fq':[
            'format:Hardcover',
            'inStock:true',
            'price:[* TO 50]',
            'pages:[* TO 500]'
        ]
    }
}
print(json.dumps(handle_request(body=query), indent=2))

{
  "responseHeader": {
    "status": 0,
    "QTime": 5,
    "params": {
      "json": "{\"query\": \"Autobiography of America\", \"params\": {\"sort\": \"price asc, pages asc\", \"fq\": [\"format:Hardcover\", \"inStock:true\", \"price:[* TO 50]\", \"pages:[* TO 500]\"]}}"
    }
  },
  "response": {
    "numFound": 13,
    "start": 0,
    "numFoundExact": true,
    "docs": [
      {
        "id": "113",
        "title": "The Four Seasons: Poems",
        "author": [
          "J. D. McClatchy"
        ],
        "author_bio": [
          "J. D. McClatchy is a poet and Professor of English at Yale University. He is a Fellow of the American Academy of Arts and Sciences and a member of the American Academy of Arts and Letters. His book  Hazmat   (Alfred A. Knopf, 2002) was nominated for the 2003 Pulitzer Prize. He edits the \"Voice of the Poet\" series for Random House AudioBooks; and has written texts for musical settings, including eight opera libretti, for such composers as William Sch

Whoa! Query result now makes sense. The user of our search engine can *select* documents they desire, add to cart, checkout cart,  proceed to coastline and enjoy the breeze reading their favorite books.