## Pre-requisites

For this exercise, we shall use solr in standalone mode.  
There are a number of alternative ways to run solr including using a [docker image](https://hub.docker.com/_/solr) but for this exercise we shall use binaries downloaded from [here](https://solr.apache.org/downloads.html).  
I assume you are using Linux or Mac. For windows users, all other commands will work with no modification except the command used for indexing documents. 

- Download and decompress solr binary reloase `tar -xzf solr-{version}.tgz`
- Change directory to the decompressed binary directory `cd solr-{version}`
- Launch solr in sandalone mode, run the process in the foreground `bin/solr start -f`
- Create films core `bin/solr create -c films`. Note that here we have not defined configset for films core!

In [18]:
import simplejson as json
import requests

host = 'http://localhost:8983/solr'
core = 'films'
search_url = host + '/' + core + '/select?q='

headers = {
    'Content-type':'application/json'
}

def search_query(query):
    query = requests.utils.quote(query)
    req = requests.get(search_url + query, headers=headers)
    if req.status_code == 200:
        result = json.loads(req.text)
        print(f"Matching documents count {result['response']['numFound']}")
        print(json.dumps(result['response']['docs'], indent=1))
    else:
        print(req.status_code, req.reason)

In [19]:
# set up custom schema settings
schema = {
    "add-field": 
    [
        {
            "name":"name", 
            "type":"text_general", 
            "multiValued":False, 
            "stored":True
        },
        {
            "name":"genre", 
            "type":"text_general", 
            "multiValued":True, 
            "stored":True
        }
    ]
}

r = requests.post(host+"/"+collection+'/schema', json=schema)
if r.status_code == 200:
    print(r.text)
else:
    print(r.status_code, r.reason)

{
  "responseHeader":{
    "status":0,
    "QTime":389}}



In [20]:
#  set up a "catchall field" by defining a copy field that will take all data from all fields
schema = {
    "add-copy-field": 
    {
        "source":"*",
        "dest":"_text_"
    }
}
r = requests.post(host+"/"+collection+'/schema', json=schema)
if r.status_code == 200:
    print(r.text)
else:
    print(r.status_code, r.reason)

{
  "responseHeader":{
    "status":0,
    "QTime":275}}



## Index sample films data after schema definition
- `bin/post -c films example/films/films.json*` for linux/mac users
- `java -jar -Dc=films -Dauto example\exampledocs\post.jar example\films\*.json` for windows users

In [22]:
# query movies where genre involves crime fiction scenes
search_query("Crime Fiction")

Matching documents count 284
[
 {
  "id": "/en/anamorph",
  "genre": [
   "Psychological thriller",
   "Crime Fiction",
   "Thriller",
   "Mystery",
   "Crime Thriller",
   "Suspense"
  ],
  "directed_by": [
   "H.S. Miller"
  ],
  "name": "Anamorph",
  "_version_": 1695550362411335681
 },
 {
  "id": "/en/blood_work",
  "directed_by": [
   "Clint Eastwood"
  ],
  "initial_release_date": [
   "2002-08-09T00:00:00Z"
  ],
  "name": "Blood Work",
  "genre": [
   "Mystery",
   "Crime Thriller",
   "Thriller",
   "Suspense",
   "Crime Fiction",
   "Detective fiction",
   "Drama"
  ],
  "_version_": 1695550362494173186
 },
 {
  "id": "/en/brigham_city_2001",
  "directed_by": [
   "Richard Dutcher"
  ],
  "name": "Brigham City",
  "genre": [
   "Mystery",
   "Indie film",
   "Crime Fiction",
   "Thriller",
   "Crime Thriller",
   "Drama"
  ],
  "_version_": 1695550362510950401
 },
 {
  "id": "/en/brother",
  "directed_by": [
   "Takeshi Kitano"
  ],
  "name": "Brother",
  "genre": [
   "Thrill

In [23]:
# apply faceting
def query_and_facet(query, facet_field, facet_mincount=20):
    query = requests.utils.quote(query)
    url = search_url + query + '&facet=on&facet.field={}&facet.mincount={}&wt=json'.format(facet_field, facet_mincount)
    req = requests.get(url, headers=headers)
    if req.status_code == 200:
        result = json.loads(req.text)
        print(f"Matching documents count {result['response']['numFound']}")
        print("\nFacet counts\n")
        print(json.dumps(result['facet_counts']['facet_fields'], indent=1))
        print("\nSample search result\n")
        print(json.dumps(result['response']['docs'], indent=1))
    else:
        print(req.status_code, req.reason)

query_and_facet(query="Crime Fiction", facet_field='genre', facet_mincount=50)

Matching documents count 284

Facet counts

{
 "genre": [
  "fiction",
  263,
  "film",
  211,
  "crime",
  191,
  "thriller",
  154,
  "drama",
  148,
  "action",
  114,
  "science",
  82,
  "adventure",
  73,
  "comedy",
  66
 ]
}

Sample search result

[
 {
  "id": "/en/anamorph",
  "genre": [
   "Psychological thriller",
   "Crime Fiction",
   "Thriller",
   "Mystery",
   "Crime Thriller",
   "Suspense"
  ],
  "directed_by": [
   "H.S. Miller"
  ],
  "name": "Anamorph",
  "_version_": 1695550362411335681
 },
 {
  "id": "/en/blood_work",
  "directed_by": [
   "Clint Eastwood"
  ],
  "initial_release_date": [
   "2002-08-09T00:00:00Z"
  ],
  "name": "Blood Work",
  "genre": [
   "Mystery",
   "Crime Thriller",
   "Thriller",
   "Suspense",
   "Crime Fiction",
   "Detective fiction",
   "Drama"
  ],
  "_version_": 1695550362494173186
 },
 {
  "id": "/en/brigham_city_2001",
  "directed_by": [
   "Richard Dutcher"
  ],
  "name": "Brigham City",
  "genre": [
   "Mystery",
   "Indie film"