## About the Apache Solr search engine in the Smart Village Platform

> I asked my search engine for a good joke. It said, "A man walked into a bar, tavern, pole, inn, stake, tap room, stick, pub, shaft, parlour, shank, watering hole, rail, cantina, beam, alehouse, spoke, or saloon... but most likely a bar, and said ouch." 

An open source search engine like Apache Solr is required by the Smart Village application API to serve up API stored objects as quickly as possible. APIs that are backed by a search engine have numerous additional benefits compared to an API backed by a traditional relational database. A search engine is always indexed for the fastest data retrieval possible. The advanced data and query parsing of a search engine allows for extremely fast full text search, filtering, and sorting of the data. Search engine data can also be grouped, faceted, and pivoted on, for an advanced set of analytics and statistics on specific data in your query. 


## Deploy Solr in the OpenShift Developer Sandbox


Run the command below to deploy the default computate Solr ConfigSet as a Kubernetes ConfigMap to the cloud. These configsets will be loaded later, when the Solr pod is running to initialize the Solr ConfigSet which is the schema used for Solr Collections in the Smart Village Platform. 

In [None]:
%%bash
oc apply -k ~/smartvillage-operator/kustomize/overlays/sandbox/edgesolrs/default/configmaps/
echo DONE

If you are curious what the EdgeSolr custom resource definition looks like that you are deploying in the Sandbox, run the command below. 

In [None]:
%%bash
cat ~/smartvillage-operator/kustomize/overlays/sandbox/edgesolrs/default/edgesolrs/default/edgesolr.yaml
echo DONE

Here are some useful things to note about the configuration of the IoT Agent JSON. 

- `name: solr` We can name the deployment, service, and route created by this EdgeSolr resource. 
- `route`: We create a public route for Solr if you want to interact with it from your own computer. 
- `replicas: 1` We'll only deploy 1 replica for this Sandbox environment to stay within the resource quota of the Developer Sandbox. 
- The `resources:` definition defines the memory and CPU requests and limits for our Solr pod. We have to keep this lower than I would recommend for a production deployment because we have limited resources available in the Developer Sandbox for the many running Smart Village services. 
- `configsets:` this is a list of Solr ConfigSets given by a name, and a Kubernetes ConfigMap, which we deployed in the step above. These are deployed to the Solr pod in the right directory to be registered as a ConfigSet. 
- `collections:` this is a list of Solr Collections where we can actually store and query search results. We create two collections, one called `computate` which is used for AI/ML code generation in the Smart Village Platform. The other collection `smartvillage` is where we store Smart Village Platform data that is returned in the REST API. Both of these collections use the same `computate` ConfigSet, because it's very reusable with several good wildcard fields to solve many problems. 
- `zookeeper:` we configure the host name for the zookeeper service, because Solr is clustered and scalable thanks to Zookeeper. 

For more information about the EdgeSolr custom resource definition, [see the full EdgeSolr schema here](https://github.com/smartabyar-smartvillage/smartvillage-operator/blob/main/config/crd/bases/smartvillage.computate.org_edgesolrs.yaml). 

Run the Ansible Playbook below to deploy Solr to the cloud. 

In [None]:
%%bash
ansible-playbook ~/smartvillage-operator/apply-edgesolr.yaml \
  -e ansible_operator_meta_namespace=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace) \
  -e crd_path=~/smartvillage-operator/kustomize/overlays/sandbox/edgesolrs/default/edgesolrs/default/edgesolr.yaml
echo DONE

You may see a play recap that has failed. 
This is expected because the Solr pod is just now getting created. 
The final tasks in the playbook expect the solr pod to be running to connect to the running pod and create the Solr ConfigSets and Solr Collections used by the Smart Village application. 

Retry the playbook once the Solr pod is running. 


### View Solr pod details
After running the Ansible Playbook, it will take a minute before the Solr pod is up and running. Run the command below until the Solr pod health checks are `READY 1/1` and `STATUS Running`. 

In [None]:
%%bash
oc get pod -l app=solr
oc wait pod -l app=solr --for=condition=Ready --timeout=2m
oc get pod -l app=solr
echo DONE

When the Solr pod is running, run the Ansible Playbook again. 

In [None]:
%%bash
ansible-playbook ~/smartvillage-operator/apply-edgesolr.yaml \
  -e ansible_operator_meta_namespace=$(cat /var/run/secrets/kubernetes.io/serviceaccount/namespace) \
  -e crd_path=~/smartvillage-operator/kustomize/overlays/sandbox/edgesolrs/default/edgesolrs/default/edgesolr.yaml
echo DONE

### View Solr pod logs
If your Solr pod does not reach the STATUS Running, you can run the command below to view the pod logs of Solr and check for other errors that may have occured. 

In [None]:
%%bash
oc logs -l app=solr
echo DONE

## Next...
I hope that answers your questions about Solr in the Smart Village Platform. 
- If you have additional questions or issues, please [create an issue for the course here](https://github.com/smartabyar-smartvillage/smartabyar-smartvillage-sandbox-course/issues). 
- Otherwise, please continue to the next notebook [10-about-smartvillage.ipynb](10-about-smartvillage.ipynb). 