# Indexing custom documents  

Solr’s basic unit of information is a `document`, which is a set of data that describes something. A document about a book could contain the title, author, year of publication, number of pages, and so on. Documents are composed of `fields`, which are more specific pieces of information. Fields can contain different kinds of data. A title field, for example, is text and publication year could be a date or an integer. **If fields are defined correctly, Solr will be able to interpret field values correctly**. `Field analysis` tells Solr what to do with incoming data when building an index.

## Field Data Types  

The following table lists the field types that are available in Solr and are recommended.  
![Field Data Types](field-data-types.png)

## Field Default Properties  
These are properties that can be specified either on the field types or on individual fields to override the values provided by the field types.  

![Field Type](field-properties.png)

## Hands On Lab

### Pre-requisites

For this lab, we shall use solr in single node mode.  
There are a number of alternative ways to run solr including using a [docker image](https://hub.docker.com/_/solr) but for this lab we shall use distributed binaries downloaded from [here](https://solr.apache.org/downloads.html).  
I assume you are using Linux or Mac. For windows users, all other commands will work with no modification except the command used for indexing documents. 

- Download and extract solr distribution archive to a directory of your choosing  
    - `curl <download-link> && tar -xzf solr-{version}.tgz`  
- Change directory to the decompressed binary directory  
    - `cd solr-{version}`  
- Launch solr in single node mode  
    - `bin/solr start`  
- Create localDocs core   
    - `bin/solr create -c localDocs`



In [2]:
import simplejson as json
import requests

host = 'http://localhost:8983/solr'
core = 'localdocs'
search_url = host + '/' + core + '/select?q='
headers = {
    'Content-type':'application/json'
}


In [None]:
# define documents schema fields. See https://solr.apache.org/guide/8_8/schema-api.html for more info
fields = {
    "schema": {
        "add-field":[
            {
                "name":"docId",
                "type":"UUIDField",
                "default":"NEW",
                "omitTermFreqAndPositions":True
            }
            {
                "name":"title",
                "type":"StrField",
                "required":True
            },
            {
                "name":"author",
                "type":"StrField",
                "required":True,
                "multiValued":True
            },
            {
                "name":"publisher",
                "type":"StrField",
                "required":True
            },
            {
                "name":"price",
                "type":"CurrencyFieldType",
                "default":0,
                "defaultCurrency":"KES",
                "currencyConfig":"currency.xml"

            },
            {
                "name":"description",
                "type":"TextField",
                "required":True,
                "large":True,

            },
            {
                "name":"publication_date",
                "type":"DateRangeField",
                "sortMissingLast":True
            },
            {
                "name":"file",
                "type":"ExternalFileField"
            },
            {
                "name":"catchall",
                "type":"TextField"
            }
        ]
    },
    "catchall":{
        "add-copy-field": {
            "source":"*",
            "dest":"catchall"
        }
    }
}