# Schema API  

The Schema API allows you to use an HTTP API to manage many of the elements of your schema. This API provides read and write access to the Solr schema for each collection (or core, when using standalone Solr). Using the API, fields, dynamic fields, field types and copyField rules may be added, removed or replaced/updated.  

When modifying the schema with the API, a core reload will automatically occur in order for the changes to be available immediately for documents indexed thereafter. Previously indexed documents will **not** be automatically updated - they **must** be reindexed if existing index data uses schema elements that you changed.  

**Throughout the course of this exercise, we will use `localDocs` core which can be created as shown below**  

- Download and extract Solr distribution archive to a directory of your choosing  
    - `curl <download-link> && tar -xzf solr-{version}.tgz`  
- Change directory to the decompressed binary directory  
    - `cd solr-{version}`  
- Launch solr in single node mode, run the process in the foreground  
    - `bin/solr start -f`  
- On aother terminal window or new tab in current terminal, create `sandboxEnv` core   
    - `bin/solr create -c sandboxEnv`


In [1]:
from simplejson import loads
from requests import request

# define Solr instance resources
base_url = 'http://localhost:8983'
core_name = 'sandboxEnv'
# define important paths
api_endpoint = f'{base_url}/api/cores/{core_name}' # note that we are using API V2
schema_endpoint = f'{api_endpoint}/schema'
# set http header content
headers = {
    'Content-type':'application/json'
}

def handle_request(method="POST", body={}, endpoint=schema_endpoint, headers=headers):
    r = request(method, endpoint, headers=headers, json=body)
    return loads(r.text)

By default, `Solr` ships with pre-defined `field types` which can later be extended when defining new fields or adding fields to schema. See [this page](https://solr.apache.org/guide/8_8/field-type-definitions-and-properties.html) and [this](https://solr.apache.org/guide/8_8/field-types-included-with-solr.html) for more details.  
Let's view default `field types` defined in our `localDocs` core

In [2]:
# Retrieve whole schema
handle_request('GET')

{'responseHeader': {'status': 0, 'QTime': 16},
 'schema': {'name': 'default-config',
  'version': 1.6,
  'uniqueKey': 'id',
  'fieldTypes': [{'name': '_nest_path_',
    'class': 'solr.NestPathField',
    'maxCharsForDocValues': '-1',
    'omitNorms': True,
    'omitTermFreqAndPositions': True,
    'stored': False,
    'multiValued': False},
   {'name': 'ancestor_path',
    'class': 'solr.TextField',
    'indexAnalyzer': {'tokenizer': {'class': 'solr.KeywordTokenizerFactory'}},
    'queryAnalyzer': {'tokenizer': {'class': 'solr.PathHierarchyTokenizerFactory',
      'delimiter': '/'}}},
   {'name': 'binary', 'class': 'solr.BinaryField'},
   {'name': 'boolean', 'class': 'solr.BoolField', 'sortMissingLast': True},
   {'name': 'booleans',
    'class': 'solr.BoolField',
    'sortMissingLast': True,
    'multiValued': True},
   {'name': 'delimited_payloads_float',
    'class': 'solr.TextField',
    'indexed': True,
    'stored': False,
    'analyzer': {'tokenizer': {'class': 'solr.WhitespaceT

## Modify the Schema  

To add, remove or replace fields, dynamic field rules, copy field rules, or new field types, you can send a POST request to the `/api/<collections|cores>/<name>/schema/` endpoint with a sequence of commands in `JSON` format to perform the requested actions. The following commands are supported:  

- `add-field`: add a new field with parameters you provide.  
- `delete-field`: delete a field.
- `replace-field`: replace an existing field with one that is differently configured.
- `add-dynamic-field`: add a new dynamic field rule with parameters you provide.
- `delete-dynamic-field`: delete a dynamic field rule.
- `replace-dynamic-field`: replace an existing dynamic field rule with one that is differently configured.
- `add-field-type`: add a new field type with parameters you provide.
- `delete-field-type`: delete a field type.
- `replace-field-type`: replace an existing field type with one that is differently configured.
- `add-copy-field`: add a new copy field rule.
- `delete-copy-field`: delete a copy field rule.  

**These commands can be issued in separate `POST` requests or in the same `POST` request. Commands are executed in the order in which they are specified.**

### Add a new field  

The `add-field` command adds a new field definition to your schema. If a field with the same name exists an error is thrown.  

In [22]:
# define field attributes
new_field = {
    'add-field': {
        'name': 'title',
        'type': 'text_en',
        'required': True
    }
}

handle_request('POST', new_field)

{'responseHeader': {'status': 0, 'QTime': 235}}

In [21]:
# create several fields all together at once

fields = {
    'add-field':[
        {
            'name':'authors',
            'type':'string',
            'required':True,
            'multiValued':True
        },
        {
            "name":'publication_date',
            'type':'string',
            'required':True
        },
        {
            'name':'isbn',
            'type':'string'
        },
        {
            'name':'language',
            'type':'string'
        }
    ]
}

handle_request('POST', fields)

{'responseHeader': {'status': 0, 'QTime': 256}}

#### Delete a Field  

The `delete-field` command removes a field definition from your schema. If the field does not exist in the schema, or if the field is the source or destination of a `copy field rule`, an error is thrown.

In [23]:
field = {
    'delete-field': {'name':'isbn'}
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 160}}

#### Replace a Field  

The `replace-field` command replaces a field’s definition. Note that you **must** supply the full definition for a field - this command will **not** partially modify a field’s definition. If the field does not exist in the schema an error is thrown.

In [25]:
# make language field required, change publication_date type to date
# if there was already indexed data, changing publication_date to date type would require re-indexing of existing documents

field = {
    'replace-field':[
        {
            'name':'language',
            'type':'string',
            'required':True
        },
        {
            "name":'publication_date',
            'type':'pdate',
            'required':True
        }
    ]
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 192}}

#### Add a Dynamic Field Rule   

The `add-dynamic-field` command adds a new dynamic field rule to your schema.

In [28]:
# create a new dynamic field rule where all incoming fields ending with "_latlng" would be stored and have field type "location" (for spatial operations)

field = {
    'add-dynamic-field':{
        'name':'*_latlng',
        'type':'location',
        'stored':True
    }
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 166}}

#### Delete a Dynamic Field Rule  

The `delete-dynamic-field` command deletes a dynamic field rule from your schema.  

In [3]:
field = {
    'delete-dynamic-field':{'name':'*_latlng'}
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 284}}

#### Replace a Dynamic Field Rule
The `replace-dynamic-field` command replaces a dynamic field rule in your schema. 

In [4]:
# It's good we work with custom fields.
# messing up with Solr default fields might cost us down the lane
# So, we will create a custom field then replace it in same block, see the syntax

field = {
    'add-dynamic-field':[
        {
            'name':'*__latlng',
            'type':'location'
        },
        {
            'name':'*_list',
            'type':'string'
        },
        {
            'name':'*_json',
            'type':'string'
        }
    ],
    'replace-dynamic-field':[
        {
            'name':'*_list',
            'type':'string',
            'multiValued':True
        },
        {
            'name':'*_json',
            'type':'string',
            'multiValued':True
        }
    ]
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 310}}

#### Add a New Field Type  

The `add-field-type` command adds a new field type to your schema. The structure of the command is a `JSON` mapping of the standard field type definition described [here](https://solr.apache.org/guide/8_8/solr-field-types.html#solr-field-types)

In [5]:
field = {
    'add-field-type':{
        "name":"new_txt_field",
        "class":"solr.TextField",
        "positionIncrementGap":"100",
        "analyzer" : {
            "charFilters":[{
                "class":"solr.PatternReplaceCharFilterFactory",
                "replacement":"$1$1",
                "pattern":"([a-zA-Z])\\\\1+" 
            }],
            "tokenizer":{
                "class":"solr.WhitespaceTokenizerFactory" 
            },
            "filters":[{
                "class":"solr.WordDelimiterFilterFactory",
                "preserveOriginal":"0" 
            }]
        }
    }
}

handle_request('POST',field)

{'responseHeader': {'status': 0, 'QTime': 231}}

#### Delete a Field Type  

The `delete-field-type` command removes a field type from your schema.

In [7]:
field = {
    'delete-field-type':{'name':'new_txt_field'}
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 193}}

#### Replace a Field Type
The `replace-field-type` command replaces a field type in your schema.

In [8]:
field = {
    'add-field-type':{
        "name":"new_txt_field",
        "class":"solr.TextField",
        "positionIncrementGap":100,
        "analyzer" : {
            "charFilters":[{
                "class":"solr.PatternReplaceCharFilterFactory",
                "replacement":"$1$1",
                "pattern":"([a-zA-Z])\\\\1+" 
            }],
            "tokenizer":{
                "class":"solr.WhitespaceTokenizerFactory" 
            },
            "filters":[{
                "class":"solr.WordDelimiterFilterFactory",
                "preserveOriginal":"0" 
            }]
        }
    },
    'replace-field-type':{
        "name":"new_txt_field",
        "class":"solr.TextField",
        "positionIncrementGap":100,
        "analyzer" : {
            "charFilters":[{
                "class":"solr.PatternReplaceCharFilterFactory",
                "replacement":"$1$1",
                "pattern":"([a-zA-Z])\\\\1+" 
            }],
            "tokenizer":{
                "class":"solr.StandardTokenizerFactory" 
            },
            "filters":[{
                "class":"solr.WordDelimiterFilterFactory",
                "preserveOriginal":"0" 
            },
            {
                'class': 'solr.StopFilterFactory',
                'words': 'lang/stopwords_en.txt',
                'ignoreCase': True
            },
            {
                'class': 'solr.EnglishPossessiveFilterFactory'
            },
            {
                'class': 'solr.KeywordMarkerFilterFactory',
                'protected': 'protwords.txt'
            }]
        }
    }
}

handle_request('POST',field)

{'responseHeader': {'status': 0, 'QTime': 164}}

#### Add a New Copy Field Rule
The `add-copy-field` command adds a new copy field rule to your schema.

In [19]:
# define a rule to copy the field "*_list" to "*-json" field
field = {
     "add-copy-field":{
        "source":"*_list",
        "dest":["*_json"]
    }
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 157}}

#### Delete a Copy Field Rule  
The `delete-copy-field` command deletes a copy field rule from your schema.  

In [22]:
field = {
    'delete-copy-field':{'source':'*_list', 'dest':'*_json'}
}

handle_request('POST', field)

{'responseHeader': {'status': 0, 'QTime': 202}}

## Retrieve Schema  

The schema so defined can be retwieved in entirety or in portionns as needed.

### List Schema Fields

`GET /api/cores|collections/<c-name>/schema/fields` endpint is used

In [32]:
# list document fields in the schema

handle_request(method="GET", endpoint=f'{schema_endpoint}/fields')

{'responseHeader': {'status': 0, 'QTime': 0},
 'fields': [{'name': '_nest_path_', 'type': '_nest_path_'},
  {'name': '_root_',
   'type': 'string',
   'docValues': False,
   'indexed': True,
   'stored': False},
  {'name': '_text_',
   'type': 'text_general',
   'multiValued': True,
   'indexed': True,
   'stored': False},
  {'name': '_version_', 'type': 'plong', 'indexed': False, 'stored': False},
  {'name': 'authors', 'type': 'string', 'multiValued': True, 'required': True},
  {'name': 'id',
   'type': 'string',
   'multiValued': False,
   'indexed': True,
   'required': True,
   'stored': True},
  {'name': 'language', 'type': 'string', 'required': True},
  {'name': 'publication_date', 'type': 'pdate', 'required': True},
  {'name': 'title', 'type': 'text_en', 'required': True}]}

In [35]:
# show details of a specific field

handle_request(method="GET", endpoint=f'{schema_endpoint}/fields/title')

{'responseHeader': {'status': 0, 'QTime': 0},
 'field': {'name': 'title', 'type': 'text_en', 'required': True}}

In [31]:
# list dynamic fields in the schema

handle_request('GET', endpoint=f'{schema_endpoint}/dynamicfields')

{'responseHeader': {'status': 0, 'QTime': 0},
 'dynamicFields': [{'name': '*_txt_en_split_tight',
   'type': 'text_en_splitting_tight',
   'indexed': True,
   'stored': True},
  {'name': '*_descendent_path',
   'type': 'descendent_path',
   'indexed': True,
   'stored': True},
  {'name': '*_ancestor_path',
   'type': 'ancestor_path',
   'indexed': True,
   'stored': True},
  {'name': '*_txt_en_split',
   'type': 'text_en_splitting',
   'indexed': True,
   'stored': True},
  {'name': '*_txt_sort',
   'type': 'text_gen_sort',
   'indexed': True,
   'stored': True},
  {'name': 'ignored_*', 'type': 'ignored'},
  {'name': '*_txt_rev',
   'type': 'text_general_rev',
   'indexed': True,
   'stored': True},
  {'name': '*_phon_en',
   'type': 'phonetic_en',
   'indexed': True,
   'stored': True},
  {'name': '*_s_lower', 'type': 'lowercase', 'indexed': True, 'stored': True},
  {'name': '*_txt_cjk', 'type': 'text_cjk', 'indexed': True, 'stored': True},
  {'name': '*__latlng', 'type': 'location'},

In [36]:
# details of a specific field

handle_request('GET', endpoint=f'{schema_endpoint}/dynamicfields/*_dts')

{'responseHeader': {'status': 0, 'QTime': 0},
 'dynamicField': {'name': '*_dts',
  'type': 'pdate',
  'multiValued': True,
  'indexed': True,
  'stored': True}}

In [44]:
# list field types  

handle_request('GET', endpoint=f'{schema_endpoint}/fieldtypes')

{'responseHeader': {'status': 0, 'QTime': 0},
 'fieldTypes': [{'name': '_nest_path_',
   'class': 'solr.NestPathField',
   'maxCharsForDocValues': '-1',
   'omitNorms': True,
   'omitTermFreqAndPositions': True,
   'stored': False,
   'multiValued': False},
  {'name': 'ancestor_path',
   'class': 'solr.TextField',
   'indexAnalyzer': {'tokenizer': {'class': 'solr.KeywordTokenizerFactory'}},
   'queryAnalyzer': {'tokenizer': {'class': 'solr.PathHierarchyTokenizerFactory',
     'delimiter': '/'}}},
  {'name': 'binary', 'class': 'solr.BinaryField'},
  {'name': 'boolean', 'class': 'solr.BoolField', 'sortMissingLast': True},
  {'name': 'booleans',
   'class': 'solr.BoolField',
   'sortMissingLast': True,
   'multiValued': True},
  {'name': 'delimited_payloads_float',
   'class': 'solr.TextField',
   'indexed': True,
   'stored': False,
   'analyzer': {'tokenizer': {'class': 'solr.WhitespaceTokenizerFactory'},
    'filters': [{'class': 'solr.DelimitedPayloadTokenFilterFactory',
      'encode

In [41]:
# details of a specific field type  

handle_request('GET', endpoint=f'{schema_endpoint}/fieldtypes/new_txt_field')

{'responseHeader': {'status': 0, 'QTime': 0},
 'fieldType': {'name': 'new_txt_field',
  'class': 'solr.TextField',
  'positionIncrementGap': '100',
  'analyzer': {'charFilters': [{'class': 'solr.PatternReplaceCharFilterFactory',
     'pattern': '([a-zA-Z])\\\\1+',
     'replacement': '$1$1'}],
   'tokenizer': {'class': 'solr.StandardTokenizerFactory'},
   'filters': [{'class': 'solr.WordDelimiterFilterFactory',
     'preserveOriginal': '0'},
    {'class': 'solr.StopFilterFactory',
     'words': 'lang/stopwords_en.txt',
     'ignoreCase': 'true'},
    {'class': 'solr.EnglishPossessiveFilterFactory'},
    {'class': 'solr.KeywordMarkerFilterFactory',
     'protected': 'protwords.txt'}]}}}

In [45]:
# list copy fields  

handle_request('GET', endpoint=f'{schema_endpoint}/copyfields')

{'responseHeader': {'status': 0, 'QTime': 0}, 'copyFields': []}

In [46]:
# show schema name  

handle_request('GET', endpoint=f'{schema_endpoint}/name')

{'responseHeader': {'status': 0, 'QTime': 0}, 'name': 'default-config'}

In [47]:
# show schema version  

handle_request('GET', endpoint=f'{schema_endpoint}/version')

{'responseHeader': {'status': 0, 'QTime': 0}, 'version': 1.6}

In [48]:
# list unique keys 

handle_request('GET', endpoint=f'{schema_endpoint}/uniquekey')

{'responseHeader': {'status': 0, 'QTime': 0}, 'uniqueKey': 'id'}

In [49]:
# show global similarity

handle_request('GET', endpoint=f'{schema_endpoint}/similarity')

{'responseHeader': {'status': 0, 'QTime': 0},
 'similarity': {'class': 'org.apache.solr.search.similarities.SchemaSimilarityFactory'}}