# StringSplitter

This presentation's goal it to introduce the features of the `StringSplitter` and how to configure it.

### The challenges

- I want to split strings of varying length contained in a source field


given preprocessed log entry:

In [1]:
document = {
    "ip_addresses": "192.168.5.1, 10.10.2.1, fe80::, 127.0.0.1"
}


### Create rules and processor

create the rules:

In [2]:
import sys
sys.path.append("../../../../../")

from logprep.processor.string_splitter.rule import StringSplitterRule
rules_definitions = [
    {
        "filter": "ip_addresses",
        "string_splitter": {
            "source_fields": ["ip_addresses"],
            "target_field": "ip_addresses",
            "overwrite_target": True
        },
    }
]
rules = [StringSplitterRule._create_from_dict(rule_dict) for rule_dict in rules_definitions]
rules

[filter="ip_addresses", StringSplitterRule.Config(description='', regex_fields=[], tests=[], tag_on_failure=['_string_splitter_failure'], source_fields=['ip_addresses'], target_field='ip_addresses', delete_source_fields=False, overwrite_target=True, extend_target_list=False, delimeter=' ')]

create the processor config:

In [3]:
processor_config = {
    "allmighty_string_splitter": {
        "type": "string_splitter",
        "specific_rules": ["/dev"],
        "generic_rules": ["/dev"],
    }
}


create the processor with the factory:

In [4]:
from logging import getLogger
from logprep.factory import Factory

logger = getLogger()

processor = Factory.create(processor_config, logger)
processor


string_splitter

load rules to processor

In [5]:
for rule in rules:
    processor._specific_tree.add_rule(rule)
    
processor._specific_rules

[filter="ip_addresses", StringSplitterRule.Config(description='', regex_fields=[], tests=[], tag_on_failure=['_string_splitter_failure'], source_fields=['ip_addresses'], target_field='ip_addresses', delete_source_fields=False, overwrite_target=True, extend_target_list=False, delimeter=' ')]

### Process event

In [6]:
from copy import deepcopy

mydocument = deepcopy(document)
processor.process(mydocument)


### Check Results

In [7]:
document

{'ip_addresses': '192.168.5.1, 10.10.2.1, fe80::, 127.0.0.1'}

In [8]:
mydocument

{'ip_addresses': ['192.168.5.1,', '10.10.2.1,', 'fe80::,', '127.0.0.1']}