Skip to content

Go package for splitting strings (enclosing bracket and quotes aware)

License

Notifications You must be signed in to change notification settings

go-andiamo/splitter

Repository files navigation

Splitter

GoDoc Latest Version codecov Go Report Card

Overview

Go package for splitting strings (aware of enclosing braces and quotes)

The problem with standard Golang strings.Split is that it does not take into consideration that the string being split may contain enclosing braces and/or quotes (where the separator should not be considered where it's inside braces or quotes)

Take for example a string representing a slice of comma separated strings...

    str := `"aaa","bbb","this, for sanity, should not be split"`

running strings.Split on that...

package main

import "strings"

func main() {
    str := `"aaa","bbb","this, for sanity, should not be parts"`
    parts := strings.Split(str, `,`)
    println(len(parts))
}

would yield 5 (try on go-playground) - instead of the desired 3

However, with splitter, the result would be different...

package main

import "github.com/go-andiamo/splitter"

func main() {
    commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotes)

    str := `"aaa","bbb","this, for sanity, should not be split"`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
}

which yields the desired 3! try on go-playground

Note: The varargs, after the first separator arg, are the desired 'enclosures' (e.g. quotes, brackets, etc.) to be taken into consideration

While splitting, any enclosures specified are checked for balancing!

Installation

To install Splitter, use go get:

go get github.com/go-andiamo/splitter

To update Splitter to the latest version, run:

go get -u github.com/go-andiamo/splitter

Enclosures

Enclosures instruct the splitter specific start/end sequences within which the separator is not to be considered. An enclosure can be one of two types: quotes or brackets.

Quote type enclosures only differ from bracket type enclosures in the way that their optional escaping works -

  • Quote enclosures can be:
    • escaped by escape prefix - e.g. a quote enclosure starting with " and ending with " but \" is not seen as ending
    • escaped by doubles - e.g. a quote enclosure starting with ' and ending with ' but any doubles '' are not seen as ending
  • Bracket enclosures can only be:
    • escaped by escape prefix - e.g. a bracket enclosure starting with ( and ending with ) and escape set to \
      • \( is not seen as a start
      • \) is not seen as an end

Note that brackets are ignored inside quotes - but quotes can exist within brackets. And when splitting, separators found within any specified quote or bracket enclosure are not considered.

The Splitter provides many pre-defined enclosures:

Var Name Type Start - End Escaped end
DoubleQuotes Quote " " none
DoubleQuotesBackSlashEscaped Quote " " \"
DoubleQuotesDoubleEscaped Quote " " ""
SingleQuotes Quote ' ' none
SingleQuotesBackSlashEscaped Quote ' ' \'
SingleQuotesDoubleEscaped Quote ' ' ''
SingleInvertedQuotes Quote ` ` none
SingleInvertedQuotesBackSlashEscaped Quote ` ` \'
SingleInvertedQuotesDoubleEscaped Quote ` ` ``
SinglePointingAngleQuotes Quote none
SinglePointingAngleQuotesBackSlashEscaped Quote \›
DoublePointingAngleQuotes Quote « » none
LeftRightDoubleDoubleQuotes Quote none
LeftRightDoubleSingleQuotes Quote none
LeftRightDoublePrimeQuotes Quote none
SingleLowHigh9Quotes Quote none
DoubleLowHigh9Quotes Quote none
Parenthesis Brackets ( ) none
CurlyBrackets Brackets { } none
SquareBrackets Brackets [ ] none
LtGtAngleBrackets Brackets < > none
LeftRightPointingAngleBrackets Brackets none
SubscriptParenthesis Brackets none
SuperscriptParenthesis Brackets none
SmallParenthesis Brackets none
SmallCurlyBrackets Brackets none
DoubleParenthesis Brackets none
MathWhiteSquareBrackets Brackets none
MathAngleBrackets Brackets none
MathDoubleAngleBrackets Brackets none
MathWhiteTortoiseShellBrackets Brackets none
MathFlattenedParenthesis Brackets none
OrnateParenthesis Brackets ﴿ none
AngleBrackets Brackets none
DoubleAngleBrackets Brackets none
FullWidthParenthesis Brackets none
FullWidthSquareBrackets Brackets none
FullWidthCurlyBrackets Brackets none
SubstitutionBrackets Brackets none
SubstitutionQuotes Quote none
DottedSubstitutionBrackets Brackets none
DottedSubstitutionQuotes Quote none
TranspositionBrackets Brackets none
TranspositionQuotes Quote none
RaisedOmissionBrackets Brackets none
RaisedOmissionQuotes Quote none
LowParaphraseBrackets Brackets none
LowParaphraseQuotes Quote none
SquareWithQuillBrackets Brackets none
WhiteParenthesis Brackets none
WhiteCurlyBrackets Brackets none
WhiteSquareBrackets Brackets none
WhiteLenticularBrackets Brackets none
WhiteTortoiseShellBrackets Brackets none
FullWidthWhiteParenthesis Brackets none
BlackTortoiseShellBrackets Brackets none
BlackLenticularBrackets Brackets none
PointingCurvedAngleBrackets Brackets none
TortoiseShellBrackets Brackets none
SmallTortoiseShellBrackets Brackets none
ZNotationImageBrackets Brackets none
ZNotationBindingBrackets Brackets none
MediumOrnamentalParenthesis Brackets none
LightOrnamentalTortoiseShellBrackets Brackets none
MediumOrnamentalFlattenedParenthesis Brackets none
MediumOrnamentalPointingAngleBrackets Brackets none
MediumOrnamentalCurlyBrackets Brackets none
HeavyOrnamentalPointingAngleQuotes Quote none
HeavyOrnamentalPointingAngleBrackets Brackets none

Note: To convert any of the above enclosures to escaping - use the MakeEscapable() or MustMakeEscapable() functions.

Quote enclosures with escaping

Quotes within quotes can be handled by using an enclosure that specifies how the escaping works, for example the following uses \ (backslash) prefixed escaping...

package main

import "github.com/go-andiamo/splitter"

func main() {
    commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesBackSlashEscaped)

    str := `"aaa","bbb","this, for sanity, \"should\" not be split"`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
}

try on go-playground

Or with double escaping...

package main

import "github.com/go-andiamo/splitter"

func main() {
    commaSplitter, _ := splitter.NewSplitter(',', splitter.DoubleQuotesDoubleEscaped)

    str := `"aaa","bbb","this, for sanity, """"should,,,,"" not be split"`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
}

try on go-playground

Not separating when separator encountered in quotes or brackets...

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    encs := []*splitter.Enclosure{
        splitter.Parenthesis, splitter.SquareBrackets, splitter.CurlyBrackets,
        splitter.DoubleQuotesDoubleEscaped, splitter.SingleQuotesDoubleEscaped,
    }
    commaSplitter, _ := splitter.NewSplitter(',', encs...)

    str := `do(not,)split,'don''t,split,this',[,{,(a,"this has "" quotes")}]`
    parts, _ := commaSplitter.Split(str)
    println(len(parts))
    for i, pt := range parts {
        fmt.Printf("\t[%d]%s\n", i, pt)
    }
}

try on go-playground

Options

Options define behaviours that are to be carried out on each found part during splitting.

An option, by virtue of it's return args from .Apply(), can do one of three things:

  1. return a modified string of what is to be added to the split parts
  2. return a false to indicate that the split part is not to be added to the split result
  3. return an error to indicate that the split part is unacceptable (and cease further splitting - the error is returned from the Split method)

Options can be added directly to the Splitter using .AddDefaultOptions() method. These options are checked for every call to the splitters .Split() method.

Options can also be specified when calling the splitter .Split() method - these options are only carried out for this call (and after any options already specified on the splitter)

Option Examples

1. Stripping empty parts

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.IgnoreEmpties)

    parts, _ := s.Split(`/a//c/`)
    println(len(parts))
    fmt.Printf("%+v", parts)
}

try on go-playground

2. Stripping empty first/last parts

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.IgnoreEmptyFirst, splitter.IgnoreEmptyLast)

    parts, _ := s.Split(`/a//c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`a//c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`/a//c`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)
}

try on go-playground

3. Trimming parts

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.TrimSpaces)

    parts, _ := s.Split(`/a/b/c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`  / a /b / c/    `)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`/   a   /   b   /   c   /`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)
}

try on go-playground

4. Trimming spaces (and removing empties)

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.TrimSpaces, splitter.IgnoreEmpties)

    parts, _ := s.Split(`/a/  /c/`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`  / a // c/    `)
    println(len(parts))
    fmt.Printf("%+v\n", parts)

    parts, _ = s.Split(`/   a   /      /   c   /`)
    println(len(parts))
    fmt.Printf("%+v\n", parts)
}

try on go-playground

5. Error for empties found

package main

import (
    "fmt"
    "github.com/go-andiamo/splitter"
)

func main() {
    s := splitter.MustCreateSplitter('/').
        AddDefaultOptions(splitter.TrimSpaces, splitter.NoEmpties)

    if parts, err := s.Split(`/a/  /c/`); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }

    if parts, err := s.Split(`  / a // c/    `); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }

    if parts, err := s.Split(`/   a   /      /   c   /`); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }

    if parts, err := s.Split(` a / b/c `); err != nil {
        println(err.Error())
    } else {
        println(len(parts))
        fmt.Printf("%+v\n", parts)
    }
}

try on go-playground