# `Nuqleon.StringSegment`

Provides a type providing a view into a `System.String` to avoid allocations when performing string operations, e.g. `Substring`.

> **Note:** This type predates the introduction of `Span<T>` APIs in .NET, which may provide a valid alternative.

## Reference the library

### Option 1 - Use a local build

If you have built the library locally, run the following cell to load the latest build.

In [1]:
#r "bin/Debug/net50/Nuqleon.StringSegment.dll"

### Option 2 - Use NuGet packages

If you want to use the latest published package from NuGet, run the following cell.

In [1]:
#r "nuget:Nuqleon.StringSegment,*-*"

## (Optional) Attach a debugger

If you'd like to step through the source code of the library while running samples, run the following cell, and follow instructions to start a debugger (e.g. Visual Studio). Navigate to the source code of the library to set breakpoints.

In [1]:
System.Diagnostics.Debugger.Launch();

## `StringSegment`

A `StringSegment` provides an API surface similar to `String` but is backed by a view into an underlying string, starting at a specified offset, and with a specified length. An example to break a text into a sequence of sentences is shown below. To construct these `StringSegment` objects containing the sentences, we use the `StringSegment` constructor.

In [1]:
var text = "Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.";

static IEnumerable<StringSegment> GetSentences(string text)
{
    int start = 0;
    int i = 0;

    while (i < text.Length)
    {
        char c = text[i];

        if (c == '.')
        {
            yield return new StringSegment(text, start, i - start + 1);

            i++;

            while (i < text.Length && text[i] == ' ')
            {
                i++;
            }

            start = i;
        }
        else
        {
            i++;
        }
    }
}

foreach (var sentence in GetSentences(text))
{
    Console.WriteLine(sentence);
}

The methods on `StringSegment` are identical to those on `String`, but rather than accepting and returning `String` instances, they operate on `StringSegment` instances instead. As an example, consider the `Split` and `TrimEnd` methods, which we can use to break sentences into words, as shown below.

In [1]:
static IEnumerable<StringSegment> GetWords(StringSegment sentence)
{
    return sentence.Split(' ').Select(word => word.TrimEnd(',', ';', '.'));
}

foreach (var sentence in GetSentences(text))
{
    Console.WriteLine(sentence);

    foreach (var word in GetWords(sentence))
    {
        Console.WriteLine("  " + word);
    }
}

In all of the code above, we haven't allocated a single new `String` instance. All the `StringSegment` objects simply refer to the original `String` and use an offset and a length to delineate a substring.

Next, let's try to change all words to lower case, in a defensive manner. That is, if all letters are already lower case, we won't call `ToLower` which will allocate a new `String` (and wrap it in a `StringSegment`).

In [1]:
var words = from sentence in GetSentences(text)
            from word in GetWords(sentence)
            let x = word.All(char.IsLower) ? word : word.ToLower()
            select x;

foreach (var word in words)
{
    Console.WriteLine(word);
}

Finally, we'll have a look at equality behavior of `StringSegment` by using `GroupBy` to find the most frequently used words.

In [1]:
var res = from word in words
          group word by word into g
          let count = g.Count()
          where count > 1
          orderby count descending
          select new { Word = g.Key, Count = count };

foreach (var word in res)
{
    Console.WriteLine(word);
}