You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A long time ago it bothered me that IEnumerator<T> required two interface calls per iteration, a design choice that would cause a small but pervasive performance reduction app-wide. (Long ago I introduced Iterator<T> and IIterable<T>, which were like IEnumerator and IEnumerable but required just one call per iteration - but as most infrastructure was built on IEnumerator, they didn't provide enough value so I were removed them in mid-2013.)
When I was designing LLLPG, I knew that lexer performance was important, and to achieve it I thought it would be important to be able to access most characters from the input stream without any virtual calls. On the other hand, I didn't want to require an entire source file to be stored as a single contiguous array - it should be possible to "stream" a large file from disk, or to tokenize an IDE's gap buffer. To resolve the tension between performance and flexibility I introduced ICharSource, whose Slice method returned a UString structure (it used to be StringSlice but never mind that) which represents part of a string (a tuple of string, start, length - like the new Memory<T>):
public interface ICharSource : IListSource<char>
{
new UString Slice(int startIndex, int length);
}
The lexer then uses Slice to read small blocks (currently 512 characters), which are cached in the lexer. Since UString is a struct, accessing individual characters involves no virtual calls (though it did still require a range check). Note that UString itself implements ICharSource, and a string converts implicitly to UString so lexers can easily accept a string input. If you're reading data from a file you can use StreamCharSource instead which also implements ICharSource.
Let's consider a more general problem: bulk operations in which data is moving from one collection to another. You can easily implement methods like this on any collection type using a foreach loop:
You might be able to accelerate a list-insert operation by shifting elements after index only once (instead of once per item inserted). What you can't do is accelerate access to the input collection.
Okay, so what's the solution? Obviously, it's some sort of bulk read access based on arrays. Two approaches come to mind: either the sink collection / client code asks to read a certain-size block of data (probably a fixed size), or the source collection chooses a block size itself on a per-block basis. The second choice may be faster in some cases, because the source collection may be divided into nodes, each having an array it can return without copying. The first choice may have a performance advantage too, as the sink collection may be divided into nodes and only the sink knows how much space is available in each node. A compromise: the sink specifies a maximum size and the source provides something that large or less.
Hence:
// I don't think the compiler will accept "out T" here but there must be some way to // upcast, e.g. using an extension method that returns a decorator...publicinterfaceIScanner<outT>{// Returns an empty span (only) when the end of collection is reachedReadOnlySpan<T>Read(intmaxSize=int.MaxValue);}publicinterfaceIMScanner<T>:IScanner<T>// mutable version{// When the caller calls Read(), changes to the previous span are saved?Span<T>Read(intmaxSize=int.MaxValue);}publicinterfaceIScannable<T>{IScanner<T>Scan();}publicinterfaceICollectionEx<T>:ICollectionImpl<T>,IIsEmpty{voidAddRange(IScannable<T>e);}
....
Here you can see I'm thinking of changing existing interfaces like ICollectionEx so that instead of supporting boring-old AddRange(IEnumerable<T>) it only supports AddRange(IScannable<T>). AddRange(IEnumerable<T>) cannot accelerate access very much, so there's little point in offering it directly on the interface (it may as well be an extension method).
The text was updated successfully, but these errors were encountered:
A long time ago it bothered me that
IEnumerator<T>
required two interface calls per iteration, a design choice that would cause a small but pervasive performance reduction app-wide. (Long ago I introducedIterator<T>
andIIterable<T>
, which were likeIEnumerator
andIEnumerable
but required just one call per iteration - but as most infrastructure was built onIEnumerator
, they didn't provide enough value so I were removed them in mid-2013.)When I was designing LLLPG, I knew that lexer performance was important, and to achieve it I thought it would be important to be able to access most characters from the input stream without any virtual calls. On the other hand, I didn't want to require an entire source file to be stored as a single contiguous array - it should be possible to "stream" a large file from disk, or to tokenize an IDE's gap buffer. To resolve the tension between performance and flexibility I introduced
ICharSource
, whoseSlice
method returned aUString
structure (it used to beStringSlice
but never mind that) which represents part of a string (a tuple of string, start, length - like the newMemory<T>
):The lexer then uses Slice to read small blocks (currently 512 characters), which are cached in the lexer. Since
UString
is a struct, accessing individual characters involves no virtual calls (though it did still require a range check). Note thatUString
itself implementsICharSource
, and astring
converts implicitly toUString
so lexers can easily accept a string input. If you're reading data from a file you can useStreamCharSource
instead which also implementsICharSource
.Let's consider a more general problem: bulk operations in which data is moving from one collection to another. You can easily implement methods like this on any collection type using a
foreach
loop:You might be able to accelerate a list-insert operation by shifting elements after
index
only once (instead of once per item inserted). What you can't do is accelerate access to the input collection.Okay, so what's the solution? Obviously, it's some sort of bulk read access based on arrays. Two approaches come to mind: either the sink collection / client code asks to read a certain-size block of data (probably a fixed size), or the source collection chooses a block size itself on a per-block basis. The second choice may be faster in some cases, because the source collection may be divided into nodes, each having an array it can return without copying. The first choice may have a performance advantage too, as the sink collection may be divided into nodes and only the sink knows how much space is available in each node. A compromise: the sink specifies a maximum size and the source provides something that large or less.
Hence:
Here you can see I'm thinking of changing existing interfaces like
ICollectionEx
so that instead of supporting boring-oldAddRange(IEnumerable<T>)
it only supportsAddRange(IScannable<T>)
.AddRange(IEnumerable<T>)
cannot accelerate access very much, so there's little point in offering it directly on the interface (it may as well be an extension method).The text was updated successfully, but these errors were encountered: