Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random improvements #416

Merged
merged 20 commits into from Apr 18, 2020
Merged

Random improvements #416

merged 20 commits into from Apr 18, 2020

Conversation

MihaZupan
Copy link
Collaborator

@MihaZupan MihaZupan commented Apr 7, 2020

  • Dropping support for net35, net40. Now multi-targeting netstandard2.0, netcoreapp2.1, netcoreapp3.1. 馃帀
  • Removed helpers such as ArrayHelper, MethodImplOptionPortable, ExcludeFromCodeCoverage that were needed for framework target compat.
  • Unsealed MarkdownParserContext (@patriksvensson I don't suppose you somehow rely on it being sealed? I figure having it unsealed adds more value)
  • Removed some allocations (b090f24, 84ec832, bdfaa27, 30d0924), 2% overall.
  • Improved codegen for some StringSlice helpers.
  • Used ThrowHelper everywhere, hopefully more things can inline now.
  • Further optimized CharacterMap in 256b4d3. @craigwi while not the reason I did it, this may also workaround the UWP issue.

Overall ~6% parsing throughput increase. Allocation-wise we are now very close to noise-free. The vast majority of allocations are now the AST itself and some strings.

@KrisVandermotten
Copy link
Contributor

You changed BitVector128, a 16 byte struct, to BoolVector128, a 512 byte struct. Are you sure you want to do that?

@MihaZupan
Copy link
Collaborator Author

The allocation overhead of 128 bytes is negligible as we don't construct character maps often. Calls to CharacterMap methods don't copy the struct either so I don't see a downside.

@craigwi
Copy link

craigwi commented Apr 7, 2020

Lots of interesting changes. My recommendation is to split the PR into about a half of a dozen smaller PR, each of which focuses on a specific change to make it easier to review.

@MihaZupan
Copy link
Collaborator Author

MihaZupan commented Apr 7, 2020

I tried to keep changes isolated between commits and I don't think they should be too contentious (except perhaps a99e650). Is there any particular change you would find easier to review if I isolate it into a separate PR?

Edit:
A lot of changes depend on a few others like adding System.Memory / would cause merge conflicts with others like using ThrowHelper / removing compat helpers. The aim of grouping multiple changes into a single PR is an egoistic one, to save myself time resolving all conflicts / waiting for tiny changes before opening PRs, considering that there are not many other changes happening to Markdig that would conflict with this.

@KrisVandermotten
Copy link
Contributor

KrisVandermotten commented Apr 7, 2020

The allocation overhead of 128 bytes is negligible as we don't construct character maps often. Calls to CharacterMap methods don't copy the struct either so I don't see a downside.

Well, the struct no longer fits in a single cache line.

Note that going from a 128 bool (512 bytes) array to a 4 uint (16 bytes) struct resulted in a 3% throughput improvement for the parsing.

Is there any particular change you would find easier to review if I isolate it into a separate PR?

This might be such a change yes. It would be interesting to see before and after benchmark results.

@MihaZupan
Copy link
Collaborator Author

This might be such a change yes. It would be interesting to see before and after benchmark results.

You got it, I'll move it out. The biggest chunk of throughput improvement from this PR comes from character map changes.

@MihaZupan
Copy link
Collaborator Author

Should we not ensure that the array cannot grow without bounds?

We should, looks like I over-simplified that part.

@patriksvensson
Copy link
Sponsor Contributor

@MihaZupan Just an old habit of sealing everything until there's a reason for it to be unsealed.

@MihaZupan
Copy link
Collaborator Author

@KrisVandermotten I've extracted it to MihaZupan#1, you can see the difference in the generated asm.

@KrisVandermotten
Copy link
Contributor

@MihaZupan I'm sorry I did not make myself clear. I'm curious about the effect is of only the change from the 16 byte struct to 512 byte struct. I have no doubt that some of the other changes you did there are beneficial.

FYI, I may find some time later this week to take a look myself as well.

@MihaZupan
Copy link
Collaborator Author

I'm curious about the effect is of only the change from the 16 byte struct to 512 byte struct

Ah I see. The struct is 128 bytes, not 512. I can run a benchmark on just this change but I would expect the bool lookup to still be (not much) faster.

FYI, I may find some time later this week to take a look myself as well.

Glad to hear it!

@KrisVandermotten
Copy link
Contributor

The struct is 128 bytes, not 512.

I stand corrected. Somehow I was convinced that sizeof(bool) == 4. It goes without saying that the difference is significant.

@MihaZupan
Copy link
Collaborator Author

cc @Sergio0694

@Sergio0694
Copy link

@MihaZupan NICE! 馃嵒馃殌

A couple questions:

  1. Why don't you also add the netstandard.2.1 target while you're at it? Have you checked if there's some overlap with the .NET Core 3.1 APIs you're using and .NET Standard 2.1, so that those new features could also be available in the portable version?
  2. Regarding the ThrowHelpers, not sure if taking an external dependency is out of the questions, but if isn't, you might be interested in referencing Microsoft.Toolkit when thre 6.1 update is out, and checking out the new Guard APIs (PR here), using those could simplify a lot of these conditional checks and remove some more code from markdig. They're already pretty optimized with respect to the codegen paths. Let me know if that helps! 馃槃

int end = End;
int i = Start;

while (i <= end && (uint)i < (uint)text.Length && text[i].IsWhitespace())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but why not leverage the ReadOnlySpan<T>.Trim extension (from MemoryExtensions) here instead, which is also automatically vectorized? Same for the other trim methods from this type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would expect the 99% case here to be 0 - 2 spaces, so creating the span would be more overhead.

I have played around with other unsafe helpers (that could be used in Match for example), that would take ref char to avoid the cost of a span creation, but that's for a different issue/pr.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that makes perfect sense, didn't know the expected leading space was so small. Good call, thanks for the additional info! 馃槉

@MihaZupan
Copy link
Collaborator Author

Why don't you also add the netstandard.2.1 target while you're at it? Have you checked if there's some overlap with the .NET Core 3.1 APIs you're using and .NET Standard 2.1, so that those new features could also be available in the portable version?

We can target netstandard2.1 as well, I added core 3.1 for the string.GetPinnableReference API, I don't know of an alternative to get ref char from a string (without going through Span / pinning).

Regarding the ThrowHelpers, not sure if taking an external dependency is out of the questions, but if isn't, you might be interested in referencing Microsoft.Toolkit

Considering the simplicity of those methods, I wouldn't add a dependency. See #296 (comment).

@Sergio0694
Copy link

Sergio0694 commented Apr 7, 2020

We can target netstandard2.1 as well, I added core 3.1 for the string.GetPinnableReference API, I don't know of an alternative to get ref char from a string (without going through Span / pinning).

If you don't mind doing dirty tricks, there's this 馃檲

This also happens to be faster than GetPinnableReference, since it can skip the bounds check, though technically it is only guaranteed to work on CoreCLR-like runtimes. But, since you're using that inside the #if NETCOREAPP3_1 directive anyway, it should be fine. I understand if you didn't want to have stuff like this in the repo for maintanability though 馃槄

Considering the simplicity of those methods, I wouldn't add a dependency. See #296 (comment).

Yup, figured as much, makes perfect sense! 馃憤

@MihaZupan
Copy link
Collaborator Author

string's GetPinnableReference doesn't do any validation afaik (sharplab).

If you don't mind doing dirty tricks, there's this 馃檲

馃憖 Thanks for sharing this. While I don't mind unsafe, I would rather have a preprocessor directive here in this case while we support targets without the API 馃槃

@Sergio0694
Copy link

Sergio0694 commented Apr 7, 2020

馃憖 Well thank you for sharing that too!

I was sure string.GetPinnableReference() was doing a check as well, but apparently I was wrong and that only happens for Span<T>.GetPinnableReference() (see here). That's a bit weird, I wonder why do those APIs act so differently (assuming it's not just sharplab.io acting up). I'll need to go find the source in CoreCLR, hopefully there will be some comments there that will shed some light on this! 馃槃

Well what do you know! Here's the source from CoreCLR.

This is great, thanks again! TIL 馃嵒

EDIT: here's Span<T>.GetPinnableReference() instead, source from CoreCLR (with bounds check).

@Sergio0694
Copy link

Small note on that string.GetPinnableReference(), while it's great to have it built-in, looks like it unfortunately suffers from a very similar codegen issue to dotnet/runtime#12880.

In particular, we get this in x64:

public static ref char GetReference(this string text)
{
    return ref Unsafe.AsRef(in text.GetPinnableReference());   
}
C.GetReference(System.String)
    L0000: mov eax, [rcx]
    L0002: lea rax, [rcx+0xc]
    L0006: ret

That first mov eax, [rcx] looks unnecessary, since the value in eax is immediately overwritten and never actually used. Also I'm not even sure what that was supposed to be in the first place, since it's apparently reading a 32bit value (in eax), from the address of... The method table pointer? The first char is located at [rcx + 0xc] since that's 8 bytes for the method table pointer (x64) + 4 bytes for the int representing the length of the string. Might be a codegen leftover for the bounds check 馃

Anyway, hopefully it gets fixed in a future RyuJit update, thought it might be interesting to share 馃槃

@xoofx
Copy link
Owner

xoofx commented Apr 7, 2020

L0000: mov eax, [rcx]

It should be a check for NullReferenceException so that access to text is still safe.

@Sergio0694
Copy link

@xoofx Oh I see, thanks! Is that because if the address was null, trying to dereference [rcx+0xc] would have resulted in an access violation instead? As in, since that address would've been invalid but not exactlu null, so I'm not sure if the runtime is able to handle that the same? 馃

@xoofx
Copy link
Owner

xoofx commented Apr 7, 2020

@xoofx Oh I see, thanks! Is that because if the address was null, trying to dereference [rcx+0xc] would have resulted in an access violation instead? As in, since that address would've been invalid but not exactlu null, so I'm not sure if the runtime is able to handle that the same?

    L0002: lea rax, [rcx+0xc]

This instruction doesn't dereference anything but just calculate an offset. That means that if the first instruction mov eax, [rcx] was not emitted, you would return rax = 12 without any error in this function. So the problem is that you really want to know that text.GetPinnableReference() failed on a NullReferenceException not much later when accessing the pointer returned by GetReference

An access to the memory behind 0xc or 0x0 should generate a NullReferenceException, usually because it is in the first range of a 4K page fault that are mapped/protected to generate such exception.

@Sergio0694
Copy link

Ah, gotcha, that makes perfect sense. Also, good to know that memory accesses in that initial range would still properly trigger a NullReferenceException instead of a nasty AV, that's nice 馃槃

I guess if one really wanted to also skip that null check there, the solution would just be to either use the trick with the mapping type (sample here), or to just wait for .NET 5 (maybe?) to add some GetRawDataReference API or equivalent for the string type. Something like GetRawData (dotnet/runtime#28001), which for now is only for arrays as far as I know (GetArrayDataReference).

Of course these are arguably just very minor details, but mostly I was just interested in learning more about this in general. Thanks for taking the time to chime in, really appreciate it! 馃槉

@KrisVandermotten
Copy link
Contributor

KrisVandermotten commented Apr 13, 2020

I've been benchmarking the difference between the 128 bool array and the 16 byte array.

It seems that the bool array is indeed slightly faster (at least on my computer, .NET Core 3.1.3) on this benchmark:

public class Program
{
    private readonly string text = File.ReadAllText("spec.md");

    [Benchmark]
    public void TestMarkdig()
    {
        Markdown.Parse(text);
    }

    private static void Main(string[] args)
    {
        BenchmarkRunner.Run<Program>();
    }
}

@KrisVandermotten
Copy link
Contributor

LGTM

@xoofx
Copy link
Owner

xoofx commented Apr 18, 2020

Could you re-merge back master to your branch @MihaZupan?, the CI was broken and I had to fix it on master. You might have also to rerun the specs after merging. Thanks!

@MihaZupan
Copy link
Collaborator Author

Thanks for fixing up the CI! I rebased and also added a commit to cross-target standard2.1 as Sergio suggested

Copy link

@Sergio0694 Sergio0694 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an actual review, just a couple notes out of curiosity on my part 馃槃
Great work with the PR!

@@ -10,45 +10,46 @@ namespace Markdig.Helpers
[ExcludeFromCodeCoverage]
internal static class ThrowHelper
{
public static void ArgumentNullException(string paramName) => throw new ArgumentNullException(paramName);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be missing something, but shouldn't all these methods be marked as [MethodImpl(MethodImplOptions.NoInlining)]?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be, but the Jit will never find inlining methods that always throw profitable afaik

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see, you're relying more on the JIT to do that for you. I guess I might just be overly cautious to always add that manually, just in case the code ends up running on some runtime with an older or less optimized JIT. But what you said makes perfect sense! 馃憤

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add them to document that that is the intent

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it should be handled automatically in CoreCLR now (see dotnet/coreclr#6103), but I guess it wouldn't hurt considering the lib targets .NET Standard, so it could be running on everything from .NET Framework to other runtimes as well 馃槉

@@ -6,7 +6,7 @@
<NeutralLanguage>en-US</NeutralLanguage>
<VersionPrefix>0.18.3</VersionPrefix>
<Authors>Alexandre Mutel</Authors>
<TargetFrameworks>net35;net40;netstandard2.0;netcoreapp2.1</TargetFrameworks>
<TargetFrameworks>netstandard2.0;netstandard2.1;netcoreapp2.1;netcoreapp3.1</TargetFrameworks>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, what .NET Core 2.1 specific APIs are you using in the lib?
Wouldn't .NET Standard 2.0 be enough to replace that target too?

And as for .NET Core 3.1, I assume that's because you have some vectorized paths using SSE/AVX/etc.?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some APIs like TextWriter.Write(Span) aren't in standard 2.0 so that's why I added core2.1 before.
Core 3.1 is used only for string.GetPinnableReference in CharacterMap rn

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now there's no manual hw intrinsics code in Markdig, but that can change now that we're refreshing build targets

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, targeting .NET Core 2.1 directly as a workaround to be able to use those TextWriter.Write(Span<T>) overloads makes sense, yeah. As for .NET Core 3.1 though, if it's just for that single string.GetPinnableReference call, wouldn't it make more sense to drop that target, just target .NET Standard 2.1, and pass ReadOnlySpan<char> values around (instead of string-s), and then just use MemoryMarshal.GetReference() to get a mutable ref from there? This would also allow you to just split/tokenize over ReadOnlySpan<T> instead of allocating substrings, in case you're doing that too.

Just a thought! 馃槃

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StringSlice is the type plummed throughout Markdig to avoid substring allocations. It provides good utility like the NextChar method that you would otherwise have to replace with slices/more manual index bookkeeping.

I'm not too worried about having the extra target since I can see us adding more code relying on new APIs in the future

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, yeah that makes perfect sense. I was just curious, thank you for taking the time to provide more info, I appreciate it! 馃槉

@xoofx xoofx merged commit 5c52a72 into xoofx:master Apr 18, 2020
@xoofx
Copy link
Owner

xoofx commented Apr 18, 2020

Amazing job @MihaZupan !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants