Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async IO support #231

Closed
wants to merge 19 commits into from
Closed

Conversation

naeemkhedarun
Copy link

Currently a work in progress, please do not merge.

Synchronous implementation

Unfortunately it was difficult trying to have two code branches for sync and async. All the synchronous methods have been made async as things have bubbled up. We can can make these synchronous again using .Wait() and .Result. The drawback to this (besides performance) is that we'll need to unwrap aggregateexceptions on the wait out.

Locking on SyncRoot

I've had to replace lock(SyncRoot) with a SemaphoreSlim implementation (see EngineLock), so now code looks like:

return SyncRoot.StartAsync(() => NoOp (cancellationToken));

If you want to remove the lambda call for something else for performance let me know what you have in mind.

Yield return and IEnumerable

Unfortunately these are not compatible with async/await. We can offer a blocking implementation based on .Result but I think it would be better to offer consumers an async callback when we get each page and avoid IEnumerable. Alternatively we can use an IObservable instead. Thoughts?

Unsafe functions

I've used the blocking .Wait and Results in some unsafe functions since its not compatible with async. We can try to break apart the unsafe bits, but for now this is a little bit beyond me, perhaps you could take a look and offer some thoughts?

@@ -301,7 +302,7 @@ protected MailService ()
/// <exception cref="ProtocolException">
/// A protocol error occurred.
/// </exception>
public virtual Task ConnectAsync (string host, int port = 0, SecureSocketOptions options = SecureSocketOptions.Auto, CancellationToken cancellationToken = default (CancellationToken))
public virtual async Task ConnectAsync (string host, int port = 0, SecureSocketOptions options = SecureSocketOptions.Auto, CancellationToken cancellationToken = default (CancellationToken))
{
if (host == null)
throw new ArgumentNullException ("host");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now if host is null exception will be propagated on task

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct we don't need async here, I've removed it. The ArgumentNullException will be raised immediately instead of on await.

@naeemkhedarun naeemkhedarun changed the title Async 2 Async IO support Aug 20, 2015
}
}, cancellationToken, TaskCreationOptions.None, TaskScheduler.Default);
}
await SyncRoot.StartAsync(async () => await Connect(host, port, options, cancellationToken));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can avoid closure here but that will require a bunch of generic overloads of StartAsync

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SyncRoot.StartAsync(
    this,
    host,
    port,
    options,
    cancellationToken,
    async (thisRef, h, p, o, token) => await thisRef.Connect(h, p, o, token));

It's more code to write but it doesn't allocate closure.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is slightly indeed slightly longer, but I agree removing the closure would be good. Perhaps remove the lambda altogether?

Would the below be safe and efficient?

using(var synclock = SyncLock.Take()){
    await Connect(host, port, options, cancellationToken)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems 🆗

@jstedfast
Copy link
Owner

Locking on SyncRoot

Probably best to just say we no longer guarantee locking and leave it to the caller.

Yield return and IEnumerable

Do you have a list of these? Are these just APIs that return IEnumerable<T>? If so, I don't think any of them actually return IEnumerable, I bet they all return a List<T>. Just change the API to return IList<T> and be done with it.

That's probably what I did with my ImapFolder.cs async port a few months ago before I gave up on porting ImapClient.cs.

If you mean that the IMailFolder and IMailSpool interfaces implementing IEnumerable is causing problems, I would vote to just drop that interface. I doubt anyone actually uses it anyway.

/// </exception>
public void Flush (CancellationToken cancellationToken)
{
Write(buffer, offset, count, CancellationToken.None).Wait();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using .GetAwaiter().GetResult(). The difference between it and .Wait()/.Result in that if Task faulted, result exception will be thrown directly instead of wrapper AggregateException.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Maxwe11 that's a great tip! I had no idea. I'll use that everywhere I was forced to use a Wait / Result.

@naeemkhedarun
Copy link
Author

Sorry about the formatting, we have fairly different editor settings. I've done my best to undo what was done and minimise the whitespace noise in the diff. e70fa3f

The primitives have been put back to their keyword counterparts. Everytime I refactored signatures and propagated the async around it would automatically change then. Unfortunately I hadn't even realised these were changing. Fixed: 3296c2b

Locking on SyncRoot

I would be happy to refactor the EngineLock to avoid using closures, and add a feature toggle to allow people to use their own locking outside. Or I can remove the locks altogether, I don't mind on this one.

Yield return and IEnumerable

With very large folders I understand that yield returning one element at a time will stream the messages bit by bit through the IEnumerable interface rather than having to wait for all items to load. But I'm not 100% on this. I think if that is correct, then I agree we should drop the interface.

If you can think of another way of providing people with a simple way to iterate through folder messages, I'd be happy to take some time to implement it.

@jstedfast
Copy link
Owner

before you spend much more time on this - since I'd like to continue supporting .NET 4.0... does this asyncification really make that much of an improvement over the old code? Or is this just checking off a checkbox for no real gain (but a quite substantial loss in terms of no longer being able to support .NET 4.0)?

One of the reasons I gave up on my async port was that I got some emails back from the people who originally requested this feature that they had decided to give the existing releases a try and discovered that all of their performance concerns were null and void.

Here's my expectation:

I suspect that the new async code will be slower than the old sync code, but that the new async code might be faster than the old async code.

For those that want to use the pure async API, this might seem like a gain, but realistically, things would be more performant if you manage your own threads and largely use the sync API because you generally want to chain certain commands anyway.

@naeemkhedarun
Copy link
Author

Honestly it depends on the use case. Our intended use case is a backend cluster which needs to process thousands of IMAP accounts.

Our previous implementation used to use synchronous calls and the result was significant resource use, thread pool size and contention. After refactoring to an async implementation we could support an order of magnitude more work on a single box (we are now CPU constrained).

I understand the changes might not be ideal for all use cases. I will complete this port so I can run a PoC on our systems against the current library and see where we go from there.

Our current library is inefficient with memory (which is hurting our GC) and I was hoping to add a byte memory pool like https://github.com/Microsoft/Microsoft.IO.RecyclableMemoryStream to MailKit to help alleviate that.

@naeemkhedarun
Copy link
Author

http://blogs.msdn.com/b/bclteam/archive/2013/04/17/microsoft-bcl-async-is-now-stable.aspx Suggests that there might be a way to maintain a build for .net 4.0.

@@ -1,4 +1,4 @@
<?xml version="1.0" encoding="utf-8"?>
<?xml version="1.0" encoding="utf-8"?>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems BOM overwritten

@naeemkhedarun
Copy link
Author

As an update, I am having quite a bit of difficulty making the IO reads within MimeParser async. Any thoughts or suggestions would be welcome! Async + Unsafe really don't go together.

I'm trying to work around it by manually using ContinueWiths to return tasks but honestly I don't think its the right way to go.

@Maxwe11
Copy link

Maxwe11 commented Aug 24, 2015

As an update, I am having quite a bit of difficulty making the IO reads within MimeParser async. Any thoughts or suggestions would be welcome! Async + Unsafe really don't go together.

@naeemkhedarun could you point the code please?

@naeemkhedarun
Copy link
Author

I haven't checked it in yet, here is some horrible jiggery pokery:

        unsafe Task<bool> StepByteOrderMark(byte* inbuf)
        {
            int bomIndex = 0;

            var tcs = new TaskCompletionSource<Boolean>();

            Task tx = null;
            tx = ReadAhead(ReadAheadSize, 0).ContinueWith(t =>
            {
                if (t.Result <= 0)
                {
                    // failed to read any data... EOF
                    inputIndex = inputEnd;
                    tcs.SetResult(false);
                }
                byte* inptr = inbuf + inputIndex;
                byte* inend = inbuf + inputEnd;

                while (inptr < inend && bomIndex < UTF8ByteOrderMark.Length && *inptr == UTF8ByteOrderMark[bomIndex])
                {
                    bomIndex++;
                    inptr++;
                }

                inputIndex = (int) (inptr - inbuf);

                if (inputIndex == inputEnd)
                    tx.Start();

                tcs.SetResult(true);
            }, token);

            return tcs.Task;
        }

This is allowing some tests to pass but I'm not sure how safe it is and whether I should push through...

@naeemkhedarun
Copy link
Author

None of the tests exercise the condition if (inputIndex == inputEnd) and I'm pretty sure the tx.Start() isn't correct.

@Maxwe11
Copy link

Maxwe11 commented Aug 24, 2015

So just wrap only unsafe lines

async Task Foo()
{
    await ReadAhead(...);
    unsafe
    {
        byte* inptr = inbuf + inputIndex;
        byte* inend = inbuf + inputEnd;
        ...
    }
    ...
}

@naeemkhedarun
Copy link
Author

Unfortunately since the function takes a pointer as a parameter the entire function has to be marked unsafe.

@naeemkhedarun
Copy link
Author

Couple more questions:

  1. Is it OK to hold a reference to inbuf from an unsafe fields rather than passing it through function parameters?
  2. How can we solve the last bit of the puzzle (remove the blocking GetResult):
public async Task<MimeMessage> ParseMessage (CancellationToken cancellationToken = default (CancellationToken))
{
    token = cancellationToken;

    unsafe {
        fixed (byte* inbuf = input)
        {
            _inbuf = inbuf;
            return ParseMessageAsync().GetAwaiter().GetResult();
        }
    }
}

@Maxwe11
Copy link

Maxwe11 commented Aug 24, 2015

Is it OK to hold a reference to inbuf from an unsafe fields rather than passing it through function parameters?

Don't think it's a good idea

How can we solve the last bit of the puzzle (remove the blocking GetResult):

needs more context: point original code

@jstedfast
Copy link
Owner

The reason that the MimeParser pases around a byte* inbuf is for 2 reasons:

  1. it avoids having to write more fixed blocks to get that pointer reference
  2. a lot of the smaller methods called from inner loops were causing performance problems when they did originally have their own fixed blocks due to register allocation.

One thing to note is that the inbuf parameter always points to the beginning of the input buffer and is never moved around. Since the input buffer is readonly, I think it should be safe to assume that the pointer is fine.

@naeemkhedarun
Copy link
Author

Thanks for the details guys, I've been reading more into this to better understand it.

It looks like my options are:

  1. Use GCHandle.Alloc to pin the array and get a pointer, then ensure this is unpinned once we exit.
  2. Copy the array into unmanaged memory using Marshal.AllocHGlobal which avoids the GC altogether but incurs the cost of the copy.

I can try both and see what the performance penalties are, would you be open to either of those as solutions?

@naeemkhedarun
Copy link
Author

I decided to use GCHandle.Alloc and made some further progress on this. The IMAP tests are passing!

I'll continue to fix the remainder of the tests and fix code style issues.

@jstedfast
Copy link
Owner

I've got a new async branch in the works so closing this.

@jstedfast jstedfast closed this Nov 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants