Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unhandled Exception: System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary. #331

Closed
vikingcodes opened this issue Jun 17, 2018 · 15 comments
Labels

Comments

@vikingcodes
Copy link

puppeteer sharp throwing following exception

Unhandled Exception: System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.

untitled

And the main problem is that all puppeteer sharp code running inside but after that program is breaking.

@kblok
Copy link
Member

kblok commented Jun 17, 2018

Could you post this issue following the issue template?
Thanks!

@vikingcodes
Copy link
Author

Not understand. Can you please more clarify?

@kblok
Copy link
Member

kblok commented Jun 17, 2018

When you create a New Issue you'll find a template to follow.
In order to help you, we need you to help us by following that template, telling us how to reproduce it, which version are you running, etc.

@Goregakalack
Copy link

I have had the same issue with the network manager, after some investigation looks like the issue is due to contention over two dictionaries in the NetworkManager class, “_requestIdToRequest” and “_interceptionIdToRequest”. Both these dictionaries were accessed by multiple threads causing the issue of KeyNotFound. After making some code changes to these to make them ConcurrentDictionary and making them thread safe seems to have solved the issue.

This also required similar changes to the “_responses” dictionary in Connection class and the “_callbacks” dictionary in CDPSession class.

@kblok kblok added the bug label Aug 23, 2018
@kblok
Copy link
Member

kblok commented Aug 23, 2018

@Goregakalack Could you share a piece of code to reproduce that?

@vikingcodes
Copy link
Author

@Goregakalack @kblok Have you fixed this issue.

@kblok
Copy link
Member

kblok commented Aug 23, 2018

@virenderkverma Could you share a piece of code to reproduce the issue?

@Goregakalack
Copy link

Goregakalack commented Sep 7, 2018

Sorry it took me so long to get back to you, had quite a few deadlines to meet. Anyway there is code to reporduce, it may take a few attempts to work.

Exception thrown:

System.IndexOutOfRangeException
HResult=0x80131508
Message=Index was outside the bounds of the array.
Source=mscorlib
StackTrace:
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at PuppeteerSharp.CDPSession.d__30.MoveNext()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.ValidateEnd(Task task)
at PuppeteerSharp.Request.d__58.MoveNext()
at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
at System.Runtime.CompilerServices.TaskAwaiter.GetResult()
at Puperteer_Sharp_Issues.Program.<>c__DisplayClass0_1.<

b__1>d.MoveNext() in


`
var fetcher = new BrowserFetcher();

        fetcher.DownloadAsync(BrowserFetcher.DefaultRevision).Wait();

        var launcher = new Launcher();

        var browser = launcher.LaunchAsync(new LaunchOptions
        {
            Headless = true
        }).Result;

        var urls = new List<string>
        {
            "http://www.google.co.uk",
            "https://uk.yahoo.com",
            "https://www.youtube.com/",
            "https://www.bbc.co.uk",
            "https://www.bbc.co.uk/news",
            "https://www.bbc.co.uk/iplayer",
            //"https://www.microsoft.com/en-gb",
            "https://visualstudio.microsoft.com/"
        };
        var page = await browser.NewPageAsync();

        await page.SetRequestInterceptionAsync(true);

        page.Request += (s, o) =>
        {
            Task.Run(async () =>
            {

                var httpRequest = WebRequest.CreateHttp(o.Request.Url);

                var response = (HttpWebResponse)httpRequest.GetResponse();

                var bytes = new byte[0];

                using (var ms = new MemoryStream())
                {
                    using (var stream = response.GetResponseStream())
                    {
                        stream.CopyTo(ms);
                    }

                    bytes = ms.ToArray();
                }

                var headers = new Dictionary<string, object>();

                foreach (var key in response.Headers.Keys)
                {
                    var strKey = key.ToString();

                    headers.Add(strKey, response.Headers[strKey]);
                }

                // Continue
                await o.Request.RespondAsync(new ResponseData()
                {
                    BodyData = bytes,
                    ContentType = "UTF-8",
                    Headers = headers
                });
            });
        };

        foreach (var url in urls)
        {

            await page.GoToAsync(url, new NavigationOptions());

            var screenShotData = await page.ScreenshotDataAsync(new ScreenshotOptions()
            {
                Type = ScreenshotType.Jpeg,
                FullPage = true,
                OmitBackground = false
            });

            var guid = Guid.NewGuid();

            File.WriteAllBytes(@"D:\Projects\SandBox\SandBox\Screenshots\" + guid.ToString() + ".jpeg", screenShotData);
        }

        await page.CloseAsync();
        page.Dispose();
        await browser.CloseAsync();

`

@DominicBoettger
Copy link
Contributor

Same here

System.Collections.Generic.KeyNotFoundException
  HResult=0x80131577
  Message=The given key '495CD48F501579A07D44F24EA34AD7D8' was not present in the dictionary.
  Source=System.Private.CoreLib
  StackTrace:
   at System.ThrowHelper.ThrowKeyNotFoundException[T](T key)
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at PuppeteerSharp.NetworkManager.OnRequest(RequestWillBeSentPayload e, String interceptionId)
   at PuppeteerSharp.NetworkManager.<OnRequestInterceptedAsync>d__37.MoveNext()
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   at PuppeteerSharp.NetworkManager.<Client_MessageReceived>d__33.MoveNext()
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()

@kblok
Copy link
Member

kblok commented Oct 26, 2018

Do you have a piece of code to test @DominicBoettger ?

@DominicBoettger
Copy link
Contributor

DominicBoettger commented Oct 26, 2018

using System;
using System.IO;
using System.Collections.Generic;
using System.Linq;
using System.Net.Http;
using System.Threading;
using System.Threading.Tasks;
using Newtonsoft.Json.Linq;
using NDesk.Options;
using PuppeteerSharp;
using System.Reflection;
using System.Collections;

namespace com.inspirationlabs.prerenderer
{
    class Prerenderer
    {
        static string Host = "http://localhost:2015";
        static int Threads = Environment.ProcessorCount * 20;
        static string Jsonurl = "https://my.urls";
        static string OutputPath = Path.GetDirectoryName(Assembly.GetEntryAssembly().Location) + Path.DirectorySeparatorChar + "output";
        static DirectoryInfo Cwd;
        static void Main(string[] args)
        {
            var p = new OptionSet() {
               { "host=", "Set the hostname", v => Host = v },
               { "threads=", "Set the amount of paralell threads", (int v) => Threads = v },
               { "jsonurl=", "Set the endpoint url to get the url list", v => Jsonurl = v},
               { "outputpath=", "Set the path to output the contents", v => OutputPath = v }
            };
            List<string> extra = p.Parse(args);
            
            try
            {
                // delete outputpath if it exists
                if(OutputPath.Length > 0 && Directory.Exists(OutputPath))
                {
                    Directory.Delete(OutputPath, true);
                }
                if(OutputPath.Length > 0)
                {
                    Console.WriteLine("Creating outputpath " + OutputPath);
                    Cwd = Directory.CreateDirectory(OutputPath);
                }
            } catch(Exception e)
            {
                Console.WriteLine(e.Message);
            }
            // wait for MainTask (async)
            Maintask().Wait();
        }

        static async Task Maintask()
        {
            try
            {
                HttpClient client = new HttpClient();
                HttpResponseMessage response = await client.GetAsync(Jsonurl);
                response.EnsureSuccessStatusCode();

                string responseBody = await response.Content.ReadAsStringAsync();
                JObject jObject = JObject.Parse(responseBody);
                JArray urldata = (JArray)jObject["data"];

                var fetcher = await new BrowserFetcher().DownloadAsync(BrowserFetcher.DefaultRevision);
                await DownloadAsync(urldata);
            } catch(Exception e)
            {
                Console.WriteLine(e.Message);
            }
        }
        
        // download data
        static async Task DownloadAsync(JArray urls)
        {
            Processing processing = new Processing();
            Queue<Page> qt = new Queue<Page>();


            using (SemaphoreSlim semaphore = new SemaphoreSlim(Threads))
            using (Browser browser = await Puppeteer.LaunchAsync(new LaunchOptions
            {
                Headless = false,
                Args =  new[] { "--no-sandbox", "--disable-setuid-sandbox" }
            }))
            {
                for (int i = 0; i <= Threads+50; i++)
                {
                    qt.Enqueue(await browser.NewPageAsync());
                }

                var tasks = urls.Select(async (urldata) =>
                {
                    if ((bool)urldata.SelectToken("published") && (bool)urldata.SelectToken("indexed"))
                    {
                        await semaphore.WaitAsync();
                        Page page = qt.Dequeue();
                        try
                        {
                            page.DefaultNavigationTimeout = 120000;
                            var setIsServer = @"
                            Object.defineProperty(window, 'isServer', {
                                get() {
                                    return true
                                }
                            });
                        ";
                            await page.EvaluateOnNewDocumentAsync(setIsServer);
                            await page.SetRequestInterceptionAsync(true);
                            page.Request += (sender, e) =>
                            {
                                string resType = e.Request.ResourceType.ToString();
                                if (resType == "Image" || resType == "Font")
                                {
                                    e.Request.AbortAsync();
                                }
                                else
                                {
                                    e.Request.ContinueAsync();
                                }
                            };
                            string path = (string)urldata.SelectToken("url");
                            string url = Host + path;
                            await page.GoToAsync(url, new NavigationOptions { WaitUntil = new[] { WaitUntilNavigation.Networkidle0 } });
                            string content = await page.GetContentAsync();

                            // put the result on the processing pipeline
                            processing.QueueItemAsync(content, path, OutputPath);
                        }
                        finally
                        {
                            qt.Enqueue(page);
                            semaphore.Release();
                        }
                    }

                });

                await Task.WhenAll(tasks.ToArray());
                // await processing.WaitForCompleteAsync();
            }
        }
    }
}

The problem occurs after ~ 2000 rendered pages.

@kblok
Copy link
Member

kblok commented Oct 26, 2018

@DominicBoettger I'm not getting that error. I created an array of 50 calls to Amazon.
Could you try awaiting e.Request.AbortAsync(); and e.Request.ContinueAsync();

@DominicBoettger
Copy link
Contributor

DominicBoettger commented Oct 26, 2018

I tried this but this did not work too. And I also only get the error after more than 2000 requests.
I removed:

await page.SetRequestInterceptionAsync(true);
page.Request += (sender, e) =>
                            {
                                string resType = e.Request.ResourceType.ToString();
                                if (resType == "Image" || resType == "Font")
                                {
                                    e.Request.AbortAsync();
                                }
                                else
                                {
                                    e.Request.ContinueAsync();
                                }
};

After that my code works.

@kblok
Copy link
Member

kblok commented Oct 28, 2018

@DominicBoettger I just published v1.9. Could you give it a try?

@kblok
Copy link
Member

kblok commented Nov 29, 2018

Fixed on #720

@kblok kblok closed this as completed Nov 29, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants