Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Access Violation when using CBFDeserializer #2964

Closed
mjmckp opened this issue Feb 16, 2018 · 6 comments
Closed

Memory Access Violation when using CBFDeserializer #2964

mjmckp opened this issue Feb 16, 2018 · 6 comments

Comments

@mjmckp
Copy link

mjmckp commented Feb 16, 2018

The fully self-contained C# code below reproduces an access violation caused by CBFDeserializer.

Loading the file and doing a single pass through the data works without error (i.e., when MaxSweeps == 1). However, when MaxSweeps > 1, at the start of the second pass, the follow access violation occurs:

Exception thrown at 0x00007FFF2D9CC3C7 (vcruntime140.dll) in MSBuild.exe: 0xC0000005: Access violation reading location 0x000001AA4460C844. occurred
>	vcruntime140.dll!memcpy() Line 137	Unknown	Symbols loaded.
 	Cntk.Composite-2.4.dll!00007fff19c2a834()	Unknown	No symbols loaded.
 	Cntk.Composite-2.4.dll!00007fff19c2a03a()	Unknown	No symbols loaded.
 	Cntk.Composite-2.4.dll!00007fff19c18603()	Unknown	No symbols loaded.
 	Cntk.Core-2.4.dll!00007ffede550e4f()	Unknown	No symbols loaded.
 	Cntk.Core-2.4.dll!00007ffede557b1b()	Unknown	No symbols loaded.
 	Cntk.Core-2.4.dll!00007ffede5558d3()	Unknown	No symbols loaded.
 	Cntk.Core-2.4.dll!00007ffede553f7a()	Unknown	No symbols loaded.
 	Cntk.Core-2.4.dll!00007ffede557e52()	Unknown	No symbols loaded.
 	Cntk.Core-2.4.dll!00007ffede545dbd()	Unknown	No symbols loaded.
 	Cntk.Core-2.4.dll!00007ffede5457ed()	Unknown	No symbols loaded.
 	msvcp140.dll!Concurrency::details::`anonymous namespace'::_Task_scheduler_callback(_TP_CALLBACK_INSTANCE * _Pci, void * _Args, _TP_WORK * __formal) Line 158	C++	Symbols loaded.
 	ntdll.dll!TppWorkpExecuteCallback()	Unknown	Symbols loaded.
 	ntdll.dll!TppWorkerThread()	Unknown	Symbols loaded.
 	kernel32.dll!BaseThreadInitThunk�()	Unknown	Symbols loaded.
 	ntdll.dll!RtlUserThreadStart�()	Unknown	Symbols loaded.

The access violation occurs with both CPU and GPU devices. Note also that this error does not occur with other data sets where the binary files have been generated by the same method.

Code to reproduce the exception:

using System;
using System.IO;
using System.Text;
using System.Collections.Generic;
using System.Linq;

namespace CNTK.AccessViolation
{
    class Program
    {
        static void Main(string[] args)
        {
            //
            // Download this file from the following link, then edit the path below:
            //    https://www.dropbox.com/s/6hcaj886zan37lm/j2j5t1nu.3ts?dl=0
            //
            var file = @"C:\temp\j2j5t1nu.3ts";
            Run(file, 1); // Works fine
            Run(file, 2); // Access violation at start of second sweep!
        }

        /// <summary>
        ///  Iterates through the data in the CBF file maxSweeps times.
        /// </summary>
        /// <param name="file"></param>
        /// <param name="maxSweeps"></param>
        static void Run(string file, int maxSweeps)
        {
            Console.WriteLine("*** Running with MaxSweeps = {0} ****", maxSweeps);
            var batchSize = 500u;
            Dictionary<string, StreamConfiguration> streams = null;
            using (var src = GetMinibatchSource(file, true, maxSweeps, out streams))
            {
                var streamInfo = src.StreamInfo(streams.Keys.First());
              //using (var device = DeviceDescriptor.GPUDevice(0)) // note: access violation occurs with both CPU and GPU devices
                using (var device = DeviceDescriptor.CPUDevice)
                {
                    var mbcount = 0;
                    var sweeps = 0;
                    while (sweeps < maxSweeps)
                    {
                        mbcount++;
                        using (var mb = src.GetNextMinibatch(batchSize, device))
                        {
                            var value = mb[streamInfo];
                            if (value.sweepEnd)
                                sweeps++;
                            if (mbcount % 1000 == 0 || value.sweepEnd)
                                Console.WriteLine("Minibatch {0}: NSamples={1} SweepEnd={2}", mbcount, value.numberOfSamples, value.sweepEnd);
                        }
                    }
                }
            }
            Console.WriteLine("done");
        }

        /// <summary>
        /// Creates a MinibatchSource for the given CBF file
        /// </summary>
        /// <param name="file"></param>
        /// <param name="randomise"></param>
        /// <param name="maxSweeps"></param>
        /// <param name="streams"></param>
        /// <returns></returns>
        static MinibatchSource GetMinibatchSource(string file, bool randomise, int maxSweeps, out Dictionary<string, StreamConfiguration> streams)
        {
            streams = InspectCBFFile(file);
            var streamConfigurations = new StreamConfigurationVector();
            foreach (var stream in streams.Values)
            {
                streamConfigurations.Add(stream);
            }
            var deserialiser = CNTKLib.CBFDeserializer(file, streamConfigurations);
            var config = new MinibatchSourceConfig(new DictionaryVector(new CNTKDictionary[] { deserialiser }), randomise);
            config.MaxSweeps = (ulong) maxSweeps;
            return CNTKLib.CreateCompositeMinibatchSource(config);
        }

        /// <summary>
        /// Prints out a summary of the data contained in a CNTK binary file, returns StreamConfigurations
        /// </summary>
        /// <param name="file"></param>
        static Dictionary<string, StreamConfiguration> InspectCBFFile(string file)
        {
            using (var strm = File.Open(file, FileMode.Open, FileAccess.Read, FileShare.Read))
            {
                using (var reader = new BinaryReader(strm))
                {
                    if (reader.ReadUInt64() != 0x636e746b5f62696e)
                        throw (new Exception("Not a CBF file"));
                    if (reader.ReadUInt32() != 1)
                        throw (new Exception("Unexpected CBF version"));
                    strm.Seek(-8L, SeekOrigin.End);
                    var headerStart = reader.ReadInt64();
                    if (headerStart <= 0L || headerStart >= strm.Position)
                        throw (new Exception("Invalid header start"));
                    strm.Seek(headerStart, SeekOrigin.Begin);
                    if (reader.ReadUInt64() != 0x636e746b5f62696e)
                        throw (new Exception("Corrupt header"));
                    var numChunks = reader.ReadUInt32();
                    var numStreams = reader.ReadUInt32();
                    Console.WriteLine("File contains {0} chunks and {1} streams", numChunks, numStreams);
                    var streams = new Dictionary<string, StreamConfiguration>();
                    for (var i = 0; i < numStreams; i++)
                    {
                        var dense = (reader.ReadByte() == 0);
                        var bytes = reader.ReadBytes((int)reader.ReadUInt32());
                        var name = new String((new ASCIIEncoding()).GetChars(bytes));
                        var elementType = (reader.ReadByte() == 0) ? "float" : "double";
                        var dim = reader.ReadUInt32();
                        Console.WriteLine("stream: {0} storage: {1} name: {2} type: {3} dim: {4}", i, (dense ? "dense" : "sparse"), name, elementType, dim);
                        var config = new StreamConfiguration(name, dim, !dense, name);
                        streams.Add(name, config);
                    }

                    var totalSamples = 0;
                    for (var i = 0; i < numChunks; i++)
                    {
                        var offset = reader.ReadInt64();
                        var numSeq = reader.ReadUInt32();
                        var numSamples = reader.ReadUInt32();
                        Console.WriteLine("chunk: {0} offset: {1} numSeq: {2} numSamples: {3}", i, offset, numSeq, numSamples);
                        totalSamples += (int)numSamples;
                    }
                    Console.WriteLine("total samples: {0}", totalSamples);

                    return streams;
                }
            }
        }

    }
}

FYI, I am using the CNTK.GPU NuGet package version 2.4.0 on Windows 10.

@bencherian
Copy link
Contributor

I believe this is the same issue as what I reported in #2905. The BlockRandomizer has a logic bug which causes memory that holds samples from sweep N to be freed when crossing into sweep N+1. When you have a minibatch that crosses sweep boundaries you will get access violation errors. You can use the diff in the PR #2906 I submitted (and then closed because it breaks existing tests) to build a version that prevents minibatches from crossing sweep boundaries.

@haixpham
Copy link

Same as #2479. It has not been fixed yet.

@mjmckp
Copy link
Author

mjmckp commented Feb 18, 2018

This is quite a major bug that would have been easily found with the most basic unit tests in the first place, and it's been known about for 4 months?

@ke1337
Copy link

ke1337 commented Feb 21, 2018

Thanks for the report and detailed analysis, we'll work on a fix.

ke1337 pushed a commit that referenced this issue Feb 23, 2018
ke1337 pushed a commit that referenced this issue Feb 23, 2018
ke1337 pushed a commit that referenced this issue Feb 23, 2018
ke1337 pushed a commit that referenced this issue Feb 23, 2018
@ke1337 ke1337 closed this as completed Feb 23, 2018
@mjmckp
Copy link
Author

mjmckp commented Feb 23, 2018

Thanks for the prompt fix! When are you next planning to update the NuGet packages to pick up the changes?

@ke1337
Copy link

ke1337 commented Feb 23, 2018

I should thank you for providing the repro, :). The NuGet package would be updated with next release, and in the mean time, please try build from source or wait until nightly build is available to public.

main76 pushed a commit to main76/CNTK that referenced this issue Mar 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants