RavenDB Data Correction Tool

The source code of the console application which uses thies functions is in Raven.DataCorrector project.

Problem #1

Not so long ago, when I was with my old company we encountered a serious problem on one of our RavenDB instances. Some records couldn't be read from the Voron store and the logs contained the following exception,

System.IO.InvalidDataException: Failed to de-serialize metadata of document documentCollection/3c93ae96-aa62-42b0-afa3-4bb693cbcdd4 ---> System.IO.EndOfStreamException: Attempted to read past the end of the stream.
   at Raven.Abstractions.Extensions.StreamExtensions.ReadEtag(Stream stream) in c:\Builds\RavenDB-Stable-3.0\Raven.Abstractions\Extensions\StreamExtensions.cs:line 162
   at Raven.Database.Storage.Voron.StorageActions.DocumentsStorageActions.ReadDocumentMetadata(String normalizedKey, Slice sliceKey, Int32& size)
   --- End of inner exception stack trace ---
   at Raven.Database.Storage.Voron.StorageActions.DocumentsStorageActions.ReadDocumentMetadata(String normalizedKey, Slice sliceKey, Int32& size)
   at Raven.Database.Storage.Voron.StorageActions.DocumentsStorageActions.DocumentByKey(String key)
   at Raven.Database.Storage.Voron.StorageActions.DocumentsStorageActions.<GetDocumentsWithIdStartingWith>d__15.MoveNext()
   at Raven.Database.Actions.DocumentActions.<>c__DisplayClassf.<GetDocumentsWithIdStartingWith>b__d(IStorageActionsAccessor actions)
   at Raven.Storage.Voron.TransactionalStorage.ExecuteBatch(Action`1 action)
   at Raven.Storage.Voron.TransactionalStorage.Batch(Action`1 action)
   at Raven.Database.Actions.DocumentActions.GetDocumentsWithIdStartingWith(String idPrefix, String matches, String exclude, Int32 start, Int32 pageSize, CancellationToken token, Int32& nextStart, Action`1 addDoc, String transformer, Dictionary`2 transformerParameters, String skipAfter)
   at Raven.Database.Actions.DocumentActions.GetDocumentsWithIdStartingWith(String idPrefix, String matches, String exclude, Int32 start, Int32 pageSize, CancellationToken token, Int32& nextStart, String transformer, Dictionary`2 transformerParameters, String skipAfter)
   at Raven.Database.Server.Controllers.DocumentsController.DocsGet()
   at lambda_method(Closure , Object , Object[] )
   ...

a big chunk of the exception stack is removed since I considered it to be irrelevant.

Thus indexes for the collection in subject where skipping the corrupt records and replication completely stopped, it was an operational database and we had to bring back up as soon as we could.

First thing we opened an issue in Raven google groups. Oren Eini responded, he immediately assumed that it was a hard disk failure (which we also think was the case) due to which the object was written to Voron store incompletely. He suggested to export all data and import it into a fresh RavenDB instance. Unfortunately exporting of data wasn't an option, since the exporter reads all records in the etag order and on failing to read a record it comes to halt. I had to come up with a way to make our database exportable so I started digging into the dark depths of RavenDB source code.

Solution

By following the chain of calls in the exception stack I ended up in DocumentsStorageActions.ReadDocumentMetadata() method. It turned out that metadata weren't present for those documents, or were present partially and couldn't be deserialized. The only option that I had in mind then is to somehow force the metadata into the document and make it readable, DocumentsStorageActions.TouchDocument() turned out to be just that function that I needed. Normally it simply updates the documents etag, i.e. "touches" it without really updating anything. But in order to make it work for my case I had to slightly modify it, so that in case the metadata turned out to be unreadable my function would overwrite them with a new metadata object. I called my modification TouchCorruptDocumentPub(), where ending Pub signifies that the function is publicly available.

public void TouchCorruptDocumentPub(string key, out Etag preTouchEtag, out Etag afterTouchEtag, Etag seekAfterEtag)
        {
            if (string.IsNullOrEmpty(key))
                throw new ArgumentNullException("key");

            var normalizedKey = CreateKey(key);
            var normalizedKeySlice = (Slice)normalizedKey;

            if (!tableStorage.Documents.Contains(Snapshot, normalizedKeySlice, writeBatch.Value))
            {
                if (logger.IsDebugEnabled)
                    logger.Debug("Document with dataKey='{0}' was not found", key);
                preTouchEtag = null;
                afterTouchEtag = null;
                return;
            }

            int _;
            JsonDocumentMetadata metadata;
            try
            {
                metadata = ReadDocumentMetadata(normalizedKey, normalizedKeySlice, out _);

                Console.WriteLine($"Metadata loaded for key '{key}', {metadata.Etag}");
            }
            catch (Exception)
            {
                Console.WriteLine($"Metadata failed to load for key '{key}'");
                Console.WriteLine($"Looking for an etag for key '{key}', starting at {seekAfterEtag ?? Etag.Empty}");

                metadata = new JsonDocumentMetadata()
                {
                    Etag = FindEtagByKey(normalizedKey, seekAfterEtag),
                    Key = key,
                    Metadata = new RavenJObject()
                };

                Console.WriteLine($"Found etag {metadata.Etag}");
            }

            var newEtag = uuidGenerator.CreateSequentialUuid(UuidType.Documents);
            Console.WriteLine($"Generated new etag {newEtag}");

            afterTouchEtag = newEtag;
            preTouchEtag = metadata.Etag;
            metadata.Etag = newEtag;

            WriteDocumentMetadata(metadata, normalizedKeySlice, shouldIgnoreConcurrencyExceptions: true);

            var keyByEtagIndex = tableStorage.Documents.GetIndex(Tables.Documents.Indices.KeyByEtag);

            keyByEtagIndex.Delete(writeBatch.Value, preTouchEtag);
            keyByEtagIndex.Add(writeBatch.Value, newEtag, normalizedKey);

            documentCacher.RemoveCachedDocument(normalizedKey, preTouchEtag);
            etagTouches.Add(preTouchEtag, afterTouchEtag);

            if (logger.IsDebugEnabled) { logger.Debug("TouchDocument() - document with key = '{0}'", key); }
        }

This fixed the problem for individual documents. Now I had to figure out how to fix all of the affected documents, so here comes the function that traverses the etag tree and attempts to load each corresponding document and it's metadata,

public IEnumerable<Tuple<Etag, string, bool, bool, Exception>> GetKeysAfterWithIdStartingWithPub(
            Etag etag, 
            int take = int.MaxValue,
            Etag untilEtag = null,
            TimeSpan? timeout = null,
            Reference<bool> earlyExit = null,
            Action<List<DocumentFetchError>> failedToGetHandler = null,
            bool includeMetadataCanBeReadFlag = false,
            bool includeDocumentCanBeReadFlag = false)
        {
            if (earlyExit != null)
                earlyExit.Value = false;
            if (take < 0)
                throw new ArgumentException("must have zero or positive value", "take");

            if (take == 0)
                yield break;

            if (string.IsNullOrEmpty(etag))
                throw new ArgumentNullException("etag");

            Stopwatch duration = null;
            if (timeout != null)
                duration = Stopwatch.StartNew();

            Etag lastDocEtag = null;
            using (var iterator = tableStorage.Documents.GetIndex(Tables.Documents.Indices.KeyByEtag)
                .Iterate(Snapshot, writeBatch.Value))
            {
                var slice = (Slice)etag.ToString();
                if (iterator.Seek(slice) == false)
                    yield break;

                if (iterator.CurrentKey.Equals(slice)) // need gt, not ge
                {
                    if (iterator.MoveNext() == false)
                        yield break;
                }
                int fetchedDocumentCount = 0;

                Etag docEtag = etag;

                var errors = new List<DocumentFetchError>();
                var skipDocumentGetErrors = failedToGetHandler != null;
                
                do
                {
                    docEtag = Etag.Parse(iterator.CurrentKey.ToString());

                    // We can skip many documents so the timeout should be at the start of the process to be executed.
                    if (timeout != null)
                    {
                        if (duration.Elapsed > timeout.Value)
                        {
                            if (earlyExit != null)
                                earlyExit.Value = true;
                            break;
                        }
                    }

                    if (untilEtag != null)
                    {
                        // This is not a failure, we are just ahead of when we expected to. 
                        if (EtagUtil.IsGreaterThan(docEtag, untilEtag))
                            break;
                    }

                    var key = GetKeyFromCurrent(iterator);

                    int metadataSize = -1;
                    int documentSize = -1;
                    var sliceKey = (Slice)key;
                    Exception exception = null;
                    if (includeDocumentCanBeReadFlag || includeMetadataCanBeReadFlag)
                    {
                        try
                        {
                            var metadata = ReadDocumentMetadata(key, sliceKey, out metadataSize);
                            if (includeDocumentCanBeReadFlag)
                            {
                                var @object = ReadDocumentData(key, sliceKey, metadata.Etag, metadata.Metadata, out documentSize);
                            }
                        }
                        catch (Exception ex)
                        {
                            exception = ex;
                            // ignored
                        }
                    }

                    yield return Tuple.Create(docEtag, key, metadataSize > 0, documentSize > 0, exception);

                    fetchedDocumentCount++;
                    if (fetchedDocumentCount >= take)
                    {
                        if (untilEtag != null && earlyExit != null)
                            earlyExit.Value = true;
                        break;
                    }
                } while (iterator.MoveNext());

                if (skipDocumentGetErrors && errors.Count > 0)
                {
                    failedToGetHandler(errors);
                }
            }
        }

it outputs a Tuple object with with four items,

Item1 = Document etag
Item2 = Indication that metadata are present
Item3 = Indication that document data are present
Item4 = The exception in case of an error

The results could be stored in a csv or any other human readable document format. That helped to identify all broken records and later pass them into TouchCorruptDocumentPub() for correction.

We established the followng process that helped us to extract all data and then import them into a fresh instance like Oren suggested,

Bring the db offline
Make a copy of the database folder
Get the list of all keys/etags
Find corrupt records
- Correct them (Make sure the DocumentTouchCorruptSeekAfterEtag is set to either Empty or less than the first record in the list)
Export all corrupt records into a json file
Delete all corrupt records
Find duplicate (in the sense of unique constraints, if any) records
- Correct them (requires bringing the db online)
Verify the db for corrupt records again
If no corruption found, bring the db online
Start export

This is how this tool was born, later I continued adding more functions to it. For example you can make a rough comarison between two databases by exporting all etags/document keys saving them as csv files and then letting the tool output a report with dicovered differences.

Problem #2

Later this year we encountered another problem, the following exception was logged,

---> System.IO.InvalidDataException: Failed to de-serialize a document: documentCollection/480fcbd7-f051-444d-8c9e-cd19c774d8e7 ---> System.IO.IOException: Encrypted stream is not correctly salted with the document key. 
---> System.IO.InvalidDataException: The encrypted stream's salt was different than the expected salt.
   at Raven.Database.Bundles.Encryption.CodecSaltExtensions.ReadSalt(Stream stream, String key) in C:\Builds\RavenDB-3.5-Patch\Raven.Database\Bundles\Encryption\Codec.cs:line 213

as previously we contacted Oren and got these responses back:

What I think happened is that you have an encrypted db, and at some point someone reset the user password. RavenDB is using DPAPI to store the secret key, so if you reset the password, that provide a different key, and may result in a silent key change for RavenDB. Later documents are fine, but older document are using the old (lost) key.
We found a bug that may cause this issue on Voron systems, and we have a hotfix for that. (Stable release is in the works) Since you have a replica that works, you can switch it up and make the secondary the primary and replciate back from it.

In this case of course there was nothing we could do, but this tool helped us quickly identify all affected documents and take corresponding measures.

Helpers:

="""" & TRIM(A1) & """: """ & TRIM(B1) & ""","
RavenJObject.Parse(data);

Name		Name	Last commit message	Last commit date
Latest commit History 34,575 Commits
.nuget		.nuget
Art		Art
Bundles		Bundles
DefaultConfigs		DefaultConfigs
Imports		Imports
IndexCleaner		IndexCleaner
NuGet		NuGet
Rachis		Rachis
Raven.Abstractions		Raven.Abstractions
Raven.Backup		Raven.Backup
Raven.Client.Lightweight		Raven.Client.Lightweight
Raven.Client.MvcIntegration		Raven.Client.MvcIntegration
Raven.DataCorrector		Raven.DataCorrector
Raven.Database		Raven.Database
Raven.Debug		Raven.Debug
Raven.DtcTests		Raven.DtcTests
Raven.Migration		Raven.Migration
Raven.MiniDumper		Raven.MiniDumper
Raven.Monitor		Raven.Monitor
Raven.NetCore.Tryouts		Raven.NetCore.Tryouts
Raven.Performance		Raven.Performance
Raven.Server		Raven.Server
Raven.Setup.CustomActions		Raven.Setup.CustomActions
Raven.Setup		Raven.Setup
Raven.SimulatedWorkLoad		Raven.SimulatedWorkLoad
Raven.SlowTests		Raven.SlowTests
Raven.Smuggler		Raven.Smuggler
Raven.Sparrow		Raven.Sparrow
Raven.StorageExporter		Raven.StorageExporter
Raven.Studio.Html5		Raven.Studio.Html5
Raven.Test.CoreWLog4net		Raven.Test.CoreWLog4net
Raven.Tests.Bundles		Raven.Tests.Bundles
Raven.Tests.Common		Raven.Tests.Common
Raven.Tests.Core		Raven.Tests.Core
Raven.Tests.Counters		Raven.Tests.Counters
Raven.Tests.FileSystem		Raven.Tests.FileSystem
Raven.Tests.Helpers		Raven.Tests.Helpers
Raven.Tests.Issues		Raven.Tests.Issues
Raven.Tests.MailingList		Raven.Tests.MailingList
Raven.Tests.Migration		Raven.Tests.Migration
Raven.Tests.Raft		Raven.Tests.Raft
Raven.Tests.Server.Runner		Raven.Tests.Server.Runner
Raven.Tests.TimeSeries		Raven.Tests.TimeSeries
Raven.Tests.Web		Raven.Tests.Web
Raven.Tests		Raven.Tests
Raven.Tryouts		Raven.Tryouts
Raven.Voron		Raven.Voron
Raven.Web		Raven.Web
Raven.Xunit.Runner		Raven.Xunit.Runner
Scripts		Scripts
SharedLibs		SharedLibs
Tools		Tools
.gitattributes		.gitattributes
.gitignore		.gitignore
CommonAssemblyInfo.cs		CommonAssemblyInfo.cs
RavenDB.sln		RavenDB.sln
RavenDB.sln.DotSettings		RavenDB.sln.DotSettings
RavenDB.userprefs		RavenDB.userprefs
Settings.StyleCop		Settings.StyleCop
acknowledgments.txt		acknowledgments.txt
agpl.txt		agpl.txt
bisect.cmd		bisect.cmd
bisect.ps1		bisect.ps1
bisect_internal.sh		bisect_internal.sh
build.cmd		build.cmd
build.ps1		build.ps1
build_utils.ps1		build_utils.ps1
bumpVersion.ps1		bumpVersion.ps1
cleanup.cmd		cleanup.cmd
cleanup.ps1		cleanup.ps1
default.ps1		default.ps1
git_setup.cmd		git_setup.cmd
git_setup.ps1		git_setup.ps1
global.json		global.json
license.txt		license.txt
normalize.ps1		normalize.ps1
normalize_pr.sh		normalize_pr.sh
quick.ps1		quick.ps1
readme.md		readme.md
reserver_ports.cmd		reserver_ports.cmd
zzz_RavenDB_Release.sln		zzz_RavenDB_Release.sln
zzz_RavenDB_Release.sln.DotSettings		zzz_RavenDB_Release.sln.DotSettings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RavenDB Data Correction Tool

Problem #1

Solution

Problem #2

About

Releases

Packages

Languages

License

maqduni/ravendb-data-corrector

Folders and files

Latest commit

History

Repository files navigation

RavenDB Data Correction Tool

Problem #1

Solution

Problem #2

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages