Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue on my side with replay #127

Closed
JakubKad opened this issue Jan 16, 2023 · 3 comments
Closed

Issue on my side with replay #127

JakubKad opened this issue Jan 16, 2023 · 3 comments
Labels
question Further information is requested

Comments

@JakubKad
Copy link

Hello,

I am here again to plea for help. I have problem with replay. I will go through the steps that I have taken, and then diescribe the problem and my thought of a solution that is in my mind.

  1. Taking DB backup
  2. Taking transaction logs to get to the state, where the Capture starts
  3. Capturing the workload into the SQLite file
  4. Restore of the DB backup and transaction logs.
  5. Doing Reseeds for IDs
  6. Start the replay

obrazek
obrazek
obrazek

Errors

2022-12-14 11:57:36.0902 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : 1301000 events queued for replay ( 28% ) 2022-12-14 11:57:36.8018 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : 1302000 events queued for replay ( 28% ) 2022-12-14 11:57:37.7756 - Info - WorkloadTools.Consumer.Replay.ReplayConsumer : 1303000 events queued for replay ( 28% ) 2022-12-14 11:57:38.6316 - Info - WorkloadTools.Consumer.Replay.ReplayWorker : Worker [209] - 6000 commands executed. 2022-12-14 11:57:38.6316 - Info - WorkloadTools.Consumer.Replay.ReplayWorker : Worker [209] - 90000 commands pending. 2022-12-14 11:57:38.6350 - Info - WorkloadTools.Consumer.Replay.ReplayWorker : Worker [209] - Last Event Sequence: 176656 2022-12-14 11:57:38.6350 - Info - WorkloadTools.Consumer.Replay.ReplayWorker : Worker [209] - 205 commands per second. 2022-12-14 11:57:38.8016 - Warn - WorkloadTools.Consumer.Replay.ReplayWorker : Worker [84] - Sequence[383713] - Error: The INSERT statement conflicted with the FOREIGN KEY constraint "FK_RevenueRecognition_OrderItems". The conflict occurred in database "DB", table "dbo.OrderItems", column 'OrderItem_ID'. The statement has been terminated.

Description

The server is in continuous operation and cannot be put into a state where nothing is running on it, it cannot be shut down for a while. I am able to get the database to an identical state to when Capture was running. There is a problem here that I am unable to resolve. This problem is actually related to a continuous stream of data to the server, where I start Capture, but the server is still running queries that have not been completed and thus cannot be recorded in the SQLite file.

By doing this, I am not able to treat errors that may occur in Replay, as they may be working with data that is not recorded in Capture and not in Backup/Logs either. I would like to ask how should I address this issue? I was thinking of a larger time window where I can be sure that all queries have been run and go through the SQLite file and remove the Querries that are done and put into the DB (this is a very inconvenient way of solving, time consuming).

When I tried this tool on a Local instance to get familiar with the tool, I didn't have these problems, of course, and everything worked fine.

Thanks for any advice.

@spaghettidba
Copy link
Owner

I'm not really sure that you can avoid this type of errors completely. There will always be a number of events that you will not be able to replay, especially at the beginning of the replay itself.
The ideal solution depends on what you are trying to achieve: are you trying to compare production with test, test with test? What is the purpose of the replay?
Depending on the answer, you could:

  1. Ignore the errors. If you are comparing a first replay on test with a second replay on test, both replays will get the same errors, so the comparison is still fair
  2. Remove the offending events from the captured events. If you need the replay to contain 0 errors, you could open the source .sqlite file with a sqlite client like DB Browser for Sqlite and delete the events that are causing trouble. You could even decide to delete the first 5 or more minutes of events. If this solution suits you, I could add a property to make the replay skip a certain amount of minutes or events before enqueueing events for replay.

Hope this helps

@spaghettidba spaghettidba added the question Further information is requested label Jan 17, 2023
@JakubKad
Copy link
Author

JakubKad commented Mar 26, 2023

Hello,

After a while we have conclusion. We used MARK (Begin Transaction With Mark) to get highlight spot. Using this we restored Backup and Logs to the MARK point and started replay from the beginning (when the Capture started). Quick little question at the end. These little files after the Capture was concluded, are they containing data from the main TEMP file and are therefore to inject some small samples or just doing the check of the capture file (sqlite)?

Capture is creating big TEMP file, these cache files are there after the capture was concluded.

Edit: I know there is a CACHE option in JSON (We have not defined it). But I am not entirely sure of the purpose of these files (around 100 before we shutted all down for good, 'cause the CMD were not writing a thing anymore and only these CACHE files were popping up, we used the timer option in JSON).

image

@spaghettidba
Copy link
Owner

Glad you sorted it out!
Regarding the cache files, those are for caching events to disk before processing them. The events queue is a memory-mapped file and needs the cache to avoid using all the memory in the host. I'm surprised those file don't get deleted though, this is not the intended behavior. Thanks for reporting it, I'll have a look at the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants