Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix debug assert in CachingReader #2309

Merged
merged 11 commits into from
Oct 3, 2019
Merged

Conversation

daschuer
Copy link
Member

@daschuer daschuer commented Oct 1, 2019

I have finally managed to dig to the ground of the assertion and created a solution with a lower impact on the way. Sorry for the concurrent PR, but IMHO this is a best way the get a clear view on the issue for both of us. The consecutive track load fix of your PR is not included here.

The issue was that the engine stops calling process when the deck is ejected. This is why the m_stateFIFO is not purged as originally assumed after unloading the old track but after loading the new track. This does not matter in the track change case, but is an issue when ejecting the track first.
in this case the idle signal is delayed and overwrites the loading state leading to the assert violation.

The fix now uses TRACK_LOADED and TRACK_UNLOADED as a border in the m_stateFIFO fifo to distinguish old chunks from new.

I have also moved the freeAllChunks to the engine thread and call it whenever such a border appears.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 2, 2019

How about emitting and forwarding signals through direct connections between different threads???

@Be-ing
Copy link
Contributor

Be-ing commented Oct 2, 2019

FWIW, I just played a 3.5 hour set with this and the track not having sound output bug never occurred.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 2, 2019

So you prefer a hacky solution for 2.2 relying on some undefined behavior?

Btw, the Xenial build is broken.

@daschuer
Copy link
Member Author

daschuer commented Oct 2, 2019

How about emitting and forwarding signals through direct connections between different threads???

This is an unrelated detail here.

I agree that these callbacks, called form a different thread than other function in a class can be supprising and should be avoided if posdible. We need to make sure that it is at least well documented but for now I did not want to touch it.

@daschuer
Copy link
Member Author

daschuer commented Oct 2, 2019

The Xenial error is interesting, because I have successfully build it here on Xenial.

A solution would be to replace the m_status with a qAtomicInt

src/engine/cachingreader.cpp:4:
/usr/include/c++/4.8/atomic: In member function ‘virtual void CachingReader::newTrack(TrackPointer)’:
/usr/include/c++/4.8/atomic:199:9: error: invalid memory model for ‘__atomic_store’
       { __atomic_store(&_M_i, &__i, _m); }

@uklotzde
Copy link
Contributor

uklotzde commented Oct 2, 2019

I will not make any compromises regarding quality and won't take responsibility for this hack.

@daschuer
Copy link
Member Author

daschuer commented Oct 2, 2019

I think we have no other chance to do so here IMHO, without starting a major refactoring of the engine.

So you prefer a hacky solution for 2.2 relying on some undefined behavior?

At least the behaviour id well defined.

The direct connection was never part of the issue we have experienced and hopefully solve here. And IMHO callbscks from different threads to change atomic member variables is a normal pattern in a lock free solutions and have sometimes less hassle than a cued connection from the main thread like you propose.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 2, 2019

The queued connection goes from a worker thread (without an event queue) to the main thread and not vice versa.

What's even worse is that we forward directly connected signals from CachingReaderWorker (running an engine worker thread) through CachingReader into the unknown.

@daschuer
Copy link
Member Author

daschuer commented Oct 2, 2019

Not into the unknown, into the engine.

The signal forwarding, in this case a simple callback with some Qt suggar, is part of the legacy where caching reader and worker where one class.

I think there is some room for improvments, but the point here is that the design was not recently introduced and is not part of the issue we solve here.

As explained, the original issue was, that the engine stops calling process after eject.

The second issue you have discovered with the consequtive track loads is sill pending, though, but a new, hopeful no real live issue. But worth to solve since we are aware now.

@Be-ing
Copy link
Contributor

Be-ing commented Oct 2, 2019

I would prefer a hacky solution to going down a rabbit hole of refactoring on a stable branch.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 2, 2019

I now agree that invoking freeAllChunks() in newTrack() was wrong.

We should annotate all functions with the thread they are supposed to be called and document the restrictions carefully. This is class is a dangerous beast.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 2, 2019

We should replace the signals with regular function calls to avoid further confusion. Then separate the functions into groups associated with threads.

auto pChunk = update.takeFromWorker();
if (pChunk) {
DEBUG_ASSERT(m_state != State::Idle);
Copy link
Contributor

@uklotzde uklotzde Oct 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to move the debug assertion. The condition must be valid while looping, independent of the update message. Ignore my comment. But I still don't understand when and why this might happen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the original position it happen when you have ejected the track.
In this case the "idle" state has overwritten the "loading" state because a null track was loaded. After loading a new non null track we receive the "loaded" signal while in idle state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should receive TRACK_LOADED/UNLOADED messages only while the state is Loading. The Idle state is set after we have received TRACK_UNLOADED. We should not receive any messages from the worker until the state has been changed from Idle to Loading again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that the cue is not processed after ejecting the track. That is why the idle message is still in the queue and overwrite the loading message.
This is not much of an issue, but I see how it is surprising.
I will look how to solve this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not an issue to overwrite the loading message because the very next message is the loaded message and the cue was purged until the loading state was there. Purging is stopped by the idle message.

Can we solve this by renaming?

@daschuer
Copy link
Member Author

daschuer commented Oct 2, 2019

hmm odd. std::atomic m_state; works on my xenial system but not on Travis.
QAtomicInteger m_state; dos not work because std::is_integral fails for enums.
even if i put ": int" to the definition.

@daschuer daschuer force-pushed the 2.2_cachingreader branch 2 times, most recently from 2dcaef6 to ff61eeb Compare October 3, 2019 09:22
<< "Cannot load new track while loading a track";
return;
}
m_worker.newTrack(std::move(pTrack));
Copy link
Contributor

@uklotzde uklotzde Oct 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final line m_worker.workReady(); got lost along the way Ok, I see you moved it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from your template :-)

@uklotzde
Copy link
Contributor

uklotzde commented Oct 3, 2019

I'm convinced that the code now works as expected! My previous approach didn't consider all the intended side effects and was wrong. Thanks for stopping me ;)

The design is still a mess, but I'm not going to rewrite it. At least not in C++. This code could be written much easier, safer, and cleaner in Rust. Maybe an option for the future.

freeAllChunks();
auto newState = pTrack ? STATE_TRACK_LOADING : STATE_TRACK_UNLOADING;
auto oldState = m_state.fetchAndStoreAcquire(newState);
VERIFY_OR_DEBUG_ASSERT(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this can currently actually happen in case the worker is stalled for some reasons.
We need to prevent it in the engine that this case is really impossible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, from the caching reader perspective this is not longer an issue, since CachingReaderWorker::loadTrack handles the loading and is not interrupted. It is an engine issue, however we cannot judge from m_state, because this is delayed due to the fifos.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, the issue is the slotLoadingTrack(TrackPointer pNewTrack, TrackPointer pOldTrack) everywhere.
New Track is only a wild guess if we allow another new track loading. In most cases pNewTrack is not used but not in every case. We need to clean up this which is risky due to the string based signal slot connections.

So lets stick with this assertion, until we manage to make it happen, but not as critical.
I will add a comment.

@daschuer
Copy link
Member Author

daschuer commented Oct 3, 2019

OK, we must not assert possible things, so I replaced it with a warning().
For the caching reader it is no problem, because no chunks are cached between two loads.
I have improved the assertion for that.

It "only" lead to inconsistency GUI, no reason for quit Mixxx and stop the party.

I have manually tested it a bit with a sleep in loadTrack() and nothing "critical" happens.

We should target this in 2.3 if it turns out to be a real live issue.

I am ready here.

Idle,
TrackLoading,
TrackLoaded,
enum State {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this changed from an enum class to an enum? Please change it back to an enum class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we need integers for the atomic. I didn't check if this still works with an enum class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So cast it to an int and back when needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually had a version with a static_cast() one every use. But @uklotzde version with a plain enum conviced me, because it is less distracting. In this case we have no namespace pollution issue, because the enum is class private.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this special case the enum literals are just private integer constants and a classical enum is a good fit.

@uklotzde uklotzde added this to the 2.2.3 milestone Oct 3, 2019
// but the newTrack may change if we load a new track while the previous one
// is still loading. This leads to inconsistent states for example a different
// track in the Mixx Title and the Deck label.
if (oldState != STATE_TRACK_LOADING ||
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Condition needs to be inverted: == && ==

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups .. yes!

@@ -218,6 +223,7 @@ void CachingReader::process() {
ReaderStatusUpdate update;
while (m_readerStatusUpdateFIFO.read(&update, 1) == 1) {
auto pChunk = update.takeFromWorker();
qDebug() << "CachingReader::process()" << update.status;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This debug log will be printed too frequently, not a good idea.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups a leftover.

@@ -143,6 +143,9 @@ void CachingReaderWorker::loadTrack(const TrackPointer& pTrack) {
// Emit that a new track is loading, stops the current track
emit trackLoading();

// Slows down track loading for debug
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either be more specific "for debugging race conditions while loading multiple tracks in a row" or delete the comment.

@daschuer
Copy link
Member Author

daschuer commented Oct 3, 2019

Done.

@uklotzde
Copy link
Contributor

uklotzde commented Oct 3, 2019

I'm confident that we ironed out all the undefined behavior while avoiding trapdoors. LGTM.

@uklotzde uklotzde merged commit b9de27a into mixxxdj:2.2 Oct 3, 2019
@Be-ing
Copy link
Contributor

Be-ing commented Oct 3, 2019

Great, thank you both for your hard work on this critical issue. I think we should wait at least a week or two before releasing 2.2.3 to see if anyone reports bugs from these changes.

I just merged the 2.2 branch to master.

@daschuer daschuer deleted the 2.2_cachingreader branch September 26, 2021 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants