Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#7476 fix logic and syntax of queue fully_acked method #7524

Closed
wants to merge 2 commits into from

Conversation

original-brownbear
Copy link
Member

@original-brownbear original-brownbear commented Jun 23, 2017

Fixes #7476

  • Syntax error in wrapped_acked_queue.rb

@colinsurprenant
Copy link
Contributor

Thanks @original-brownbear for looking into this.

The fix for the empty? syntax error + test looks good.

For the isFullyAcked() method, I am not sure about this: I see why you want isFullyAcked() to return true for the empty? method but I wonder if semantically it makes sense to say that an empty page is fully acked?

Also, won't that break this condition here where a new head page will be then considered fully acked and immediately transformed into a tail page?

if (this.headPage.isFullyAcked()) {
// purge the old headPage because its full and fully acked
// there is no checkpoint file to purge since just creating a new TailPage from a HeadPage does
// not trigger a checkpoint creation in itself
TailPage tailPage = new TailPage(this.headPage);
tailPage.purge();

@original-brownbear
Copy link
Member Author

original-brownbear commented Jun 27, 2017

@colinsurprenant npnp :)

method but I wonder if semantically it makes sense to say that an empty page is fully acked?

I'd go with yes on this one, because "nothing is unacknowledged" must mean "fully acknowledged".
Anything else is really hard to make into consistent algebra imo.

Also, won't that break this condition here where a new head page will be then considered fully acked and immediately transformed into a tail page?

if (this.headPage.isFullyAcked()) {
// purge the old headPage because its full and fully acked
// there is no checkpoint file to purge since just creating a new TailPage from a HeadPage does
// not trigger a checkpoint creation in itself
TailPage tailPage = new TailPage(this.headPage);
tailPage.purge();

Counter question :) Why does an empty page fail the free space check wrapping that logic?

if (! this.headPage.hasSpace(data.length)) {

            if (! this.headPage.hasSpace(data.length)) {

@colinsurprenant
Copy link
Contributor

@original-brownbear

I'd go with yes on this one, because "nothing is unacknowledged" must mean "fully acknowledged".
Anything else is really hard to make into consistent algebra imo.

Interesting - yeah empty set algebra is weird. So both "all elements are acknowledged" and "all elements are unacknowledged" would be true at the same time on an empty set.

Maybe we should probably rename that method to reflect the initial intention of looking for all elements on a non empty queue?

Counter question :) Why does an empty page fail the free space check wrapping that logic?

I'll let you look into this - seems important to figure before merging.

@original-brownbear
Copy link
Member Author

@colinsurprenant

Maybe we should probably rename that method to reflect the initial intention of looking for all elements on a non empty queue?

+1 in general, it would be less confusing. What I really don't like about it is that we have to negate every call to it we make at the moment, which really messes up the git history. Not sure we want to do that while we fix a bug?

So both "all elements are acknowledged" and "all elements are unacknowledged" would be true at the same time on an empty set.

Yea we aren't asking if all elements are unacknowledged anywhere in our code, this seems like a metaphysics question. If we would ask both questions, we should probably just throw on a call to this method for an empty page and gate all use cases by a count or isEmpty check (imo).

@colinsurprenant
Copy link
Contributor

+1 in general, it would be less confusing. What I really don't like about it is that we have to negate every call to it we make at the moment, which really messes up the git history. Not sure we want to do that while we fix a bug?

Agree. Can I then suggest you change the empty? to both check for the empty page case and the is_fully_acked? case?

for the isFullyAcked() method, either we keep your change or not - I would argue maybe not, unless we prove that this is not/will not create problems elsewhere, and defer the renaming/semantic change to another PR?

this seems like a metaphysics question.

Oh, totally :P it was just to illustrate the empty set algebra metaphysical confusion.

@original-brownbear
Copy link
Member Author

@colinsurprenant

Can I then suggest you change the empty? to both check for the empty page case and the is_fully_acked? case?

But then we add 2 synchronized Ruby to Java calls + we don't have any empty check implemented in the Queue. Do we really want to add another method to Queue? Worse yet, you'd have to lock across both calls to get a correct answer?

@colinsurprenant
Copy link
Contributor

I am just pointing out the potential problem of changing the semantic of isFullyAcked() and maybe for the purpose of this bug fix it might be better to check for actual queue emptiness in the Ruby Empty? method instead of introducing that in isFullyAcked().

@original-brownbear
Copy link
Member Author

@colinsurprenant

check for actual queue emptiness

I don't think we want to do that, at least with my understanding of PQ. If the PQ says that "there is nothing else to process", shouldn't that mean that all reads from the Queue were acknowledged? Otherwise restarting LS may lead to repeated processing (not that it will in practice probably), but that's what this change would make possible as an accidental side effect of changing pipeline code.

@colinsurprenant
Copy link
Contributor

Oh, I meant to somehow move the check this.elementCount == 0 into the Ruby empty? method instead of adding it in the isFullyAcked() method.

@original-brownbear
Copy link
Member Author

original-brownbear commented Jun 27, 2017

@colinsurprenant Yes, but would have to then lock over the two operations since they're not atomic:
(this.elementCount == 0and isFullyAcked())

Plus, you'd have to add a new isEmpty() to java somehow in my opinion, calling count isn't really what you want to do on this kind of datastructure (since it either requires indexing or what we currently have, the full read of a queue before processing it).

Also our count methods are currently not neither synchronized nor atomic

    public long getAckedCount() {
        return headPage.ackedSeqNums.cardinality() + tailPages.stream()
                .mapToLong(page -> page.ackedSeqNums.cardinality())
                .sum();
    }

    public long getUnackedCount() {
        long headPageCount = (headPage.getElementCount() - headPage.ackedSeqNums.cardinality());
        long tailPagesCount = tailPages.stream()
                .mapToLong(page -> (page.getElementCount() - page.ackedSeqNums.cardinality())).sum();
        return headPageCount + tailPagesCount;
    }

and hence not correct when called without holding org.logstash.ackedqueue.Queue#lock. (this fact should probably get a new issue either way)

I don't think I can do this kind of thing (count calls in a concurrent situation) without a bunch of potential issues.

@colinsurprenant
Copy link
Contributor

Ok. But adding the check into isFullyAcked() is IMO dangerous at this point for the purpose of this bug fix. I guess there are many ways this could be fixed if you agree with the idea of deferring the change to isFullyAcked() ? I would be ok to create extra specific code for this if required until we refactor isFullyAcked().

@original-brownbear
Copy link
Member Author

@colinsurprenant idk man :), we only use that function in 4 production code spots. Do we really want to add more tech debt here, when we already have a bug that resulted from undocumented/inconsistent behavior?
I guess this is more of a question for @suyograo or @jordansissel (not sure who manages the deadlines :D), if we have to have this fixed now without the risk of any side effect I'm on board with some hacky workaround like adding yet another function.
But if we have some time ... I really wouldn't pile onto things here.

@original-brownbear
Copy link
Member Author

@colinsurprenant also note that all tests still pass + one more that reproduced a bug. Maybe you can suggest some tests we could add to rule out potential problems you see? Better to invest the time into that compared to investing it into hacks?

@jordansissel
Copy link
Contributor

not sure who manages the deadlines

We don't really have deadlines for code in most cases. If a change is not ready when a release is cut, then that change can go into the next release.

@colinsurprenant
Copy link
Contributor

again my concern is to minimize the risk of introducing a change that could potentially have a unwanted side-effect. We have essentially 3 choices:

  • we abort mission of this one and defer the fix into a better refactor
  • we somehow find a fix for this now while minimizing potential side-effects
  • we move on with the proposed change in isFullyAcked() but also prove that all its usages are safe

I am ok with any of these. Since you introduced this PR I'll let you choose what you prefer.

@original-brownbear
Copy link
Member Author

@jordansissel so how does that tie in with the specific issue here? I mean if something fixes a failing test, but passes all tests and we have no deadline it seems wrong to add tech debt instead of it on the grounds of avoiding the risk of introducing bugs?

@original-brownbear
Copy link
Member Author

@colinsurprenant

but also prove that all its usages are safe

sure let's do this, as I asked above, what tests would prove this?

@colinsurprenant
Copy link
Contributor

oh, and BTW, passing tests is only an indication and not a proof that a code change is correct.

@original-brownbear
Copy link
Member Author

@colinsurprenant I don't think we'll achieve algorithmic prove in this code, tests will have to do? :D

@colinsurprenant
Copy link
Contributor

sure let's do this, as I asked above, what tests would prove this?

Not sure - haven't looked into that really. Do you want to do it?

@original-brownbear
Copy link
Member Author

@colinsurprenant

Not sure - haven't looked into that really. Do you want to do it?

happy to, but I need an actionable goal, I'd like to avoid this one going stale like #6998 sorry :)

@colinsurprenant
Copy link
Contributor

not sure what is missing here? I will try to recap my concern: you are changing the behaviour of isFullyAcked() which I think can have side effects. I would prefer we not change the behaviour of isFullyAcked() for this bug fix unless it is verified that the change in behaviour does not have any unwanted side effects. I specifically pointed at a possible side effect in the Queue.write() method.

@original-brownbear
Copy link
Member Author

original-brownbear commented Jun 28, 2017

@colinsurprenant

. I specifically pointed at a possible side effect in the Queue.write() method.

Yes but I need some clear cut scenario you want to see a test for/verified to work. Otherwise I'll have to take a guess here and we'll keep going in circles if I'm wrong :(
Also I'm not even sure that the scenario exists as it would simply imply the existence of another bug in the free space method?

@colinsurprenant
Copy link
Contributor

@original-brownbear I think there is a misunderstanding on our respective roles here. You took the initiative to write this code to propose a fix. As the reviewer I am pointing at a possible issue. I am expecting you will try to receive that comment as something constructive toward improving that code and look into it and possibly propose solutions. I don't think you need any more clear cut scenarios and I am sure you can apply the same level of initiative to work toward improving this. If I knew exactly what the problem was, how to solve it and what tests are missing I would have said it already. For me to get to that point would require I spend more time digging into this. Since this is your code change proposal I am expecting you will move it forward while trying to honestly consider any reviewer comment that are made here.

As I said there are many ways this can move forward, be it a temp fix with possible tech debt, a bigger refactor, further investigation into a potential extra bug in the free space method, etc. There is no right or wrong.

@original-brownbear
Copy link
Member Author

@colinsurprenant I get you point really :) but look:

If I knew exactly what the problem was, how to solve it and what tests are missing I would have said it already

So is there even an actionable bug/issue here at all, if there isn't one how would I prove there isn't?
My problem is that I'm asked to add tests unspecified issues and could invest hours only to learn that my guesses as to the nature of them, were wrong (again according to unspecified criteria?)?

@colinsurprenant
Copy link
Contributor

Okay. I'll circle back to try and move this to some closure at some point.

@colinsurprenant colinsurprenant removed their request for review June 28, 2017 13:31
@colinsurprenant colinsurprenant removed their assignment Jun 28, 2017
@original-brownbear
Copy link
Member Author

@suyograo fixed just the syntax and removed the test and will assign @colinsurprenant to the actual issue then as discussed.

@colinsurprenant
Copy link
Contributor

LGTM - this will solve the first part of #7568.

@elasticsearch-bot
Copy link

Armin Braun merged this into the following branches!

Branch Commits
5.x 1e0ed35, d6262d1
master c307b18, 1000794

elasticsearch-bot pushed a commit that referenced this pull request Jun 29, 2017
elasticsearch-bot pushed a commit that referenced this pull request Jun 29, 2017
Fixes #7524
elasticsearch-bot pushed a commit that referenced this pull request Jun 29, 2017
Fixes #7524
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants