#7476 fix logic and syntax of queue fully_acked method #7524

original-brownbear · 2017-06-23T11:52:57Z

Syntax error in wrapped_acked_queue.rb

colinsurprenant · 2017-06-27T17:46:45Z

Thanks @original-brownbear for looking into this.

The fix for the empty? syntax error + test looks good.

For the isFullyAcked() method, I am not sure about this: I see why you want isFullyAcked() to return true for the empty? method but I wonder if semantically it makes sense to say that an empty page is fully acked?

Also, won't that break this condition here where a new head page will be then considered fully acked and immediately transformed into a tail page?

logstash/logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java

Lines 335 to 340 in b3b9e60

    
           if (this.headPage.isFullyAcked()) { 
        
               // purge the old headPage because its full and fully acked 
        
               // there is no checkpoint file to purge since just creating a new TailPage from a HeadPage does 
        
               // not trigger a checkpoint creation in itself 
        
               TailPage tailPage = new TailPage(this.headPage); 
        
               tailPage.purge();

original-brownbear · 2017-06-27T18:01:38Z

@colinsurprenant npnp :)

method but I wonder if semantically it makes sense to say that an empty page is fully acked?

I'd go with yes on this one, because "nothing is unacknowledged" must mean "fully acknowledged".
Anything else is really hard to make into consistent algebra imo.

Also, won't that break this condition here where a new head page will be then considered fully acked and immediately transformed into a tail page?

logstash/logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java

Lines 335 to 340 in b3b9e60

if (this.headPage.isFullyAcked()) {

// purge the old headPage because its full and fully acked

// there is no checkpoint file to purge since just creating a new TailPage from a HeadPage does

// not trigger a checkpoint creation in itself

TailPage tailPage = new TailPage(this.headPage);

tailPage.purge();

Counter question :) Why does an empty page fail the free space check wrapping that logic?

logstash/logstash-core/src/main/java/org/logstash/ackedqueue/Queue.java

Line 329 in b3b9e60

if (! this.headPage.hasSpace(data.length)) {

            if (! this.headPage.hasSpace(data.length)) {

colinsurprenant · 2017-06-27T18:25:13Z

@original-brownbear

I'd go with yes on this one, because "nothing is unacknowledged" must mean "fully acknowledged".
Anything else is really hard to make into consistent algebra imo.

Interesting - yeah empty set algebra is weird. So both "all elements are acknowledged" and "all elements are unacknowledged" would be true at the same time on an empty set.

Maybe we should probably rename that method to reflect the initial intention of looking for all elements on a non empty queue?

Counter question :) Why does an empty page fail the free space check wrapping that logic?

I'll let you look into this - seems important to figure before merging.

original-brownbear · 2017-06-27T18:58:50Z

@colinsurprenant

Maybe we should probably rename that method to reflect the initial intention of looking for all elements on a non empty queue?

+1 in general, it would be less confusing. What I really don't like about it is that we have to negate every call to it we make at the moment, which really messes up the git history. Not sure we want to do that while we fix a bug?

So both "all elements are acknowledged" and "all elements are unacknowledged" would be true at the same time on an empty set.

Yea we aren't asking if all elements are unacknowledged anywhere in our code, this seems like a metaphysics question. If we would ask both questions, we should probably just throw on a call to this method for an empty page and gate all use cases by a count or isEmpty check (imo).

colinsurprenant · 2017-06-27T19:21:39Z

+1 in general, it would be less confusing. What I really don't like about it is that we have to negate every call to it we make at the moment, which really messes up the git history. Not sure we want to do that while we fix a bug?

Agree. Can I then suggest you change the empty? to both check for the empty page case and the is_fully_acked? case?

for the isFullyAcked() method, either we keep your change or not - I would argue maybe not, unless we prove that this is not/will not create problems elsewhere, and defer the renaming/semantic change to another PR?

this seems like a metaphysics question.

Oh, totally :P it was just to illustrate the empty set algebra metaphysical confusion.

original-brownbear · 2017-06-27T19:37:27Z

@colinsurprenant

Can I then suggest you change the empty? to both check for the empty page case and the is_fully_acked? case?

But then we add 2 synchronized Ruby to Java calls + we don't have any empty check implemented in the Queue. Do we really want to add another method to Queue? Worse yet, you'd have to lock across both calls to get a correct answer?

colinsurprenant · 2017-06-27T19:48:07Z

I am just pointing out the potential problem of changing the semantic of isFullyAcked() and maybe for the purpose of this bug fix it might be better to check for actual queue emptiness in the Ruby Empty? method instead of introducing that in isFullyAcked().

original-brownbear · 2017-06-27T19:53:48Z

@colinsurprenant

check for actual queue emptiness

I don't think we want to do that, at least with my understanding of PQ. If the PQ says that "there is nothing else to process", shouldn't that mean that all reads from the Queue were acknowledged? Otherwise restarting LS may lead to repeated processing (not that it will in practice probably), but that's what this change would make possible as an accidental side effect of changing pipeline code.

colinsurprenant · 2017-06-27T20:04:12Z

Oh, I meant to somehow move the check this.elementCount == 0 into the Ruby empty? method instead of adding it in the isFullyAcked() method.

original-brownbear · 2017-06-27T20:13:16Z

@colinsurprenant Yes, but would have to then lock over the two operations since they're not atomic:
(this.elementCount == 0and isFullyAcked())

Plus, you'd have to add a new isEmpty() to java somehow in my opinion, calling count isn't really what you want to do on this kind of datastructure (since it either requires indexing or what we currently have, the full read of a queue before processing it).

Also our count methods are currently not neither synchronized nor atomic

    public long getAckedCount() {
        return headPage.ackedSeqNums.cardinality() + tailPages.stream()
                .mapToLong(page -> page.ackedSeqNums.cardinality())
                .sum();
    }

    public long getUnackedCount() {
        long headPageCount = (headPage.getElementCount() - headPage.ackedSeqNums.cardinality());
        long tailPagesCount = tailPages.stream()
                .mapToLong(page -> (page.getElementCount() - page.ackedSeqNums.cardinality())).sum();
        return headPageCount + tailPagesCount;
    }

and hence not correct when called without holding org.logstash.ackedqueue.Queue#lock. (this fact should probably get a new issue either way)

I don't think I can do this kind of thing (count calls in a concurrent situation) without a bunch of potential issues.

colinsurprenant · 2017-06-27T20:21:56Z

Ok. But adding the check into isFullyAcked() is IMO dangerous at this point for the purpose of this bug fix. I guess there are many ways this could be fixed if you agree with the idea of deferring the change to isFullyAcked() ? I would be ok to create extra specific code for this if required until we refactor isFullyAcked().

original-brownbear · 2017-06-27T20:53:17Z

@colinsurprenant idk man :), we only use that function in 4 production code spots. Do we really want to add more tech debt here, when we already have a bug that resulted from undocumented/inconsistent behavior?
I guess this is more of a question for @suyograo or @jordansissel (not sure who manages the deadlines :D), if we have to have this fixed now without the risk of any side effect I'm on board with some hacky workaround like adding yet another function.
But if we have some time ... I really wouldn't pile onto things here.

original-brownbear · 2017-06-27T20:54:55Z

@colinsurprenant also note that all tests still pass + one more that reproduced a bug. Maybe you can suggest some tests we could add to rule out potential problems you see? Better to invest the time into that compared to investing it into hacks?

jordansissel · 2017-06-27T20:57:58Z

not sure who manages the deadlines

We don't really have deadlines for code in most cases. If a change is not ready when a release is cut, then that change can go into the next release.

colinsurprenant · 2017-06-27T21:01:34Z

again my concern is to minimize the risk of introducing a change that could potentially have a unwanted side-effect. We have essentially 3 choices:

we abort mission of this one and defer the fix into a better refactor
we somehow find a fix for this now while minimizing potential side-effects
we move on with the proposed change in isFullyAcked() but also prove that all its usages are safe

I am ok with any of these. Since you introduced this PR I'll let you choose what you prefer.

original-brownbear · 2017-06-27T21:01:58Z

@jordansissel so how does that tie in with the specific issue here? I mean if something fixes a failing test, but passes all tests and we have no deadline it seems wrong to add tech debt instead of it on the grounds of avoiding the risk of introducing bugs?

original-brownbear · 2017-06-27T21:03:04Z

@colinsurprenant

but also prove that all its usages are safe

sure let's do this, as I asked above, what tests would prove this?

colinsurprenant · 2017-06-27T21:03:08Z

oh, and BTW, passing tests is only an indication and not a proof that a code change is correct.

original-brownbear · 2017-06-27T21:10:16Z

@colinsurprenant I don't think we'll achieve algorithmic prove in this code, tests will have to do? :D

colinsurprenant · 2017-06-27T21:39:59Z

sure let's do this, as I asked above, what tests would prove this?

Not sure - haven't looked into that really. Do you want to do it?

original-brownbear · 2017-06-27T22:34:17Z

@colinsurprenant

Not sure - haven't looked into that really. Do you want to do it?

happy to, but I need an actionable goal, I'd like to avoid this one going stale like #6998 sorry :)

colinsurprenant · 2017-06-27T23:58:11Z

not sure what is missing here? I will try to recap my concern: you are changing the behaviour of isFullyAcked() which I think can have side effects. I would prefer we not change the behaviour of isFullyAcked() for this bug fix unless it is verified that the change in behaviour does not have any unwanted side effects. I specifically pointed at a possible side effect in the Queue.write() method.

original-brownbear · 2017-06-28T07:57:52Z

@colinsurprenant

. I specifically pointed at a possible side effect in the Queue.write() method.

Yes but I need some clear cut scenario you want to see a test for/verified to work. Otherwise I'll have to take a guess here and we'll keep going in circles if I'm wrong :(
Also I'm not even sure that the scenario exists as it would simply imply the existence of another bug in the free space method?

colinsurprenant · 2017-06-28T12:39:03Z

@original-brownbear I think there is a misunderstanding on our respective roles here. You took the initiative to write this code to propose a fix. As the reviewer I am pointing at a possible issue. I am expecting you will try to receive that comment as something constructive toward improving that code and look into it and possibly propose solutions. I don't think you need any more clear cut scenarios and I am sure you can apply the same level of initiative to work toward improving this. If I knew exactly what the problem was, how to solve it and what tests are missing I would have said it already. For me to get to that point would require I spend more time digging into this. Since this is your code change proposal I am expecting you will move it forward while trying to honestly consider any reviewer comment that are made here.

As I said there are many ways this can move forward, be it a temp fix with possible tech debt, a bigger refactor, further investigation into a potential extra bug in the free space method, etc. There is no right or wrong.

original-brownbear · 2017-06-28T13:05:45Z

@colinsurprenant I get you point really :) but look:

If I knew exactly what the problem was, how to solve it and what tests are missing I would have said it already

So is there even an actionable bug/issue here at all, if there isn't one how would I prove there isn't?
My problem is that I'm asked to add tests unspecified issues and could invest hours only to learn that my guesses as to the nature of them, were wrong (again according to unspecified criteria?)?

colinsurprenant · 2017-06-28T13:31:07Z

Okay. I'll circle back to try and move this to some closure at some point.

original-brownbear · 2017-06-28T15:53:32Z

@suyograo fixed just the syntax and removed the test and will assign @colinsurprenant to the actual issue then as discussed.

colinsurprenant · 2017-06-29T17:44:23Z

LGTM - this will solve the first part of #7568.

elasticsearch-bot · 2017-06-29T17:48:38Z

Armin Braun merged this into the following branches!

Branch	Commits
5.x	`1e0ed35`, `d6262d1`
master	`c307b18`, `1000794`

Fixes #7524

elasticsearch-bot self-assigned this Jun 23, 2017

original-brownbear force-pushed the 7476 branch from 830c702 to b6f6a42 Compare June 25, 2017 17:16

suyograo requested a review from colinsurprenant June 25, 2017 17:18

suyograo assigned colinsurprenant and unassigned elasticsearch-bot Jun 25, 2017

colinsurprenant removed their request for review June 28, 2017 13:31

colinsurprenant removed their assignment Jun 28, 2017

original-brownbear added 2 commits June 28, 2017 17:52

elastic#7476 fix syntax of queue fully_acked method

add4cf6

bck

c0b724f

original-brownbear force-pushed the 7476 branch from b6f6a42 to c0b724f Compare June 28, 2017 15:52

original-brownbear assigned suyograo Jun 28, 2017

This was referenced Jun 29, 2017

ReadClient#empty? is broken #7568

Closed

isFullyAcked() semantic #7570

Closed

colinsurprenant mentioned this pull request Jun 29, 2017

queue.drain Setting is Broken #7476

Closed

elasticsearch-bot pushed a commit that referenced this pull request Jun 29, 2017

#7476 fix syntax of queue fully_acked method

1e0ed35

Fixes #7524

elasticsearch-bot pushed a commit that referenced this pull request Jun 29, 2017

bck

d6262d1

Fixes #7524

elasticsearch-bot closed this in c307b18 Jun 29, 2017

elasticsearch-bot pushed a commit that referenced this pull request Jun 29, 2017

bck

1000794

Fixes #7524

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

#7476 fix logic and syntax of queue fully_acked method #7524

#7476 fix logic and syntax of queue fully_acked method #7524

original-brownbear commented Jun 23, 2017 •

edited

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017 •

edited

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017 •

edited

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

jordansissel commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 28, 2017 •

edited

colinsurprenant commented Jun 28, 2017

original-brownbear commented Jun 28, 2017

colinsurprenant commented Jun 28, 2017

original-brownbear commented Jun 28, 2017

colinsurprenant commented Jun 29, 2017

elasticsearch-bot commented Jun 29, 2017

#7476 fix logic and syntax of queue fully_acked method #7524

#7476 fix logic and syntax of queue fully_acked method #7524

Conversation

original-brownbear commented Jun 23, 2017 • edited

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017 • edited

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017 • edited

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

jordansissel commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 27, 2017

colinsurprenant commented Jun 27, 2017

original-brownbear commented Jun 28, 2017 • edited

colinsurprenant commented Jun 28, 2017

original-brownbear commented Jun 28, 2017

colinsurprenant commented Jun 28, 2017

original-brownbear commented Jun 28, 2017

colinsurprenant commented Jun 29, 2017

elasticsearch-bot commented Jun 29, 2017

original-brownbear commented Jun 23, 2017 •

edited

original-brownbear commented Jun 27, 2017 •

edited

original-brownbear commented Jun 27, 2017 •

edited

original-brownbear commented Jun 28, 2017 •

edited