Backfilling progress times out most of the time #1999

neumino · 2014-02-24T03:38:56Z

The progress data I retrieve from the servers seem to timeout most of the time when servers are backfilling (/ajax/progress).

It used to work better (at least from what I can remember)

{
  "dc85ca94-ad17-49b4-880b-e43a83515be0": {
    "16d74cb4-fee3-4538-8a0e-4419493cce09": {
      "[\"\",null]": [
        "Timeout"
      ]
    },
    "34367f93-7d84-43ac-ada6-1bdb5544a67f": {
      "[\"\",null]": [
        "Timeout"
      ]
    },
    "3bc111e0-f520-4207-b3c3-14970acf6eb7": {
      "[\"\",null]": [
        "Timeout"
      ]
    },
    "68cd0df2-602c-478e-9a7a-161283baa92d": {
      "[\"\",null]": [
        "Timeout"
      ]
    },
    "71602467-b935-4cd6-ac33-071289b53a5d": {
      "[\"\",null]": [
        "Timeout"
      ]
    },
    "86b6ebee-9ba6-4b85-9382-f89cba98d1aa": {
      "[\"\",null]": [
        "Timeout"
      ]
    },
    "89fc1154-d81d-491a-b47a-7afa6014059b": {
      "[\"\",null]": [
        "Timeout"
      ]
    },
    "f2c51e26-9e6c-458f-a8a7-e6e657e53f61": {
      "[\"\",null]": [
        "Timeout"
      ]
    }
  },
  "id": "720507fe-80ae-4f53-9e5e-98df6e0934c1"
}

Putting in backlog because the web ui just displays the number of blocks only if the number is available, and relies on the number of replicas ready to display progress.

The text was updated successfully, but these errors were encountered:

danielmewes · 2014-02-24T04:33:11Z

Can you describe what server configuration you saw this with?

I used /ajax/progress a lot recently to check on backfills and never saw timeouts. I was testing on rotational drives though (on magneto and electro). Maybe that avoided timeouts because it was slowing down backfilling...

neumino · 2014-02-24T05:46:48Z

I started the two instances on my local machine (i7 3rd gen, 12gb ram, ssd).

They are both in debug mode, so maybe that's why? I'll try in release mode later.

coffeemug · 2014-02-24T09:59:57Z

I'd like to move this to subsequent. I think this functionality is really important. It's been somewhat flaky before because both the numerator and the denominator change, and we'll have to make that work better for ops people, but I don't think we should just accept progress timing out.

coffeemug · 2014-07-07T07:33:04Z

Also, the numerator and the denominator both change, which makes the progress feature unusable in real life scenarios.

danielmewes · 2014-07-07T18:19:10Z

Also, the numerator and the denominator both change, which makes the progress feature unusable in real life scenarios.

I don't fully follow on this one. Is the problem that the progress value is not monotonic? Or that the denominator changes? I know this is based on an actual user issue, but maybe you have more insights better knowing the context.

The latter doesn't seem like an issue. If you are not interested in the exact blocks estimated, you can simply convert the value to a decimal number.

The solution to the former would be to artificially force the fraction to never decrease, though that would be arguably worse. It's in the nature of predictions based on incomplete information that they change as more data becomes available.

neumino · 2014-07-07T18:22:55Z

From what I've heard (I never read the code), we somehow recurse in the Btree, and give an estimate of the number of blocks left to copy by keeping an avarage branching factor.
Every time we read a new block, we find new blocks (which increate the denominator).

danielmewes · 2014-07-07T18:36:07Z

From what I've heard (I never read the code), we somehow recurse in the Btree, and give an estimate of the number of blocks left to copy by keeping an avarage branching factor.

Yes. We don't always increase the denominator though, but start with an estimate for the total number of blocks that gets refined during backfilling.

I just wonder which part of this is the actual problem.

Making the block count precise isn't that easy (in the presence of resharding). It is essentially the same problem as keeping track of the number of documents in a certain range of the tree #152 .

neumino · 2014-07-07T18:37:56Z

One problem is that if you compute a percentage, it may decrease.

I think sending a percentage that never decrease is enough for most of the use cases. It's basically what the web interface was doing until I replace the progress bar with the number of replicas ready/total number of replicas.

danielmewes · 2014-07-07T19:08:12Z

Actually another problem is that backfills that are being held back because there already are too many concurrent backfills going on don't report any estimates. I think their progress is just 0/0 or -1/-1 or something like that. That's technically fine, but makes it more difficult to get a good overall progress estimate (you could for example assume that those shards are approximately as big as the other shards, and use the average number of estimated blocks from the other shards as the number of blocks for the throttled ones).

coffeemug · 2014-07-07T22:33:41Z

I think the current implementation has a general problem of usability + perception. There are a couple of solutions I can think of (though I don't know if some of these are possible or how much work they are):

Only return a ratio and artificially ensure it's monotonically increasing. That would be better than what we have now because it would solve many perception issues for people.
The problem with a monotonically increasing ratio is that it might be stuck at a low point (say, .1) and then suddenly jump to .9. This is better than what we have now, but still not ideal. Preferably the ratio would increase gradually to actually represent the state of the backfill, otherwise it isn't very useful.
The ratio is ok, but it's really nice to be able to tell how much data transfer we're actually doing in blocks/megabytes. If we do this, we should make sure the denominator doesn't actually change. I don't know if this is doable -- if not, I think we should stick with returning just the ratio.

danielmewes · 2014-07-07T22:53:33Z

How about we expose the actual number of blocks transferred, but not the estimate for the total number of blocks?

We would have:

number of blocks transferred for a given shard
percentage of backfill complete for a given shard

Optionally we can make sure that the percentage never decreases.

coffeemug · 2014-07-07T23:00:42Z

That makes sense, but then couldn't I divide one by the other to get an estimate of the total number of blocks? :)

(Also, I don't think making sure percentage doesn't decrease is optional, otherwise the feature isn't very useful)

danielmewes · 2014-07-07T23:10:19Z

but then couldn't I divide one by the other to get an estimate of the total number of blocks? :)

Hmm you are right. Though this endeavour would be partially hindered if we forced the percentage to be monotonic.
We could randomly perturbate either the percentage or the number of processed blocks to make it more difficult to reconstruct the estimated total with a high precision. We would have to be careful to pick the perturbation at the beginning of a backfill, so people don't query the progress 1,000 times in a row and reconstruct the unperturbated value by averaging.
Also there is a problem of known plaintext attacks. Let's say we add a random number to the percentage. Then at the very beginning and at the very end of the process, users could reconstruct the random number because they know the unperturbated percentage must be 0 and 100 respectively. We would have to do something smarter than addition, though some type of statistical attack would almost always still be possible (as far as I can think).

coffeemug · 2014-07-07T23:11:41Z

😀

danielmewes · 2015-03-24T02:19:19Z

This is outdated. We now have r.db('rethinkdb').table('stats') and as far as I can see took care of these issues for that.

neumino added this to the backlog milestone Feb 24, 2014

coffeemug modified the milestones: subsequent, backlog Feb 24, 2014

danielmewes closed this as completed Mar 24, 2015

danielmewes modified the milestones: outdated, subsequent Mar 24, 2015

This was referenced Mar 12, 2020

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 DhavalW/rethinkdb#9

Open

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 enterstudio/rethinkdb#11

Open

This was referenced Oct 20, 2020

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 DhavalW/rethinkdb#12

Open

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 enterstudio/rethinkdb#14

Open

snyk-bot mentioned this issue Sep 3, 2021

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 ferreiramarcelo/rethinkdb#6

Open

This was referenced Sep 12, 2021

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 ferreiramarcelo/rethinkdb#7

Open

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 DhavalW/rethinkdb#16

Open

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 enterstudio/rethinkdb#18

Open

enterstudio mentioned this issue May 13, 2022

[Snyk] Fix for 1 vulnerabilities enterstudio/rethinkdb#21

Open

ferreiramarcelo mentioned this issue May 13, 2022

[Snyk] Fix for 1 vulnerabilities ferreiramarcelo/rethinkdb#10

Open

DhavalW mentioned this issue May 14, 2022

[Snyk] Fix for 1 vulnerabilities DhavalW/rethinkdb#19

Open

enterstudio mentioned this issue Oct 18, 2022

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 enterstudio/rethinkdb#23

Open

snyk-bot mentioned this issue Oct 18, 2022

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 ferreiramarcelo/rethinkdb#12

Open

DhavalW mentioned this issue Oct 19, 2022

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 DhavalW/rethinkdb#21

Open

ferreiramarcelo mentioned this issue Jun 20, 2023

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 ferreiramarcelo/rethinkdb#14

Open

DhavalW mentioned this issue Jun 21, 2023

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 DhavalW/rethinkdb#23

Open

enterstudio mentioned this issue Nov 29, 2023

[Snyk] Fix for 17 vulnerabilities enterstudio/rethinkdb#26

Open

ferreiramarcelo mentioned this issue Apr 15, 2024

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 ferreiramarcelo/rethinkdb#18

Open

DhavalW mentioned this issue Apr 15, 2024

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 DhavalW/rethinkdb#26

Open

enterstudio mentioned this issue Apr 16, 2024

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 enterstudio/rethinkdb#29

Open

ferreiramarcelo mentioned this issue May 13, 2024

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 ferreiramarcelo/rethinkdb#19

Open

DhavalW mentioned this issue May 13, 2024

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 DhavalW/rethinkdb#27

Open

enterstudio mentioned this issue May 13, 2024

[Snyk] Security upgrade gulp from 3.9.1 to 4.0.0 enterstudio/rethinkdb#30

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backfilling progress times out most of the time #1999

Backfilling progress times out most of the time #1999

neumino commented Feb 24, 2014

danielmewes commented Feb 24, 2014

neumino commented Feb 24, 2014

coffeemug commented Feb 24, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Jul 7, 2014

neumino commented Jul 7, 2014

danielmewes commented Jul 7, 2014

neumino commented Jul 7, 2014

danielmewes commented Jul 7, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Jul 7, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Jul 7, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Mar 24, 2015

Backfilling progress times out most of the time #1999

Backfilling progress times out most of the time #1999

Comments

neumino commented Feb 24, 2014

danielmewes commented Feb 24, 2014

neumino commented Feb 24, 2014

coffeemug commented Feb 24, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Jul 7, 2014

neumino commented Jul 7, 2014

danielmewes commented Jul 7, 2014

neumino commented Jul 7, 2014

danielmewes commented Jul 7, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Jul 7, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Jul 7, 2014

coffeemug commented Jul 7, 2014

danielmewes commented Mar 24, 2015