Ongoing pruning #4429

lutter · 2023-03-07T01:12:43Z

This is still work-in-progress, and I am still testing correctness and performance, but for greater visibility, opening a PR already.

This PR adds the ability to perform pruning on an ongoing basis; each deployment now has an optional flag history_blocks that indicates how much history the deployment should retain. While the deployment is processing blocks, it checks whether the deployment currently has more history than configured and prunes itself if that is the case. Pruning happens in line with normal block processing, so processing can get blocked by pruning, though the details depend on how full the write queue gets.

The frequency with which pruning is performed is set with GRAPH_STORE_HISTORY_SLACK_FACTOR: once a deployment has more than history_blocks * GRAPH_STORE_HISTORY_SLACK_FACTOR, pruning is kicked off.

Pruning can now use two different strategies: copying and deleting, and selects the strategy based on how much of each table is historical and therefore has to be removed. For large removals, copying is used, and for smaller removals, we delete (configured with GRAPH_STORE_HISTORY_COPY_THRESHOLD and GRAPH_STORE_HISTORY_DELETE_THRESHOLD.

The amount of data we are likely to remove is estimated, since getting precise numbers would be too slow as it onvolves counting the entries in each table. The estimate is based on how much history the table retains and the ratio of entities to entity versions. When that ratio is high, it's likely that pruning will remove only few rows, and when it is low, it's likely that pruning will remove many rows. Similarly, if the ratio of history_blocks to the number of blocks for which a table has data is high, we expect to remove only few blocks since a lot of rows are still current. The number of blocks for which a table has data is stored in table_stats.last_pruned_block - since a table will not necessarily have data removed on every pruning run, that block can be substantially before the deployment's earliest block.

Pruning is controlled with graphman prune; besides performing an initial prune, it now also sets the deployment's history_blocks by default, which can be turned off by passing --once.

lutter · 2023-03-14T16:14:05Z

I've run this in the integration cluster without pruning any subgraphs yet, and at least for that, it does not cause POI differences. Will do another run with pruned subgraphs, but at least as long as we don't prune, this code is ok

lutter · 2023-03-16T00:44:11Z

I've now run this in the integration cluster with all subgraphs pruned to 10,000 blocks. The initial pruning took almost 24hrs, but it was doing one subgraph at a time. Most subgraphs took under 5 minutes for the initial pruning, but there were a handful (< 10) where the initial pruning took 30-90 minutes.

So far, there have been no POI differences except in cases where integer is so far behind fraction that fraction doesn't have a POI any more, i.e., more "we can't determine whether there is a POI diff" than an actual difference

leoyvens · 2023-03-16T14:07:50Z

Just from reading the PR description, my biggest question would be why block indexing, rather than do this in a concurrent task? That could be spawned by the indexing process.

Another thing, could history_blocks have a value that means "as tight as possible"? The value could be null or 0, and could eventually be made the default for new deployments. Initially this could mean using REORG_THRESHOLD. And if in the future we have a finer notion of finality, possibly a dynamic one, we would have the flexibility to switch to that.

lutter · 2023-03-16T17:01:54Z

Just from reading the PR description, my biggest question would be why block indexing, rather than do this in a concurrent task? That could be spawned by the indexing process.

There's a few reasons: (1) it's simpler in terms of behavior and implementation (2) doing this in a background task will mean that indexing a subgraph sometimes needs 2 connections which can lead to bursty connection requirements (3) we will run even more threads and I am a little concerned that we are already using a very large number of threads.

None of this means we should never do this, but I would first like to see how things work out with the simpler processing model of non-concurrent prunes.

The main factor for how much this will slow down indexing is the setting of GRAPH_STORE_HISTORY_SLACK_FACTOR; if that is set, e.g., to 1.2, and a deployment retains 10,000 blocks, we will attempt to prune every 2,000 blocks. Basically, the time to prune is amortized over those 2,000 blocks. Whether that will turn out to be an issue in practice, I don't know, and I feel we first need to get some experience with the performance in concrete cases.

Another thing, could history_blocks have a value that means "as tight as possible"? The value could be null or 0, and could eventually be made the default for new deployments. Initially this could mean using REORG_THRESHOLD. And if in the future we have a finer notion of finality, possibly a dynamic one, we would have the flexibility to switch to that.

For 'as tight as possible', users can just set that to the REORG_THRESHOLD manually when they set history_blocks for a deployment. The idea behind all of this is that we still need a policy for deciding what the right history_blocks is and how that gets decided for each deployment; at the end of the day something needs to translate that into a concrete number of blocks. This PR just punts on that and assumes that's decided elsewhere.

I think once we have a different notion of finality, it will be much easier and clearer to adapt the pruning behavior to that rather than trying to anticipate what that might look like.

leoyvens · 2023-03-16T20:40:33Z

(1) it's simpler in terms of behavior and implementation (2) doing this in a background task will mean that indexing a subgraph sometimes needs 2 connections which can lead to bursty connection requirements (3) we will run even more threads and I am a little concerned that we are already using a very large number of threads.

The advantage of not blocking indexing seems stronger than these downsides. (1) the possibility of a concurrent prune through graphman already exists (2) the total time holding a connection should be the same so no change in total load, even if it's maybe burstier. The threshold can be calculated in the main task so the overhead is only taken when pruning will really happen. (3) More tasks, which doesn't change the maximum number of threads.

If we agree that the ideal design is to do it concurrently, then imo we should do it now, so we can already start discussing and gaining experience with the intended design rather than a provisional one.

lutter · 2023-03-16T21:54:26Z

(1) it's simpler in terms of behavior and implementation (2) doing this in a background task will mean that indexing a subgraph sometimes needs 2 connections which can lead to bursty connection requirements (3) we will run even more threads and I am a little concerned that we are already using a very large number of threads.

The advantage of not blocking indexing seems stronger than these downsides. (1) the possibility of a concurrent prune through graphman already exists (2) the total time holding a connection should be the same so no change in total load, even if it's maybe burstier. The threshold can be calculated in the main task so the overhead is only taken when pruning will really happen. (3) More tasks, which doesn't change the maximum number of threads.

Agreed on (1) and (2); for (3): we ultimately have to run the db interaction on the blocking pool which means another thread.

If we agree that the ideal design is to do it concurrently, then imo we should do it now, so we can already start discussing and gaining experience with the intended design rather than a provisional one.

Another concern is error propagation: right now, if pruning fails, errors are handled just like any other error that happens during transact_block_operations. With doing pruning in another task, we need to figure out how errors make it back to the subgraph.

lutter · 2023-03-16T22:42:58Z

@leoyvens The change you are asking for would be easy enough: it would just require replacing this line with something like this:

            async fn run(
                store: Arc<DeploymentStore>,
                reporter: Box<OngoingPruneReporter>,
                site: Arc<Site>,
                req: PruneRequest,
            ) -> Result<Box<dyn PruneReporter>, StoreError> {
                store.prune(reporter, site, req).await
            }

            let _handle = graph::spawn(run(this, reporter, site, req));

For error propagation, we would need to do something smart with _handle, like store it somewhere and check it periodically; but this should be all that's needed to run pruning in the background - though I still have the concerns I mentioned above.

leoyvens

Amazing work! I tried it out and it worked very well. I like the polish such as the reporter trait for ongoing vs cli pruning. Left some comments.

leoyvens · 2023-03-23T11:47:48Z

graph/src/env/store.rs

@@ -160,4 +180,42 @@ pub struct InnerStore {
    write_queue_size: usize,
    #[envconfig(from = "GRAPH_STORE_BATCH_TARGET_DURATION", default = "180")]
    batch_target_duration_in_secs: u64,
+    #[envconfig(from = "GRAPH_STORE_HISTORY_COPY_THRESHOLD", default = "0.5")]
+    copy_threshold: ZeroToOneF64,
+    #[envconfig(from = "GRAPH_STORE_HISTORY_COPY_THRESHOLD", default = "0.05")]


Suggested change

#[envconfig(from = "GRAPH_STORE_HISTORY_COPY_THRESHOLD", default = "0.05")]

#[envconfig(from = "GRAPH_STORE_HISTORY_DELETE_THRESHOLD", default = "0.05")]

Woops .. fixed

leoyvens · 2023-03-23T11:48:09Z

graph/src/env/store.rs

+    fn from_str(s: &str) -> Result<Self, Self::Err> {
+        let f = s.parse::<f64>()?;
+        if f < 1.01 {
+            bail!("invalid value: {s} must be bigger than 1.01");


Why can't the slack factor can't be set to 1? If that can cause numerical instability somewhere, imo we should fix that there and allow this to be set to 1.

It's mostly to avoid doing a pruning run on every block - with a slack factor of 1, you will try to prune on every block which is just very wasteful. The lower bound of allowing 1% of slack is somewhat arbitrary though.

leoyvens · 2023-03-23T11:49:47Z

store/postgres/src/relational/prune.rs

@@ -170,14 +197,16 @@ impl TablePair {
                // The conditions on `block_range` are expressed redundantly
                // to make more indexes useable
                sql_query(format!(
-                    "insert into {dst}({column_list}) \
+                    "/* controller=prune,phase=nonfinal,start_vid={next_vid},next_vid={batch_size} */ \


Here and in a previous query there is next_vid=batch_size, which doesn't make sense to me.

I fixed the names in the comments to make this less confusing.

leoyvens · 2023-03-23T11:50:37Z

graph/src/components/store/mod.rs

+                "the delete threshold must be between 0 and 1 but is {delete_threshold}"
+            ));
+        }
+        if history_blocks < reorg_threshold {


I believe this should be <=, I tried it with equal values and nothing gets pruned, and then history_blocks fails to set because it checks for <=.

Good catch - fixed.

leoyvens · 2023-03-23T11:51:32Z

node/src/manager/commands/prune.rs

+    let mut req = PruneRequest::new(
+        &deployment,
+        history,
+        ENV_VARS.reorg_threshold,


A minor inconsistency is that this env value might be different between the index node and graphman, but improving this will require a general rethink of finality configuration.

Yeah, the global reorg_threshold is something that I'd love for us to improve on. I feel that that is something that should be settable for each chain individually (e.g., in graph-node.toml)

leoyvens · 2023-03-23T11:52:50Z

store/postgres/src/deployment_store.rs

+            ));
+        }
+
+        if history_blocks <= 0 {


The preceding if makes this dead code, assuming reorg_threshold >= 0.

leoyvens · 2023-03-23T11:53:22Z

store/postgres/src/deployment_store.rs

+                        .unwrap()
+                        .remove(&site.id)
+                        .unwrap();
+                    match graph::block_on(reap(handle)) {


Since we know the task is finished, we can use now_or_never https://docs.rs/futures/latest/futures/future/trait.FutureExt.html#method.now_or_never here instead of block_on.

I had no idea that existed. Changed.

leoyvens · 2023-03-23T11:53:42Z

store/postgres/src/advisory_lock.rs

+
+const COPY: Scope = Scope { id: 1 };
+const WRITE: Scope = Scope { id: 2 };
+const PRUNE: Scope = Scope { id: 3 };


Is there no need for PRUNE and COPY to be mutually exclusive operations?

Good question; I think it's ok if the source subgraph gets pruned simultaneously:

we might end up with data that's been pruned from the source, but that's harmless since the earliest_block of the destination will be set to that from the source after data copying has finished and will make that data unreachable

we might also try to copy data that has been pruned after we determined which data to copy, which is fine for the same reason

If the destination subgraph gets pruned while the copy is still in progress, things are ok for the raw data, but I think we might get into trouble with these steps:

copy process starts and copies data

pruning does its thing and sets earliest_block

copying data finishes and sets metadata, including earliest_block from the source which might be before what was pruned

We could fix this by never allowing earliest_block to go back by changing deployment::earliest_block to

pub fn set_earliest_block( conn: &PgConnection, site: &Site, earliest_block: BlockNumber, ) -> Result<(), StoreError> { use subgraph_deployment as d; update(d::table.filter(d::id.eq(site.id))) .set(d::earliest_block_number.eq(earliest_block)) .filter(d::earliest_block.lt(earliest_block)) // this is the change .execute(conn)?; Ok(()) }

There's one more interaction between copying and pruning that I am not clear on: as things are right now, when you copy/graft a deployment that has history_blocks set, that setting gets reset to the default, i.e., pruning is disabled for the copy. I think copying/grafting should retain that setting.

I just added a commit that does this

Nice, thanks for thinking through this, it's worth documenting some of that analysis somewhere on this file.

we might end up with data that's been pruned from the source, but that's harmless since the earliest_block of the destination will be set to that from the source after data copying has finished and will make that data unreachable

I just looked through the code again, and that's actually not true. I filed #4496 to capture this. I'll merge this PR as-is and work on another PR to address this problem.

lutter · 2023-03-23T18:36:22Z

Addressed/replied to all review comments

lutter · 2023-03-23T23:01:57Z

Rebased to latest master

Pass in how many blocks of history to keep instead of the earliest block; use the subgraph's setting if the caller doesn't specify history

The store also needs access to it, and this avoids making the store dependent on graph::chain::ethereum

We used to perform pruning in two stages, by first copying final entities for all tables, then copying nonfinal entities and switching for all tables. It is better to do this loop the other way around: we now go table-by-table, and for each of them do the nonfinal copy, then the final copy. This makes an ongoing prune operation less visible, since the subgraph writer can write in between the final copying for each table.

Change the PruneReporter trait to produce reasonable output for the new pruning flow

Prune the subgraph periodically while transacting blocks. Select the right strategy (copying or deleting) depending on how much history we are removing.

We used to determine the total number of blocks in a subgraph based on its latest and earliest blocks. With ongoing pruning, the earliest block is updated every time we prune, even though the logic in PruneRequest.strategy might have us actually not do anything. That leads to a situation where we think the subgraph contains much fewer blocks than it really does, and we therefore underestimate how much of its data is historical. We now remember for each table the block at which we actually pruned, which might be long before the subgraph's earliest block, and use that to determine how many blocks are present. As an example, assume we want to keep 100 blocks of history, in a subgraph that is at block 1000 and earliest block 800 and a table that was last pruned at block 500. Previously, we would have estimated that 50% of the table is historical, when in reality 80% is historical.

Also, remove a check that could never fail

lutter · 2023-03-27T20:15:56Z

Rebased to latest master.

I've also completed a run in the integration cluster with this branch; that did not cause any PoI differences. There were some differences reported, but they were all because fraction didn't have the block anymore that we were checking as the latest common block, and the test tool therefore reported an unknown POI for fraction

lutter force-pushed the lutter/prune branch 2 times, most recently from e62d491 to 970224b Compare March 7, 2023 19:32

lutter force-pushed the lutter/prune branch from 970224b to e901e5e Compare March 14, 2023 16:12

lutter marked this pull request as ready for review March 14, 2023 16:12

lutter force-pushed the lutter/prune branch from e901e5e to df441e5 Compare March 16, 2023 00:41

lutter requested review from leoyvens and mangas March 16, 2023 00:44

lutter force-pushed the lutter/prune branch 4 times, most recently from b00891f to 57ab779 Compare March 17, 2023 20:59

lutter mentioned this pull request Mar 20, 2023

[Feature] Expose history_blocks setting in index-node API #4482

Closed

2 tasks

leoyvens reviewed Mar 23, 2023

View reviewed changes

leoyvens approved these changes Mar 23, 2023

View reviewed changes

lutter force-pushed the lutter/prune branch from 3ba15f5 to 7b3a55a Compare March 23, 2023 23:01

lutter mentioned this pull request Mar 27, 2023

[Bug] Pruning can set earliest_block incorrectly #4496

Closed

3 tasks

lutter added 5 commits March 27, 2023 13:01

store: Factor getting VersionStats out from prune_by_copying

b1da5e5

store: Factor determining which tables are prunable into a helper

04267c4

store: Add a column subgraph_manifest.history_blocks

d8ca8da

store: Periodically refresh Layout.history_blocks

0c72944

node, store: Set history_blocks from graphman prune

b409b85

lutter added 21 commits March 27, 2023 13:01

node, store: Change the calling convention for SubgraphStore.prune

f80abc8

Pass in how many blocks of history to keep instead of the earliest block; use the subgraph's setting if the caller doesn't specify history

all: Move reorg threshold env var to garph::env

51707ec

The store also needs access to it, and this avoids making the store dependent on graph::chain::ethereum

store: Factor vid_range into a helper

b001636

graph, node, store: Update the PruneReporter

b5f3671

Change the PruneReporter trait to produce reasonable output for the new pruning flow

node: Print earliest block in graphman info -s

620e636

all: Make choosing the pruning strategy more explicit

d19182a

all: Ongoing pruning

8ba6003

Prune the subgraph periodically while transacting blocks. Select the right strategy (copying or deleting) depending on how much history we are removing.

store: Annotate pruning queries to enhance observability

83ee05e

graph: Update tokio to 1.26.0

5777896

store: Move retry logic into its own module

40fbae6

graph, store: Perform pruning in a separate task

8f94576

store: Refactor advisory_lock to reduce code duplication

d725a07

store: Lock pruning at the database level

9e6be95

graph: Fix name of GRAPH_STORE_HISTORY_DELETE_THRESHOLD env var

aeb7a76

store: Fix names of debug comments in prune queries

78c22a4

graph: Require that we keep strictly more than reorg_threshold history

577a350

Also, remove a check that could never fail

store: Use now_or_never to reap join handle of prune task

3355707

store: Never move earliest_block backwards

6f44856

graph, store: Retain history_blocks setting when copying

9dd4828

lutter force-pushed the lutter/prune branch from 7b3a55a to 9dd4828 Compare March 27, 2023 20:01

lutter merged commit 9dd4828 into master Mar 28, 2023
6 checks passed

lutter deleted the lutter/prune branch March 28, 2023 00:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ongoing pruning #4429

Ongoing pruning #4429

lutter commented Mar 7, 2023

lutter commented Mar 14, 2023

lutter commented Mar 16, 2023

leoyvens commented Mar 16, 2023

lutter commented Mar 16, 2023

leoyvens commented Mar 16, 2023

lutter commented Mar 16, 2023

lutter commented Mar 16, 2023

leoyvens left a comment •

edited

leoyvens Mar 23, 2023

lutter Mar 23, 2023

leoyvens Mar 23, 2023

lutter Mar 23, 2023

leoyvens Mar 23, 2023

lutter Mar 23, 2023

leoyvens Mar 23, 2023

lutter Mar 23, 2023

leoyvens Mar 23, 2023

lutter Mar 23, 2023

leoyvens Mar 23, 2023

lutter Mar 23, 2023

leoyvens Mar 23, 2023

lutter Mar 23, 2023

leoyvens Mar 23, 2023

lutter Mar 23, 2023

lutter Mar 23, 2023 •

edited

leoyvens Mar 23, 2023

lutter Mar 27, 2023 •

edited

lutter commented Mar 23, 2023

lutter commented Mar 23, 2023

lutter commented Mar 27, 2023

	#[envconfig(from = "GRAPH_STORE_HISTORY_COPY_THRESHOLD", default = "0.05")]
	#[envconfig(from = "GRAPH_STORE_HISTORY_DELETE_THRESHOLD", default = "0.05")]

Ongoing pruning #4429

Ongoing pruning #4429

Conversation

lutter commented Mar 7, 2023

lutter commented Mar 14, 2023

lutter commented Mar 16, 2023

leoyvens commented Mar 16, 2023

lutter commented Mar 16, 2023

leoyvens commented Mar 16, 2023

lutter commented Mar 16, 2023

lutter commented Mar 16, 2023

leoyvens left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter Mar 23, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lutter Mar 27, 2023 • edited

Choose a reason for hiding this comment

lutter commented Mar 23, 2023

lutter commented Mar 23, 2023

lutter commented Mar 27, 2023

leoyvens left a comment •

edited

lutter Mar 23, 2023 •

edited

lutter Mar 27, 2023 •

edited