Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiering with storage class support - TTL based worker + TMFS blocks migration #7313

Merged

Conversation

tangledbytes
Copy link
Member

@tangledbytes tangledbytes commented May 29, 2023

Explain the changes

Context: NooBaa in its write flows marks an object for migration so that TMFS can migrate to another tier2 and NooBaa's read flow can recall the block from the other tier using TMFS calls but this would leave a bunch of objects on the disk (cache).

This PR adds a BG Worker which iterates over the chunks associated with a bucket (the bucket needs to have at least 2 tiers and for the case of TMFS, the second tier HAS to share the same pool) and select the chunks which haven't been read for longer than 30 mins (configurable) and move them to second tier and evict the associated the blocks.

Tested In the following way

  1. Upload 2 objects, testobject1 and testobject2 of sizes 2G and 4G respectively.
  2. Let the BG worker do its job.
  3. Run find storage/backingstores -name '*.data' -type f | xargs -n1 -I{} sh -c "echo 'file: {} => '; xattr {}; echo" which correctly shows user._trigger.migrate being set on the files.
  4. Run psql -d nbcore -U postgres -c "SELECT data FROM datachunks;" shows the second tier as the tier selected for datachunks.
  5. TTL Worker does not iterates over the blocks which are in second tier.
  • Doc added/updated
  • Tests added

@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch from e9a0116 to 3d07817 Compare May 30, 2023 08:13
@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch 3 times, most recently from be65336 to 8bf9d19 Compare June 1, 2023 10:01
@tangledbytes tangledbytes marked this pull request as ready for review June 1, 2023 10:25
@tangledbytes tangledbytes changed the title [WIP]: Add block evictor for Tier2 FS Add block evictor for Tier2 FS Jun 1, 2023
Copy link
Member

@guymguym guymguym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Utkarsh-pro The structure looks good.
Notice the comment on adding a time condition to the evictor.

src/server/bg_workers.js Outdated Show resolved Hide resolved
src/server/bg_services/tier2_block_evicter.js Outdated Show resolved Hide resolved
@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch 2 times, most recently from b225f63 to 2e513b9 Compare June 7, 2023 14:30
@guymguym guymguym requested a review from jackyalbo June 8, 2023 20:53
Copy link
Member

@guymguym guymguym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Utkarsh-pro
This PR is superb, but I raised several changes which are important (+ some cosmetic).
I'd love to discuss it further.
!!

@@ -222,6 +222,13 @@ export AWS_SECRET_ACCESS_KEY=$(npm run api -- account read_account '{}' --json |
aws --endpoint http://localhost:6001 s3 mb s3://testbucket
```

Add additional second tier to the bucket so that TTL worker can do the cache eviction.Note that here:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately we would want that tier 2 creation will be a system configuration and not require every bucket to be configured specifically. I would leave it like this for this PR, but this will follow soon.

docs/standalone.md Outdated Show resolved Hide resolved
src/server/bg_services/tier_ttl_worker.js Outdated Show resolved Hide resolved
src/server/bg_services/tier_ttl_worker.js Outdated Show resolved Hide resolved
src/server/object_services/md_store.js Outdated Show resolved Hide resolved
src/server/object_services/md_store.js Outdated Show resolved Hide resolved
config.js Outdated Show resolved Hide resolved
src/api/block_store_api.js Outdated Show resolved Hide resolved
src/server/bg_workers.js Outdated Show resolved Hide resolved
src/server/bg_services/tier_ttl_worker.js Outdated Show resolved Hide resolved
@tangledbytes
Copy link
Member Author

@guymguym lemme try to summarize the proposed changes (specifically storage class) here:

  • The idea is to have the construct of storage classes in our tiers. Each tier can have <= 1 storage class specified.
  • Storage class will just communicate some additional details to the underlying block store. This detail will be used by functions like move_blocks_to_storage_class.
  • move_blocks_to_storage_class must consume block ids and storage class and should try to move it to that storage class doing something (here something would be different for each block store which plans to support this).
  • TTL worker should invoke map_client.run() which will do all of the heavy lifting.
  • Read flow already moves the chunks across the tiers. We will need to modify flow (maybe map_client.read_object_mapping?) such that it also starts invoking move_blocks_to_storage_class.

Clarifications

  1. Are storage classes transparent to S3? By "transparent" I mean that S3 responses will always mention StorageClass: Standard?

@guymguym
Copy link
Member

guymguym commented Jun 9, 2023

@Utkarsh-pro I think you got it, just two notes:

  • TTL worker should invoke map_client.run() which will do all of the heavy lifting.
  • Read flow already moves the chunks across the tiers. We will need to modify flow (maybe map_client.read_object_mapping?) such that it also starts invoking move_blocks_to_storage_class.

Both TTL worker and TTF worker already call build_chunks and therefore invoke map_client, so there is no need to add such a call in the workers, only to make this call also handle the storage class move as I suggested in this comment - #7313 (comment)

  1. Are storage classes transparent to S3? By "transparent" I mean that S3 responses will always mention StorageClass: Standard?

Yes. For now we only handle the internal mappings. Exposing tiers to the clients is a whole new challenge.

@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch from 2e513b9 to 05db415 Compare June 12, 2023 13:03
src/server/object_services/md_store.js Outdated Show resolved Hide resolved
src/agent/block_store_services/block_store_fs.js Outdated Show resolved Hide resolved
src/sdk/map_client.js Outdated Show resolved Hide resolved
src/sdk/map_client.js Outdated Show resolved Hide resolved
src/api/common_api.js Outdated Show resolved Hide resolved
@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch from 05db415 to 2e4f554 Compare June 19, 2023 07:06
@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch from 2e4f554 to 61264fb Compare June 19, 2023 11:25
src/test/unit_tests/test_map_client.js Show resolved Hide resolved
src/agent/block_store_services/block_store_fs.js Outdated Show resolved Hide resolved
src/api/common_api.js Outdated Show resolved Hide resolved
config.js Outdated Show resolved Hide resolved
src/sdk/map_client.js Outdated Show resolved Hide resolved
src/server/bg_services/tier_ttl_worker.js Show resolved Hide resolved
src/server/bg_services/tier_ttl_worker.js Show resolved Hide resolved
@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch from 61264fb to 56fccd8 Compare June 21, 2023 21:04
@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch from 56fccd8 to 6fcaa6f Compare June 21, 2023 21:48
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add evicter specific delays

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

restructure to use FSWrapper functions - ensure single fd

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add support for last read based block eviction

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

replace naive evictor with TTL evictor

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

rename worker entities and add docs

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add storage class construct

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix typos; add support for chunk-tier mapping in builds chunks; add tests for map_builder and map_client

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix subtle bug in compare_unordered

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

integrate storage classes completely

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix unit test

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix undefined key bug and allow running map_builder tests

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

rebase master + rename tier2 to tmfs

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
@tangledbytes tangledbytes force-pushed the utkarsh-pro/add/tier2-cache-eviction branch from 6fcaa6f to ec502f5 Compare June 22, 2023 06:23
Comment on lines +270 to +272
if (storage_class === 'GLACIER') {
return this._move_blocks_to_glacier(block_ids);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid confusion, I would change to not call these block_store_fs methods with *glacier in their name, because while the storage class is called glacier, we are translating it to TMFS. So I think I would so something like this here, and if TMFS is not enabled we should probably continue and throw the error (unsupported storage class ) because we don't have a glacier tier to move to. WDYT?

Suggested change
if (storage_class === 'GLACIER') {
return this._move_blocks_to_glacier(block_ids);
}
if (storage_class === 'GLACIER') {
if (config.BLOCK_STORE_FS_TMFS_ENABLED) {
return this._move_blocks_to_tmfs(block_ids);
}
}

const current_attached_pools = tier_before_move.mirrors.map(mirror => mirror.spread_pools.map(pool => String(pool._id))).flat();

if (
!js_utils.compare_unordered(target_attached_pools, current_attached_pools, true) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about using _.xor() instead of creating a special function for this esoteric case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I .. did not know about xor

Copy link
Member

@guymguym guymguym left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Utkarsh-pro All in all this is absolutely fantastic! I added couple of small comments, but it can also be addressed in next PRs. Godspeed.

@guymguym guymguym changed the title Add block evictor for Tier2 FS Tiering with storage class support - TTL based worker + TMFS blocks migration Jun 22, 2023
@tangledbytes tangledbytes merged commit 2eac264 into noobaa:master Jun 22, 2023
6 checks passed
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 22, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 22, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 22, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add auto tier 2 creation & assignment to bucket if configured

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix issues

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 22, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add auto tier 2 creation & assignment to bucket if configured

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix issues

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix docs

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 22, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add auto tier 2 creation & assignment to bucket if configured

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix issues

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix docs

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix deepscan

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 22, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add auto tier 2 creation & assignment to bucket if configured

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix issues

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix docs

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix deepscan

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

cleanup auto_setup_tier2

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 22, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add auto tier 2 creation & assignment to bucket if configured

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix issues

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix docs

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix deepscan

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

cleanup auto_setup_tier2

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit to tangledbytes/noobaa-core that referenced this pull request Jun 23, 2023
Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

add auto tier 2 creation & assignment to bucket if configured

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix issues

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix docs

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

fix deepscan

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

cleanup auto_setup_tier2

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>

cleanup auto_setup_tier2 and reduce agent_blocks_reclaimer log verbosity

Signed-off-by: Utkarsh Srivastava <srivastavautkarsh8097@gmail.com>
tangledbytes added a commit that referenced this pull request Jun 23, 2023
…ket-tmfs-autoconf

[TMFS] Address minors from #7313 & support auto second tier creation & assignment if configured
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants