Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fast Dedup: DDT Prefetch #15890

Merged
merged 1 commit into from
Jul 26, 2024
Merged

Conversation

allanjude
Copy link
Contributor

Motivation and Context

After pool import, it takes a long time for a dedup table to naturally be loaded into the cache, which reduces performance until enough of the table is in memory.

In production we have seen significantly reduces performance for at least 72 hours on a large DDT, reduces to minutes using this feature.

Description

This change adds a new zpool prefetch -t ddt $pool command which causes a pool's DDT to be loaded into the ARC. The primary goal is to remove the need to "warm" a pool's cache before deduplication stops slowing write performance. It may also provide a way to reload portions of a DDT if they have been flushed due to inactivity.

Replaces #9464

How Has This Been Tested?

Tests added.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Feb 15, 2024
lib/libzfs_core/libzfs_core.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
@don-brady
Copy link
Contributor

Address review feedback
  • Make ioct for prefetch generic so future types can be added
    
  • Cleanup dmu_object_cached_size()
    
  • Fix panic due to incorrect prefetch max value
    
  • Rewrote dmu_prefetch_wait to avoid decompressing data during prefetch
    

module/zfs/dmu.c Fixed Show fixed Hide fixed
lib/libzfs/libzfs_pool.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
@don-brady don-brady force-pushed the fdt-rel-prefetch branch 2 times, most recently from 0ad925e to 503c190 Compare April 11, 2024 04:18
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/zap_micro.c Show resolved Hide resolved
module/zfs/zfs_ioctl.c Outdated Show resolved Hide resolved
@don-brady don-brady force-pushed the fdt-rel-prefetch branch 2 times, most recently from d923ec7 to a25b0b7 Compare April 12, 2024 04:53
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
@don-brady
Copy link
Contributor

rebased after zap shrinking was landed

@shodanshok
Copy link
Contributor

Maybe I missed something, but I have two questions:

  • what happens if DDT overflows available ARC memory?
  • can this be extended for generic pool metadata prefetch?

Thanks.

@amotin
Copy link
Member

amotin commented May 1, 2024

* what happens if DDT overflows available ARC memory?

Right now it will evict something prefetched earlier. Not very efficient...

* can this be extended for generic pool metadata prefetch?

As I pointed earlier, it should be done for BRT also, since they and their problems due to random access are very similar. "Genertic pool metadata" though is pretty vague, you should not want to prefetch all existing space maps, indirect blocks, extended attributes and other things you likely never need in a typical operation.

@robn
Copy link
Member

robn commented May 15, 2024

[Fast dedup stack rebased to master 3c941d1]

cmd/zpool/zpool_main.c Outdated Show resolved Hide resolved
lib/libzfs/libzfs_pool.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/dmu.c Outdated Show resolved Hide resolved
module/zfs/spa.c Show resolved Hide resolved
@robn
Copy link
Member

robn commented Jun 14, 2024

[Fast dedup stack rebased to master c98295e]

@robn robn force-pushed the fdt-rel-prefetch branch 2 times, most recently from 1026417 to 1bcea4c Compare June 17, 2024 05:30
@robn robn force-pushed the fdt-rel-prefetch branch 2 times, most recently from 186c360 to 3779c22 Compare June 25, 2024 01:32
@allanjude allanjude force-pushed the fdt-rel-prefetch branch 2 times, most recently from c937bc8 to e76b46c Compare July 23, 2024 13:18
@tonyhutter
Copy link
Contributor

@allanjude can you take a look at the checkstyle errors?

@behlendorf behlendorf removed the Status: Code Review Needed Ready for review and testing label Jul 25, 2024
This change adds a new `zpool prefetch -t ddt $pool` command which
causes a pool's DDT to be loaded into the ARC. The primary goal is to
remove the need to "warm" a pool's cache before deduplication stops
slowing write performance. It may also provide a way to reload portions
of a DDT if they have been flushed due to inactivity.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Klara, Inc.

Co-authored-by: Will Andrews <will.andrews@klarasystems.com>
Co-authored-by: Allan Jude <allan@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Fred Weigel <fred.weigel@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
@allanjude
Copy link
Contributor Author

@allanjude can you take a look at the checkstyle errors?

It was the file you had me remove, needed to be removed from the Makefile.am as well.
Should be good to go now.

@behlendorf behlendorf added the Status: Accepted Ready to integrate (reviewed, tested) label Jul 26, 2024
@behlendorf behlendorf merged commit 62e7d3c into openzfs:master Jul 26, 2024
22 of 23 checks passed
@lundman
Copy link
Contributor

lundman commented Sep 4, 2024

Hellooo!

So this commit BSODs on Windows, it turns out to be:

https://github.com/KlaraSystems/zfs/blob/master/module/zfs/spa.c#L553
Here, err is not initialised, so Windows deem it worthy of $rand.
We skip the alloc as nvp is not NULL.

Before we get here, nvp is set to outnvl in the call to:
https://github.com/KlaraSystems/zfs/blob/master/module/zfs/zfs_ioctl.c#L7934

We leave somewhat early due to

 .... spa->spa_pool_props_object == 0)
		goto out;

and in out we do:

out:
	mutex_exit(&spa->spa_props_lock);
	dsl_pool_config_exit(dp, FTAG);
	if (err && err != ENOENT) {
		nvlist_free(*nvp);
		*nvp = NULL;
		return (err);

So nvp is now freed, we return error. We bubble up through:
https://github.com/KlaraSystems/zfs/blob/master/module/zfs/zfs_ioctl.c#L3051-L3057
here, nvp is free, but nothing in outnvl reflects that, so finally:

https://github.com/KlaraSystems/zfs/blob/master/module/zfs/zfs_ioctl.c#L7974

Boom, we are rogering around with the freed outnvl.

So, first fix is err = 0 naturally, but we should probably also handle the error case where we have freed nvp.

I could do a PR, but I have 332 commits to pull in still, so if someone gets one in before me, I will be happy :)

@robn
Copy link
Member

robn commented Sep 4, 2024

@lundman wow, good find. I've got half a patch worked up already; I'll finish and post it tonight.

lundman pushed a commit to openzfsonwindows/openzfs that referenced this pull request Sep 4, 2024
This change adds a new `zpool prefetch -t ddt $pool` command which
causes a pool's DDT to be loaded into the ARC. The primary goal is to
remove the need to "warm" a pool's cache before deduplication stops
slowing write performance. It may also provide a way to reload portions
of a DDT if they have been flushed due to inactivity.

Sponsored-by: iXsystems, Inc.
Sponsored-by: Catalogics, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <mav@FreeBSD.org>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Allan Jude <allan@klarasystems.com>
Signed-off-by: Will Andrews <will.andrews@klarasystems.com>
Signed-off-by: Fred Weigel <fred.weigel@klarasystems.com>
Signed-off-by: Rob Norris <rob.norris@klarasystems.com>
Signed-off-by: Don Brady <don.brady@klarasystems.com>
Co-authored-by: Will Andrews <will.andrews@klarasystems.com>
Co-authored-by: Don Brady <don.brady@klarasystems.com>
Closes openzfs#15890
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants