Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Option to set special_small_blocks higher than 128k - why not allow it to be higher? #9131

Closed
recklessnl opened this issue Aug 6, 2019 · 9 comments
Labels
good first issue Indicates a good issue for first-time contributors Type: Feature Feature request or new feature

Comments

@recklessnl
Copy link

System information

Type Version/Name
Distribution Name Debian
Distribution Version 10
Linux Kernel 5.0.15
Architecture x86
ZFS Version 0.8.1
SPL Version 0.8.1

Describe the problem you're observing

I would like to set the special_small_blocks setting for a small, dedicated dataset to something higher than 128K. The reason being that this dataset has a lot of small files between 1Kb and ~750KB, but even if I added all them up it still wouldn't be more than ~700GB total. The special vdev that I have is a fast, flash SSD pool in RAID10 with very fast random read speeds compared to the main HDD pool. I'd like to utilize the SSDs in the overall storage pool by having files that are < 1MB of a particular dataset to be put on the special vdev. It would greatly improve performance for my usage.

If anything, shouldn't this be a tunable so that advanced users can select this as an option? As long as users are aware of the risks, the onus is on them to select a proper value for this for each dataset. And I am aware of the risk of putting this value higher - but for a small dataset and a big special vdev, this is not a problem.

@behlendorf behlendorf added the Type: Feature Feature request or new feature label Aug 21, 2019
@behlendorf
Copy link
Contributor

One thing which perhaps isn't entirely clear is that special_small_blocks refers to the recordsize used for the dataset. Not a maximum file size. So as long as you set the recordsize=128k (or less) and the special_small_blocks=128k then regardless of how large the file is it will be stored in the special devices.

That said, increasing the allowed maximum special_small_blocks isn't unreasonable but we wanted to initially set a conservative maximum size.

@recklessnl
Copy link
Author

recklessnl commented Aug 22, 2019

@behlendorf thanks for the reply - I'm more confused now though. In the manual, it says the following about the special_small_blocks size:

This value represents the threshold block size for including small file blocks into the special allocation class. Blocks smaller than or equal to this value will be assigned to the special allocation class while greater blocks will be assigned to the regular class.

So I don't fully follow you when you say "then regardless of how large the file is it will be stored in the special devices." Could you explain it with a concrete example? What I'm trying to achieve is to have small files (less than 512K) stored on the special device (4 striped mirrors of fast SSDs) in order to speed up the entire dataset. I'm thinking of setting the dataset recordsize to either 512K or 1M. My thinking is that files bigger than 1M are easier to read with the harddisks, and the smaller files will be read from much faster SSDs (the special vdev).

That said, increasing the allowed maximum special_small_blocks isn't unreasonable but we wanted to initially set a conservative maximum size.

I understand that completely, but as with other values in ZFS, I feel that this should be a tunable setting that users can change if they so wish (just like you can tune the maximum recordsize as well to > 1M). In my case, I'd like to set it to 512K as the threshold, and it would be great if this was a tuneable setting.

@recklessnl
Copy link
Author

recklessnl commented Sep 4, 2019

@behlendorf could you help clarify this? Assuming a dataset recordsize of 1M and special_small_blocks of 128K, what happens with files under 128K and those above 128K?

Additionally would you be able to tell if this is going to be an added feature (a tuneable option for the maximum size of special_small_blocks) for the next zfs release?

@behlendorf
Copy link
Contributor

Assuming a dataset recordsize of 1M and special_small_blocks of 128K, what happens with files under 128K and those above 128K?

Good question. Let's take your concrete example of special_small_blocks=128K and recordsize=1M for an assortment of file size. The important thing to keep in mind in that every file in ZFS is comprised a number of power-of-two identically sized blocks. For small files, the block size may be less than recordsize but it will not exceed it.

So for example, a small 4k file will be stored using a single 4k block. Similarly, a 3k file will be rounded up to use a 4k block. If this block size is less than or equal to the special_small_blocks the block will be stored on the special device if space is available. You can use zdb <dataset> <inode-number> to dump the details of a given file.

$ zdb tank/fs 137
Dataset tank/fs [ZPL], ID 90, cr_txg 12, 9.72M, 24 objects

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       137    1   128K     4K     4K     512     4K  100.00  ZFS plain file

Increasing the file size to 512k, you'll see that a single 512k block is stored which is larger than the 128k cutoff so it will be stored in the primary pool storage.

zdb tank/fs 134
Dataset tank/fs [ZPL], ID 90, cr_txg 12, 9.72M, 24 objects

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
       134    1   128K   512K   512K     512   512K  100.00  ZFS plain file

For an 8M file it will be constructed from 8 individual 1M blocks, all of which will be stored in the primary pool storage.

zdb tank/fs 11
Dataset tank/fs [ZPL], ID 90, cr_txg 12, 9.72M, 24 objects

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
        11    2   128K     1M  8.00M     512     8M  100.00  ZFS plain file

You could force that same 8M file to be stored on the special devices if you decreased the recordsize to 128k and recreated the file (coping it would do the trick). The file would now be constructed from 64 individual 128K blocks, which are below the special_small_blocks threshold.

$ zdb tank/fs 14
Dataset tank/fs [ZPL], ID 90, cr_txg 12, 17.7M, 27 objects

    Object  lvl   iblk   dblk  dsize  dnsize  lsize   %full  type
        14    2   128K   128K  8.01M     512     8M  100.00  ZFS plain file

Hopefully this helps!

@recklessnl
Copy link
Author

That makes a lot of sense. Thanks for the explanation @behlendorf, very insightful.

As for adding a tuneable to set special_small_blocks to a higher value, as we discussed above, is this something that could be added to the next ZFS version?

@recklessnl
Copy link
Author

@behlendorf is the option to tune this to a higher value coming to a future version of ZFS? Would be really great!

@behlendorf behlendorf added the good first issue Indicates a good issue for first-time contributors label Sep 24, 2019
behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 24, 2019
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#9131
@behlendorf
Copy link
Contributor

@recklessnl I've opened #9355 with the proposed change to increase this limit for feedback and review.

@recklessnl
Copy link
Author

@behlendorf Thank you for the quick turnaround time, much appreciated! I'm looking forward to seeing this feature implemented. I'll go ahead and close this issue now. Thanks again.

behlendorf added a commit to behlendorf/zfs that referenced this issue Sep 25, 2019
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#9131
behlendorf added a commit to behlendorf/zfs that referenced this issue Nov 11, 2019
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#9131
@Ornias1993

This comment has been minimized.

behlendorf added a commit to behlendorf/zfs that referenced this issue Nov 19, 2019
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#9131
behlendorf added a commit that referenced this issue Dec 3, 2019
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9131
Closes #9355
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 26, 2019
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9131
Closes openzfs#9355
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Dec 27, 2019
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9131
Closes openzfs#9355
tonyhutter pushed a commit that referenced this issue Jan 23, 2020
There may be circumstances where it's desirable that all blocks
in a specified dataset be stored on the special device.  Relax
the artificial 128K limit and allow the special_small_blocks
property to be set up to 1M.  When blocks >1MB have been enabled
via the zfs_max_recordsize module option, this limit is increased
accordingly.

Reviewed-by: Don Brady <don.brady@delphix.com>
Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9131
Closes #9355
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Indicates a good issue for first-time contributors Type: Feature Feature request or new feature
Projects
None yet
Development

No branches or pull requests

3 participants