Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added gufi_stats -c direntries-log2-bins #139

Closed
wants to merge 0 commits into from

Conversation

johnbent
Copy link
Contributor

@johnbent johnbent commented Oct 4, 2023

Added direntries and dirsize cumulative queries into gufi_stats following the logic of the newly added filesize. Also made 0 and 1 size items show in the 0 and 2 bins respectively. Changed floor to ceil so that 5 shows in the 8 bit and not the 4 bin which follows more consistently with the logical handling of 0, 1, 2, 3, 4, etc.

I tested it with the super simple bash script attached.
test.sh.txt

The output is as follows:

[jbent@hpe1 build]$ bash /tmp/test.sh 
Index building log in /tmp/dir2index_log.63409. Consult if errors
Initial direntries query
0 774
2 1092
4 515
8 518
Create 10 dirs with 0 files each and reindex.
0 784
2 1092
4 515
8 518
Create 10 dirs with 1 files each and reindex.
0 784
2 1102
4 515
8 518
Create 10 dirs with 2 files each and reindex.
0 784
2 1112
4 515
8 518
Create 10 dirs with 3 files each and reindex.
0 784
2 1112
4 525
8 518
Create 10 dirs with 4 files each and reindex.
0 784
2 1112
4 535
8 518
Create 10 dirs with 5 files each and reindex.
0 784
2 1112
4 535
8 528
Initial filesize query
0 2056
2 97
4 143
8 135
Create 10 files with size 0 each and reindex.
0 2066
2 97
4 143
8 135
Create 10 files with size 1 each and reindex.
0 2066
2 107
4 143
8 135
Create 10 files with size 2 each and reindex.
0 2066
2 117
4 143
8 135
Create 10 files with size 3 each and reindex.
0 2066
2 117
4 153
8 135
Create 10 files with size 4 each and reindex.
0 2066
2 117
4 163
8 135
Create 10 files with size 5 each and reindex.
0 2066
2 117
4 163
8 145

Comment on lines 1106 to 1111
['dirsize-log2-bins', dirsize_log2_bins],
['dirsize-log1024-bins', dirsize_log1024_bins],
['filesize-log2-bins', filesize_log2_bins],
['filesize-log1024-bins', filesize_log1024_bins],
['direntries-log2-bins', direntries_log2_bins],
['direntries-log1024-bins', direntries_log1024_bins],
Copy link
Collaborator

@calccrypto calccrypto Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group the dir* stuff together and in the order that they appear as above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, I'll just get rid of dirsize.

@@ -1016,10 +1016,13 @@ def size_bins(args, base, type): # pylint: disable=redefined-builtin
pinode_create = ['{0} INTEGER'.format(pinode_col)] if args.recursive else []
pinode = [pinode_col] if args.recursive else []

# use log and power to put things into bins. Note that 0 must be handled specially because 2^0 is 1. Also 1 needs to be handled specially because log(N,1) is 0 but it shouldn't be included with 0
# so this will return items of size zero in a 0 bin and will return items of size 1 in the next bin
# use ceil so that 3 and 4 go into the 4 bin and 5 6 7 8 go into the 8 bin, etc.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which bin ends do we want?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a preference. I'll back this out and test with your zero-fix.

Comment on lines 1062 to 1066
def direntries_log2_bins(_config, args, _where):
return size_bins(args, 2, 'd', 'totfiles')

def direntries_log1024_bins(_config, args, _where):
return size_bins(args, 1024, 'd', 'totfiles')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to dirfilecount_* since you are pulling totfiles and not totfiles + totlinks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can add multiple functions:

  1. dirfilecount for just files
  2. dirlinkcount for just links
    Cannot do direntrycount bec need totsubdirs. I'll submit an Issue asking for totsubdirs.

@@ -993,7 +993,7 @@ def uid_size(_config, args, where):
def gid_size(_config, args, where):
return uidgid_size(args, 'gid', where)

def size_bins(args, base, type): # pylint: disable=redefined-builtin
def size_bins(args, base, type, field='size'): # pylint: disable=redefined-builtin
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets not have default arguments. the only other place there are default arguments is build_where, and they arent even used

Comment on lines 1068 to 1072
def dirsize_log2_bins(_config, args, _where):
return size_bins(args, 2, 'd')

def dirsize_log1024_bins(_config, args, _where):
return size_bins(args, 1024, 'd')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does dirsize tell us?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will remove

@johnbent
Copy link
Contributor Author

johnbent commented Oct 4, 2023

Also need to update docs and tests. Follow what was done here:
74a01e9#diff-0f23589fd21049cea7039aa237fb7d0c21c592bb24f81ba4f1f5e48c29c012b9

@johnbent
Copy link
Contributor Author

johnbent commented Oct 4, 2023

As discussed, I'll also change 'bits' to 'exponent' as the variable name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants