-
Notifications
You must be signed in to change notification settings - Fork 76
zfs file system compatibility checks added. #67 #70
zfs file system compatibility checks added. #67 #70
Conversation
dedekind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This patch looks OK to me, thanks!
bmaptools/BmapHelpers.py
Outdated
| # normalize,resolve and get absolute path to the file object | ||
| f = os.path.realpath(filepath) | ||
| cmd = "df -T '%s' | awk 'NR==2 {print $2}'" % f | ||
| proc = subprocess.Popen(cmd, stdout=PIPE, stderr=PIPE, shell=True, universal_newlines=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to avoid running external programs for this? Less of this is generally better, less dependencies. Did you look at other solutions, like just scanning /proc/mounds, or using psutils, here is some helpful stackoverflow discussion: https://stackoverflow.com/questions/7718411/determine-device-of-filesystem-in-python?lq=1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whilst psutil looks like a good approach, it looks like it comes with at a heavy price, the user would need to install gcc and python3-dev on linux and Xcode on mac. Am i correct in thinking mac support is not a thing yet for bmap-tools and hence I dont need to worry about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we do not want psutil. I never worried about Windows or Mac.
I guess 'df' is OK, but please, do not pipe to awk, python has enough tools for parsing strings. I think all you need will be output.split(" ")[0].strip().
Also, please handle errors. Check how / if they are handled in different places in this project. I went through evolution of how I handle exception in my programs, and I do not remember how I did it in this project, but just be consistent with the rest of the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if mac support is required then, "df -T '%s' | awk 'NR==2 {print $2}" can be replaced with a single call to "mount" and the output of this would need to be parsed and correlated with a python function to determine the mount point that the image file resides on
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cmd = "df -T '%s' | awk 'NR==2 {print $2}'" % f
This is vulnerable to shell injection. If the filename is '$(rm -fr /) (including the single quote, to escape from the single quotes in the format string) then running bmaptool will remove all your files. In general, avoid using shell=True with subprocess.
The best and most direct way to get this information would be os.statvfs(f).f_type and comparing with the filesystem's magic number, but Python doesn't seem to implement that struct field...
Using subprocess.Popen(['stat', '-f', '--printf=%t', '--', f], ...) (without shell=True), and comparing with the magic number of a ZFS filesystem (possibly 0x2f52f5?), might be a less bad way to do this with external commands; or maybe %T instead of %t and comparing with zfs, or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a quick test, however is resulting in stat output not accurately identifying the filesystem type being used. testing on files on an ext4 file system results in ext2/ext3 being returned for stat whereas df returns ext4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ext2, ext3 and ext4 all have the same magic number, so it's expected that different tools will label them differently. What matters for this PR is surely whether ZFS can be detected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that these are support helper functions, I was aiming for this particular function to be accurate and reusable in-case the file system needs to be known for some other reasons in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shell=true should no longer be needed with the removal of the pipe as per @dedekind request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requirements-test.txt
Outdated
| six | ||
| backports.tempfile | ||
| nose | ||
| mock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not something needed for the new test, right? It is just something missing. If so, would be great to have it as a separate commit.
Then, I suggest to aggregate all the trivial patches that are sort of independent of this ZFS check into a separate pull request (this one, the .gitignore change, the first commit which cleans up the test). Then we could go ahead and merge those quickly and independently.
|
|
||
| #vscode | ||
| .vscode/ | ||
| .noseids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to have this as independent commit.
a9eedf1 to
a111233
Compare
|
latest refactoring now includes requested changes |
dedekind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did not review everything, but done partial review. Mostly nitpicks and stylistical, but it is improtant to keep the code consistent in my opinion. Would you please consider addressing those nitpicks? Thanks!
bmaptools/BmapHelpers.py
Outdated
| def get_mount_point(filepath): | ||
| """ | ||
| Return the mount point the file object file resides on. | ||
| """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is to have a newline between the docstring and the code. Also for sort comments the quotes can be on the same line.
bmaptools/BmapHelpers.py
Outdated
|
|
||
| def get_mount_point(filepath): | ||
| """ | ||
| Return the mount point the file object file resides on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the file object file" - not very well phrased. The input is path, the output is the mount point. I suggest:
Return the mount point for path in 'filepath'.
Also, there is not reason to limit this to files, it'll work for non-files too, right? Then just call the variable 'path'.
bmaptools/BmapHelpers.py
Outdated
| """ | ||
| Return the mount point the file object file resides on. | ||
| """ | ||
| f = os.path.realpath(filepath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pylint complains about one-letter variables. Call it "part" or something, something along the lines of "partial path", or "path prefix", or "path component". But "f" is not a good choice for a variable name. Have it self-documenting, at least e little bit.
bmaptools/BmapHelpers.py
Outdated
| """ | ||
| f = os.path.realpath(filepath) | ||
| while f != os.path.sep: | ||
| if os.path.ismount(f): return f |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, keep the return statement on a separate line. It may look unimportant, and it is indeed not a big deal, but I am a fan of consistency and I find code much easier to read when it is uniform. So let's mimic the existing code (and this is a general rule of thumb for contributions - mimic existing code, if willing to improve something, improve globally).
bmaptools/BmapHelpers.py
Outdated
|
|
||
| return "%.1f %s" % (size, 'EiB') | ||
|
|
||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you add these newlines? I think the unspoken convention in this code is to have double newlines between classes and only single between functions. It may be silly, and not big deal, but it is what it is. It is better to be consistent, so the whole code base would look coherent and thus, easier to follow.
If you like 2 newlines, you can suggest a separate pull request cleaning this up globally, not just in one file. All or nothing.
bmaptools/BmapHelpers.py
Outdated
| stdout, stderr = proc.communicate() | ||
| ftype = None | ||
| if stdout: | ||
| t = stdout.splitlines() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
't' is a bad name. Could you please improve it? E.g., use 'entries', or 'split_line', or something else that is logical and at least somewhat self-documenting. Also, it would be nice to add a small comment with an example of the output string that you are parsing here, makes the code easier to read. Right now I have no clue what is going on and I have to run that 'df -T' and check how the output looks like. A comment would help.
bmaptools/BmapHelpers.py
Outdated
| ftype = t[1].lower() | ||
|
|
||
| if not ftype: | ||
| raise Error("no file system type was found: %s" % stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely the error message would be much more helpful if it actually included the path for which the file-system type was not found?
bmaptools/BmapHelpers.py
Outdated
| Returns path to required zfs kernel compatibility param. | ||
| """ | ||
|
|
||
| return '/sys/module/zfs/parameters/zfs_dmu_offset_next_sync' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove the function please, it brings no value. If this path is used only once, have a local variable. Otherwise, introduce a global variable instead.
bmaptools/BmapHelpers.py
Outdated
| return f | ||
|
|
||
|
|
||
| def get_file_system_type(filepath): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, in python we use leading underscore to communicate that a global symbol is private to this module and should not be used outside of the module. I do not know if this entire project is careful about it, but at least for this and other functions you add, mark them as private by using this name: get_file_system_type().Those functions that you do want to be visible outside of this module (BmapHelpers) should start with a non-.
bmaptools/BmapHelpers.py
Outdated
| Returns whether hosts zfs configuration is compatible for our usage. | ||
| """ | ||
|
|
||
| fp = get_zfs_compat_param_path() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess 'fp' is file path? Surely this is poor name, because it is not just a file path, this is a path to a specific ZFS config file, and name of the variable could be improved, right? Even 'cfg_path' is much better. Or 'zfs_param_path' or 'param_path' or something.
dedekind
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I've done some more partial review and suggested some more improvements.
Is it normal practice in github to address review comments with an incremental patch? In my practice people just rework the patch and re-submit the whole thing instead. Why project history should include all the revisions with all the issues fixed? Who'd be interested in that?
Unless you have a good reason for addressing every review cycle with a separate patch, please, squash the changes. Thanks!
bmaptools/BmapHelpers.py
Outdated
|
|
||
| return False | ||
|
|
||
| def get_mount_point(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is marked as public (does not start with "_"), is this intentional? I did not spot the user outside this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, this method is not actually used so this can be removed
| part = os.path.realpath(os.path.join(part, os.pardir)) | ||
| return part | ||
|
|
||
| def get_file_system_type(path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto (private vs public function).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that this is a helpers module I feel this particular method could be quite useful going forward for any future work that may need to know the file system type of a path, even though its only currently used internally. If you would prefer it to be private I can change it.
bmaptools/BmapHelpers.py
Outdated
| # normalize,resolve and get absolute path to the file object | ||
| f = os.path.realpath(filepath) | ||
| proc = subprocess.Popen(['df', '-T', '--', f], stdout=PIPE, stderr=PIPE) | ||
| # Normalize,resolve and get absolute path of the supplied path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this comment a bit confusing - is it explaining what's 'os.path.realpath()' ? If so, I spent more time reading it and trying to understand the conveyed message. Let's remove it in that case. If I misunderstood it, may be we can improve it?
|
|
||
| # Parse the output of subprocess, for example: | ||
| # Filesystem Type 1K-blocks Used Available Use% Mounted on | ||
| # rpool/USERDATA/foo_5ucog2 zfs 456499712 86956288 369543424 20% /home/foo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very helpful, thanks. But this comment basically explains the block of code below, so make it be adjacent to that block, remove the empty newline below.
bmaptools/BmapHelpers.py
Outdated
| # Parse the output of subprocess, for example: | ||
| # Filesystem Type 1K-blocks Used Available Use% Mounted on | ||
| # rpool/USERDATA/foo_5ucog2 zfs 456499712 86956288 369543424 20% /home/foo | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This newline should go I think. Not big deal, but improves the readability.
bmaptools/BmapHelpers.py
Outdated
|
|
||
| if not ftype: | ||
| raise Error("no file system type was found: %s" % stderr) | ||
| raise Error("no file system type was found for '%s': %s" % (path, stderr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to include both stdout and stderr in the error message. Will be much more helpful in case this actually happen. How about something like this:
raise Error("failed to find file system type for path at '%s'.\nHere is the 'df -T' output.\nstdout:\n%s\nstderr:\n%s" % (path, stdout, stderr))
| """ | ||
| Returns whether hosts zfs configuration is compatible for our usage. | ||
| """ | ||
| """Return if hosts zfs configuration is compatible.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming: is_zfs_compatible sounds a bit like "is ZFS-compatible", as if we were checking if something is compatible with ZFS.
May be this name would be a tiny bit better: is_zfs_configuration_compatible() ? (or config for brevity).
Yes it is normal on larger Guthub projects, alot of them i have worked on insist on this method and only request a squash once all changes have been approved or have bots that auto-squash and commit them. |
361ae47 to
c61e762
Compare
|
Some more stylistic nitpicks, otherwise looks good to me, thanks. Please, just to make sure, do run it on non-ZFS to verify that there are no regressions. Also I assume you tested it on both "correctly" configured ZFS and "incorrectly" configured ZFS (correctly/incorrectly from bmap-tools point of view). Am I right in my assumption? Thank you! |
|
Hi Dell, would you mind to do those final polishing things and we can merge this one? |
Hi still here, sorry, had higher priority work this week to attend to. I should be able to take a look at these at the weekend. :) |
Signed-off-by: Dell Green <dell.green@ideaworks.co.uk>
c61e762 to
f1cd6ec
Compare
|
I spotted few minor things, but I think it is good enough. Could you please confirm that you did run pylint on the files you changed and addressed new complaints, and that you did test this patch on ZFS with and without the feature, as well as on non-ZFS. If yes, we can merge it. If not, would you please do those things? Thank you very much! |
so we should be good to merge :) |
|
Thank you for the contribution! |
This is my first draft implementation. It has been tested on python 2.7 and 3.4.
All tests have been ran on a zfs file system (with and without fix applied) and an ext4 system.