New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing kernel release number in ifcfg module #5087

Closed
mariospr opened this Issue Apr 15, 2016 · 12 comments

Comments

Projects
None yet
3 participants
@mariospr
Contributor

mariospr commented Apr 15, 2016

Summary

When I install ka-lite in my ARM device, I can't get it to start without crashing, due to a parsin problem in the ifcfg module, which is printing this error instead:

Running 'kalite start' as daemon (system service)

Stand by while the server loads its data...

[WARNING] [2016-04-15 15:45:35,777] kalite: Channel khan does not exist.
Traceback (most recent call last):
  File "/usr/share/kalite/kalitectl.py", line 925, in <module>
    port=arguments["--port"]
  File "/usr/share/kalite/kalitectl.py", line 527, in start
    addresses = get_ip_addresses(include_loopback=False)
  File "/usr/share/kalite/python-packages/fle_utils/internet/functions.py", line 94, in get_ip_addresses
    ips = [iface.get("inet") for iface in ifcfg.interfaces().values()]
  File "/usr/share/kalite/python-packages/ifcfg/__init__.py", line 62, in interfaces
    parser = get_parser()
  File "/usr/share/kalite/python-packages/ifcfg/__init__.py", line 42, in get_parser
    if float(kernel) < 3.3:
ValueError: invalid literal for float(): 3.10-trunk-meson8

Looking at the relevant code it seems to me that there are two issues:

  • The code assumes that the kernel's release number contains the minor revision, which is true in all the other environments I tried (e.g. it's '4.3.3-301.fc23.x86_64' on my Intel laptop), but not necessarily true everywhere (it's '3.10-trunk-meson8' in my ARM device).
  • The comparison to determine whether the kernel is old is broken, since for instance float('3.10') will evaluate to the 3.1 number, resulting in the wrong conclusion.

Branch or installer method

The crash is happening both in the stable version installable with pip install ka-lite-static and the one from the master branch on Linux machines, as long as the output of uname -r returns a release number string where the major revision number part is not a valid number (e.g. containing non-digits).

The wrong comparison against the 3.3 version seems to be wrong regardless of the underlying system.

How to reproduce

  1. Run kalite start on Linux where uname -r reports a non-integer major revision number string
  2. See how it crashes with the backtrace mentioned above

Alternatively replace this line with something full_kernel = "3.10-trunk-meson8" to replicate my environment and you should get the crash too.

mariospr added a commit to mariospr/ka-lite that referenced this issue Apr 15, 2016

Fixes learningequality#5087: Properly parse kernel's release number s…
…tring in ifcfg module

Do not assume that kernel release number strings come always in a 'x.y.z'
format and only use the first two digits (kernel's version and major revision),
sanitizing the second one to make sure that we parse its numeric value properly.

Also, compare the kernel's version and major revision separately, to avoid
jumping into the wrong conclusion if float('ver.major_rev') does not result
in the expected value (e.g. float('3.10') -> 3.1).
@MCGallaspy

This comment has been minimized.

Show comment
Hide comment
@MCGallaspy

MCGallaspy Apr 15, 2016

Contributor

Hi @mariospr thanks for the very detailed bug report!

Contributor

MCGallaspy commented Apr 15, 2016

Hi @mariospr thanks for the very detailed bug report!

@mariospr

This comment has been minimized.

Show comment
Hide comment
@mariospr

mariospr Apr 15, 2016

Contributor

No problem, happy to help! For the record, I've just closed the pull request and proposed the same change in the python-ifcfg module instead, following the recommendation from #5088 (comment).

Would rather keep this open while KA Lite is not actually fixed, though, just to keep track of it properly.

Contributor

mariospr commented Apr 15, 2016

No problem, happy to help! For the record, I've just closed the pull request and proposed the same change in the python-ifcfg module instead, following the recommendation from #5088 (comment).

Would rather keep this open while KA Lite is not actually fixed, though, just to keep track of it properly.

@MCGallaspy

This comment has been minimized.

Show comment
Hide comment
@MCGallaspy

MCGallaspy Apr 15, 2016

Contributor

Would rather keep this open while KA Lite is not actually fixed, though, just to keep track of it properly.

Agreed.

Contributor

MCGallaspy commented Apr 15, 2016

Would rather keep this open while KA Lite is not actually fixed, though, just to keep track of it properly.

Agreed.

@benjaoming benjaoming self-assigned this Apr 17, 2016

@benjaoming benjaoming added this to the 0.16.4 milestone Apr 17, 2016

@benjaoming

This comment has been minimized.

Show comment
Hide comment
@benjaoming

benjaoming Apr 17, 2016

Member

Thanks @mariospr! a new version of ifcfg has been released on pypi.

Would you be able to confirm that it works? I'm making a PR now to have it included in the next point release of KA Lite (0.16.4).

Member

benjaoming commented Apr 17, 2016

Thanks @mariospr! a new version of ifcfg has been released on pypi.

Would you be able to confirm that it works? I'm making a PR now to have it included in the next point release of KA Lite (0.16.4).

@mariospr

This comment has been minimized.

Show comment
Hide comment
@mariospr

mariospr Apr 18, 2016

Contributor

@benjaoming Not very rigorous (but hopefully good enough), I've did this to confirm that the fix works:

  1. Install the current (unpatched) version of ka-lite-static with sudo pip install ka-lite-static
  2. Run kalite startp -> Check that it crashes with the aforementioned backtrace
  3. Manually remove the ifconfig bundled module: sudo rm -rf /usr/share/kalite/python-packages/ifcfg
  4. Install patched ifcfg from pypi: sudo pip install ifcfg -> installs patched 0.9.3 version
  5. Run kalite start again -> This time the the daemon is started and web UI works

Additionally, I added a couple of prints in /var/local/lib/python2.7/dist-packages/ifcfg/__init__.py to be completely sure, just in cas, that it was that code what was being executed when the daemon gets started, which turned out to be true, so I think it's all good.

@MCGallaspy It looks like the #5090 Pull Request is all that is left from your suggestion in #5088 (comment), so I think this can probably be closed as soon as the new release is out and tested to work fine in a more rigorous way (which I volunteer for helping with)

Contributor

mariospr commented Apr 18, 2016

@benjaoming Not very rigorous (but hopefully good enough), I've did this to confirm that the fix works:

  1. Install the current (unpatched) version of ka-lite-static with sudo pip install ka-lite-static
  2. Run kalite startp -> Check that it crashes with the aforementioned backtrace
  3. Manually remove the ifconfig bundled module: sudo rm -rf /usr/share/kalite/python-packages/ifcfg
  4. Install patched ifcfg from pypi: sudo pip install ifcfg -> installs patched 0.9.3 version
  5. Run kalite start again -> This time the the daemon is started and web UI works

Additionally, I added a couple of prints in /var/local/lib/python2.7/dist-packages/ifcfg/__init__.py to be completely sure, just in cas, that it was that code what was being executed when the daemon gets started, which turned out to be true, so I think it's all good.

@MCGallaspy It looks like the #5090 Pull Request is all that is left from your suggestion in #5088 (comment), so I think this can probably be closed as soon as the new release is out and tested to work fine in a more rigorous way (which I volunteer for helping with)

@benjaoming

This comment has been minimized.

Show comment
Hide comment
@benjaoming

benjaoming Apr 18, 2016

Member

Thanks @mariospr ! I see that the PR also passes automated tests, so I'm quite confident it's a safe merge for a patch release.

Could you weigh in about how you see the severity of the issue? I think there's gonna be a discussion about whether or not to immediately release a patch version. This error has been around forever so the real question is if the sort of Linux Kernel version string is very common or perhaps has become so :)

Member

benjaoming commented Apr 18, 2016

Thanks @mariospr ! I see that the PR also passes automated tests, so I'm quite confident it's a safe merge for a patch release.

Could you weigh in about how you see the severity of the issue? I think there's gonna be a discussion about whether or not to immediately release a patch version. This error has been around forever so the real question is if the sort of Linux Kernel version string is very common or perhaps has become so :)

@mariospr

This comment has been minimized.

Show comment
Hide comment
@mariospr

mariospr Apr 18, 2016

Contributor

Thanks @mariospr ! I see that the PR also passes automated tests, so I'm quite confident it's a safe merge for a patch release.

Cool, thanks for your help and (extremely) timely response!

Could you weigh in about how you see the severity of the issue? I think there's gonna be a discussion about whether or not to immediately release a patch version. This error has been around forever so the real question is if the sort of Linux Kernel version string is very common or perhaps has become so :)

I'm not really a kernel expert so I can't tell how common is to have kernels reporting only the VERSION and MAJOR REVISION numbers as their release number string via the uname -r command. I did a quick check in a few other ARM machines that I have around (RPi2 and a NAS) and they both reported the MINOR REVISION number, so perhaps it's not that common after all. Actually, even in the machine where I found the issue, the uname -a command reports the full kernel version in a very detailed way:

uname -ar
Linux endless 3.10-trunk-meson8b #1 SMP Debian 3.10.33.20150928+dev610.b21aada-6bem1 (2015-12-01) armv7l armv7l armv7l GNU/Linux

However, I've checked this with one of our Kernel guys and he explained to me that "the minor version is something used by people in several non-standard ways" and so that it can't really be trusted (actually, it seems that ubuntu does a poor job reporting that number), so it's probably wise not to rely on it at all, as if it was always being reported as part of the kernel's release number (output of uname -r). This, apparently, is the reasoning behind some kernels (like ours) not even bothering to report it, and the source of the main problem I've observed here.

But on top of that, the patch also fixed a wrong use of the float() casting which would wrongly report any existing 3.x kernel >= 3.10 as older than 3.3, and that would be incorrect for every single platform, not just our particular ARM one, so I think it's worth including it anyway.

All of this being a long way of saying that... the issues fixed are important, but only if you either run ifcfg in a OS not reporting the kernel version (which will crash) or if you run it in OS with a 3.x kernel >= 3.10 (which will treat as < 3.3). Still, we (Endless) would particularly appreciate if it could make it to the next stable release, for obvious reasons :-).

Contributor

mariospr commented Apr 18, 2016

Thanks @mariospr ! I see that the PR also passes automated tests, so I'm quite confident it's a safe merge for a patch release.

Cool, thanks for your help and (extremely) timely response!

Could you weigh in about how you see the severity of the issue? I think there's gonna be a discussion about whether or not to immediately release a patch version. This error has been around forever so the real question is if the sort of Linux Kernel version string is very common or perhaps has become so :)

I'm not really a kernel expert so I can't tell how common is to have kernels reporting only the VERSION and MAJOR REVISION numbers as their release number string via the uname -r command. I did a quick check in a few other ARM machines that I have around (RPi2 and a NAS) and they both reported the MINOR REVISION number, so perhaps it's not that common after all. Actually, even in the machine where I found the issue, the uname -a command reports the full kernel version in a very detailed way:

uname -ar
Linux endless 3.10-trunk-meson8b #1 SMP Debian 3.10.33.20150928+dev610.b21aada-6bem1 (2015-12-01) armv7l armv7l armv7l GNU/Linux

However, I've checked this with one of our Kernel guys and he explained to me that "the minor version is something used by people in several non-standard ways" and so that it can't really be trusted (actually, it seems that ubuntu does a poor job reporting that number), so it's probably wise not to rely on it at all, as if it was always being reported as part of the kernel's release number (output of uname -r). This, apparently, is the reasoning behind some kernels (like ours) not even bothering to report it, and the source of the main problem I've observed here.

But on top of that, the patch also fixed a wrong use of the float() casting which would wrongly report any existing 3.x kernel >= 3.10 as older than 3.3, and that would be incorrect for every single platform, not just our particular ARM one, so I think it's worth including it anyway.

All of this being a long way of saying that... the issues fixed are important, but only if you either run ifcfg in a OS not reporting the kernel version (which will crash) or if you run it in OS with a 3.x kernel >= 3.10 (which will treat as < 3.3). Still, we (Endless) would particularly appreciate if it could make it to the next stable release, for obvious reasons :-).

@benjaoming

This comment has been minimized.

Show comment
Hide comment
@benjaoming

benjaoming Apr 18, 2016

Member

Cool, thanks for explaining it thoroughly! I think it's fair to aim for a 0.16.3 within the next ~1 week's time to accommodate the need! I'll let you know as soon as it's done, and this ticket should remain until we're resolved on when we get the fixed released.

Member

benjaoming commented Apr 18, 2016

Cool, thanks for explaining it thoroughly! I think it's fair to aim for a 0.16.3 within the next ~1 week's time to accommodate the need! I'll let you know as soon as it's done, and this ticket should remain until we're resolved on when we get the fixed released.

@MCGallaspy

This comment has been minimized.

Show comment
Hide comment
@MCGallaspy

MCGallaspy Apr 18, 2016

Contributor

Yeah, we'll have to discuss internally when to make the next patch release, but I don't anticipate any reason to delay it for very long.

Contributor

MCGallaspy commented Apr 18, 2016

Yeah, we'll have to discuss internally when to make the next patch release, but I don't anticipate any reason to delay it for very long.

@mariospr

This comment has been minimized.

Show comment
Hide comment
@mariospr

mariospr Apr 18, 2016

Contributor

Great! Happy to see my very first contribution to this project so quickly and efficiently dealt with :-). Let me know if you need anything from my side. Otherwise I'll wait until the release is done for testing it again in my (problematic) ARM hardware. Thanks!

Contributor

mariospr commented Apr 18, 2016

Great! Happy to see my very first contribution to this project so quickly and efficiently dealt with :-). Let me know if you need anything from my side. Otherwise I'll wait until the release is done for testing it again in my (problematic) ARM hardware. Thanks!

@benjaoming

This comment has been minimized.

Show comment
Hide comment
@benjaoming

benjaoming Apr 25, 2016

Member

Merged into 0.16.x and I am releasing it now in 0.16.3 -- therefore, closing this. Stay tuned for a new .deb in the PPA during today.

Member

benjaoming commented Apr 25, 2016

Merged into 0.16.x and I am releasing it now in 0.16.3 -- therefore, closing this. Stay tuned for a new .deb in the PPA during today.

@benjaoming benjaoming closed this Apr 25, 2016

@mariospr

This comment has been minimized.

Show comment
Hide comment
@mariospr

mariospr Apr 25, 2016

Contributor

Thanks a lot. JFTR, I'm confirming that I've uninstalled my previous installation of KA Lite in my arm device and re-installed again with pip and everything worked out of the box this time, as expected.

Contributor

mariospr commented Apr 25, 2016

Thanks a lot. JFTR, I'm confirming that I've uninstalled my previous installation of KA Lite in my arm device and re-installed again with pip and everything worked out of the box this time, as expected.

benjaoming added a commit to benjaoming/ka-lite that referenced this issue Jun 12, 2016

Merge tag '0.16.x' into develop
Final 0.16.3

 - Fixes learningequality#5087 related to certain Linux kernel version labels
 - Few docs changes

Conflicts:
	data/version.yml
	kalite/version.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment