The first half of this lesson advocates using Python's assert statement to do data validation. While this may be common (I don't know), it is not correct, and can potentially be very dangerous (explanation to follow).
By default if a Python program is run from the command line, assertions will be evaluated; however, there are two flags, -O and -OO, which run Python programs in optimisation mode and disable assertions:
-O : remove assert and __debug__-dependent statements; add .opt-1 before .pyc extension; also PYTHONOPTIMIZE=x
-OO : do -O changes and also discard docstrings; add .opt-2 before .pyc extension
If assert statements are used for data validation, this means that none of that validation will take place when the program is run in one of these modes, which, depending on the program, will result in surprising errors at best, and at worst, could be a security risk.
assert statements should be used as debugging aids, not to handle run-time errors.
A more suitable way of validating the data in the first example would be:
for num in numbers:
if num > 0:
total += 1
raise ValueError('Data should only contain positive values')
This data validation step will always run.
NB: I would be happy to work on, or contribute to, changing this lesson so that it reflects best practices, but as it is a larger change, though raising it in an issue first was the best approach. — Bailey
The text was updated successfully, but these errors were encountered:
changed the title
Episode 10 Defensive — Using the assert statement to validate data is bad practiceOct 3, 2020
We have similarly had people bring up the trouble with assertions in issues #622, #773 and propose to bring in exception handling instead. The maintainers agree that it makes sense to modify this part of the episode, and we would welcome a PR from you. Thank you, Bailey!
I've had a look through those issue threads and I don't see any current pull requests related to them, so I'll start on a new one, trying to keep the discussion that has gone before in mind.
As it is, I think the main consideration is whether to demonstrate the try...except(...else...finally) statements, or stick with simply raising an appropriate error type. I'll look into both to see how they fit with the rest of the lesson(s). Cheers!
I was also taking a look at this session and wasn't so keen to see the use of assert without any clarification that it should only be used in unreleased settings (testing mainly).
Similarly, I agree with the other comments that it would be useful to have an introduction to exception handling in place of some of the existing material - particularly since exceptions raised by built-in modules are introduced in the previous session, it would then be nice to see how to raise your own and, if there's enough time in the session, how to handle them too (easier to ask for forgiveness than permission, and all that :-)). Definitely aware that there's a balance between the material, duration, and level of the course though.
@baileythegreen, would be great to collaborate on a PR once you've got started, if you'd like!
@baileythegreen I do completely agree with all your comments and concerns about the use of assert for data validation, leading to bad coding practice. I thought exactly the same when I saw the episode, and so I landed on this issue :)
This is to say, very happy to collaborate with you on any PR that may be coming, if you think there is anything I can still help you with?