# Validation, Exceptions, and Error Handling
*aka Developing Robust Code*



So far in our code, we have followed the "happy" path through our code where everything just works and we assume that our input values are correct.  If something went wrong, we either re-ran our code (e.g., bad input value from the user) or made a code fix and then re-ran the code.

Software needs to be reliable and robust. That means that it needs to prevent errors when possible, detect situations when errors do occur, and then recover from errors as appropriate.  

Throughout these notebooks, we have seen many errors and their associated messages - sometimes intentionally, sometimes accidentally, and sometimes just incorrect logic (semantic errors).

Now we need to make a few decisions:
1. How do we want to validate input?
2. How do we want to prevent errors?
3. How do we want to react and recover from errors?

There are no clear, fixed solutions for these questions.  As with many things with system design, the answers will depend upon the context.

For scripts that we write for ourselves, ignoring errors may be fine in some circumstances.  However, if for code or results that will be used by others, we do need to ensure our code performs in a robust manner.  

Input validation depends upon the source, potential harm of using the data unchecked, and the ability for the exception handling to detect and handle the message.  Input validation, while often times annoying and time-consuming, is the best way to produce robust code that has a minimal amount of security flaw.

<div style="border: 3px solid black;padding: 10px; border-radius: 10px;">
    <b>Software Security Flaws</b><br>
    The <a href="https://owasp.org">Open Web Application Security Project</a> has run an awareness project over the past 
    20 years to identify the top security issues facing developers in web applications.  Looking through the
    <a href='https://web.archive.org/web/20220511193851/https://www.hahwul.com/cullinan/history-of-owasp-top-10/'>
        history</a>
    of these categories, many are directly related to the lack of input validation or sanitization (sanitization is
    the process of removing illegal characters from input or replacing potential dangerous character sequences
    with safe ones): Buffer Overflow, Cross Site Request Forgery (CSRF), Cross Site Scripting (XSS), 
    Injection, Injection Flaws, Server-Side Request Forgery, Unvalidated Input, and Unvalidated Redirects.
    Most of these now have been put into the "injection" category.
    <p> Two potentially dangerous built-in functions are <code>eval()</code> and <code>exec()</code>.
        <code>eval()</code> evaluates a string assume it's an expression. Remember - a function call is an
        expression!. <code>exec()</code> executes the contents of the string as if it represents one or 
        more Python statements.  While, legitimate use cases exist for both of theses functions, extreme care
        taken to ensure any string values passed to these functions are safe to execute.
    <p>As an example, create a code cell and run the following:
    <pre>eval('exec("import os; print(os.listdir(\'.\'))")')
    </pre>
    Listing a directory's contents seems innocuous, but it could give valuable information to an attacker.  And if
    someone could execute that code, they more than likely could execute far more malicious code.
</div>

As we detect errors, we also need to determine who is responsible for recovering from the error?  Is it performed within the current routine / function?  or passed back up the caller stack?  How is the user informed?  How does this differ among command-line tools, local GUIs, and web applications?

## Revisiting User Input
In this below code snippet from the [Iteration Notebook](11-Iteration.ipynb), the user is asked to enter grades until they are complete with a negative number.  We had already added some error checking to see if the user entered at least one grade before calculating the average - this prevented a division by zero.  However, what occurs if they enter a value that's not an integer?  Let's try ...

In [None]:
total = 0
num_entries = 0

while True:
    grade = int(input("Enter a grade: "))
    if grade < 0:
        break
    total += grade
    num_entries += 1

if num_entries > 0:
    print("Average:",total/num_entries)
else:
    print("no grades entered")

Whether you entered a string literal or a float literal, you should have received a "ValueError" that occurs when Python attempts to convert the string return value from input into an integer.  

Another error that could have occurred if we were running this script from a shell session if "EOFError" if the input stream was closed (e.g., through the user typing ctrl+d). You may need to run this code from the command-line rather than a Juypter notebook to receive this error.

First let's look at different possibilities to validate that a string actually does represent an integer.

One possibility is to create some custom logic to ensure that each character in the string is a valid digit between 0 and 9. We'd probably need to handle having a check as well that leading character could be a negative sign. If
we chose this approach, we would want to create a function such that other parts of our code (or even other programs) could re-use this logic.

When possible, though, we should try to reuse code. Are there any methods in Python's [string class](https://docs.python.org/3/library/stdtypes.html#string-methods)?  Looking at that documentation, several possibilities may exist: `isdecimal()`, `isdigit()`, and `isnumerical()`.  However, reading the documentation, there's some interesting particulars for each of these:

`isdecimal()` returns true as long as the string is composed of any character in the ['Unicode General Category ND' ](https://www.fileformat.info/info/unicode/category/Nd/list.htm).

In [None]:
test_strings = ['65536','00123','-2','0.124','life42','42life','\u00BD','\u1C43','\u2460']
for s in test_strings:
    print("{:>10}".format(s), s.isdecimal(),)

Overall, that does pretty well - although it can't handle a negative number.  We could still deal with that, though, by stripping the leading `-`.

`isdigit()` still doesn't handle negative numbers.  It also accepts numbers that are not base 10 such as ①.

In [None]:
for s in test_strings:
    print("{:>10}".format(s), s.isdigit())

In [None]:
int('\u2460')

`isnumeric()` is even further away from the right solution. The function accepts just about anything that can represent a number - including fractions.  And, no, it does not handle negative numbers. 

In [None]:
for s in test_strings:
    print("{:>10}".format(s), s.isnumeric())

Another possibility covered in a later notebook is to use a regular expression to see if the string matches a particular pattern. The regular expression `^[-+]?[0-9]+$` could be used as a pattern to check for integer values. Here's a brief description as to how the expression works:
- `^` means to match the start of the string
- `[+-]` is a character class consisting of either the plus `+` sign or the minus `-` sign.
- `?` makes that previous character optional
- `[0-9]` means any character from 0 to 9.  As digits are consecutively defined in Unicode, this translates to 0,1,2,3,4,5,6,7,8,9
- `+` means that previous character (or any member of the character class [0-9] must be present one or more times. i.e., at least once 
- `$` means to match the end of the string

One downside to this expression is the inability to handle decimal numbers written in non-arabic numerals.

From this regular expression, we can see when we convert a string to an integer, we can have an arbitrary number of leading zeros. However, for integer literals, leading zeros are not possible.

In [None]:
import re
for s in test_strings:
    print("{:>10}".format(s), bool(re.match(r"^[-+]?[0-9]+$",s)))

The final solution to examine goes back to that pesky `ValueError`.  Fortunately, Python allows use to capture and handle these types of errors.

## Exceptions
An exception is an error that occurs as a program executes, causing the normal execution sequence to stop processing and for control to pass to the nearest block designated to handle that type of an error. By default, if no such handle is present, the Python interpreter will print a stack trace and stop the program.



### Handling Errors
Python provides the `try except` statement to handle errors.  The `try` block is used to contain code in which an error may occur.  The `except` block provides the necessary error handling.  (You may still need more code outside of the except block to appropriately recover from the error.

In [None]:
try:
    s = "hello"
    i = int(s)
except:
    print("'{:s}' is not a valid number.".format(s))

With no other details on the `except` line, that `except` block is a catch all for any error type.

except exceptType as variable_name

can have mulitple except clauses.

can also have multiple exceptions in one except block

Can query the variable for more details

else clause

finally clause

raising exception

can create our own exceptions (getting ahead our ourselves before we discuss creating custom classes)

stack traces - need to read


best practices on exceptions
- add eception handling anywhere an exception might occur
- at the very highest levle, may want one
- within a server procesisng loop (e.g. webserver) don't show error messages to the users.  
- log these things

In [None]:
error messages

One of the things we do need to do is to be kind to our user community ...


Launcher Error Cannot read properties of undefined (reading 'path')
Error Invalid response: 400 Bad Request

## Discussion ...

The check of num_entries is the right thing to have - its a simple check and the corresponding try 

## Exercises
