via Udacity
## Student
- Don't be daunted by the large problem of "get rid of all bugs".
- Split into smaller problems, apply patterns.
- Test input -> Software under test -> Test outputs.
- Outputs OK? Yes, else debug.
- Test output, what do we do when we check "OK?"
- Running an experiment, but when it is OK we don't learn that much.
- If not OK, and it's a bug in SUT.
- Might be bug in acceptability test.
- Might be bug in specification.
- Might be bug in OS, compiler, libraries, hardware.
- Else...don't know!
- All these answers help us create software in the future.
- Orbiter crashed.
- Miscommunication between Lockheed Martin and NASA.
- NASA expected metric, m/s, but Lockheed Martin programmed in english, ft/s.
- Not a bug in SUT.
- Software did what it was specified to do.
- Not a bug in acceptability test.
- If we argue this then all bugs are acceptability test bugs because now we're saying acceptability tests must catch all bugs.
- Rather want to argue the bug isn't here; the software was acceptable as per specifications.
- Bug in specification.
- Miscommunication between engineers.
- Not a bug in OS, compiler, libraries, hardware.
- No evidence for this.
- FIFO order.
- Statically sized and allocated.
- Operations.
- enqueue.
- Return True if it fits, False if not.
- dequeue.
- Return value if present, None if not.
- enqueue.
- Python implementation
- For performance use
array.array
, statically typed, 10 times faster than list. array.array('i'), range(size_max))
- Tracks
head
,tail
,size
. - Never actually delete. On dequeue just increment head pointer.
- On enqueue increment tail pointer.
- For performance use
`` q = Queue(2) r1 = enqueue(6) r2 = enqueue(7) r3 = enqueue(8) r4 = dequeue() r5 = dequeue() r6 = dequeue()
(r1, r2, r3, r4, r5, r6) = (True, True, False, 6, 7, None) ``
- How useful is any given test case?
- How to make useful test cases?
enqueue(7) x = dequeue() if x == 7: print "success!" else: print "error!"
- If this test case passes:
- Our code passes this test case.
- Our code doesn't necessarily pass any test case where we replace 7 with a different integer.
- e.g. a massive integer that can't fit in memory.
- e.g. storing shorts, but try to put 2^32 - 1.
- Our code does pass many test cases where we replace 7 with a different integer.
- Replace 7 with any reasonably sized integer.
- How do we write tests that represent a large space of inputs?
- Want to argue that a single test case is representative of a whole class of executions types for the software under test.
- Mapping single point in the input space to single point in output space.
- Want an intuition that we've mapped many inputs to many outputs.
- Clearly large integers different from smaller ones.
- Maybe Python garbage collector bug such that large integer gets garbage collected, to sleeping before dequeuing it would fail.
- Maybe Python runtime has bug with truncating large integers, so it wouldn't be the same when we dequeue it.
- But note the above arguments are about the surrounding system and not the software under test.
- Hence we can make arguments that increase test coverage but must be very careful about assumptions we make.
- Test basic assumptions in every test, duplicate tests.
- Be introspective, test internal pointers.
- Test1: basic enqueue and dequeue.
- Test2
- Queue size 2, enqueue 3 times and check
q.tail != 0
.
- Queue size 2, enqueue 3 times and check
- Test3
- Queue size 1
- dequeue when empty
- enqueue then dequeue.
- Check value correct and q.head != 0.
- Write clean code.
- Refactor code whenever necessary.
- For any module, be able to - clearly describe what it does. - clearly describe how it interacts with other code.
- No extra threads.
- No swamp of global variables. - Implicit inputs to all functions.
- No pointer soup.
- Modules should have unit tests.
- When applicable, add fault injection. - Inject bad inputs. - Deliberately malform your code. - Check your assertions trigger.
- Assertions, assertions, assertions.
-
Assertion: executable check for a property that must be true.
def sqrt(arg): … compute … assert result >= 0 return result
-
We know that by definition result must be 0 or bigger.
-
Rules 1. Assertions are not for error handling.
- e.g. want an exception to check arg >= 0, not an assertion.
- Asserting our sanity, not behaviour of others. 2. No side effects.
- e.g. assert calls a function that modifies a global variable. 3. No silly assertions.
- e.g. assert (1+1) = 2
- check non-trivial properties that imply a fault with our logic.
-
For data structures often write a checkRep. Validate invariants that must be true.
-
There could be deeply buried problems in our own code.
-
When they occur they affect others, infecting them.
-
Want to fail fast, and checkRep as tight as possible and catch bugs before they affect others.
-
Want to assert
self.head
,self.tail
, andself.size
are consistent.def checkRep(self): assert self.size >= 0 and self.size <= self.max if self.tail > self.head: assert (self.tail - self.head) == self.size if self.tail < self.head: assert (self.head - self.tail) == (self.max - self.size) if self.tail == self.head: assert (self.size == 0) or (self.size == self.max)
#### 1.13: Why assertions
- Make code self-checking, leading to more effective testing.
- Make code fail early, closer to bug.
- Assign blame.
- Document and actively check: - Assumptions. - Preconditions. - Postconditions. - Invariants.
- Assertions are used in GCC and LLVM.
- Do you use them in production?
- Advantages of disabling.
- Code runs faster.
- Code keeps going. (Always good?) - Disadvantages of disabling.
- Code relies on side-effects of assertions (whoops!)
- Even in production, may be better to fail early. - Generally worth even the run-time cost of leaving assertions enabled in production. - At NASA, left assertions on except when lander was landing - only have one shot!
- SUT provides some service over some API.
- Want to test API. e.g. sqrt.
- Just writing and test cases help refine specifications.
- Domain: set of inputs.
- Range: set of outputs.
- e.g. sqrt in Python - Domain: all floating point number (F) - Range: positive floating point, unioned with exception. - Ideally only want domain to be positive floating point, but we accept all and throw exceptions.
- Want to test over full domain, even if it results in exception.
- Often domain specification is implicit; too many exceptions make code unwieldy.
- For operating systems especially want to test over the full domain. - Must be strong against any kind of bad input.
- crashme. Generate garbage, then jump into the middle and ask the kernel to execute it.
- If kernel dies then that's a bug.
- Interfaces that span trust boundaries are special and must be tested on the full range of representable values.
- Browsers offer APIs (e.g. GUI), but use a lot of APIs (network, cookies onto disk).
- If e.g. disk out of space and we try to store a cookie don't want to crash the browser.
- But how to test this? Difficult to mimic OS API conditions, e.g. out of disk space.
- Think of e.g. UNIX call to
read()
. - Read can be asked to read 0 bytes. - read() can return -1, for at least nine different reasons. - There is no easy or magical answer.
-
Use API that are predictable and return easy to understand error codes and behaviour. - C
read()
vs. Pythonread()
. -
So call a stub function.
file = open("/tmp/foo", 'w')
file = my_open("/tmp/foo", 'w')
-
Fault injection: alter abstract function calls to other API.
-
Faults injected into a SUT should be faults that we want our code to be robust to. - Don't need to be all possible faults, not feasible.
- Read/request cycle of API isn't the only problem.
- What is SUT cares about timing of API usage?
- e.g. double click vs. single click.
- Too fast vs. too slow is simple to think about.
- Race condition: different rates of execution of different parts of program cause problem. - Typing slowly, Therac 25 is fine. Typical for beginners. - Typing too fast, trigger race conditions. Typical for experts.
- Do we need to care about timing? - For browsers and OS kernels, yes. - For sqrt, hopefully not!
- Care about timing: - Hardware interfaces. - Network interfaces. - Multi-threaded.
- Not testing APIs the SUT provides.
- Not testing APIs the SUT uses.
- Examples.
- Context switches. How multiple threads execute in what order. Timing is under the control of the OS.
- Testing very reliable network switches.
- Open up a switch and run a key over electrical contacts.
- Testing a previously inaccessible part of domain.
- White box: tester uses detailed knowledge about internals.
- Black box: Don't use internal knowledge, just behavioural knowledge.
- Unit testing: test small units of code. - Typically by developer. - Can be white or black box. - No hypothesis about behaviour, so test over full range of domain. - Mock objects: pretend to be other modules.
- Integration testing: testing already-tested modules in combination.
- Very hard, even with tight specification.
- System testing: does the system, as a whole, meet specifications? - Internal knowledge may not be helpful. - Don't really care about internals, just checking if it meets goals. - May not care about all possible use cases, just important use cases.
- Differential testing - Deliver same input to two different versions of SUT.
- Stress testing - Push limits of domain, its timing, its sizes, number of API calls, etc. - Testing reliability.
- Random testing
- Exploring full domain space.
- e.g. crashme.
- Testing and development are different. - Developer: "I want this code to succeed." - Tester: "I want this code to fail." - Double think!
- Learn to test creatively.
- Don't ignore weird stuff.
- Fun! Profit!
- Lessons learned.
1. Test maximums and minimums of interfaces.
- If can have a Queue of size x, what about x = 2 ** 16?
- If can hold any integer, what about holding 2 ** 32? 2. Methods that should be idempotent might not be.
- queue.empty() actually dequeues!
- Write tests that check that query methods are idempotent. 3. Test interfaces strictly.
- What do they promise to return when?
- False != None. 4. Test asssumptions.
- If a queue claims to have a size, how do you know? Fill it up then dequeue everything!