#Debugging Hard Problems
by Alex Gaynor

[watch the talk here](https://www.youtube.com/watch?v=ij99SGGEX34)

[slides are here](https://speakerdeck.com/alex)


# What is debugging?
- When does this bug occur?
    -when I do x, my server crashes
- why does it happen?
- how do we fix it



Hard problem: the stuff I normally do isn't working
Hard problem is not: a test failing

## Things that lead to hard problems
 - timing or order dependent
 - crosses module boundaries
     - imprecise API details
 - threads
 - independently "safe" failures conspiring
 
    - a big bug made of many small bugs

## Ground rules
- Everything is in scope
    - the os
    - the stdlib
    - ... stuff you don't normally think about
    - __but__ chances are you're not the first person to find it if it is
- Read __all__ of the source
    - documentation is good, but can be out of sync with the code
    - read your source code
    - read your dependencies' source code
- Trust nothing
    - verify your assumptions
        - e.g. don't assume your file is actually closing when you call close()
            - maybe someone monkey patched it... so don't skip it when stepping in pdb
        - don't be arrogant
- Write down what you do while debugging
    - here's what I tried already
        -here's what happened
    - commit experiments to branches

## Tools to use
- Your debugger!
- OS tracing (strace etc) to see system calls your program makes
    - lsof
    - netstat
    - htop
    - iotop
    - /proc
    - osquery (released by facebook) - make a DB from what your system is doing
- an editor you like reading

## Techniques
- Pair debugging
    - Talk to someone (or a rubber duck) about the problem
    - Helps spot misunderstandings or bad assumptions
- Minimization
    - Find the simplest example that can reproduce the problem
        - eg, replace a call with a hard-coded result
    - be careful with intermittent bugs though! You might not be proving the cause
- Proximate cause
    - Look for the recent changes
        - but don't overlook the environment and how it interacts with the older code
    - git bisect is a good tool here
- Don't get distracted
    - while taking a close look, you're likely to find other issues
        - write these down, and continue your original task!
- Don't debug in production
    - Get the bug to happen in your dev environment
        - If you can't get this to happen, that's a big hint