# One of the reasons for crashing: trying to access invalid memory

- **accessing invalid memory** means that the process tried to access a portion of the system's memory that wasn't assigned to it
    - programming errors might lead to a process trying to read or write to a memory address outside of the valid range 
        - OS will raise an error like **segmentation fault** or **general protection fault**
        - typically happens with low level languages: C or C++
            - **pointers** - variables that store memory addresses in those languages - can point to invalid memory; if **0x0** - signal for the end of the structure (never a valid pointer)
            - **Example of segmentation fault or segfault**: forget to initialize the variable, access list variable outside of list range, etc. -> **debugger** is the easiest way to pinpoint the problem! -> to do that, the programm needs to be **compiled with debugging symbols** -> same for fucntions if the error occurs in function -> **.PDB** in microsoft compilers
        - #### Tools: Valgrind (macOS, Linux), Dr. Memory (Windows, Linux)
            
            
#### Undefined behaviour: the code that is doing something that is not valid in the programming language
    - system (linux vs. windows)
    - version of libraries installed etc.
    
- **printf debugging** - general name comes from C programming language of printing info, that can be applierd everywhere (no matter `echo` in bash or `print`)
- **logging module** sets debug messages to show up when the code fails.  
- **core files** - store all the information related to the crash so that we, or someone else, can debug what's going on: 


### Debugging C

```bash

# execute buggy program
./example
# to have a full output: ulimit command is used to create a core file that stores information related to a crash
ulimit -c unlimited
./example
# checking the core file, that was generated above
ls -l core
# pass core file to the gdb debugger and example, to tell where the executable ot the crash is located
# gdb command will debug a core dump and stop where the failure was recorded
gdb -c core example
# look at the full backtrace of the crush: backtrace command can be used to show a summary of the function calls that were used to the point where the failure occurs
backtrace
# list command shows the lines around the failed line of code that is being reviewed in a backtrace
list
```

### Debugging python

```bash
# run the crashing programm
./update_products.py new_products.csv
# run debugger
pdb3 update_products.py new_products.csv
# 'next' after starting the pdb3 debugger will run each line of instructions one at a time, but the code is to long

# Running the 'continue' command after starting the pdb3 debugger will execute the program until it finishes or crashes
(Pdb) continue
(Pdb) print(row)

# \ufeff: Byte Order Mark (BOM) - tells difference between little endian/big endian -> change encoding to utf-8-sig
# change: with open(options.filoename, 'r', encoding = 'utf-8-sig')
./update_products.py new_products.csv
# success
```

### Quick summary:
- **breakpoints** will let code run until the certain line of code is executed;
  - watchpoints
- will cause *segmentation fault*:
   - reading past the end of the array
   - **stack overflow** - stack overflow is a run-time software bug when a program attempts to use more space than is available on the run-time stack, which typically results in a program crash
   - **wild pointers** - uninitialized pointers are known as wild pointers because they point to some arbitrary memory location and may cause a program to crash or behave badly; Please note that if a pointer p points to a known variable then it's not a wild pointer.
- **off-by-one error** - common error when iteration through arrays or other collections (forgetting about indexing, for i<=50 -> for i<50)
- **printf debugging** - a very common method of debugging is to add print statements to our code that display information, such as contents of variables, custom error statements, or return values of functions
- **core files**: when a process crashes, the operating system may generate a file containing information about the state of the process in memory to help the developer debug the program later; **core files** (or core dump files) record an image and status of a running process, and can be used to determine the cause of a crash.  

# Complex Systems

If system crashes after new iteration update:

 - easy way - to do the **rollback** (if it is easy in the system) to either find the reason for it, or eliminate the update as the reason
 - communicate to everyone involved
 - document what we are doing: operations performed
 - contact the Incident Commander or Incident Controller (if the problem is big)
    
Team: 

 - **Incident Controller**: is in charge of delegating different tasks to team members. 
 - **Communications lead**: provides timely updates on the incident and answers questions from users.  
 
Effective Postmortems:
 - postmortems = documents that describe details of incidents to help us learn from our mistakes (& should not include the person(s) who caused the issue) -> main idea: what we can do better
 

### Fixing small problems in Practice



# Extra read:

#### 1. Info about [Byte Order Mark (BOM)](https://stackoverflow.com/questions/17912307/u-ufeff-in-python-string/17912811)
#### 2. Check out the following links for more information:

- https://realpython.com/python-concurrency/
- https://hackernoon.com/threaded-asynchronous-magic-and-how-to-wield-it-bba9ed602c32
- https://stackoverflow.com/questions/33047452/definitive-list-of-common-reasons-for-segmentation-faults
- https://sites.google.com/a/case.edu/hpcc/home/important-notes-for-new-users/debugging-segmentation-faults