# 2.1 Printing and Debugging
If you already read through all of the beginner section you've seen me use the `print()` function a lot. Printing is a really good way to get feedback from your program as the code executes, sometimes you can't quite track down where an error is coming from with the printed error message and other times it will be a logic error you need to fix. Generally speaking when it comes to code logic errors are your worst enemy because it isn't something the python interpreter can catch. A common example of a logic error is if you programmed a banking system and the function which applies interest to an account uses the wrong equation the program will run just fine but customers' account balances will be incorrect. So how would you track down an issue? The best answer is a whole lot of printing. 

Now as with most things in programming there's a number of ways of adding all these print statements to your code, I'll be showing a few ways that I commonly debug my programs as well as give a little blurb about when I would use each method but to be clear there is no *wrong* way to debug a program, if it helps you find the error and fix it then it's as good an options as any of these.

Before we get into the code, I think it's also important to discuss what you should be printing in these debug statements. Generally speaking starting with printing the value of important variables. Additionally its sometimes useful to print something like `inside for loop` to help keep track of where in the code the program is currently executing. 


When debugging a program there are often a few things that affect what method I choose for adding debug statements. If its a large project I often elect to use the first example, this is a more permenant debugging solution that requires a bit more initial setup but can be useful when problems pop up because only the debug variable's value needs to be changed for the print statements to work. For small projects this is a bit overkill because its likely you will fix the problem and not need to debug that part of the code again. Now lets get into some code shall we?

In [1]:
DEBUG = True

if DEBUG:
    print('Hello World')
    
# this option is best for big projects where you might need to
# debug the same portion of your code multiple times over the course 
# of development. 

Hello World


In [2]:
print('Program running')

a = 5
# print('Value of a=%d' %(a))    # notice this doesn't print 

# for simple programs this is a better option, if you think you might
# need to come back you can always comment out the print statement by
# preceding it with a `#` and then when you're done you can just 
# delete all the prints using find and replace (usually [CTRL] + [F])

Program running


Its hard to see what the pros and cons of these two options are with just what is above, I encourage you to play around with the next two examples to see how easy it is to enable and disable the print statement with each style. The more you work with code and design your own debugging systems the more you'll understand why the first is better in some instances and the second likewise.

In [8]:
DEBUG = True
def isUniqueChars(st): 
  
    # String length cannot be more than 
    # 256. 
    if len(st) > 256: 
        if DEBUG:
            print('String length larger than 256')
        return False
  
    # Initialize occurrences of all characters 
    char_set = [False] * 128
  
    # For every character, check if it exists 
    # in char_set 
    for i in range(0, len(st)): 
  
        # Find ASCII value and check if it 
        # exists in set. 
        val = ord(st[i]) 
        if DEBUG:
            print('val=%d' %(val))
        if char_set[val]: 
            if DEBUG:
                print('Found duplicate character')
            return False
  
        char_set[val] = True
  
    return True

# comment/uncomment these (or make your own) to see how the debug output changes
isUniqueChars('dasndlkjsankljdsanl')
#isUniqueChars('abcde')
#isUniqueChars('change debug to false if you dont want to see those pesky print statements when you run this')

val=100
val=97
val=115
val=110
val=100
Found duplicate character


False

In [9]:
def isUniqueChars(st): 
  
    # String length cannot be more than 
    # 256. 
    if len(st) > 256: 
        print('String length larger than 256')
        return False
  
    # Initialize occurrences of all characters 
    char_set = [False] * 128
  
    # For every character, check if it exists 
    # in char_set 
    for i in range(0, len(st)): 
  
        # Find ASCII value and check if it 
        # exists in set. 
        val = ord(st[i]) 
        print('val=%d' %(val))
        if char_set[val]: 
            print('Found duplicate character')
            return False
  
        char_set[val] = True
  
    return True

# comment/uncomment these (or make your own) to see how the debug output changes
isUniqueChars('dasndlkjsankljdsanl')
#isUniqueChars('abcde')
#isUniqueChars('change debug to false if you dont want to see those pesky print statements when you run this')

val=100
val=97
val=115
val=110
val=100
Found duplicate character


False

# 2.2 Reading and Writing Files
Being able to read and write files using python code will certainly help improve your workflow when you're working with real data. In addition to the basic methods I've provided here there are also many, many libraries which include functions to read in and format data into useful structures as well as output python data structures into various file formats. In the basics section we worked a little with pandas dataframes which have a builtin function for outputtin to a csv file. In case you are unfamiliar, csv files can be opened with excel and preserve the table-like structure of the dataframe. 

We will start with reading a file into your program using code. To do this you need to create a file object, we will talk more about objects in the advanced section of this guide. To create a file object follow the format below:
```python
my_file = open('filepath', 'mode')
```
The `filepath` parameter is used to tell python where to look for the file you are trying to open. Its simplest if you keep the files in the same folder as your python script or in a subdirectory. The following examples show how you would open the specified file with several different directory structures to give you an idea.

```text
C:/
|
├─── My_Project/
        |
        ├─── mycode.py
        |
        ├─── data.txt
```

```python
my_data = open('data.txt', 'r')
```

```text
C:/
|
├─── My_Project/
        |
        ├─── mycode.py
        |
        ├─── Data/
              |
              ├─── data.txt
```

```python
my_data = open('Data/data.txt', 'r')
```

```text
C:/
|
├─── My_Project/
        |
        ├─── mycode.py
        |
        ├─── Data/
              |
              ├─── Text/
              |      |
              |      ├─── Group1/
              |      |      |
              |      |      ├─── Samples/
              |      |             |
              |      |             ├─── sample_data.txt
              |      |
              |      ├─── Group2/
              |      |      |
              |      |      ├─── group_data.txt
              |      |
              |      ├─── Group3/
              |             |
              |             ├─── g3data.txt
              |
              ├─── CSV/
                    |
                    ├─── Samples/
                    |       |
                    |       ├─── csv_sample.csv
                    |
                    ├─── New/
                    |     |
                    |     ├─── my_data.csv
                    |
                    ├─── Old/
                          |
                          ├─── old_data.csv
```

```python
group_1_sample = open('Data/Text/Group1/Samples/sample_data.txt', 'r')
group_2_data = open('Data/Text/Group2/group_data.txt', 'r')
group_3_data = open('Data/Text/Group3/g3data.txt', 'r')
csv_sample = open('Data/CSV/Samples/csv_sample.csv', 'r')
csv_new = open('Data/CSV/New/my_data.csv', 'r')
csv_old = open('Data/CSV/Old/old_data.csv', 'r')
```
If your data file is outside of the directory your python code is in you'll need to use the absolute path to the file. For more information on absolute paths look [here](https://automatetheboringstuff.com/chapter8/)

The `mode` parameter tells python what you plan to do with the file, this can be reading the file, writing the file or both (plus a few other options). Reading the file means that python will be able to use data in the file as the code executes but no new content can be added to the file nor can any content be removed. Writing the file refers to making changes to the content of the file whether that's adding new information or removing existing information. Although you could always open a file with both read and write privledges, it is better to have it only open with what you intend so that nothing accidentally gets overwritten or deleted. The following table shows the various modes available, the parameter you would use, and a short description of what they do.

| mode	|character|meaning|
|---	|---	  |---    |
| read 	| 'r'  	  |  open for reading (default)     |
| write	| 'w'  	  |  open for writing     |
| create| 'x'  	  |  open to create file, fails if filename already exists     |
| append| 'a'  	  |  open for writing, append all content to end of file assuming it exists     |
| binary| 'b'     |  binary mode     |
| text  | 't'     |  text mode (default)     |
| update| '+'     |  open for reading and writing     |


Now that we know how to open a file, let's try doing some reading and writing. If you look at the table above you'll notice the text mode is the default, for your purposes you'll really only be using text mode this means when content is read from the file you've opened all of it will be in string format to begin with, this is important to remember if you are working with data because python will not automatically parse the data into numbers that you can perform mathematical equations on. The binary mode on the other hand reads the bytes of data in your file directly without attempting to interpret it. This means if you had a file with a single `a` reading the file in binary mode would return `01100001` which is the binary representation of the ASCII 'a'. In contrast if you read the same file in text mode the returned value would be `'a'` (Note: apostrophes and quotation marks are interchangable in python, both denote a string literal)

## Reading from a file
If you want to read the entire contents of the file in all at once you can use `file_contents = file_object.read()` this calls the read() function on the file which you opened using the `open()` function we discussed earlier. In many cases reading the entire file in one giant blob is not the best way to accomplish any given task which is why there are other reading methods and functions available. In some instances it may be better for each line in the file to be a separate string. We can get a list of all the lines in separate strings using `file_lines = file_object.readlines()` which will return a list with each entry being its own line in the file. The final option for file reading is the `readline()` function. This function reads a single line at a time and returns it as a string. This function is best used with a loop so that you can incrementally read and process each line from a file rather than trying to do it all at once. Before we take a look at the code in the next cell and see some working code we need to cover the `close()` function. It is critical (if you don't want to have corrupted files) to close the file when you are done using it, if it helps you can think of it like saving your work before you shutdown your computer, if you don't save your work it won't all be there when you return. To close a file object all you need to do is `file_object.close()`. Alternatively, you can avoid needing to remember to include the `close()` statement by using a little python magic with the `with` keyword. The following example creates a file named `data.txt` and writes `hello world` then automatically closes the file when all writing has completed.

```python
with open('data.txt', 'x+') as file:
    file.write('Hello World')

print('File closed when this line reached')
```

Now let's look at some real examples using the sample data in the data/ directory.

In [5]:
# example using all 3 read methods
with open('data/data1.txt') as myfile:
    all_content = myfile.read()
    print('----- Value of read() function: -----')
    print(all_content)
    
print('\n')
with open('data/data1.txt') as myfile:
    single_line = myfile.readline()
    print('----- Value of readline() function: -----')
    print(single_line)
    
print('\n')
with open('data/data1.txt') as myfile:
    all_lines = myfile.readlines()
    print('----- Value of readlines() function -----')
    print(all_lines)
    

----- Value of read() function: -----
hello world

this is line 3
let's parse some data
here's another line
the next line is all whitespace
            
this is another line
wow so creative 
this is the last line in the file


----- Value of readline() function: -----
hello world



----- Value of readlines() function -----
['hello world\n', '\n', 'this is line 3\n', "let's parse some data\n", "here's another line\n", 'the next line is all whitespace\n', '            \n', 'this is another line\n', 'wow so creative \n', 'this is the last line in the file']


In [7]:
# example iterating over a file line by line
with open('data/data1.txt') as mf:
    line_num = 0
    for line in mf:
        print('{}: {}'.format(line_num, line))  # format() is another way to insert variables into a string
        line_num += 1

0: hello world

1: 

2: this is line 3

3: let's parse some data

4: here's another line

5: the next line is all whitespace

6:             

7: this is another line

8: wow so creative 

9: this is the last line in the file


In [16]:
# example parsing comma separated numbers - notice how the last number doesn't get converted
with open('data/data2.txt') as mf:
    line = mf.readline()
    line = mf.readline()
    # split the line each time a ',' occurs in the line
    data = line.split(', ')
    i = 0
    # parse the strings of numbers into python numbers
    for val in data:
        if val.isdigit():
            data[i] = int(val)
            i += 1
        # uncomment the following code to fix the last number not being converted
        #if not val.isdigit():
            #val.rstrip()
            #data[i] = int(val)
            #i += 1
    print(data)

[10, 15, 5, 7, 6, 3, 2, 3, 12, 2, 78, 8, 0, 10, '23\n']


## Writing to a file
As you saw in an example for reading files, writing files is super easy. There's only one command you need and that is `write()`. The example you saw earlier used a string literal `'Hello World'` but you can also pass variables into it too. You'll likely want to format the data you are writing to a file in some way which means you'll need to make use of escape characters. Escape characters are basically a combination of symbols and letters which tells the python interpreter to treat it like a type of white space. If you wanted a new line to be added after your sentence you'd want to add a `\n` to the end of the string. Similarly, if you wanted to add a tab in between two numbers you would use a `\t`. For a full list of escape characters in python look [here](http://python-ds.com/python-3-escape-sequences).

In [5]:
with open('data/data3.txt', 'w') as nf: # if the file doesn't exist it automatically creates one for you
    nf.write('This is a test')
    my_string = 'this is stored in a variable'
    nf.write(my_string)
    my_num = 3
    nf.write(str(my_num))

with open('data/data3.txt') as myfile:
    print(myfile.read())
    
# it's important to be aware that python will not automatically create a new line each time the write() 
# function is called see the next example for how you can format the above code to have each write() statement
# added to a new line

This is a testthis is stored in a variable3


In [7]:
# in this example we've added '\n' to tell python we want to add a newline after our text
# if you want a tab instead of a newline you can try replacing the \n with \t 
with open('data/data4.txt', 'w') as mf:
    mf.write('This is a test\n')
    my_string = 'this is stored in a variable\n'
    mf.write(my_string)
    my_num = 3
    mf.write(str(my_num))
    
with open('data/data4.txt') as file:
    print(file.read())

This is a test
this is stored in a variable
3


# 2.3 String Manipulation
By now you've seen quite a few examples using functions available to strings in python. Earlier in this section we used the `format()` function to insert variables in place of `{}` and in the example parsing numbers from a file we used the `rstrip()` function to remove the `\n` from the last number in the line. To understand strings better we first need to look at how python interprets them. In python a string is a list of characters with a particular order. This means if you wanted to get just the 'w' from the string `'hello world'` you could access it like a list (remember that the first item in a list is found at index 0) so something like the following example would return only the w from the string.
```python
my_str = 'hello world'
only_w = my_str[6]
# starting with index 0 count each letter including whitespace and you'll find the 'w' at index 6
```
Because strings are secretly lists we can also use the `len()` function to get the length of the string, this is often useful if you are looping over a string's characters in search of something specific this way you can set up your for loop similar to the one below:
```python
my_str = 'abcdefg'
for i in range(len(my_str)):
    print(my_str[i])
```
Or alternatively you could just loop over the string like we did with lists:
```python
my_str = 'zxywvu'
for character in my_str:
    print(character)
```
Seems pretty straightforward right? Let's take a look at some new functions we haven't messed with along with some examples for each of them.

If you want to count the number of occurences of a certain character or pattern of characters you should use the `count()` function. The only parameter you need to pass to this function is the character(s) you are trying to count. Let's look at an example:
```python
s = ' a a b a b b a a b a'
s.count('a')   # returns 6
s.count('b')   # returns 4
s.count(' ')   # returns 10
```

If you want to find the index of a letter or word you can use the `find()` function. Much like the `count()` function, the only argument you need is what you are trying to find. Let's look at an example:
```python
s = 'hello world'
s.find('w')    # returns 6 
s[6]           # returns w
s.find('world') # also returns 6 because that is the index of the first character of the pattern that matches
s.find('burger') # returns -1 because burger is not in the string 'hello world'
```

If you want to get part of a string from a bigger string you'll need to slice it. At first slicing seems complicated and confusing but its fairly simple. The following table will help show the different ways you can slice a string depending on what information you have and what information you need. In the table we assume the string you are slicing is stored in the variable named 's'.

|Code          |Meaning                                                                             |
|---           |---                                                                                 |
|`s[start:end]`| returns all content between start index and end-1                                  |
|`s[start:]`   | returns all content starting at the specified index through the rest of the string |
|`s[:end]`     | returns all content from the beginning of the string until end-1 index             |
|`s[:]`        | returns the entire string as a copy                                                |

Example:
```python
s = 'hello world'
hello = s[:5]   # returns 'hello'
world = s[6:]   # returns 'world'
hello_world = s[:]  # returns 'hello world'
o_w = s[4:7]    # returns 'o w'
```

Sometimes it would be more useful to split a string depending on a certain delimeter, for this we use the `split()` function which takes the delimeter you are splitting the string by as a parameter. For example if you had a list of numbers separated by commas and wanted to split each into their own string you could use `my_string.split(',')` which would return a list of strings.

These functions above are some of the most commonly used ones, there are many, many more so I will direct you to either search for the specific function you need on google or check [here](https://programminghistorian.org/en/lessons/manipulating-strings-in-python) for a pretty extensive guide.

# 2.4 Converting Files from Wide to Long Format
You most certainly could take a crack at writing your own program to manually convert datasets in wide format to long format, but luckily for you pandas has a function. In this short section I'll be showing you how to use the `wide_to_long()` function provided by the pandas library.

In [16]:
import pandas as pd

df = pd.read_csv('data/data.csv', index_col=False)
print(df)


df.reset_index(inplace=True,drop=True)
df['ID'] = df.index
pd.wide_to_long(df, ['OT_', 'NT_'], i='ID', j=['MISS', 'HIT', 'CR', 'FA']).reset_index().rename(columns={'OT_': 'OT', 'NT_': 'NT'})


   PID   OT_MISS   OT_HIT   OT_CR   OT_FA   NT_MISS   NT_HIT   NT_CR   NT_FA
0  111       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
1  121       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
2  212       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
3  321       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
4  423       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
5  534       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
6  621       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
7  721       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
8  812       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
9  922       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0.68
   PID   OT_MISS   OT_HIT   OT_CR   OT_FA   NT_MISS   NT_HIT   NT_CR   NT_FA  \
0  111       0.1     0.23    0.56    0.11       0.9      1.0    0.92    0

IndexError: Too many levels: Index has only 1 level, not 2