# PYTHON FUNDAMENTALS, Part 2 (Workbook)

[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/)

## Credits

Most code snippets and explanatory texts from: 
- Charles Russel Severance's [Python for Everybody](https://www.py4e.com/) lecture slides
  -  Unless annotated as indicated below, text and code came from this source. Additionally, they could be explicitly indicated by the marker `[PES]`

and
- Charles Russel Severance's [Python for Everybody - Online HTML Book](https://www.py4e.com/html3/)
  - Notes and codes from this source are indicated by this the marker `[PEW]`

Additional codes and comments from:

- Antonio C. Briza
  - code cells marked by `#あ` at the first or last line
  - markdown and raw cells marked by `[あ]` at the end

- J.R. Johansson's [Introduction to Scientific Computing with Python](http://github.com/jrjohansson/scientific-python-lectures) 
  - Notes and codes from this source are indicated by this the marker `[JRJ]`

# Before you begin

First, make a copy of this notebook so that you can make changes as you please. Run and edit the copy instead of this original notebook. To copy this notebook, go to `File|Make a Copy`.

If this notebook is already a copy of the original, clear all outputs in this notebook. Go to the Menu and click on `Cell| All Outputs| Clear`. Once this is done, you're ready to go.

# Suggestions for Learning

- Code snippets in Raw Cells are meant to be written by Beginners.
- Code snippets in Code Cells are for illustration purposes. They are meant to be executed by both Beginners and more experienced Learners. (Beginners, may wish to also type them, if they so choose).

For better learning experience, the following are suggested:

### For Beginners
1. Create a new Code Cell below the code snippets inside Raw Cells (Insert Cell below then convert the cell type to Code Cell). 
2. Go to your newly-created Code Cell and re-type what you see in the code snippet (don't copy-paste) 
3. Execute (and experiment) on the Code Cell. 
4. Remember to learn by doing (not just by reading or seeing)

### For Coders
1. If you are not yet familiar with the concept, follow Steps 1&2 of the Instructions for Beginners
2. If you are already familiar with the concept being presented, convert the Raw Cell into a Code Cell.
3. Execute (and experiment) on the Code Cell. 
4. Learn the "adjacent concepts", e.g. read related documentation.
5. Help your classmates, because teaching is a wonderful way to learn. 

### For All
* Make this your personal notebook.
    * Add your own text annotations in Markup Cells. 
    * Add comments to parts of code that you find difficult to understand
    * Breakdown difficult code into several small pieces (maybe, several Code Cells) that are easier to understand

<a id='contents'></a>

# TABLE OF CONTENTS

[Chapter 6 Strings](#chapter6)<br>
[Chapter 7 Files](#chapter7)<br>


<a id='chapter6'></a>

# CHAPTER 6  - `Strings`

## The `String` Data Type

A string is a sequence of characters.

Recall:
    Python strings can be enclosed in single quote(`'`) or double quotes (`"`). The back quote (the character that shares the key with tilde `~`) cannot be used to identify strings. [あ] 

In [12]:
str1 = "Hello"
str1

'Hello'

In [13]:
str2 = "there"
str2

'there'

In [16]:
str1, str2

('Hello', 'there')

When a string contains numbers, it is still a string

str3 = '123'
str3 = str3 + 1

We can convert numbers in a string into a number using `int()` ...

In [26]:
str3 = "123"
x = int(str3) + 1
print(x)

124


or `float()` [あ]

In [27]:
x = float(str3) + 1
print(x)

124.0


## A string is a sequence of characters

- We can get at any single character in a string using an **index** specified in __square brackets__
- The index value must be an integer and __starts at zero__

In [28]:
fruit = 'banana'
letter = fruit[0] #あ remember: index starts at 0
print(letter)

b


The index value can be an expression that is computed. 

In [29]:
x = 3
w = fruit[x - 1]
print(w)

n


[PEW]
You can use any expression, including variables and operators, as an index, but the value of the index has to be an integer. 

letter = fruit[1.5]

Indexing with a negative number counts from the end of the string. The last element of the string is indexed at `-1`.  [あ] 

In [31]:
fruit[-1]

'a'

In [32]:
fruit[-2]

'n'

### `String index out of range` error

You will get a python error if you attempt to index beyond the end of a string

In [34]:
connection = 'wifi'
print(connection[-2])

#あ

f


Remember that string index starts at 0, so the last letter 
should be accessed using [あ]

In [37]:
connection[3]

'i'

### String Length

`len` is a built-in function that returns the number of characters in a string:

In [38]:
fruit = 'banana'
print(len(fruit))

6


How do we access to the end of the string?

In [47]:
length = len(fruit)
fruit[length-1]
length = int(length)
length

#あ

6

The easier way is to use negative indexing mentioned a while ago. [あ]

Be careful not to do the following: [あ]

In [50]:
last = len(fruit)
last

6

### Looping through strings using `for`

You can access each character in a string using a `for` loop

In [53]:
fruit = 'banana'
for letter in fruit : 
    print(letter)

b
a
n
a
n
a


## More String Operations

### Slicing Strings

A segment of a string is called a slice. 

- Slices are defined using a colon (`:`) operator. 
- If `s` is a string, `s[n:m]` defines a segment from the `n`th index **up to**, but __not including__, the `m`th index
- mnemonic that could be used: the right `slice` of bread looks like `[)`

[あ]

Let's get the string index of each character in the string [あ]

In [None]:
i =0
print('i  char')
print('-------')
for char in s:
    print(i,' '+char)
    i = i + 1
    
#あ

Now let's produce string some string slices [あ]

If we leave off the **first number** of the slice, it is assumed to be the **beginning** of the string.

If we leave off the **last number** of the slice, it is assumed to be the **end** of the string.

If we leave both the **first number and last number** of the slice, we will get the  **whole** string. [あ]

Slices also work with negative indexing [あ]

### Strings are Immutable

It is not possible to give a new value to an character in a string.

[あ]


Nor is it possible to assign a new value to a slice of a string [あ]

To "modify" a string in python, a new string has to be made: [あ]

### String Concatenation

When the  `+`  operator is applied to strings, it means “concatenation”

In [63]:
a = 'Hello'
b = a + 'There' + a
print(b)

HelloThereHello


In [64]:
c = a + ' ' + 'There'
print(c)

Hello There


## String Library

- Python has a number of string functions which are in the `string` library
- These functions are already built into every string - we invoke them by _appending the function to the string variable_
- These functions __do not modify__ the original string, instead they return a new string that has been altered

Below are examples of some of the useful string functions.

#### lower case with function `lower()` 

In [65]:
text = 'Enero, Febrero, Marzo'
spanish_months = text.lower()
print(spanish_months)

#あ

enero, febrero, marzo


#### UPPER CASE with function `upper()`

In [66]:
greeting = 'Hello Bob'
greeting.upper()

'HELLO BOB'

#### Stripping Whitespaces with functions `lstrip()`, `rstrip()`, and `strip()` 

In [67]:
greeting = '   Hello Bob  '

In [68]:
greeting.lstrip()

'Hello Bob  '

In [69]:
greeting.rstrip()

'   Hello Bob'

In [70]:
greeting.strip()

'Hello Bob'

#### Splitting the words with `split()`

This operation returns a **list** data type containing the words in the string. We will talk about lists in a later chapter. [あ]

In [71]:
quotation = 'One ring to rule them all, one ring to find them.'
quotation.split()

['One',
 'ring',
 'to',
 'rule',
 'them',
 'all,',
 'one',
 'ring',
 'to',
 'find',
 'them.']

### Formating Strings

#### Formatting strings with `format()`

An simple intuitive way of printing a combination of string and variable values is to use the `format` string function.

Steps
- Create a string with desired text
- Put **curly braces** in positions where you'd like variable values to appear.  
- Invoke the `format` operation, and pass the variables you want to be displayed. The variable values will be displayed in the sequence with which they are passed to the `format` function.

[あ]

The code below accomplishes the same thing, without the extra assignment statement. [あ]

#### Other useful string operations

    startswith()
    endswith()
    capitalize()
    center()
    find()
    replace()

####  strings and the `print` function

The print statement concatenates strings with a space [JRJ]

The print statement converts all arguments to strings  [JRJ]

## Getting Help on Strings

#### `dir()` function to get a list of string operations

You can get a list of functions (or methods) on strings by using the `dir()` function on a string literal. Functions that start and end with double underscores (`__`) are special methods used by Python. Disregard them for the meantime. [あ]

Below is a listing of string methods with the special methods removed. [あ]

    ['capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

#### Using the `help()` function

A way to get help is by calling the `help()` function. To use it, just pass `s.method` as argument to the function (where `s` is a string variable and `method` is the method you want to get help on. See example below.

#### `?` in  Jupyter notebooks

In Jupyter, you could get help putting a question mark before or after a valid method call. [あ]

#### **Python reference**  Lookup 

You could also get help by going to the official Python reference (go to Jupyter's Menu `Help|Python Reference`)

[あ]

#### Internet Lookup

A quick way to get help is by searching in the web. This is especially useful if you wan to get sample code snippets for reuse.  A useful site is `Stackoverflow`. 

Many times, the web results and Stackoverflow will point you to portions of the official reference.

When using solutions from the web, make sure you understand what pieces of code are useful for you, how they work, and how you could adapt them to your own program (if needed).

[あ]




## Final Words

- String manipulation is important for reading data,  especially data that still needs cleaning
- Slicing is an important skill to learn as it is also applicable to vectors, matrices, Series and Dataframes
- A big part of problem solving is done by identifying what tool to use and how to use it. For this, knowing how and where to get help is essential. 
- The portion of this chapter on getting help on strings is also applicable to the other python modules we'll learn in the upcoming lessons

[あ]

## EXERCISES

#### Exercise 1

Given the code below    

    text = 'Business Analytics'

What will the following expressions produce?
    
    text[4]
    text[2:4]
    text[:4]
    text[4:]
    text[-2]
    text[-4:-2]
    text[-2:]
    text[:-2]
    
[あ]

In [75]:
text = 'Business Analytics'
print(
    text[4],
    text[2:4],
    text[:4],
    text[4:],
    text[-2],
    text[-4:-2],
    text[-2:],
    text[:-2]
)

n si Busi ness Analytics c ti cs Business Analyti


#### Exercise 2

Given the code below:
    
    phrase = 'One Ring to rule them all'

What string slicing expression gives out the following words:

    'One '
    'Ring'
    'all'
    'them'
    'One Ring to rule them all'
    
[あ]

In [111]:
phrase = 'One Ring to rule them all'
words = phrase.split()
for i in words:
    print(i)
      

One
Ring
to
rule
them
all



## CHALLENGE


Write a program which repeatedly reads a **full name**, then does the following:
- strips the name of any trailing spaces
- capitalizes the first letter
- puts the rest into smaller letters 

[あ]

In [104]:
fullname = ' EDWIN LACAP'
fullname = fullname.strip()
fullname = fullname.capitalize()
fullname

'Edwin lacap'

[TABLE OF CONTENTS](#contents)

<a id='chapter8'></a>

# CHAPTER  7 - Files

Computations in a computer are done in the Main Memory. Data and computations in the main memory are lost when the computer's power is turned off. (This is why you have to reload jupyter notebooks and re-run cells after you have turned off your computer.)

**Files** provide us with the ability to save data such that they could be recovered even after the computer's power is turned off. Files are saved in your computer's hard drive, a DVD or in a USB flash drive. 

[あ]

![computer_architechture.png](attachment:computer_architechture.png)

There are many kinds of files, but in this chapter, we will only talk about text files. 

You could consider a text file as a sequence of strings.

[あ]

## Opening Files

- Before we can read the contents of the file, we must tell Python which file we are going to work with and what we will be doing with the file
- This is done with the `open()` function
- `open()` returns a **file handle** - a variable used to perform operations on the file
- Similar to “File | Open” in a Word Processor

##  The `open()` function and the file handle

To open a file, you write code similar to the one below [あ]

    handle = open(filename, mode)

- returns a `handle` use to manipulate the file
- `filename` is a string
- `mode` is optional and should be `'r'` if we are planning to read the file and `'w'` if we are going to write to the file

In [112]:
file_handle = open('words.txt', 'r')
print(file_handle)

<_io.TextIOWrapper name='words.txt' mode='r' encoding='UTF-8'>


In [117]:
createFile = open('words2.txt', 'w')
print(createFile)

<_io.TextIOWrapper name='words2.txt' mode='w' encoding='UTF-8'>


The file handle is not the actual data contained in the file, but instead it is a "handle" that we can use to read the data. You are given a handle if the requested file exists and you have the proper permissions to read the file.

![file_handle.png](attachment:file_handle.png)

If the file does not exist, open will fail with a traceback and you will not get a handle to access the contents of the file: [PEW]

In [120]:
file_handle = open('words.txt', 'r')

##  The newline character (`'\n'`)

A text file can be thought of as a sequence of lines. To break the file into lines, there is a special character that represents the "end of the line" called the newline character.

In Python, we represent the newline character as a `backslash-n` in string constants. Even though this looks like two characters, it is actually a single character. When we look at the variable by entering "stuff" in the interpreter, it shows us the `\n` in the string, but when we use print to show the string, we see the string broken into two lines by the newline character.[PEW]

In [122]:
stuff = 'Hello\nWorld!'
print(stuff)

Hello
World!


In [123]:
stuff = 'X\nY'
print(stuff)

X
Y


In [124]:
len(stuff)

3

##  Reading Files

- A file handle open for reading can be treated as a sequence of strings where each line in the file is a string in the sequence
- We can use the for statement to iterate through a sequence
- Remember - a sequence is an ordered set

In [135]:
f = open('words.txt')
for line in f:
    line = line.strip()
    print(line)

This is a sample file
This is second line
This is 3rd line


#### Why are there blank lines?

Each line from the file has a newline at the end: The `print` statement adds a newline to each line.
We can strip the whitespace from the right-hand side of the string using `rstrip()` from the string library. The newline is considered “white space” and is stripped

In [136]:
f = open('words.txt')
for line in f:
    line = line.rstrip()
    print(line)

This is a sample file
This is second line
This is 3rd line


#### Reading the whole file

We can read the entire file in one go with the `read()` method

In [141]:
f = open('words.txt')
text = f.read()
print(len(text))

#あ

67


### Searching through a file

Our example file for this exercise is [mbox-short.txt](www.py4e.com/code3/mbox-short.txt), a text file that records mail activity for a project. Try viewing the contents of `mbox-short.txt` using your terminal.
Our task is to extract all the email addresses from the text file.[あ]

When you are searching through data in a file, it is a very common pattern to read through a file, ignoring most of the lines and only processing lines which meet a particular condition. We can combine the pattern for reading a file with string methods to build simple search mechanisms.

In [140]:
fhand = open('words.txt')
count = 0
for line in fhand:
    if line.startswith('From:'):
        line = line.rstrip()
        print(line)

We can use the `find` string method to simulate a text editor search that finds lines where the search string is anywhere in the line. Since find looks for an occurrence of a string within another string and either returns the position of the string or `-1` if the string was not found, we can write the following loop to show lines which contain the string **"@uct.ac.za"** (i.e., they come from the University of Cape Town in South Africa)

In [149]:
fhand = open('words.txt')
for line in fhand:
    line = line.rstrip()
    if line.find('Edwin') == -1: 
        continue
    print(line)

This is a sample file from Edwin Lacap


### Using `try`, `except`, and `open`

Handling files easily lends itself to run time errors, especially when the file you are trying to open does not exist. To prevent run time errors, it is a good practice to put the `open` command of a file inside a `try-except` block. [あ]

Execute the code below, entering a name of a file that does not exist.[あ]

In [154]:
filename = input('Enter a filename:')
try:
    f = open(filename,'r')
    text = f.read()
    print('{} has {} characters.'.format(filename, len(text)))
except:
    print(filename+ ' does not exist.')


Enter a filename:missingfile.txt
missingfile.txt does not exist.


The code below is similar to the one above. Now execute it entering a name of a file that you are sure exists. [あ]

In [153]:
filename = input('Enter a filename:')
try:
    f = open(filename,'r')
    text = f.read()
    print('{} has {} characters.'.format(filename, len(text)))
except:
    print(filename+ ' does not exist.')


Enter a filename:words.txt
words.txt has 84 characters.


### Writing files

To write a file, you have to open it with mode `'w'` as a second parameter. When you are done writing, you have to `close` the file to make sure that the last bit of data is physically written to the disk so it will not be lost if the power goes off.

In [158]:
fout = open('output.txt', 'w')
fout.write('First line\n')
fout.write('Second line\n')
line3 = '1 Petabyte is {}^{} ({}) bytes\n'.format(10,15, 10**15)
fout.write(line3)
fout.close()

We could close the files which we open for read as well, but we can be a little sloppy if we are only opening a few files since Python makes sure that all open files are closed when the program ends. When we are writing files, we want to explicitly close the files so as to leave nothing to chance.

## Final Words

- File handling skills are important for manipulating data directly.
- However, many python modules (such as **pandas**) already provide methods for handling different types of files, providing many additional options on how to read and interpret the file.
- The skills you have learned in this chapter will remove a bit of the mystery behind those file-handling methods provided by the other python modules

[あ]

## EXERCISES

#### Exercise 1  

Write a program to read through a file and print the contents of the file (line by line) all in upper case.

from [PEW] E8-1

In [162]:
f = open('words.txt')
for line in f:
    line = line.strip()
    line = line.upper()
    print(line)

THIS IS A SAMPLE FILE FROM EDWIN LACAP
THIS IS SECOND LINE
THIS IS 3RD LINE
4TH LINE


#### Exercise 2  

Write a program to read through a file and output the number of lines in a file.

adapted from [PEW] 

In [169]:
f = open('words.txt')
text = f.read()
print(len(text))

84


## CHALLENGE

from [PEW] E8-2

Write a program to prompt for a file name, and then read through the file and look for lines of the form:

In [189]:
filename = input('Enter a filename:')

fhand = open(filename, 'r')
for line in fhand:
    line = line.rstrip()
    if line.find('X-DSPAM-Confidence: 0.8475') == -1: 
        continue
    print(line)
    

Enter a filename:mbox.txt
X-DSPAM-Confidence: 0.8475
X-DSPAM-Confidence: 0.8475


When you encounter a line that starts with "X-DSPAM-Confidence:" pull apart the line to extract the floating-point number on the line. Count these lines and then compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.

In [None]:
filename = input('Enter a filename:')

fhand = open(filename, 'r')
for line in fhand:
    line = line.rstrip()
    if line.find('X-DSPAM-Confidence') == -1: 
        continue
    print(line)
    

<a id='chapter8'></a>

[TABLE OF CONTENTS](#contents)