Up until now, we dealt with very little data created by us or asked from user, and we stored them in lists or variables. In this lecture you will be introduced to file handling in Python.

After this lecture you will be able to;

- Open/Close files
- Read/Write files
- Utilize text files in Python
- Perform string processing in Python

---

## Text File vs Binary File

A __text file__ is a file containing characters, structured as individual lines of text. In addition to printable characters, text files also contain the nonprinting newline character, ```\n```, to denote the end of each text line. 

Text files can be directly viewed and created using a text editor.

In contrast, __binary files__ can contain various types of data, such as numerical values, and are therefore not structured as lines of text. Such files can only be read and written via a computer program.


## Handling Files

Fundamental operations of all types of files include _opening_ a file, _reading_ from a file, _writing_ to a file, and _closing_ a file. Next we discuss each of these operations when using text files in Python.

All files must first be opened before they can be used. In Python, when a file is opened, a file object is created that provides methods for accessing the file.

The ```open``` function opens the given file with ```r``` reading access. 

In [1]:
# IOError occured because we should put the write true direction.
input_file = open('sample.txt', 'r') 

FileNotFoundError: [Errno 2] No such file or directory: 'sample.txt'

In [3]:
input_file = open('sample_data/sample.txt', 'r') # True direction, with folder inside.
print(input_file)
input_file.close()

<_io.TextIOWrapper name='sample_data/sample.txt' mode='r' encoding='UTF-8'>


When you try to print the variable created, you will not get what you want. It is just an object created to use in later statements. 

--- 


To open a file the ```open``` function used with parameter ```'w'```

In [4]:
# I used sample_data/name because I want to save in sample_data folder
output_file = open('sample_data/mynewfile.txt', 'w') 

We won't get any error because we are creating a new file only error we might get is harddisk full error from system. After we are done manipulating the file we have to close the file. with ```close()``` function.

In [5]:
output_file.close()

### Reading Files

The ```readline``` method returns as a string the next line of a text file, including the end-of-line character, ```\n```.When the end-of-file is reached, it returns an empty string:

In [7]:
input_file = open('sample_data/sample.txt', 'r')
empty_str = ''
line = input_file.readline() 
while line != empty_str:
    print(line)
    line = input_file.readline()

input_file.close()

Line one

Line two

Line three



I used ```while``` loop to show the logic behind the reading, however ```for``` loop gives us a more elegant way.

In [9]:
input_file = open('sample_data/sample.txt', 'r')
for line in input_file:
    print(line)

Line one

Line two

Line three



### Writing Files

The ```write``` method is used to write strings to a file:

In [10]:
empty_str= ''
input_file = open('sample_data/sample.txt', 'r')
output_file = open('sample_data/newfile.txt', 'w')
line = input_file.readline()

while line != empty_str:
    output_file.write(line)
    line = input_file.readline()
    
output_file.close()

The ```write``` method does not add a newline character to the output string . Thus, a newline character will be output only if it is part of the string being written. But in the example above ```line``` variable comes with ```\n``` at the end. 

## String Processing

The information in a text file, as with all information, is most likely going to be searched, analyzed, and/or updated. Collectively, the operations performed on strings is called __string processing__.

We have learned some basic operations on strings such as 

- accessing elements: ```name[k]```
- getting the length: ```len(str)```

__String Traversal__: The characters in a string can be easily traversed, without the use of an explicit index variable, using the ```for chr in string``` form of the for statement.

In [11]:
space = ' '
num_spaces = 0
line = input_file.readline()
for k in range(0, len(line)):
    if line[k] == space:
        num_spaces = num_spaces + 1

In [12]:
num_spaces

0

There are a number of methods specific to strings in addition to the general sequence operations.

### Checking the Contents of a String


In [13]:
s = 'Hello World!'

__```str.isalpha()```: __ Returns True if ```str``` contains only letters

In [14]:
s.isalpha() #

False

__```str.isdigit()```__ Returns True if str contains only digits.

In [15]:
s.isdigit()

False

In [16]:
"1".isdigit()

True

__```str.islower()``` __ and __```str.isupper()```__ : Returns True if str contains only lower/upper case letters

In [17]:
s.islower()

False

In [18]:
s

'Hello World!'

In [19]:
s.isupper()

False

In [20]:
"HELLO WORLD".isupper()

True

__```str.lower()```__ and __```str.upper()```__: Returns lower/upper case version of str

In [21]:
s

'Hello World!'

In [22]:
s.upper()

'HELLO WORLD!'

In [23]:
s.lower()

'hello world!'

In [24]:
s # Does not change... You have to assign it to an new variable or overwrite

'Hello World!'

In [25]:
s = s.lower()

In [26]:
s

'hello world!'

### Searching the Contents of a String


__```str.find(w)```:__ Returns the index of the first occurrence of w in str, Returns -1 if not found


In [27]:
s

'hello world!'

In [28]:
s.find('d')

10

In [29]:
s.find('x')

-1

### Replacing the Contents of a String

__```str.replace(w,t)```:__ Returns a copy of str wita ll occurrences of w replaced with t.

In [30]:
s

'hello world!'

In [31]:
s.replace("l", "*")

'he**o wor*d!'

### Removing the Contents of a String

__```str.strip(w)```:__ Returns a copy of str with all leading and trailing characters that appear in w removed.

In [32]:
s

'hello world!'

In [33]:
s.strip('!')

'hello world'

### Splitting a String

__```str.split(w)```:__ Returns a list containing all strings in str delimited by w:

In [34]:
s

'hello world!'

In [35]:
s.strip('!').split(" ")

['hello', 'world']

In [36]:
s[:-4]

'hello wo'

### Apply It!

<p style=color:##008080>
Write a program the removes all occurrences of the letter ‘e’ from a text file. To be able to get the text file copy paste a paragraph from internet into a file and use that file as a text file. Output should be similar to this:
</p>
        
        
        This program will display the contents of a provided text file
        with all occurrences of the letter 'e' removed
        Enter file name (including file extension): sample_data/totc_1.txt


        Th Priod
        
        It was th bst of tims, it was th worst of tims,
        it was th ag of wisdom, it was th ag of foolishnss,
        it was th poch of blif, it was th poch of incrdulity,
        it was th sason of Light, it was th sason of Darknss,
        it was th spring of hop, it was th wintr of dspair,
        w had vrything bfor us, w had nothing bfor us,
        w wr all going dirct to Havn, w wr all going dirct
        th othr way--in short, th priod was so far lik th prsnt
        priod, that som of its noisist authoritis insistd on its
        bing rcivd, for good or for vil, in th suprlativ dgr
        of comparison only.

        Thr wr a king with a larg jaw and a qun with a plain fac,
        on th thron of ngland; thr wr a king with a larg jaw and
        a qun with a fair fac, on th thron of Franc.  In both
        countris it was clarr than crystal to th lords of th Stat
        prsrvs of loavs and fishs, that things in gnral wr
        sttld for vr.

    
        379 occurrences of the letter 'e' removed
        Percentage of data lost: 6%
        Modified text in file sample_data/totc_1_e.txt


---
## Programming Exercises

__P1:__ 

    Write a Python function called reduceWhitespace that is given a line read from a text file and returns the line with all extra whitespace characters between words removed,
    
        ‘This  line  has  extra  space  characters’ ➝ ‘This line has extra space characters’
    

__P2:__

    Write a Python function named extractTemp that is given a line read from a text file and displays the one number (integer) found in the string,
    
                                ‘The high today will be 75 degrees’ ➝ 75

__P3:__ 

    Write a Python function named checkQuotes that is given a line read from a text file and returns True if each quote characters in the line has a matching quote (of the same type), otherwise returns False.
    
                            ‘Today’s high temperature will be 75 degrees’ ➝ False

__P4:__

    Write a Python function named countAllLetters that is given a line read from a text file and returns a list containing every letter in the line and the number of times that each letter appears (with upper/lower case letters counted together),
    
        ‘This is a short line’ ➝ [('t', 2), ('h', 2), ('i', 3), ('s', 3), ('a', 1),('o', 1), ('r', 1), ('l', 1),                                     ('n', 1), ('e', 1)]

__P5:__

    Write a Python function named interleaveChars that is given two lines read from a text file, and returns a single string containing the characters of each string interleaved,
    
                                       ‘Hello’, ‘Goodbye’ ➝ 'HGeololdobye' 

__P6:__

    Write a program segment that opens and reads a text file and displays how many lines of text are in the file.

__P7:__

       Write a program segment that reads a text file named original_text, and writes every other line, starting with the first line, to a new file named half_text.

__P8:__

    Write a program segment that reads a text file named original_text, and displays how many times the letter ‘e’ occurs.

---
## Modification Problem

Modify the Sparse Text program you wrote in ```Apply It!``` section, so that instead of the letter ‘e’ being removed, the user is prompted for the letter to remove.

---
## Development Problems


__D1: Sentence, Word, and Character Count Program__ 

    Develop and test a Python program that reads in any given text file and displays the number of lines, words, and total number of characters there are in the file, including spaces and special characters, but not the newline character, '\n'.

__D2: Variation on a Sparsity Program__

    Develop and test a program that reads the text in a given file, and produces a new file in which the first occurrence only of the vowel in each word is removed, unless the removal would leave an empty word (for example, for the word “I”). Consider how readable the results are for various sample text.
    

__D3: Message Encryption/Decryption Program__

    Develop and test a Python program that reads messages contained in a text file, and encodes the messages saved in a new file. For encoding messages, a simple substitution key should be used. 
    
    
![Encryption](https://raw.githubusercontent.com/NAU-ACM/IntroductionToPython/master/images/encrpytion_key.png)
    
    Each letter in the left column is substituted with the corresponding letter in the right column when encod- ing. Thus, to decode, the letters are substituted the opposite way. Unencrypted message files will be simple text files with file extension .txt. Encrypted message files will have the same file name, but with file extension .enc. For each message encoded, a new substitution key should be randomly generated and saved in a file with the extension '.key'. Your program should also be able to decrypt messages given a specific encoded message and the corresponding key.

__D4: Universal Product Code Check Digit Verification Program__

    A check digit is a digit added to a string of digits that is derived from other digits in the string. Check digits provide a form of redundancy of information, used for determining if any of the digits in the string are incorrect or misread.
    
    The Universal Product Code on almost all purchase items utilizes a bar code to allow for the scanning of items. Below the bar code is the sequence of digits that the bar code encodes, as illustrated below.
    
![](http://2.bp.blogspot.com/_YsgOkAVMc0A/TCX7SQEV_5I/AAAAAAAACFo/ZxvLKZvdmLM/s1600/UPC-A-036000291452.png)

    The last digit of the product code (2) is a check digit computed as follows,
    
    1. Add up all digits in the odd numbered positions (first, third, fifth, etc., starting with the leftmost digit) excluding the last check digit, and multiply the result by 3:
    
                                        0 + 6 + 0 + 2 + 1 + 5 = 14
                                                14 * 3 = 42
                                                
    2. Add up all digits in the even numbered positions (second, fourth, etc.) excluding the last check digit,
                                            
                                          3 + 0 + 0 + 9 + 4 = 16
    
    3. Take the sum of the two previous results mod 10,
                                          
                                           (42 + 16) mod 10 = 8
    
    4. Subtract the result from 10 to get the checksum digit. 
                                                
                                                10 - 8 = 2
                                                                                       
      
    Develop and test a Python program that verifi es the check digit of Universal Product Codes.