# Lecture 13

###   Flag-Controlled Loops; File Objects; More on Reading Files; File Validation; Thing Explaining

# 1. Flag-Controlled Loops

Recall that a *flag* is a fancy name for a `True`-`False` (or "yes-no") variable.  Sometimes, the easiest way to control a loop is using a flag, something like:

`while keep_going == True:`

where `keep_going` is a `bool` variable.  This variable will start out as `True`, but perhaps somewhere along the way something will happen that signals that the program should stop repeating -- at that moment, `keep_going` will be reset to be `False`.  

(BTW: instead of writing `while keep_going == True:`, you could just write

`while keep_going:`

because `keep_going == True` is just going to be evaluated, producing the same value as `keep_going` itself.)

<br><br><br><br><br><br><br><br><br><br>


Here's an example: a famous guessing game.


In [None]:
# EXAMPLE 1a: Guess the number
# This is an example of log base 2 of n

import random

# Choose the random number
secret_number = random.randrange(1,101)

# This holds the answer to the question "Should we keep guessing?"
# In the beginning, we certainly should
keep_guessing = True

print("I'm thinking of a number between 1 and 100! Can you guess it?")

while keep_guessing:
    guess = int(input("Enter a guess: "))
    if guess > secret_number:
        print("Too high!")
    elif guess < secret_number:
        print("Too low!")
    else:
        # Must be equal! Now is the time for the program to stop.
        print("You got it!")
        keep_guessing = False


<br><br><br><br><br><br><br><br><br><br>

# 2. File Objects

Entering in data and storing program output by hand can be tedious, especially if there is a lot of it.  Fortunately, Python provides a way to directly read from and write to files. I'm talking about the files you know and love: Word files, Excel files, PDFs, MP3s, JPEGs, web documents, etc. Most of the types of files I just mentioned are a little difficult to read directly, since in addition to whatever text you see directly, there is also formatting data you don't see.  That's why we'll mostly use plain text files, which end with the extension .txt: for those files, what you see is mostly the same as the actual raw content of the file.  (You can open and edit these with Notepad in Windows; 
and with TextEdit on Macs -- you may need to go to the Format menu and select "Make Plain Text".)

Let's start with writing.


In [None]:
BASICS OF FILE OBJECTS SYNTAX (WRITING)

OPEN A FILE FOR WRITING:

<fileobj var> = open("{actual file name}", "w")

CLOSE A FILE:

<fileobj var>.close()

WRITE A STRING INTO A (WRITING) FILE:
    
<fileobj var>.write(<SINGLE string var>)    

A file object is a variable which is associated to a file.  When you `open()` a file in writing mode (`"w"`), you will be writing over its contents, so be careful! (If no file exists with the given name, one will be created.) Also, you need the full name of the file you want to write to -- and, at least for now, you should probably put your file in the same folder as your Python script.

When you `.write()` a string to a file object, that string will be inserted into the file.  You can write several strings to a file; they will be placed in the file one after another, without spaces in between. 


<br><br><br><br><br><br><br><br><br><br>

Let's write 

`1 Alice`

`2 Bob`

`3 Charlie`

...

`10 Jake`

into a file named `newfile.txt`.


In [10]:
# EXAMPLE 2a: Basics of files

names = ["Alice", "Bob", "Charlie", "David", "Eve", "Frank", "George", "Howard", "Irene", "Jake"]

# Create a file object, associated to newfile.txt, opened in writing mode.
word_file = open("newfile.txt", "w")

# Let's write to it.

for i in range(len(names)):
    word_file.write("{0} {1} \n".format(str(i+1),names[i])) #Format creates a single string literal using the ones given.
    i += 1
    
    
# Be sure to close any files you open!!!!!!
word_file.close()


<br><br><br><br><br><br><br><br><br><br>

You can also read from files using file objects.  

In [None]:
BASICS OF FILE OBJECTS SYNTAX (READING)

OPEN A FILE FOR READING:

<fileobj var> = open("<actual file name>", "r")

CLOSE A FILE:

<fileobj var>.close()

READ n CHARACTERS FROM A FILE:
    
<string var> = <fileobj>.read(<n>)

READ ALL CHARACTERS FROM A FILE:
    
<string var> = <fileobj>.read()

File objects opened in reading mode (`"r"`) are actually rather more complicated. First, they should always be associated with a file that actually already exists.


<br><br><br><br><br><br><br><br><br><br>

A file object opened in reading mode should be thought of kind of like a pitcher.  In the beginning, the file object is filled with the entire contents of the file, in order.  Then, when you call functions that `.read()` from the file object, this "pours out" the contents (usually into some variable), starting from the beginning of the file.  Then, subsequent `.read()`s will continue with the first character that hasn't already been poured out.



In [1]:
# EXAMPLE 2b: Basics of .read()

# Create a file object, associated to evans_file.txt, opened in reading mode.
my_file = open("evans_file.txt", "r")

# Let's read 5 characters, then skip 3 characters, then read 2 more.
string_var = my_file.read(6)
print(string_var)
my_file.read(3) # This pours out 3 characters -- but doesn't print or store them. So it basically skips them.

print(my_file.read(2))
# Notice that the newline counts as a character!

print("Here comes the entire rest of the file: ")
print(my_file.read()) # This pours everything left out of the file all at once.

my_file.close()

abc de
hi
Here comes the entire rest of the file: 
 jkl
mno pqr
stu vwx
yz



<br><br><br><br><br><br><br><br><br><br>

# 3. More on Reading Files

Reading a file can be a bit tricky, because it is frequently necessary for the programmer to take apart the contents.  The three most common ways to read a file are

* word-by-word
* line-by-line
* and character-by-character.

For example, suppose you want to do a word count of a file.  How can you accomplish that?

In [14]:
# EXAMPLE 3a: Word Count 

file = open("report.txt", "r")

print("The number of words in the file is: ")

# This creates a string containing the whole file
whole_file = file.read()
# This turns it into a list of individual words
word_list = whole_file.split()
# And so this prints the word count:
print(len(word_list))
file.close()

# OR: you could combine all of that into one line.
file = open("report.txt", "r")
print( len(file.read().split()) )
file.close()

The number of words in the file is: 
10
10



<br><br><br><br><br><br><br><br><br><br>

So that's how you can write a file word-by-word.  But what if you want to read a file character-by-character, or line-by-line?  


In [None]:
READ A SINGLE LINE FROM A FILE:
    
<string> = <fileobj>.readline()
    
READ EVERY LINE FROM A FILE:
    
for <line> in <fileobj>:
    <process line>
    
READ EVERY CHARACTER FROM A FILE:
    
while True:
    <string> = <fileobj>.read(1)
    if not <string>:
        break
    <use string>

The first of the above reads the first unread entire line in `<fileobj>`, and dumps the entire line in `<string>`.  Then, subsequent reading from the file will continue on the next line.  The second bit of code will pull every line out of `<fileobj>` one at a time, storing each line into `<line>`, until the end of the file is reached.


<br><br><br><br><br><br><br><br><br><br>

For example, look at the file `table.txt`.  It contains a heading line, followed by 9 lines, each containing several entries, separated by spaces: a name, a vertical line, and then some scores.  What if we wanted to add up all the entries from each line? 

If you swallow the whole file as one string, it is hard to identify where each line begins or ends. Instead, it is easier if you could just read in one line, find what you need from that line, and then continue on to the next line.  And indeed you can do that!

In [36]:
# EXAMPLE 3b: Sum The Scores 

file = open("table.txt", "r")

# First, clear out the heading line.  Notice how we don't write the line to a variable.
file.readline()
runsum = 0

# Now, read each subsequent line, and process it
for line in file:
    values = line.split()
    numbers = values[2:]
    for i in range(len(numbers)):
        runsum += int(numbers[i])
    print("{0:} : {1:^4}".format(values[0],runsum))
file.close()

Alice : 248 
Bob : 396 
Carol : 753 
David : 965 
Edward : 1343
Frances : 1422
George : 1618
Harold : 1831
Iris : 2038



<br><br><br><br><br><br><br><br><br><br>

Other times, reading line-by-line or word-by-word is too coarse.  In those cases, you can read character-by-character:

In [None]:
READ EVERY CHARACTER FROM A FILE:
    
while True:
    <string> = <fileobj>.read(1)
    if not <string>:
        break
    <use string>


Some words on what's going on in that loop. First, `while True:` means an infinite loop -- of course, it won't actually run forever, because eventually we will hit a `break`. 

Each pass through attempts to pull one character out of the file, and stores it into `<string>`.  However, eventually we will hit the end of the file, where `<string> =  <fileobj>.read(1)` won't be successful.  That's what `if not <string>` is for -- the `break` will execute if and only if there is failure to read from the file.

(Technically, what's going on is an *implicit type conversion*.  When Python sees `if not char`, it expects `char` to be a `bool` variable.  So it converts it to a `bool` -- if `char` has been supplied with a non-empty string, it produce the value `True`, and otherwise produces the value `False`.)

For example, consider the following program, which will read a file until the end, letter by letter, until it hits the letter `x`.

In [None]:
# EXAMPLE 3c: Read until "x"

filename = input("Enter file name: ")

file = open(filename, "r")

while True:
    char = file.read(1) #reads a single character
    if not char: 
    # In other words, if the read failed -- which means we're at the end!
        print("No x's!")
        break
    if char == "x":
    # This is the stop character -- if we see this, the program should stop
        print("Found an x!")
        break

file.close()


<br><br><br><br><br><br><br><br><br><br>

# 4. File Validation

While we're talking about this: how do you make your program behave gracefully when it tries to read a file it can't find?  

We've seen *exceptions* before: when we have code that is susceptible to errors, we encase it in a try block, and if an exception (i.e., an error) is encountered, the program will rewind, and instead execute the except block.

In [40]:
# EXAMPLE 4a: Files and exceptions
filename = input("Enter file name: ")

# Remember exceptions?  The file opening line is prone to errors, because files may not be found.  So we try it,
# and provide an alternative behavior if an error does indeed occur.
try:
    file = open(filename, "r")
    print(file.read())
    file.close()
except FileNotFoundError:
    print("No file by that name here! I'm not going to ask again because I'm lazy.")
    
print("All good!")

Enter file name: evansfile
No file by that name here! I'm not going to ask again because I'm lazy.
All good!



<br><br><br><br><br><br><br><br><br><br>

# 5. Thing Explaining



it is hard to explain a thing when you only use the ten hundred words that people use the most

i am trying to do it right now

but i am not sure if i am doing it right

i would like a computer to help me check if i am doing it right




In [None]:
PSEUDOCODE:

# First, it would be a good idea to read the dictionary file's words into a list
dictionary = open("smallwords.txt", "r")
dic_list = dictionary.read().split()

text = # I'll put a long string variable here
text_list = text.split()


for each word in text_list:
    figure out if that word is in the dic_list
    if it is not:
        print out the word
        break

if every word in the text is in the dictionary:
    print "every word is good"


<br><br><br><br><br><br><br><br><br><br>

Ok, time for some real code.

In [46]:
# EXAMPLE 5a: Ten Hundred Most Common Words

# First, it would be a good idea to read the dictionary file's words into a list
my_dictionary = open("smallwords.txt", "r")
dic_list = my_dictionary.read().split()
my_dictionary.close()

# The text!
text = """it is hard to explain a thing when you only use the ten hundred words that people use the most
i am trying to do it right now
but i am not sure if i am doing it right
i would like a computer to help me check if i am doing it right"""
text_list = text.split()

# This variable will contain the answer to the question: 
# "out of the words in the text so far, is every one in the dictionary?"
all_good = True

# Go through every word in the text
for text_word in text_list:
    found_yet = False # Have we found text_word in the dictionary yet?
    
    # For each word in the dic_list, and compare it to text_word.
    # If they are the same, found_yet = True, stop searching through the dic_list.
    for dic_word in dic_list:
        if text_word == dic_word:
            found_yet = True
            #text_list.pop(text_list.index(text_word))
            break
            
    if not found_yet:
        # If you never found text_word in the dictionary, that's a problem! So:
        all_good = False
        print("\"" + text_word + "\" is not in the dictionary!!")
        
        
# Finally, if all_good still is true, that means that our text has every word in the dictionary.
if all_good:
    print("Every word was in the dictionary.")

"words" is not in the dictionary!!
"i" is not in the dictionary!!
"trying" is not in the dictionary!!
"i" is not in the dictionary!!
"i" is not in the dictionary!!
"doing" is not in the dictionary!!
"i" is not in the dictionary!!
"i" is not in the dictionary!!
"doing" is not in the dictionary!!
