# Files

<img src="./img/fileProcessing.png", width=400, height=400>

Files are a major part of computing. 
They let you save data for a long-term storage and load data for use in your program.

Python uses file objects to interact with external files on your computer. These file objects can be any sort of file you have on your computer, whether it be an audio file, a text file, emails, Excel documents, etc. Note: You will probably need to install certain libraries or modules to interact with those various file types, but they are easily available. (We will cover downloading modules later on in the course).

Python has a built-in open function that allows us to open and play with basic file types. First we will need a file though. 

In [30]:
# let's write some text using docstring (''' ''')
poem = '''
Programming is fun
When the work is done
if you wanna make your work also fun:
    use Python!
'''

In [31]:
print(poem)


Programming is fun
When the work is done
if you wanna make your work also fun:
    use Python!



## Python - File Processing
The file processing is quite straightforward, you need to:

    open a file with a writing mode, 
    write something, and then 
    close it.  
    
Below is a complete example of such a processing.

In [32]:
# open a file in writing mode
# Open for 'w'riting
f = open('./files/poem.txt', 'w')
f.write(poem)  # Write text to file
f.close()  # Close the file

## Opening a file
The open() method has two parameters: 
    1. the file location + name and 
    2. its permissions - see below for file permissions
    
<img src="./img/file-openMethods.png", width=800, height=800>
    3. In addition you can specify if the file should be handled as binary or text mode
<img src="./img/file-BinaryTextMode.png", width=800, height=800>

## Reading a file

Python has three methods to read a file as follows:

<img src="./img/readMethod.png", width=800, height=800>

### The read() method
By default the read() method returns the whole text, but you can also specify how many character you want to return:

In [34]:
# Open the text.txt we made earlier
f = open("./files/poem.txt", 'r')
# We can now read the file
f.read()

'\nProgramming is fun\nWhen the work is done\nif you wanna make your work also fun:\n    use Python!\n'

In [35]:
# returns the first 5 caracters of the file
f = open("./files/poem.txt", 'r')
print(f.read(11))


Programmin


In [37]:
# But what happens if we try to read it again?
f.read()

''

This happens because you can imagine the reading "cursor" is at the end of the file after having read it. So there is nothing left to read. We can reset the "cursor" like this:

In [38]:
# Seek to the start of file (index 0)
f.seek(0)

0

In [39]:
# Now read again
f.read()

'\nProgramming is fun\nWhen the work is done\nif you wanna make your work also fun:\n    use Python!\n'

### The readline() method
The readline() method returns a complete line including the newline character at the end of the line. 
When an empty string is returned, it means that we have reached the end of the file and we 'break' out of the loop.

In [41]:
# If no mode is specified,
# 'r'ead mode is assumed by default
f = open('./files/poem.txt')
while True:
    line = f.readline()
    # Zero length indicates EOF
    if len(line) == 0:
        break
    # The `line` already has a newline
    # at the end of each line
    # since it is reading from a file.
    print(line, end = '')
# close the file
f.close()


Programming is fun
When the work is done
if you wanna make your work also fun:
    use Python!


## The readlines() method
In order to not have to reset every time, we can also use the readlines method. Use this with caution for large files, since everything will be held in memory. We will learn how to iterate over large files later in the course.

In [10]:
# Readlines returns a list of the lines in the file.
f = open('./files/poem.txt')
f.readlines()

['Programming is fun\n',
 'When the work is done\n',
 'if you wanna make your work also fun:\n',
 '    use Python!\n']

## The with statement
You can also work with file objects using the with statement. It is designed to provide much cleaner syntax and exceptions handling when you are working with code. That explains why it’s good practice to use the with statement where applicable. 

One bonus of using this method is that **any files opened will be closed automatically** after you are done. This leaves less to worry about during cleanup. 

In [45]:
# returns the first 2 lines of the file
with open("./files/poem.txt") as myfile:
    firstNlines = myfile.readlines()[0:2] #put here the interval you want
    print(firstNlines)
    

['\n', 'Programming is fun\n']


In [12]:
# returns the last 2 lines of the file
with open("./files/poem.txt") as myfile:
    firstNlines=myfile.readlines()[-2:] #put here the interval you want
    print(firstNlines)
    

['if you wanna make your work also fun:\n', '    use Python!\n']


## Writing to a File

By default, using the open() function will only allow us to read the file, we need to pass the argument 'w' to write over the file. For example:

In [48]:
# Add a second argument to the function, 'w' which stands for write
my_file = open('./files/test.txt','w')

In [49]:
# Write to the file
my_file.write('This is a new line')
my_file.close()


In [50]:
# Read the file
my_file = open('./files/test.txt')
my_file.read()

'This is a new line'

## Iterating through a File

Lets get a quick preview of a for loop by iterating over a text file. First let's make a new text file with some Jupyter Magic:

In [53]:
%%writefile ./files/test.txt
First Line
Second Line
Third Line

Overwriting ./files/test.txt


Now we can use a little bit of flow to tell the program to for through every line of the file and do something:

In [54]:
for line in open('./files/test.txt'):
    print(line, end = '')

First Line
Second Line
Third Line

## Iterating through a large file
use a for loop.  See example below.

In [None]:
# PSEUDO-CODE - DO NOT RUN IT
with open("log.txt") as infile:
    for line in infile:
        do_something_with(line)

## File Handling with a While Loop

In [55]:
# file handling example - a while loop
toDoFile = open("toDoList.txt", "w")

toDoList = ""

toDoItem = input("Enter a to do list item: ")

while toDoItem != "exit":
    toDoList = toDoList + toDoItem + " \n"
    toDoItem = input("Enter a to do list item: ")
    
toDoFile.write(toDoList)
toDoFile.close()

Enter a to do list item: do homework
Enter a to do list item: mown the lawn
Enter a to do list item: watch tv
Enter a to do list item: take out garbage
Enter a to do list item: exit


In [57]:
# reading the toDoList file
toDoList = open("toDoList.txt")

for line in toDoList:
    print(line, end = '')
    
toDoList.close()

do homework 
mown the lawn 
watch tv 
take out garbage 


## The Pickle Module
Python provides a standard module called pickle which you can use to store any plain Python object in a file and then get it back later. This is called storing the object persistently.


In [58]:
import pickle

# The name of the file where we will store the object
shoplistfile = './data/shoplist.data'
# The list of things to buy
shoplist = ['apple', 'mango', 'carrot']

# Write to the file
f = open(shoplistfile, 'wb') # write in binary mode
pickle.dump(shoplist, f) # Dump the object to a file
f.close()

# Destroy the shoplist variable
del shoplist

# Read back from the storage
f = open(shoplistfile, 'rb')
storedlist = pickle.load(f) # Load the object from the file
print(storedlist)
f.close()

['apple', 'mango', 'carrot']


How does it work?

To store an object in a file, we have to first open the file in write binary mode and then call the dump function of the pickle module. This process is called pickling.

Next, we retrieve the object using the load function of the pickle module which returns the object. This process is called unpickling.

## JSON files
The JSON (**J**ava **S**cript **O**bject **N**otation) module allows you to dump simple Python data structures into a file and load the data from that file the next time the program runs. 

You can also use JSON to share data between different Python programs. 

Even better, the JSON data format is not specific to Python, so you can share data you store in the JSON format with people who work in many other programming languages. It’s a useful and **portable format**, and it’s easy to learn.  

The JSON file format is often used with web applications (i.e: Facebook, Twitter, Yahoo, Google, Tumblr, Wikipedia ...".

See example below. 

In [59]:
# this script will create a json file from a python dictionary.
import json

data = {}  
data['people'] = []  
data['people'].append({  
    'name': 'Scott',
    'website': 'yahoo.com',
    'from': 'Nebraska'
})
data['people'].append({  
    'name': 'Larry',
    'website': 'google.com',
    'from': 'Michigan'
})
data['people'].append({  
    'name': 'Tim',
    'website': 'apple.com',
    'from': 'Alabama'
})

with open('./data/data.json', 'w') as outfile:  
    json.dump(data, outfile)


In [25]:
# reading the json file just created
import json

with open('./data/data.json') as json_file:  
    images = json.load(json_file)
    for p in images['people']:
        print('Name: ' + p['name'])
        print('Website: ' + p['website'])
        print('From: ' + p['from'])
        print('')

Name: Scott
Website: yahoo.com
From: Nebraska

Name: Larry
Website: google.com
From: Michigan

Name: Tim
Website: apple.com
From: Alabama



NOTE: json.load is the important method to note here. It reads the string from the file, parses the JSON data, populates a Python dict with the data and returns it back to you.

## Unicode
So far, when we have been writing and using strings, or reading and writing to a file, we have used simple English characters only. Both English and non-English characters can be represented in Unicode (please see the articles at the end of this section for more info), and Python 3 by default stores string variables (think of all that text we wrote using single or double or triple quotes) in Unicode.

NOTE: If you are using Python 2, and we want to be able to read and write other non-English languages, we need to use the unicode type, and it all starts with the character u, e.g. u"hello world"

**Articles**:

["The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets"](https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/)

["Python Unicode Howto"](https://docs.python.org/3/howto/unicode.html)


When data is sent over the Internet, we need to send it in bytes... something your computer easily understands. The rules for translating Unicode (which is what Python uses when it stores a string) to bytes is called encoding. A popular encoding to use is UTF-8. We can read and write in UTF-8 by using a simple keyword argument in our open function. 

NOTE: UTF-8: UTF-8 (U from Universal Character Set + Transformation Format—8-bit) is a character encoding capable of encoding all possible characters (called code points) in Unicode. The encoding is variable-length and uses 8-bit code units.

See example below.


In [64]:
# encoding=utf-8
with open('./files/my_file.txt','w') as f:
    f.write(u"je suis française")
    f.write("\n")
    f.write(u"我是法国人")
    f.write("\n")
f.close()

text = open("./files/my_file.txt", encoding="utf-8").read()
print(text)

je suis française
我是法国人



In [87]:
# other method using writelines method.
line1 = u"je suis française, "
line2 = u"Ben fransız, "
line3 = u"我是法国人, "
line4 = u"Я французский, "
line5 = u"أنا فرنسية"

# f = open("./files/abc.txt", "wt", encoding="utf-8")

with open('./files/abc.txt','w', encoding='utf-8') as f:
    f.writelines([line1, line2, line3, line4, line5])
f.close()

with open("./files/abc.txt", encoding="utf-8") as f:
    text = f.read()
    print(text)


je suis française, Ben fransız, 我是法国人, Я французский, أنا فرنسية
