# Session 3: File Input/Output

## File I/O: Basic

* Data on a computer is usually stored in **files**
* From the view of the operating system, a file is just a sequence of
    bits that is given a name
* What data is stored in a file and how exactly it is stored in a file
    is defined in a **file format**
* The file format defines what the bits mean to the program that is reading/writing
    the file
* ***Note:*** The **file extension** (e.g. whether the name of a file ends
    in **.txt** or **.doc** does not determine the file format (it is
    just a name) -- but it makes sense to name files according to their format

## File I/O: Writing to a Text File

* A very common and useful file format is one where the sequence of bits
    is interpreted as sequence of characters
* This conversion is performed with respect to a character set
    (such as ASCII or UTF-8, but let's not worry about that here...)
* In Python, such **text files** can be manipulated very easily, by
    reading/writing their contents to/from strings
* Using the `open()` function one can obtain a reference to a
    **file** object that provides methods for reading and writing (e.g.
    `read()` and `write()`)

## File I/O: Text Files

### File I/O: Writing to a text file:

Opening a text file for writing

The python built-in function takes two inputs, filename and mode.

In [39]:
f = open('my_first_file.txt', 'w')

In [40]:
f.write('Hello world!')

12

In [41]:
f.close()

We can now read this file again:

In [42]:
f = open('my_first_file.txt', 'r')

In [43]:
line = f.readline()

In [44]:
print(line)

Hello world!


In [45]:
f.close()

Let's lets open our file in append mode to append something

In [46]:
f = open('my_first_file.txt', 'a')
f.write('\nsuch a nice file that I am')
f.close()

Let's consider the difference between read() and readline().

In [58]:
f = open('my_first_file.txt','r')
read = f.read()
f.close()
print(read)

Hello world!
such a nice file that I am


In [59]:
f = open('my_first_file.txt','r')
readline = f.readline()
f.close()
print(readline)

Hello world!



In [54]:
f = open('my_first_file.txt','r')
for i in f:
    print(i)
f.close()

Hello world!

such a nice file that I am


Look, it feels an efficient way of doing things to close a file everytime after you are done with it. So we can use the **with** method. The **with** method automatically close your file after you are done

In [71]:
with open('e_science.txt','w') as g:
    g.write('new way of opening a file')
    g.write('\nsuch a nice way to open a file')

Opening the file to read

In [72]:
with open('e_science.txt','r') as g:
    for line in g:
        print(line)

new way of opening a file

such a nice way to open a file


We didn't need to type close() at the end, wasn't that great?

Write can be called multiple times to write more data:

In [2]:
f = open("animals.txt", "w")

In [3]:
for animal in ["Animal\tFood","Sloth\tLeaves", "Chicken\tCorn", "Ant_eater\tAnts", "Penguin\tFish", "Armadillo\tIce_cream\n"]:
    f.write("%s\n" % animal)

In [4]:
f.close()

## File I/O: Reading from a Text File:

#### Reading the content of a text file using the `readlines()` function:

The `readlines()` function reads an entire text file into a list of strings, where each list entry corresponds to a line in the file

In [6]:
f = open("animals.txt", "r")

In [7]:
lines = f.readlines()

In [8]:
print(lines)

['Animal\tFood\n', 'Sloth\tLeaves\n', 'Chicken\tCorn\n', 'Ant_eater\tAnts\n', 'Penguin\tFish\n', 'Armadillo\tIce_cream\n', '\n']


In [9]:
len(lines)

7

Because the entire file is first read into memory, this can be slow or
  unfeasible for large files

Now print each line:

In [10]:
for l in lines:
    print(l)

Animal	Food

Sloth	Leaves

Chicken	Corn

Ant_eater	Ants

Penguin	Fish

Armadillo	Ice_cream





The method rstrip() returns a copy of the string in which all chars have been stripped from the end of the string

In [15]:
string = 'e-sceince 20190000000'
print(string.rstrip())

e-sceince 20190000000


In [17]:
string0 = '    '
print(len(string0))
print(len(string0.strip()))

4
0


In [19]:
string1 = '\n'
print(string1)
print(len(string1))
print(len(string1.strip()))



1
0


In [11]:
for l in lines:
    print(l.rstrip())

Animal	Food
Sloth	Leaves
Chicken	Corn
Ant_eater	Ants
Penguin	Fish
Armadillo	Ice_cream



The `print` statement inserts `\n` after automatically, without removing the already present `\n` characters with `rstrip()` we end up with empty lines!

#### Reading the content of a text file line by line:

Because processing each line in a file is such a common operation,
  Python provides the following simple syntax

In [21]:
f = open("animals.txt", "r")

In [22]:
for line in f:
    print(line.rstrip())


Animal	Food
Sloth	Leaves
Chicken	Corn
Ant_eater	Ants
Penguin	Fish
Armadillo	Ice_cream



In [23]:
f.close()

This iterates over the file line by line instead of reading in the whole content in the beginning!

#### And because python makes your life easy, here an even shorter version:

In [24]:
with open("animals.txt", "r") as infile:
    for line in infile:
        print(line.rstrip())

Animal	Food
Sloth	Leaves
Chicken	Corn
Ant_eater	Ants
Penguin	Fish
Armadillo	Ice_cream



Using `with` removes the necessity to call the `close()` function on your file object!

## File I/O: Transforming a File:

* When working with data provided by other programs (and/or other
    people), it is often necessary to convert data from one format to another


The file that we wrote contained columns separated by tabs; what if we
    need commas?

The join() method provides a flexible way to concatenate string. It concatenates each element of an iterable (such as list, string and tuple) to the string and returns the concatenated string

split() method returns a list of strings after breaking the given string by the specified separator

In [41]:
text = 'geeks for geeks'
  
# Splits at space 
print(text.split()) 


['geeks', 'for', 'geeks']


In [42]:
import os
with open("animals.txt", "r") as infile:
    with open("animals.csv", "w") as outfile:
        for line in infile:
            outfile.write(",".join(line.split()))
            outfile.write('\n')

Lets check everything worked...

In [43]:
with open("animals.csv", "r") as infile:
          for line in infile:
              print(line.rstrip())

Animal,Food
Sloth,Leaves
Chicken,Corn
Ant_eater,Ants
Penguin,Fish
Armadillo,Ice_cream



Looking good!

## File I/O Pickling:

* Text files are convenient when data needs to be exchanged with other
    programs
* However, getting the data in/out of text files can be tedious
* If we know we only need the data within Python, there is a very easy
    way to write arbitrary Python data structures to compact binary files
* This is generally referred to as **serialization**, but in
    Python-lingo it's called **pickling**
* The **pickle** module and it's more efficient **cPickle**
    version provide two functions, `dump()` and `load()`, that
    allow writing and reading arbitrary Python objects

In [46]:
from pickle import dump, load

In [47]:
l = ["a", "list", "with", "stuff", [42, 23, 3.14], True]

In [48]:
with open("my_list.pkl", "wb") as f:
    dump(l, f)

In [57]:
with open("my_list.pkl", "rb") as f:
    l = load(f)
l

['a', 'list', 'with', 'stuff', [42, 23, 3.14], True]

## File I/O Checking for Existence:

* Sometimes a program needs to check whether a file exists
* The `os.path` module provides the `exists()` function

The OS module in Python provides a way of using operating system dependent functionality. The functions that the OS module provides allows you to interface with the underlying operating system that Python is running on – be that Windows, Mac or Linux.

In [50]:
from os.path import exists

In [51]:
if exists("lockfile"):
    print("Lockfile exists!")
else:
    print("No lockfile found!")

No lockfile found!


In general, the  `os` and `os.path` modules provide functions for manipulating the file systems. Don't try to reinvent the wheel - most things exist already in the Python standard library!

## File I/O: Reading from the Web:

* In Python, there are several other objects that behave just like text
    files
* One particularly useful one provides file-like access to resources on
    the web: the `urlopen()` method in the `urllib2` module

**url** is the address of a resource on the Internet.

In [53]:
from urllib.request import urlopen

In [29]:
URL = "http://www.gutenberg.org/cache/epub/28885/pg28885.txt"

In [58]:
if not exists("alice.txt"):
    f = urlopen(URL)
    with open("alice.txt", "wb") as outfile:
        outfile.write(f.read())

In [63]:
print(''.join(open("alice.txt").readlines()[970:975]))

middle of one! There ought to be a book written about me, that there
ought! And when I grow up, I'll write one--but I'm grown up now," she
added in a sorrowful tone; "at least there's no room to grow up any more
_here_."




In [65]:
with open("alice.txt", "rb") as infile:
    book = infile.readlines()
    print("".join(book[1000:1005]))

TypeError: sequence item 0: expected str instance, bytes found

A **TypeError** will be raised if there are any non-string values in iterable, including bytes objects

## File I/O Multiple Files:

The `glob` module provides an easy way to find all files with certain names (e.g. all files with names that end in `.txt`)

In [66]:
import glob

In [67]:
text_files = glob.glob("*.txt")

In [68]:
for t in text_files:
    print(t)

alice.txt
animals.txt
e_science.txt
library_data.txt
my_first_file.txt
response_time.txt


## File I/O Terminal streams:

* The terminal input/output streams can also be accessed like filesusing the `stdin` and `stdout` objects from the `sys` module

In [69]:
import sys

In [70]:
sys.stdout.write("Another way to print!\n")

Another way to print!


So, what is the difference between sys.stdout.write and print? 

**"print"** first converts the object to a string (if it is not already a string). It will also put a space before the object if it is not the start of a line and a newline character at the end.

When using stdout, you need to convert the object to a string yourself (by calling "str", for example) and there is no newline character.

In [71]:
print(23)

23


In [75]:
sys.stdout.write(str(23) + '\n')

23
