# Exercise: read a text file

The first step in working with any kind of data is usually reading it from whatever media it resides on into the memory of your computer so you can do something with it.

At the lowest level, the data is just a stream of binary values - zeroes and ones

Most computer languages provide facilities for reading data from files and transfering it to memory.

Usually the options are to treat the data as binary, with no structure assumed, or to treat it as a stream of binary encoded characters (text).

We will experiment with reading a file as text.  

In [None]:
import numpy as np
import scipy as sc
import pandas as pd
import matplotlib as mp
%matplotlib inline

## Open a file in read mode

Use the function:

f=open('some_file_name','r')

The file you want to open is called:

text.txt

The file is in the parent directory of this notebook.

In [None]:
#

## Read the contents of the file

If f is an open file object, the function

f.read()

will transfer the entire contents of the file into a stream of bytes.

Use this function to read the file

In [None]:
#

## Since we did not assign the result of the function to anything, python tries to print the result.  

Usually this is not what we want, and for a large file it can cause the notebook to hang for quite a while or even crash.

Try this again, but assign the result to the name 'tx'

In [None]:
#

## Show the type of the result

use the type() function to print the type of the result

In [None]:
#

## Determine the length of the result

Use the len() function to determine the length of the result

In [None]:
#

## Why the length is zero

The open() function performs a number of housekeeping tasks, one of which is to set a pointer to the beginning of the data.  

Subsequent read() functions advance this pointer.  

If the pointer is already at the end of the data, read() will return an empty string.

The fix for this is to reopen the file, but first we should use the close() function to close it.

This cleans up any objects created by open() which otherwise might cause problems.

In [None]:
#

## Now reopen the file

In [None]:
#

## And reread the contents

Except this time, call the result 'tx'

In [None]:
#

## Display the type and length of the result

In [None]:
#

## Display the first 25 characters in the file

Don't use print, just reference elements as tx[start:end] with appropriate integer values for start and end

In [None]:
#

## Now display the first 25 characters using print

Why are the results different?

In [None]:
#

## Line by line processing

A giant string containing the entire file is often not the most convenient form for processing.

Often text data logically consists of a series of lines.

In this case, processing the file line by line makes more sense than treating it as a string of bytes.

Like most list-like objects in python, we can iterate over the result of read.

Write a for loop to iterate over the result of the read() function and print the result.

The general form of a for loop that does this is:

for name1 in name2:
    print(name1)

In [None]:
#

## Reading lines

OK, it looks like the problem is that our result is structured as a list of characters, not lines.

We can produce a result structured as lines by using readlines() instead of read()

First though, we need to close and reopen the file to reset the pointers and clean up.

First use the close() function to close it.

In [None]:
#

## Try closing the file a second time

Issue another f.close() statement and see what happens.

Does this make sense?

In [None]:
#

## Now open the file as before

In [None]:
#

## Read the file line by line

Now read the file with the readlines() function.

Assign the result the name 'tl'

In [None]:
#

## Print the type and length of the result

In [None]:
#

## Show the length of each line read

Write a for loop to iterate over the result and print the length of each line.

What are the shortest lines?

How would you interpret them?

In [None]:
#

## Print the lines in the file

Now write a for loop to iterate over the result printing each line.

In [None]:
#

## Finally, close the file

In [None]:
#

## Better practice for working with files

Python developers often speak of the "pythonic" way to accomplish a task, or describe one method as "more pythonic" than another.

In that spirit, the "more pythonic" way to open a file and read it is to use 'with' and 'as'.  

The general syntax is:

with open('name','r') as f:
    result = f.readlines()
    
The advantage of this method is that it opens the file, reads it into 'result', and closes the file automatically.

Write code that opens and reads our file using this method.

In [None]:
#