### A. Text Files and Lines

Recall that a Python string can be thought of as a sequence of characters. In a similar way, a text file can be thought of as a sequence of lines

For example, consider the following sample of a text file

    From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008
    Return-Path: <postmaster@collab.sakaiproject.org>
    Date: Sat, 5 Jan 2008 09:12:18 -0500
    To: source@collab.sakaiproject.org
    From: stephen.marquard@uct.ac.za
    Subject: [sakai] svn commit: r39772 - content/branches/
    Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772

These files are in a standard format for a file containing multiple mail messages. The lines which start with “From ” separate the messages and the lines which start with “From:” are part of the messages. For more information about the mbox format, see en.wikipedia.org/wiki/Mbox.

To break the file into lines, there is a special character that represents the “end of the line” called the newline character.

### B. Newline

In Python, the newline character is represented by \n

*(Even though this looks like two characters, it is actually a single character.)*

In [None]:
mystr = "A\nB"
print(mystr)

In [None]:
len(mystr)

**Note:** 
*So when we look at the lines in a file, we need to imagine that there is a special invisible character called the newline at the end of each line that marks the end of the line.*

### C. Reading Files

In [1]:
import os

In [None]:
pwd

In [2]:
os.chdir(r'/Users/muruganyuvaraaj/Python notebooks')

In [3]:
#File handle
fhand = open("mbox-short.txt")

In [None]:
fhand = open("mbox-short.txt")

**Note:**
*File handle does not contain the data for the file*

**1. Reading the data using a loop**

We can easily construct a for loop to read through and count each of the lines in a file:

In [None]:
for line in fhand:
    print(line, end='')
fhand.seek(0) # <- command places the cursor in index 0 on the file. if the file is run within the program, seek is not required. 

In [None]:
fhand = open("mbox-short.txt")

for line in fhand:
    line = line.strip()
    print(line)

In [None]:
#Reading first 10 lines:
fhand = open("mbox-short.txt")
count = 0

for line in fhand:
    line = line.strip()
    print(line)
    count += 1
    if count == 10:
        break

In [None]:
#Reading first 10 lines:
fhand = open("mbox-short.txt")
count = 0

for line in fhand:
    count += 1
    if count <= 10:
        line = line.strip()
        print(line)

In [None]:
#Print the lines between the 10th line and the 20th line
fhand = open("mbox-short.txt")
count = 0

for line in fhand:
    count += 1
    if 10 < count <= 20:
        line = line.strip()
        print(str(count),'.',line)
        

In [None]:
#Total number of lines present in the file:
fhand = open("mbox-short.txt")
count = 0

for line in fhand:
    count += 1
    
print(count)

In [None]:
fhand = open("mbox-short.txt")
count = 0
index = [1,3,9]
print(len(index))
print(index[len(index)-1])
print(max(index))

In [None]:
fhand = open("mbox-short.txt")
count = 0
index = [1,3,9]
for line in fhand:
    count += 1
    if count not in index:
        continue
    else:
        line =line.strip()
        print(line, end='\n')
    if count > max(index):
        break

In [None]:
fhand = open("mbox-short.txt")
count = 0
index = [1,3,9]
for line in fhand:
    count += 1
    if count in index:
        line = line.rstrip()
        print(line)
    if count > index[len(index)-1]:
        break

In [None]:
fhand = open("mbox-short.txt")
index = [16,9,3,1]
count = 0
d = {}
for line in fhand:
    count += 1
    if count in index:
        line = line.rstrip()
        d[count] = line
    if count > max(index):
        break
print(d)       
for i in index:
    print(d[i])

In [None]:
fhand = open('mbox-short.txt')
index = [16,9,3,1]
count = 0
d = {}
print(type(d))
for i in fhand:
    count += 1
    if count in index:
        i = i.strip()
        d[count] = i
    if count > max(index):
        break
for i in index:
    print(d[i])

In [None]:
fhand = open("mbox-short.txt")
count = 0
index = [1,3,5]
for i in fhand:
    count += 1
    if count not in index:
        continue
    if count in index:
        print(i)
    if count > max(index):
        break

**2. Reading data using the read method for files**

In [None]:
#Reading data using read() method:
fhand = open("mbox-short.txt")

data = fhand.read()

In [None]:
#length of the data
len(data)

In [None]:
#Let's see how the data is read
data[3:50]

**Disadvantage:**

Remember that this form of the open function should only be used if the file data will fit comfortably in the main memory of your computer. If the file is too large to fit in main memory, you should write your program to read the file in chunks using a for or while loop.

### D. Letting the user choose the file name

In [None]:
import os
os.chdir(r'/Users/muruganyuvaraaj/Python notebooks')
try:
    file = input('Enter file name: ')
    count = 0
    fhand = open(file)
    for line in fhand:
        print(line + '.txt')
        count += 1
        break

In [None]:
pwd

### E. Using try, except and open

### F. Searching through the file

a) For example, if we wanted to read a file and only print out lines which started with the prefix “From:"

In [None]:
fhand = open("mbox-short.txt")
count = 0
for line in fhand:
    if line.startswith('X-Content'):
        print(line.rstrip())
        count += 1
print(count)

b) Can we have the list of email id?

In [None]:
fhand = open("mbox-short.txt")
x = ":"
for line in fhand:
    if line.startswith("From:"):
        for i in range(0, len(line)):
            if line[i] == x:
                result = line[i+1:]
                print(result)

In [None]:
fhand = open("mbox-short.txt")

for line in fhand:
    if line.startswith('From:'):
        line = line.rstrip()
        print(line[line.find(" ")+1:])

In [None]:
fhand = open("mbox-short.txt")

for line in fhand:
    if line.startswith('From:'):
        line = line.rstrip()
        print(line.lstrip('From: '))

In [None]:
fhand = open("mbox-short.txt")

for line in fhand:
    if line.startswith('From:'):
        print(line.rstrip().lstrip('From: '))
        

c) Extract lines which contain the string “@uct.ac.za” (i.e., they come from the University of Cape Town in South Africa):

In [None]:
import os
os.chdir(r'/Users/muruganyuvaraaj/Python notebooks')
fhand = open('mbox-short.txt')

In [None]:
fhand = open('mbox-short.txt')
for line in fhand:
    if line.startswith('From:'):
        line = line.find('@uct')

d) How many emails were received from University of Cape Town

In [23]:
fhand = open("mbox-short.txt")

for line in fhand:
    if line.startswith('From:'):
        print(line.rstrip().lstrip('From: '))

stephen.marquard@uct.ac.za
louis@media.berkeley.edu
zqian@umich.edu
jlowe@iupui.edu
zqian@umich.edu
jlowe@iupui.edu
cwen@iupui.edu
cwen@iupui.edu
gsilver@umich.edu
gsilver@umich.edu
zqian@umich.edu
gsilver@umich.edu
wagnermr@iupui.edu
zqian@umich.edu
antranig@caret.cam.ac.uk
gopal.ramasammycook@gmail.com
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
david.horwitz@uct.ac.za
stephen.marquard@uct.ac.za
louis@media.berkeley.edu
louis@media.berkeley.edu
ay@media.berkeley.edu
cwen@iupui.edu
cwen@iupui.edu
cwen@iupui.edu
