# Files In Python

<h2 id="read">Reading Text Files</h2>


Use Python’s built-in `open` function to read or write **.txt** files by specifying the file path and name. It returns a **File object** with methods to handle the file.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/ReadOpen.png" width="500" />


The mode argument is optional and the default value is **r**. In this notebook we only cover two modes:

<ul>
    <li>**r**: Read mode for reading files </li>
    <li>**w**: Write mode for writing files</li>
</ul>


We read the file:


In [15]:
# Read the Example1.txt
example1 = "Example1.txt"
file1 = open(example1, "r")


#### Attributes of the file.


Name of file:

In [16]:
# Print the path/name of file

file1.name

'Example1.txt'

The mode the file object is in:


In [17]:
# Print the mode of file, either 'r' or 'w'

file1.mode

'r'

Reading the file and assigning it to a variable :


In [18]:
# Read the file

FileContent = file1.read()
FileContent

'This is line 1 \nmeow meow meow \nThis is line 3\nokay bye '

Print the file:


In [19]:
# Print the file with '\n' as a new line

print(FileContent)

This is line 1 
meow meow meow 
This is line 3
okay bye 


File Type :


In [20]:
# Type of file content

type(FileContent)

str

It is very important that the file is closed in the end. This frees up resources and ensures consistency across different python versions.


In [21]:
# Close file after finish

file1.close()

Opening File using the <code>with</code> statement : Better practice as it automatically closes the file even if the code encounters an exception. The code will run everything in the indent block then close the file object.


In [22]:
# Open file using with

with open(example1, "r") as file1:
    FileContent = file1.read()
    print(FileContent)

This is line 1 
meow meow meow 
This is line 3
okay bye 


The file object is closed, you can verify it by running the following cell:


In [23]:
# Verify if the file is closed

file1.closed

True

The syntax is a little confusing as the file object is after the <code>as</code> statement. We also don’t explicitly close the file. Therefore we summarize the steps in a figure:


<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/ReadWith.png" width="500" />


We don’t have to read the entire file, for example, we can read the first 4 characters by entering three as a parameter to the method **.read()**:


In [34]:
# Read first four characters

with open(example1, "r") as file1:
    print(file1.read(4))
  

This


Once the method <code>.read(4)</code> is called the first 4 characters are called. If we call the method again, the next 4 characters are called. The output for the following cell will demonstrate the process for different inputs to the method <code>read()</code>:


In [35]:
# Read certain amount of characters

with open(example1, "r") as file1:
    print(file1.read(4))
    print(file1.read(4))
    print(file1.read(7))
    print(file1.read(15))

This
 is 
line 1 

meow meow meow


Here is an example using the same file, but instead we read 16, 5, and then 9 characters at a time:


In [39]:
# Read certain amount of characters

with open(example1, "r") as file1:
    print(file1.read(16))
    print(file1.read(5))
    print(file1.read(9))

This is line 1 

meow 
meow meow


<code>readline()</code>:


In [37]:
# Read one line

with open(example1, "r") as file1:
    print("first line: " + file1.readline())

first line: This is line 1 



Argument to <code> readline() </code> to specify the number of charecters we want to read. Unlike <code> read()</code>, <code> readline()</code> can only read one line at most.


In [38]:
with open(example1, "r") as file1:
    print(file1.readline(20)) # does not read past the end of line
    print(file1.read(20)) # Returns the next 20 chars


This is line 1 

meow meow meow 
This


Loop to iterate through each line:


In [43]:
# Iterate through the lines

with open(example1,"r") as file1:
        i = 0;
        for line in file1:
            print("Iteration", int(i), ": ", line)
            i = i + 1

Iteration 0 :  This is line 1 

Iteration 1 :  meow meow meow 

Iteration 2 :  This is line 3

Iteration 3 :  okay bye 


<code>readlines()</code> to save the text file to a list:


In [48]:
# Read all lines and save as a list

with open(example1, "r") as file1:
    FileasList = file1.readlines()

In [52]:
# Each element of the list corresponds to a line of text:

# Print the first line
FileasList[0]

# Print the third line
FileasList[2]

'This is line 3\n'

<h2 id="write">Writing Files</h2>


We can open a file object using the method <code>write()</code> to save the text file to a list. To write to a file, the mode argument must be set to **w**. Let’s write a file **Example2.txt** with the line: **“This is line A”**


In [56]:
# Write line to file
exmp2 = 'Example2.txt'
with open(exmp2, 'w') as writefile:
    writefile.write("This is line AAA")

In [57]:
# Read file

with open(exmp2, 'r') as testwritefile:
    print(testwritefile.read())

This is line AAA


Write multiple lines:


In [60]:
# Write lines to file

with open(exmp2, 'w') as writefile:
    writefile.write("This is line A\n")
    writefile.write("This is line B\n")

The method <code>.write()</code> works similar to the method <code>.readline()</code>, except instead of reading a new line it writes a new line. The process is illustrated in the figure. The different colour coding of the grid represents a new line added to the file after each method call.



<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/images/WriteLine.png" width="500" />



In [61]:
# Check whether write to file

with open(exmp2, 'r') as testwritefile:
    print(testwritefile.read())

This is line A
This is line B



Write a list to a **.txt** file  as follows:


In [64]:
# Sample list of text

Lines = ["This is line A\n", "This is line B\n", "This is line C\n"]

# Write the strings in the list to text file

with open('Example2.txt', 'w') as writefile:
    for line in Lines:
        print(line)
        writefile.write(line)

This is line A

This is line B

This is line C



In [65]:
# Verify if writing to file is successfully executed

with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

This is line A
This is line B
This is line C



However, note that setting the mode to **w** overwrites all the existing data in the file.


In [66]:
with open('Example2.txt', 'w') as writefile:
    writefile.write("Overwrite\n")
with open('Example2.txt', 'r') as testwritefile:
    print(testwritefile.read())

Overwrite



<hr>


### Appending Files

We can write to files without losing any of the existing data as follows by setting the mode argument to append: **a**.  you can append a new line as follows:


In [68]:
# Write a new line to text file

with open('Example2.txt', 'a') as testwritefile:
    testwritefile.write("This is line C\n")
    testwritefile.write("This is line D\n")
    testwritefile.write("This is line E\n")

<hr>
<h2 id="add">Additional modes</h2> 


It's fairly ineffecient to open the file in **a** or **w** and then reopening it in **r** to read any lines. Luckily we can access the file in the following modes:

*   **r+** : Reading and writing. Cannot truncate the file.
*   **w+** : Writing and reading. Truncates the file.
*   **a+** : Appending and Reading. Creates a new file, if none exists.

You dont have to dwell on the specifics of each mode for this lab.


In [70]:
#a+ mode

with open('Example2.txt', 'a+') as testwritefile:
    testwritefile.write("This is line E\n")
    print(testwritefile.read())




There were no errors but <code>read()</code> also did not output anything. This is because of our location in the file.

Most of the file methods we've looked at work in a certain location in the file. <code>.write() </code> writes at a certain location in the file. <code>.read()</code> reads at a certain location in the file and so on. You can think of this as moving your pointer around in the notepad to make changes at specific location.


Opening the file in **w** is akin to opening the .txt file, moving your cursor to the beginning of the text file, writing new text and deleting everything that follows.
Whereas opening the file in **a** is similiar to opening the .txt file, moving your cursor to the very end and then adding the new pieces of text. <br>
It is often very useful to know where the 'cursor' is in a file and be able to control it. The following methods allow us to do precisely this -

*   <code>.tell()</code> - returns the current position in bytes
*   <code>.seek(offset,from)</code> - changes the position by 'offset' bytes with respect to 'from'. From can take the value of 0,1,2 corresponding to beginning, relative to current position and end


Revisit **a+**


In [71]:
with open('Example2.txt', 'a+') as testwritefile:
    print("Initial Location: {}".format(testwritefile.tell()))
    
    data = testwritefile.read()
    if (not data):  #empty strings return false in python
            print('Read nothing') 
    else: 
            print(testwritefile.read())
            
    testwritefile.seek(0,0) # move 0 bytes from beginning.
    
    print("\nNew Location : {}".format(testwritefile.tell()))
    data = testwritefile.read()
    if (not data): 
            print('Read nothing') 
    else: 
            print(data)
    
    print("Location after read: {}".format(testwritefile.tell()) )

Initial Location: 75
Read nothing

New Location : 0
Overwrite
This is line C
This is line D
This is line E
This is line E

Location after read: 75


Finally, a note on the difference between **w+** and **r+**. Both of these modes allow access to read and write methods, however, opening a file in **w+** overwrites it and deletes all pre-existing data. <br>
To work with a file on existing data, use **r+** and **a+**. While using **r+**, it can be useful to add a <code>.truncate()</code> method at the end of your data. This will reduce the file to your data and delete everything that follows. <br>
In the following code block, Run the code as it is first and then run it with the <code>.truncate()</code>.


In [72]:
with open('Example2.txt', 'r+') as testwritefile:
    data = testwritefile.readlines()
    testwritefile.seek(0,0) #write at beginning of file
   
    testwritefile.write("Line 1" + "\n")
    testwritefile.write("Line 2" + "\n")
    testwritefile.write("Line 3" + "\n")
    testwritefile.write("finished\n")
    #Uncomment the line below
    #testwritefile.truncate()
    testwritefile.seek(0,0)
    print(testwritefile.read())
    

Line 1
Line 2
Line 3
finished
 line D
This is line E
This is line E



<hr>


Let's copy the file **Example2.txt** to the file **Example3.txt**:


In [73]:
# Copy file to another

with open('Example2.txt','r') as readfile:
    with open('Example3.txt','w') as writefile:
          for line in readfile:
                writefile.write(line)

After reading files, we can also write data into files and save them in different file formats like **.txt, .csv, .xls (for excel files) etc**. You will come across these in further examples


**NOTE:** If you wish to open and view the `example3.txt` file, download this lab [here](https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-PY0101EN-SkillsNetwork/labs/Module%204/PY0101EN-4-2-WriteFile.ipynb) and run it locally on your machine. Then go to the working directory to ensure the `example3.txt` file exists and contains the summary data that we wrote.


Your local university's Raptors fan club maintains a register of its active members on a .txt document. Every month they update the file by removing the members who are not active. You have been tasked with automating this with your Python skills. <br>
Given the file `currentMem`, Remove each member with a 'no' in their Active column. Keep track of each of the removed members and append them to the `exMem` file. Make sure that the format of the original files in preserved.   (*Hint: Do this by reading/writing whole lines and ensuring the header remains* ) <br>
Run the code block below prior to starting the exercise. The skeleton code has been provided for you. Edit only the `cleanFiles` function.


In [74]:
#Run this prior to starting the exercise
from random import randint as rnd

memReg = 'members.txt'
exReg = 'inactive.txt'
fee =('yes','no')

def genFiles(current,old):
    with open(current,'w+') as writefile: 
        writefile.write('Membership No  Date Joined  Active  \n')
        data = "{:^13}  {:<11}  {:<6}\n"

        for rowno in range(20):
            date = str(rnd(2015,2020))+ '-' + str(rnd(1,12))+'-'+str(rnd(1,25))
            writefile.write(data.format(rnd(10000,99999),date,fee[rnd(0,1)]))


    with open(old,'w+') as writefile: 
        writefile.write('Membership No  Date Joined  Active  \n')
        data = "{:^13}  {:<11}  {:<6}\n"
        for rowno in range(3):
            date = str(rnd(2015,2020))+ '-' + str(rnd(1,12))+'-'+str(rnd(1,25))
            writefile.write(data.format(rnd(10000,99999),date,fee[1]))


genFiles(memReg,exReg)


Now that you've run the prerequisite code cell above, which prepared the files for this exercise, you are ready to move on to the implementation.

#### **Exercise:** Implement the cleanFiles function in the code cell below.


In [75]:
  
'''
The two arguments for this function are the files:
    - currentMem: File containing list of current members
    - exMem: File containing list of old members
    
    This function should remove all rows from currentMem containing 'no' 
    in the 'Active' column and appends them to exMem.
    '''
def cleanFiles(currentMem, exMem):
    # TODO: Open the currentMem file as in r+ mode
    with open(currentMem,'r+') as writeFile:
        #TODO: Open the exMem file in a+ mode
        with open(exMem, 'a+') as appendFile:
        #TODO: Read each member in the currentMem (1 member per row) file into a list.
        # Hint: Recall that the first line in the file is the header.
            writeFile.seek(0)
            members = writeFile.readlines()
            
            #remove header
            header = members[0]
            members.pop(0)
            
            # Go to the beginning of the currentMem file
            # TODO: Iterate through the members list. 
            # If a member is inactive, add them to exMem, otherwise write them into currentMem
            inactive = [member for member in members if ('no' in member)]
            '''
            The above is the same as 

            for member in members:
            if 'no' in member:
                inactive.append(member)
            '''
            writeFile.seek(0) 
            writeFile.write(header)
            for member in members:
                if (member in inactive):
                    appendFile.write(member)
                else:
                    writeFile.write(member)      
            writeFile.truncate()
                
memReg = 'members.txt'
exReg = 'inactive.txt'
cleanFiles(memReg,exReg)


# The code below is to help you view the files.
# Do not modify this code for this exercise.
memReg = 'members.txt'
exReg = 'inactive.txt'
cleanFiles(memReg,exReg)


headers = "Membership No  Date Joined  Active  \n"
with open(memReg,'r') as readFile:
    print("Active Members: \n\n")
    print(readFile.read())
    
with open(exReg,'r') as readFile:
    print("Inactive Members: \n\n")
    print(readFile.read())

Active Members: 


Membership No  Date Joined  Active  
    49396      2020-10-24   yes   
    86390      2015-7-22    yes   
    65127      2016-11-9    yes   
    38519      2016-3-10    yes   
    78625      2019-5-23    yes   
    30171      2018-3-3     yes   
    68202      2018-6-12    yes   
    20722      2019-6-21    yes   
    47335      2017-12-3    yes   
    12274      2015-6-5     yes   
    10350      2015-6-2     yes   

Inactive Members: 


Membership No  Date Joined  Active  
    24552      2018-1-17    no    
    59526      2016-10-3    no    
    74684      2017-8-2     no    
    53564      2016-3-9     no    
    36303      2019-3-20    no    
    78123      2018-3-6     no    
    27339      2020-6-13    no    
    49644      2019-4-15    no    
    13700      2019-4-22    no    
    81544      2015-8-9     no    
    31682      2020-8-24    no    
    98843      2015-6-19    no    



The code cell below is to verify your solution. Please do not modify the code and run it to test your implementation of `cleanFiles`.


In [76]:
def testMsg(passed):
    if passed:
       return 'Test Passed'
    else :
       return 'Test Failed'

testWrite = "testWrite.txt"
testAppend = "testAppend.txt" 
passed = True

genFiles(testWrite,testAppend)

with open(testWrite,'r') as file:
    ogWrite = file.readlines()

with open(testAppend,'r') as file:
    ogAppend = file.readlines()

try:
    cleanFiles(testWrite,testAppend)
except:
    print('Error')

with open(testWrite,'r') as file:
    clWrite = file.readlines()

with open(testAppend,'r') as file:
    clAppend = file.readlines()
        
# checking if total no of rows is same, including headers

if (len(ogWrite) + len(ogAppend) != len(clWrite) + len(clAppend)):
    print("The number of rows do not add up. Make sure your final files have the same header and format.")
    passed = False
    
for line in clWrite:
    if  'no' in line:
        passed = False
        print("Inactive members in file")
        break
    else:
        if line not in ogWrite:
            print("Data in file does not match original file")
            passed = False
print ("{}".format(testMsg(passed)))
    



Test Passed
