<a href="https://colab.research.google.com/github/ranjithsrajan/PyLab/blob/main/10_Python_Files.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
#@title Walkthrough Video
from IPython.display import HTML
HTML("""<video width="720" height="500" controls>
<source src="https://cdn.exec.talentsprint.com/content/5_Python_Files.mp4">
</video>""")

In [None]:
#@title Run this cell to download the text files
!wget https://cdn.iisc.talentsprint.com/mbox.txt
!wget https://cdn.iisc.talentsprint.com/mbox-short.txt

**Files:** In Python, files are used for reading and writing data from and to external sources, such as text files, CSV files, or even binary files. Working with files allows you to store and retrieve information persistently, enabling data processing, analysis, and storage.

**File Processing:** A text file can be thought of as a sequence of lines.

### Opening a File

Before we can read from or write to a file, we need to open it. The **open()** function is used for this purpose. It takes two arguments: the `file path` and the `mode` (optional). The mode determines whether you want to read, write, or append to the file. Some common modes include:

`'r'`: Read mode (default). Opens the file for reading.

`'w'`: Write mode. Opens the file for writing. If the file already exists, it truncates it. If it doesn't exist, it creates a new file.

`'a'`: Append mode. Opens the file for appending. Data is written at the end of the file.

`'x'`: Exclusive creation mode. Creates a new file, but raises an error if the file already exists.

**Syntax:** `file_object = open(file_path, mode)`


In [None]:
fhand = open("mbox.txt", "r")
print(fhand)

When Files are Missing but still if we pass it as paramter to the open() method it will return` "FileNotFoundError"`


In [None]:
fhand = open("stuff.txt")

### The newline Character

In Python, the "newline" character refers to a special character that represents the end of a line in a text file or a string. It is represented by the escape sequence `'\n'`.

The newline character is used to create line breaks in text files and strings. When you encounter a newline character, it signifies that a new line should begin.

In [None]:
stuff = 'Hello\nWorld!'
stuff

In the above code, a variable named stuff is being assigned a string value 'Hello\nWorld!'. Here's what each part of the string represents:

`'Hello':` This is a regular string that represents the word "Hello".

`'\n':` This is the newline character escape sequence. It represents a line break, indicating that a new line should start.

`'World!`': This is another regular string that represents the word "World!"

When the string is printed or displayed, the newline character ('\n') causes a line break between "Hello" and "World!"

In [None]:
print(stuff)

In [None]:
stuff = 'X\nY'
print(stuff)

The `len()` function in Python is used to determine the length of a string, i.e., the number of characters in the string. In the case of the stuff variable, which has the value `'X\nY'`

In [None]:
len(stuff)

The len(stuff) expression returns the value 3

### File Handle as a Sequence

In Python, a file handle (file object) can be treated as a sequence-like object, allowing you to iterate over its lines using a loop or access specific lines using indexing. This behavior is possible because the file object implements the iterator protocol and provides line-based access to the file's content.

In [None]:
xfile = open('mbox.txt')
for cheese in xfile:  # Iterating over the file handle
      print(cheese)

### Counting Lines in a File

To count the number of lines in a file using Python, you can iterate over the file handle and increment a counter variable for each line encountered.

In [None]:
fhand = open('mbox.txt')
count = 0
for line in fhand:
    count = count + 1
print('Line Count:', count)

The above code attempts to count the number of lines in the file `'mbox.txt'`. The file 'mbox.txt' is opened using `open()` and the file handle is assigned to the variable `fhand`. The variable count is initialized to `0`.

The `for` loop iterates over each line in the file handle hand, incrementing the count variable by `1` for each line encountered.

Finally, the total line count is printed using the `print()` statement.






#### Reading the “Whole” File

In [None]:
fhand = open('mbox-short.txt')
inp = fhand.read()

 The `open()` function is used to open the file `'mbox-short.txt'` in the default read mode, and the resulting file object is assigned to the variable fhand. The file is opened in the current working directory unless an absolute or relative path is specified.

The `read()` method is called on the file object fhand. The `read()` method reads the entire contents of the file as a string and assigns it to the variable inp. The contents of the file are stored in the `inp `variable as a single string, including newline characters ('\n').

After executing these lines, the variable `inp` will hold the complete content of the file `'mbox-short.txt'` as a string.

When using the `read()` method to read the entire file at once, it is important to consider the size of the file. If the file is extremely large, reading it all at once into memory may not be efficient. In such cases, it is often better to read the file line by line using a loop or use other methods like readline() or readlines() to read a portion of the file at a time.

In [None]:
print(len(inp))

In [None]:
print(inp[:20])

`inp[:20]` retrieves the substring starting from the beginning (index 0) up to, but not including, index 20. This means it retrieves the first 20 characters of the string stored in inp.

Finally, the `print()` function is used to display the retrieved substring.

Assuming the content of the file `'mbox-short.txt'` is a text, executing this code will print the first 20 characters of that text.

It's important to note that Python uses zero-based indexing, so the character at index `0` is the first character, the character at index `1` is the second character, and so on. Therefore, `inp[:20]` retrieves characters from index `0` to index 19.






### Searching Through a File


In [None]:
fhand = open('mbox-short.txt')
for line in fhand:
      if line.startswith('From:') :
            print(line)

In the above code, the `open()` function is used to open the file `'mbox-short.txt'` in the default read mode, and the resulting file object is assigned to the variable `fhand`.


A `for` loop is used to iterate over each line in the file `fhand`. The loop will iterate through each line of the file, one line at a time, assigning each line to the variable line.

Within the loop, an `if` statement is used to check if the current line starts with the string `'From:'`. The `startswith()` method is used to perform this check. If the line starts with `'From:'`, the `print() `function is called to display the line.

### Searching Through a File (fixed)

In [None]:
fhand = open('mbox-short.txt')
for line in fhand:
      line = line.rstrip()
      if line.startswith('From:') :
            print(line)

In the above code, the `open()` function is used to open the file 'mbox-short.txt' in the default read mode, and the resulting file object is assigned to the variable `fhand`. This line opens the file and prepares it for reading.

The `for` loop iterates over each line in the file `fhand`.
For each iteration, the current line is assigned to the variable line.
The `rstrip()` method is called on line to remove any trailing whitespace (including newline characters) from the right end of the line. This step ensures that any unwanted whitespace is removed from the line before further processing.
The if statement checks if the modified line starts with the string `'From:'.`
If the condition is true, indicating that the line starts with `'From:'`, the print() function is called to display the line.
By using `line.rstrip()` before checking for the `'From:'` prefix, any leading or trailing whitespace is removed from the line, ensuring that only lines starting with `'From:'` are matched.

When executed, this code reads the file `'mbox-short.txt'` line by line, removes trailing whitespace from each line, and prints only the lines that start with the string 'From:'.

### Skipping with “continue”

In [None]:
fhand = open('mbox-short.txt')
for line in fhand:
      line = line.rstrip()
      if not line.startswith('From:') :
            continue
      print(line)

In the above code, the open() function is used to open the file 'mbox-short.txt' in the default read mode, and the resulting file object is assigned to the variable fhand. This line opens the file and prepares it for reading.


The `for` loop iterates over each line in the file `fhand`.
For each iteration, the current line is assigned to the variable line.
The `rstrip()` method is called on line to remove any trailing whitespace (including newline characters) from the right end of the line. This step ensures that any unwanted whitespace is removed from the line before further processing.

The `if` statement checks if the modified line does not start with the string `'From:'`. If the condition is true, indicating that the line does not start with 'From:', the continue statement is encountered.

This statement causes the loop to skip the current iteration and move on to the next line, effectively ignoring the current line.
If the condition is false, indicating that the line does start with 'From:', the print() function is called to display the line.
By using `line.rstrip()` before checking for the 'From:' prefix, any leading or trailing whitespace is removed from the line, ensuring that only lines starting with 'From:' are matched.

When executed, this code reads the file `'mbox-short.txt'` line by line, removes trailing whitespace from each line, and prints only the lines that start with the string 'From:'. Any lines that do not start with `'From:'` are skipped and not printed.

### Using “in” to Select Lines

In [None]:
fhand = open('mbox-short.txt')
for line in fhand:
      line = line.rstrip()
      if not '@uct.ac.za' in line :
            continue
      print(line)

The `if` statement checks if the string @uct.ac.za is not found in the modified line.

If the condition is `true`, indicating that @uct.ac.za is not present in the line, the continue statement is encountered. This statement causes the loop to skip the current iteration and move on to the next line, effectively ignoring the current line.

If the condition is `false`, indicating that @uct.ac.za is present in the line, the print() function is called to display the line.

### Prompt for File Name

In [None]:
fname = input('Enter the file name:  ')
fhand = open(fname)
count = 0
for line in fhand:
      if line.startswith('Subject:') :
            count = count + 1
print('There were', count, 'subject lines in', fname)

In the above code, `input('Enter the file name:')` prompts the user to enter a file name. The value entered by the user is stored in the variable `fname`. The `open()` function is used to open the file specified by the value of `fname` in the default read mode. The resulting file object is assigned to the variable `fhand`. This line opens the file for reading.

The variable `count` is initialized to 0. It will be used to keep track of the number of lines that start with the string `'Subject:'`.

The `for` loop iterates over each line in the file fhand. For each iteration, the current line is assigned to the variable line. The if statement checks if the line starts with the string `'Subject:'`. If it does, the count variable is incremented by 1.

After processing all the lines in the file, this line uses the print() function to display the number of subject lines (count) and the file name (fname)

### Bad File Names

In [None]:
fname = input('Enter the file name: ')
try:
    fhand = open(fname)
except:
    print('File cannot be opened:', fname)
    quit()

count = 0
for line in fhand:
    if line.startswith('Subject:'):
        count = count + 1
print('There were', count, 'subject lines in', fname)

In the above code the `try-except` block is used to handle the case where the file cannot be opened.

Inside the try block, the `open()` function is used to open the specified file. If the file can be opened successfully, the file object is assigned to the variable fhand. If an exception occurs during the file opening process, the code jumps to the except block.

In the `except` block, the error message `'File cannot be opened:'` is printed, followed by the file name (fname). The `quit()` function is called to exit the program.

After successfully opening the file, the variable count is initialized to `0`. The for loop iterates over each line in the file `fhand`. For each line, the `if` statement checks if the line starts with the string `'Subject:'`. If it does, the count variable is incremented by 1.