# Methods I: Programming and Data Analysis

## Session 08: Advanced Formatting; File Formats; Finishing the Basics

### Gerhard Jäger

#### (based on Johannes Dellert's slides)

December 14, 2021

Advanced formatting using `format()`:

-   **formatting templates** are stored in a specialized string format

-   basic principles:

    -   every object to be rendered is repesented by a block in curly
        brackets

    -   symbols outside curly brackets are copied into the output

-   useful basic features:

    -   **padding** to a desired length $k$ using
        `:k` (left alignment), `:^k` (center alignment), or `:>k` (right
        alignment)

    -   **truncating** to a desired length $k$ using `.k` (combines with
        padding)

    -   **integer format** `:d` which allows padding

    -   **float format** `:f` which allows padding and specifying a
        precision,
        e.g. `{:06.2f}` to output 3.245 as `"003.42"`

-   more information on `https://pyformat.info`

### Advanced Formatting

Example of `format()` usage:

- to format a sequence of objects, apply the `format()` method on the
template string with the objects as arguments:

## base case

In [1]:
'{} {}'.format('one', 'two')


'one two'

In [2]:
'{} {}'.format(1, 2)


'1 2'

## numbered placeholders

In [3]:
'{1} {0}'.format('one', 'two')

'two one'

## padding

In [4]:
'{:10}'.format('test')


'test      '

In [5]:
'{:^10}'.format('test')


'   test   '

In [6]:
'{:>10}'.format('test')


'      test'

In [7]:
'{:_<10}'.format('test')


'test______'

## specifying data type

In [8]:
'{:d}'.format(42)

'42'

In [9]:
'{:f}'.format(42)

'42.000000'

In [10]:
'{:d}'.format("test")

ValueError: Unknown format code 'd' for object of type 'str'

In [11]:
'{:s}'.format("test")

'test'

## precision for floats

In [12]:
'{:6.2f}'.format(3.141592653589793)


'  3.14'

In [13]:
'{:06.2f}'.format(3.141592653589793)


'003.14'

- templates can be stored in a variable


In [14]:
result_template = "{:20s}: {:6.2f} {:6.2f}"

In [15]:
result_template.format("Pruta Govvom", 2.5, 4.7)

'Pruta Govvom        :   2.50   4.70'

- format() expects parts as separate arguments, which requires **unpacking** using `*` when applied to sequence objects:

In [16]:
results = [("Pruta Govvom", 2.5, 4.7), ("Prokanayardan Tum", 0.4, 3), ("Mara Tsirpalandani", 1.15, 0.01)]

In [17]:
for result in results:
    print(result_template.format(*result))

Pruta Govvom        :   2.50   4.70
Prokanayardan Tum   :   0.40   3.00
Mara Tsirpalandani  :   1.15   0.01


### File Input and Output: Motivation

Why we need to deal with files:

-   processing large amounts of data is the main application of
    programming, and the true source of its usefulness

-   permanent data storage in files (byte sequences on a storage medium)

-   for processing, we need to get the data from files into memory
    (Python objects), and write the results back into a different file

-   to work with a file, we need to open and navigate inside it

-   opening a file creates a **file handle**

-   a file handle contains a pointer that indicates a particular place
    in the file (it "remembers" where in the file we are)

-   initial placement of the pointer depends on the mode in which the
    file is opened (e.g. at the start or the end of the file)

### File I/O: Structure of Text Files

Basic facts about text files:

-   text files consist of lines of text

-   at the end of each line there is a **newline** symbol

-   in Python strings, a newline is represented by `"\n"`

-   different operating systems use slightly different ways of
    representing linebreaks:

    -   Windows: 00001101 00001010 ("carriage return" + "linefeed")

    -   Unix derivates (includes Mac OS and Linux): 00001010

-   within Python, the difference is abstracted away (the implementation
    of the functions for reading from and writing to a file will handle
    this)

### File Input: Syntax

Basic elements of file input:
-   the function `open()` creates and returns a file handle
    -   obligatory argument: a file name (or a complete path)
    -   optional: a string representing the **mode** in which the file is opened
    -   default mode if none is specified: `"r"` ("reading")
-   core functionality of a file handle in reading mode:
    -   the `read()` method returns the whole document in a single string
    -   the `readline()` method returns the next unprocessed line of the file opened by the handle (or `None`) if the end of the file was reached
    -   the `readlines()` method returns a list of strings, containing each line of the file as a string in the original order
    -   the `close()` method cancels or finishes the processing of the file, ensuring that the file is closed correctly, and returning control to the operating system

### File Input: Example



In [18]:
fr = open("some_text.txt", "r")
header = fr.readline().strip()
print("Loading text \"" + header + "\"...")
the_text = fr.read()
fr.close()
print("Done.")


Loading text "more text"...
Done.


### File Output: Syntax

Basic elements of file output:

-   use `open(file_name,"w")` to create file handle in writing mode

-   core functionality of a file handle in writing mode:

    -   the `write()` method writes a string to the file, and returns
        the number of characters written

    -   the `writelines()` method takes a list of strings, and writes
        each of them into a new line of the file

    -   the `flush()` forces the buffer to be flushed (see next slide)

    -   the `close()` method finishes the creation of the file, and
        ensures that the file is closed correctly

### File Output: Buffering

Basic facts about buffering and flushing:

-   changes made to a file a not stored immediately

-   the operating system **buffers** the changes in order to make file
    operations more efficient (important because hard drive access is
    much more limited in bandwidth)

-   by default, buffers are only **flushed** to the file when needed

-   both `flush()` and `close()` cause a flush

-   buffers are always flushed when a Python program ends normally

-   but if the program crashes, buffers might not be flushed\
    ($\Rightarrow$ not all the output up to the crash will end up in the
    file)

### File Input and Output: Example


In [19]:
fp = open("pc_writetest.txt", "w")
while True:
    text = input("Enter a line to write to the file, or press ENTER to quit! ")
    if text == "":
        break
    fp.write(text+"\n")
fp.close()

print("Here is what was written into the file:")
print("---------------------------------------")
fp = open("pc_writetest.txt")
content = fp.read()
fp.close()
print(content)


Enter a line to write to the file, or press ENTER to quit! 
Here is what was written into the file:
---------------------------------------



### File Input and Output: Modes

List of all possibilities for the second argument of `open()`,
configuring the mode in which a file is opened:

-   `"r"` for reading (default in absence of second argument)

-   `"w"` for writing (will replace the content of the file if it
    exists!)

-   `"a"` for appending at the end of an existing file

-   `"x"` for creating a new file with that name

-   `"r+"` for updating (reading and writing, difficult!)

The default mode assumes you are writing and reading text, the letter
`"b"` is used for binary files, e.g. `"wb"` mode to write to a binary
file (beyond scope of this course).

### Exception Handling

File I/O is where **exception handling** becomes important:

-   files might not exist (`FileNotFoundError`)

-   files might not have the correct permissions (`PermissionError`)

-   the storage device might run out of space (`IOError`)

A good program will catch all of these **exceptions**, and fail
graciously!

### Exception Handling: The `try` construct

Syntax of the `try` construct:

-   all the statements which could raise exceptions are wrapped in a
    `try` block, which prevents crashes when an exception occurs

-   following `except` blocks can catch different pre-defined error
    types, and define how each type of error is handled

-   a `finally` block contains the statements which will be executed in
    any case, whether the `try` block was exited via an exception or not


-   Example: (from https://www.geeksforgeeks.org/try-except-else-and-finally-in-python/)

In [20]:
def divide(x, y):  
    try:  
        # Floor Division : Gives only Fractional  
        # Part as Answer  
        result = x // y  
    except ZeroDivisionError:  
        print("Sorry! You are dividing by zero")  
    else: 
        print("Yeah! Your answer is:", result)  
    finally:   
        # this block is always executed    
        # regardless of exception generation.   
        print('This is always executed')    

In [21]:
divide(1,10)

Yeah! Your answer is: 0
This is always executed


In [22]:
divide(1, 0)

Sorry! You are dividing by zero
This is always executed


- Exception handling is important in connection with file operations. 
- Whenever you open a file, you need to close it no matter what!
- a `try`-`except`-`finally` construction with unspecified exception type can guarantee that

In [23]:
try:
    f = open("some_text.txt")
    text = f.write("whatever")
except:
    print("something bad happened")
#    1/0
finally:
    print("this is executed regardless")
    f.close()


something bad happened
this is executed regardless


### Exception Handling: The `with` construct

Another option is to open files in a `with` block:

-   the clean-up procedure of the object (e.g. a file handle) will be
    executed whether an exception is raised or not

-   in the case of a file handle, the clean-up procedure includes
    `close()`

-   exceptions will not be handled, but percolated up to any surrounding
    `try` blocks (terminating the program in absence of matching
    `except`)

-   example:

In [24]:
with open("some_text.txt") as f:
    f.write("more text")
print(file_contents)


UnsupportedOperation: not writable