<font color="white">.</font> | <font color="white">.</font> | <font color="white">.</font>
-- | -- | --
![NASA](http://www.nasa.gov/sites/all/themes/custom/nasatwo/images/nasa-logo.svg) | <h1><font size="+3">ASTG Python Courses</font></h1> | ![NASA](https://www.nccs.nasa.gov/sites/default/files/NCCS_Logo_0.png)

---
<center>
<H1 style="color:red">
File Input and Output (IO)
</H1>
</center>


## <font color='red'>Types of Files</font>

![file](https://raw.githubusercontent.com/astg606/py_materials/master/input_output/fileFormats.jpg)

## <font color="red"> Text Files</font>

A **text file** has no specific encoding and can be opened by a standard text editor without any special handling. Every text file must adhere to a set of rules:

* Text files have to be readable as is.
* Data in a text file is organized by lines. 
* Text files all have an unseen character at the end of each line which lets the text editor know that there should be a new line. When interacting with these files, you can take advantage of that character. In Python, it is denoted by the `“\n”`.

## <font color="red">Reading Text Files</font>

* Before you can read (or write) a file, you have to open it using Python's built-in `open()` function. 
* The `open()` function creates a file object, which would be utilized to call other support methods associated with it.

In [None]:
help(open)

**The Basic Syntax**

```python
file object = open(file_name [, access_mode][, buffering])
```

* `file_name` − The file_name argument is a string value that contains the name of the file that you want to access.
* `access_mode` − The access_mode determines the mode in which the file has to be opened, i.e., read, write, append, etc. This is optional parameter and the default file access mode is read (r).
* `buffering` − If the buffering value is set to 0, no buffering takes place. If the buffering value is 1, line buffering is performed while accessing a file. If you specify the buffering value as an integer greater than 1, then buffering action is performed with the indicated buffer size. If negative, the buffer size is the system default (default behavior).


**Summary of  `open()` file access modes**


| Mode | Description |
| --- | --- |
| r | Opens a file for reading only. Default mode. | 
| rb | Opens a file for reading only in binary format. | 
| r+ | Opens a file for both reading and writing. |
| rb+ | Opens a file for both reading and writing in binary format. |
| w | Opens a file for writing only. Overwrites file if it exists. Creates a new file if it does not exist. | 
| wb | Opens a file for writing only in binary format. |
| w+ | Opens a file for both writing and reading. |
| wb+ | Opens a file for both writing and reading in binary format. |
| a | Opens a file for appending. The file pointer is at the end of the file if it exists. |
| ab | Opens a file for appending in binary format. | 
| a+ | Opens a file for both appending and reading. | 
| ab+| Opens a file for both appending and reading in binary format. |

Obtain the remote file `demo.txt`:

In [None]:
import urllib.request
url = "https://raw.githubusercontent.com/astg606/py_materials/master/input_output/"
file_name = "demo.txt"
urllib.request.urlretrieve(url+file_name, file_name)

In [None]:
file_object = open('demo.txt', 'r') # 'r' is default

A file_object can be treated as a sequence of strings

#### What type of object is file_object?

In [None]:
print("file_object is of type: ", type(file_object))

In [None]:
# file_object.<TAB>
dir(file_object) # attributes and methods of file objects

Extract the file name:

In [None]:
print(file_object.name)

Determine the access mode:

In [None]:
print(file_object.mode)

Determine if the file is cloded:

In [None]:
print(file_object.closed)

### Print all the lines and count the number of lines

In [None]:
fid = open ('demo.txt','r')
count = 0
for line in fid: # treating fid as a sequence of strings
    count = count + 1
    print(line)
    
print(f"My file has {count} lines.")    
fid.close()

#### Note: Each line includes a non-printing character called the newline character `\n`.

### Automatically closing files

There is another formulation to open and close the file: use the `with` statement:

In [None]:
count = 0
with open ('demo.txt','r') as fid:
    for line in fid:
        count += 1
        print(line)
    print(' (IN)--> Is file closed? ', fid.closed)
    # file will be closed after exiting this block of code
    
print(f"My file has {count} lines.")    

In [None]:
print('(OUT)<-- Is file closed? ', fid.closed)

- We do not have to close the file identifier.
- As soon as we are outside the scope of the `with` statement, the file identifier will automatically close.

### <font color="blue">Breakout 1</font> 
Read the text file `demo.txt` and count the number of lines excluding empty lines.

<details><summary><b><font color="red">Click here to access the solution</font></b></summary>
<p>
    
```python
count = 0
with open ('demo.txt','r') as fid:
    for line in fid:
        if line.strip():
            count += 1
            print(line)
    
print(f"My file has {count} lines.")   
``` 
</p>
</details>

### Reading the entire file at once

In [None]:
with open('demo.txt','r') as fid:
    # read() reads the _entire_ file, returns a string object
    data = fid.read()           
    print("Contents of file are of type: ", type(data))

The content of the file (`data`) was saved in memory. We can use it:

In [None]:
heading = "Contents of file"
print("\n" + heading + "\n" + "-"*len(heading))
print(data)

### Read file chunks

In [None]:
num_bytes = 64

with open('demo.txt', 'r') as fid:
    data = "Dummy string"         
    while data:
        data = fid.read(num_bytes)   # read in num_bytes chunk sizes
        print(data)

### Read one line at a time: `readline()` function

Read the first line:

In [None]:
with open('demo.txt') as fid:
    data = fid.readline()
    print(data)

Read all the lines, one line at the time:

In [None]:
with open('demo.txt') as fid:
    data = "Dummy String"
    while data:
        data = fid.readline()
        print(data)

### Read all the lines at once: `readlines()` function

In [None]:
with open('demo.txt') as fid:
    data = fid.readlines()

Note that `data` is a list of lines:

In [None]:
print(type(data))

In [None]:
print(data)  

### <font color="blue">Breakout 2</font>
Read the text file `demo.txt` and find all instances of the word "Luke"

<details><summary><b><font color="red">Click here to access the solution</font></b></summary>
<p>
    
```python
with open('demo.txt') as fid:
    data = fid.readlines()

count = 0
for line in data:
    if "Luke" in line:
        count += 1
        print(line)
        
print(f"There are {count} lines with the word Luke.")  
``` 
</p>
</details>

## <font color="red">Writing Text Files</font>

* The `write()` method writes any string to an open file.
* The `write()` method does not add a newline character (`'\n'`) to the end of the string. 
* The `writelines()` method takes a list and write each entry in its own line (if it has the "\n" character at the end).

In [None]:
with open('elements.txt', 'w') as fid: # 'w' creates a new file
    fid.write('Noble gases: ')              
    fid.writelines(['He', 'Ne', 'Ar'])  # writelines writes each element on its own

In [None]:
!cat elements.txt

**Note: python will not write '\n' for you.**

#### 'a+' vs 'r+'

`'a'` is 'append' mode, no reading:

In [None]:
with open('elements.txt', 'a') as fid:
    contents = fid.read()

`'a+'` is for appending and reading:
- The file is opened for reading and writing
- The file is created if it does not exist.
- The file pointer is at the end of the file.

In [None]:
with open('elements.txt', 'a+') as fid: 
    contents = fid.read()               
    print("File position: ", fid.tell())                
    fid.write('Kr\n')                    

`fid.tell()` tells me that file pointer is at EOF and I appended 'Kr\n' at that position.

In [None]:
!cat elements.txt

`r+'` for reading and writing:
- Opens a file for both reading and writing. 
- The file pointer will be at the beginning of the file.

In [None]:
with open('elements.txt', 'r+') as fid:
    print("File position: ", fid.tell()) # file pointer is at 'beginning of file'
    fid.write('Halogens:\n')          
    fid.writelines(['F\n', 'Cl\n'])

In [None]:
with open('elements.txt') as fid:
    print(fid.readlines())

f.tell() tells me that file pointer is at BOF and I wrote text starting at that position

In [None]:
!cat elements.txt

### <font color="blue">Breakout 3</font>
Write a program that reads file `demo.txt` and writes out a new file with the lines in reversed order (i.e. the first line in the old file becomes the last one in the new file.)

<details><summary><b><font color="red">Click here to access the solution</font></b></summary>
<p>
    
```python
with open('demo.txt') as fid:
    data = fid.readlines()

data.reverse()
with open('demo_reversed.txt', 'w') as fid:
    fid.writelines(data)
``` 
</p>
</details>

## Summary of basic file IO functions and methods

<table style="width:100%">
  <tr>
    <th>Methods and functions</th>
    <th>Description</th> 
  </tr>
  <tr>
    <td>open()</td>
    <td>Returns a file object and is most commonly used with two arguments: open(filename, mode)</td> 
  </tr>
  <tr>
    <td>file.close()</td>
    <td>Close the file.</td> 
  </tr>
  <tr>
    <td>file.read([size])</td>
    <td>Read the entire file. If size is specified then read at most size bytes.</td> 
  </tr>
  <tr>
    <td>file.readline([size])</td>
    <td>Read one line from the file. If size is specified then read at most size bytes.</td> 
  </tr>
  <tr>
    <td>file.readlines([size])</td>
    <td>Read all the lines from the file. If size is specified then read at most size bytes.</td> 
  </tr>
  <tr>
    <td>file.tell()</td>
    <td>Returns file object's current position in the file.</td> 
  </tr>
  <tr>
    <td>file.seek(int)</td>
    <td>Changes the file object's current position to the given int.</td> 
  <tr>
    <td>file.write(string)</td>
    <td>Writes the contents of string to the file.</td> 
  </tr>
</table>

# Extra material

### OS dependent functions

In [None]:
import os

Python os module provides methods that help you perform all kinds of file-processing operations, such as renaming and deleting files (as well as file IO).

In [None]:
help(os.read)

In [None]:
fd = os.open('demo.txt', os.O_RDWR)
ret = os.read(fd, 35)
print('Result from os.read:'+'\n'+20*'-'+'\n', ret)
os.close(fd)

### File position

In [None]:
with open ('demo.txt') as f:
     f.seek(5)          # seek(offset) Changes file object's position
     data = f.readline() 
     print (data)

In [None]:
with open ('demo.txt') as f:
     f.seek(5)
     data = f.readline() 
     print (data)
     k = f.tell()              # tell() returns current position in file
     print (k)

### Using print  to automatically add new lines

In [None]:
with open ('elementsWithNewLine.txt', 'w') as f:
     print('Noble gases', file=f)       # print automatically adds newline
     for gas in ['He', 'Ne', 'Ar', 'Kr']:
         print(gas, file=f)

In [None]:
!cat elementsWithNewLine.txt