In [1]:
import os

To get the list of files in a directory, use `os.listdir`. You pass an absolute or relative path of a directory as the argument to the function.

In [3]:
help(os.listdir)

Help on built-in function listdir in module posix:

listdir(path=None)
    Return a list containing the names of the files in the directory.
    
    path can be specified as either str, bytes, or a path-like object.  If path is bytes,
      the filenames returned will also be bytes; in all other circumstances
      the filenames returned will be str.
    If path is None, uses the path='.'.
    On some platforms, path may also be specified as an open file descriptor;\
      the file descriptor must refer to a directory.
      If this functionality is unavailable, using it raises NotImplementedError.
    
    The list is in arbitrary order.  It does not include the special
    entries '.' and '..' even if they are present in the directory.



In [4]:
os.listdir('.') # relative path

['.bashrc',
 '.bash_logout',
 '.profile',
 '.ipython',
 '.local',
 '.ipynb_checkpoints',
 '.cache',
 'postBuild',
 '.jovianrc',
 '.config',
 'python-os-and-filesystem.ipynb',
 '.git']

In [5]:
os.listdir('/usr') # absolute path

['bin', 'share', 'lib', 'games', 'sbin', 'local', 'src', 'include']

A new directory can be created using `os.makedirs`. Let's create a new directory called `data`, where we'll later download some files.

In [6]:
os.makedirs('./data', exist_ok=True)

In [8]:
help(os.makedirs)

Help on function makedirs in module os:

makedirs(name, mode=511, exist_ok=False)
    makedirs(name [, mode=0o777][, exist_ok=False])
    
    Super-mkdir; create a leaf directory and all intermediate ones.  Works like
    mkdir, except that any intermediate path segment (not just the rightmost)
    will be created if it does not exist. If the target directory already
    exists, raise an OSError if exist_ok is False. Otherwise no exception is
    raised.  This is recursive.



Can you figure out what the argument `exist_ok` does? Try using the `help` function or [read the documentation](https://docs.python.org/3/library/os.html#os.makedirs).

Let's verify that the directory was in fact, created, and is currently empty.

In [9]:
'data' in os.listdir('.')

True

In [10]:
os.listdir('./data')

[]

Let us download some files into the `data` directory using the `urllib` module.

In [3]:
url1 = 'https://hub.jovian.ml/wp-content/uploads/2020/08/loans1.txt'
url2 = 'https://hub.jovian.ml/wp-content/uploads/2020/08/loans2.txt'
url3 = 'https://hub.jovian.ml/wp-content/uploads/2020/08/loans3.txt'

In [1]:
import urllib.request

In [None]:
urllib.request.urlretrieve(url1, './data/loans1.txt')

In [14]:
urllib.request.urlretrieve(url2, './data/loans2.txt')

('./data/loans2.txt', <http.client.HTTPMessage at 0x7fbb70093150>)

In [15]:
urllib.request.urlretrieve(url3, './data/loans3.txt')

('./data/loans3.txt', <http.client.HTTPMessage at 0x7fbb70093ad0>)

Let's verify that the files were downloaded.

In [16]:
os.listdir('./data')

['loans2.txt', 'loans1.txt', 'loans3.txt']

### Reading from a file 

To read the contents of a file, we first need to open the file using the built-in `open` function. The `open` function returns a file object, provides several methods for interacting with the contents of the file. It also accepts a `mode` argument

In [17]:
file1 = open('./data/loans1.txt', mode='r')

The `open` function also accepts a `mode` argument to specifies how we can interact with the file. The following options are supported:

```
    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================
```

To view the contents of the file we can use the `read` method of the file object.

In [18]:
file1_contents = file1.read()

In [19]:
print(file1_contents)

amount,duration,rate,down_payment
100000,36,0.08,20000
200000,12,0.1,
628400,120,0.12,100000
4637400,240,0.06,
42900,90,0.07,8900
916000,16,0.13,
45230,48,0.08,4300
991360,99,0.08,
423000,27,0.09,47200


The file contains information about loans. It is a set of comma-separated values (CSV). 

> **CSVs**: A comma-separated values (CSV) file is a delimited text file that uses a comma to separate values. Each line of the file is a data record. Each record consists of one or more fields, separated by commas. A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields. (Wikipedia)

The first line of the file is the header, which indicates what each of the numbers on the remaining lines represent. Each of the remaining lines provides information about a loan. Thus, the second line `10000,36,0.08,20000` represents a loan with:

* an *amount* of `$10000`, 
* *duration* of `36` months, 
* *rate of interest* of `8%` per annum, and 
* a down payment of `$20000`

The CSV is a common file format used for sharing data for analysis and visualization. Over the course of this tutorial, we will read the data from these CSV files, process it, and write the results back to files. Before we continue, let's close the file using the `close` method (otherwise Python will continue to hold the entire file in the RAM)

In [20]:
file1.close()

Once a file is closed, it can no longer be read.

In [21]:
file1.read()

ValueError: I/O operation on closed file.

### Closing files automatically using `with`

To make it easy to automatically close a file once you are done processing it, you can open it using the `with` statement.

In [22]:
with open('./data/loans2.txt') as file2:
    file2_contents = file2.read()
    print(file2_contents)

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300



Once the statements within the `with` block are executed, the `.close` method on `file2` is automatically invoked. Let's verify this by trying to read from the file object again.

In [23]:
file2.read()

ValueError: I/O operation on closed file.

### Reading a file line by line


File objects provide a `readlines` method to read a file line-by-line. 

In [2]:
with open('D:/python-os-and-filesystem-andrewt/Book1.txt', 'r') as file3:
    file3_lines = file3.readlines()

In [3]:
file3_lines

['VESTMENT\tSPEED\tPRINCIPAL\tRATE\tNAME\n',
 '0\tfast\t10000.00\t0.43\tjohnson\n',
 '0.1\tslow\t15000.00\t0.44\tstevens\n',
 '0.2\tfast\t20000.00\t0.45\tcrammlin\n',
 '0.3\tslow\t25000.00\t0.46\tjustof\n',
 '0.4\tfast\t30000.00\t0.47\tbjorn\n',
 '0.5\tslow\t35000.00\t0.48\tcarnag\n',
 '0.6\tfast\t40000.00\t0.49\tpobly\n',
 '0.7\tslow\t45000.00\t0.5\tsz\n',
 '0.8\tfast\t50000.00\t0.51\tjjoff\n',
 '0.9\tslow\t55000.00\t0.52\takarn\n',
 '1\tfast\t60000.00\t0.53\tsklaaa\n',
 '1.1\tslow\t65000.00\t0.54\tpyopi\n',
 '1.2\tfast\t70000.00\t0.55\tjames\n',
 '1.3\tslow\t75000.00\t0.56\tseaborn\n',
 '1.4\tfast\t80000.00\t0.57\twhynot\n',
 '1.5\tslow\t85000.00\t0.58\tnotsure\n',
 '1.6\tfast\t90000.00\t0.59\tjoe\n',
 '1.7\tslow\t95000.00\t0.6\tdirt\n',
 '1.8\tfast\t100000.00\t0.61\tbanana\n']

### Processing data from files

Before performing any operations on the data stored in a file, we need to convert the contents of the file from one large string into Python data types. For the file `loans1.txt` containing information about loans in a CSV format, we can do the following:

* Read the file line by line
* Parse the first line to get list of the column names or headers
* Split each remaining line and convert each value into a float
* Create a dictionary for each loan using the headers as keys
* Create a list of dictionaries to keep track of all the loans

Since we will perform the same operations for multiple files, it would be useful define a function `read_csv` to do this. We'll also define some helper functions to build up the functionality step by step. 

Let's start by defining a function `parse_header` which takes a line as input and returns a list of column headers.

In [1]:
def parse_headers(header_line):
    return header_line.strip().split('\t')

The `strip` method removes any extra spaces and the newline character `\n`, and the split method breaks a string into a list using the given separator (`,` in this case).

In [46]:
file3_lines[0]

'VESTMENT\tSPEED\tPRINCIPAL\tRATE\tNAME\n'

In [47]:
headers = parse_headers(file3_lines[0])

In [48]:
headers

['VESTMENT', 'SPEED', 'PRINCIPAL', 'RATE', 'NAME']

Next, let's define a function `parse_values` which takes a line containing some data, and returns a list of floating point numbers.

In [4]:
def parse_values(data_line):
    values = []
    digits = ['1', '2', '3', '4', '5', '6', '7', '8', '9']
    for value in data_line.strip().split('\t'):
        isFloat = False
        for digit in digits:
            if digit in value:
                values.append(float(value))
                isFloat = True
                break
        if not isFloat:
            values.append(value)
    return values

In [130]:
file3_lines[3]

'0.2\tfast\t20000.00\t0.45\tcrammlin\n'

In [131]:
parsed_line = parse_values(file3_lines[3])

In [132]:
parsed_line

[0.2, 'fast', 20000.0, 0.45, 'crammlin']

In [133]:
for value in parsed_line:
    print(type(value))

<class 'float'>
<class 'str'>
<class 'float'>
<class 'float'>
<class 'str'>


The values were parsed and converted to floating point numbers, as expected. Let's try it for another line from the file, which does not contain a value for the down payment.

In [134]:
file3_lines[2]

'0.1\tslow\t15000.00\t0.44\tstevens\n'

In [135]:
parse_values(file3_lines[2])

[0.1, 'slow', 15000.0, 0.44, 'stevens']

This leads to a `ValueError` because the empty string `''` cannot be converted to a float. We can enhance the `parse_values` function to handle this *edge case*.

In [None]:
def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        if item == '':
            values.append(0.0)
        else:
            values.append(float(item))
    return values

In [None]:
file3_lines[2]

In [136]:
parse_values(file3_lines[2])

[0.1, 'slow', 15000.0, 0.44, 'stevens']

Next, let's define a function `create_item_dict` which takes a list of values and a list of headers as inputs, and returns a dictionary with the values associated with their respective headers as keys.

In [5]:
def create_loan_dict(headers, values):
    loan_dict = {}
    for header, value in zip(headers, values):
        loan_dict[header] = value
    return loan_dict

Can you figure out what the Python built-in function `zip` does? Try out an example, or [read the documentation](https://docs.python.org/3.3/library/functions.html#zip).

In [40]:
for item in zip([1,2,3], ['a', 'b', 'c']):
    print(item)

(1, 'a')
(2, 'b')
(3, 'c')


Let's try out `crate_item_dict` with a couple of examples.

In [138]:
file3_lines[1]

'0\tfast\t10000.00\t0.43\tjohnson\n'

In [139]:
values1 = parse_values(file3_lines[1])
create_loan_dict(headers, values1)

{'VESTMENT': '0',
 'SPEED': 'fast',
 'PRINCIPAL': 10000.0,
 'RATE': 0.43,
 'NAME': 'johnson'}

In [140]:
file3_lines[2]

'0.1\tslow\t15000.00\t0.44\tstevens\n'

In [142]:
values2 = parse_values(file3_lines[2])
create_loan_dict(headers, values2)

{'VESTMENT': 0.1,
 'SPEED': 'slow',
 'PRINCIPAL': 15000.0,
 'RATE': 0.44,
 'NAME': 'stevens'}

In [6]:
def read_csv(path):
    loanBook = []
    loanDict = {}
    
    with open(path, 'r') as inFile:
        file_contents = inFile.readlines()
    
    headers = parse_headers(file_contents[0])    
    
    for i in range(1,len(file_contents)):
        values = parse_values(file_contents[i])
        loan = create_loan_dict(headers, values)
        loanBook.append(loan)
            
    return loanBook    

As expected, the values & header are combined to create a dictionary with the approriate key-value pairs.

We are now ready to put it all together and define the `read_csv` function.

In [78]:
def read_csv(path):
    result = []
    # Open the file in read mode
    with open(path, 'r') as f:
        # Get a list of lines
        lines = f.readlines()
        # Parse the header
        headers = parse_headers(lines[0])
        # Loop over the remaining lines
        for data_line in lines[1:]:
            # Parse the values
            values = parse_values(data_line)
            # Create a dictionary using values & headers
            item_dict = create_loan_dict(values, headers)
            # Add the dictionary to the result
            result.append(item_dict)
    return result

Let's try it out!

In [5]:
with open('D:/python-os-and-filesystem-andrewt/Book1.txt') as file2:
    print(file2.read())

VESTMENT	SPEED	PRINCIPAL	RATE	NAME
0	fast	10000.00	0.43	johnson
0.1	slow	15000.00	0.44	stevens
0.2	fast	20000.00	0.45	crammlin
0.3	slow	25000.00	0.46	justof
0.4	fast	30000.00	0.47	bjorn
0.5	slow	35000.00	0.48	carnag
0.6	fast	40000.00	0.49	pobly
0.7	slow	45000.00	0.5	sz
0.8	fast	50000.00	0.51	jjoff
0.9	slow	55000.00	0.52	akarn
1	fast	60000.00	0.53	sklaaa
1.1	slow	65000.00	0.54	pyopi
1.2	fast	70000.00	0.55	james
1.3	slow	75000.00	0.56	seaborn
1.4	fast	80000.00	0.57	whynot
1.5	slow	85000.00	0.58	notsure
1.6	fast	90000.00	0.59	joe
1.7	slow	95000.00	0.6	dirt
1.8	fast	100000.00	0.61	banana



In [11]:
readout = read_csv('D:/python-os-and-filesystem-andrewt/Book1.txt')

In [12]:
readout

[{'VESTMENT': '0',
  'SPEED': 'fast',
  'PRINCIPAL': 10000.0,
  'RATE': 0.43,
  'NAME': 'johnson'},
 {'VESTMENT': 0.1,
  'SPEED': 'slow',
  'PRINCIPAL': 15000.0,
  'RATE': 0.44,
  'NAME': 'stevens'},
 {'VESTMENT': 0.2,
  'SPEED': 'fast',
  'PRINCIPAL': 20000.0,
  'RATE': 0.45,
  'NAME': 'crammlin'},
 {'VESTMENT': 0.3,
  'SPEED': 'slow',
  'PRINCIPAL': 25000.0,
  'RATE': 0.46,
  'NAME': 'justof'},
 {'VESTMENT': 0.4,
  'SPEED': 'fast',
  'PRINCIPAL': 30000.0,
  'RATE': 0.47,
  'NAME': 'bjorn'},
 {'VESTMENT': 0.5,
  'SPEED': 'slow',
  'PRINCIPAL': 35000.0,
  'RATE': 0.48,
  'NAME': 'carnag'},
 {'VESTMENT': 0.6,
  'SPEED': 'fast',
  'PRINCIPAL': 40000.0,
  'RATE': 0.49,
  'NAME': 'pobly'},
 {'VESTMENT': 0.7,
  'SPEED': 'slow',
  'PRINCIPAL': 45000.0,
  'RATE': 0.5,
  'NAME': 'sz'},
 {'VESTMENT': 0.8,
  'SPEED': 'fast',
  'PRINCIPAL': 50000.0,
  'RATE': 0.51,
  'NAME': 'jjoff'},
 {'VESTMENT': 0.9,
  'SPEED': 'slow',
  'PRINCIPAL': 55000.0,
  'RATE': 0.52,
  'NAME': 'akarn'},
 {'VESTMENT': 1

The file is read and converted to a list of dictionaries, as expected. The `read_csv` file is generic enough that it can parse any file in the CSV format, with any number of rows or columns. Here's the full code for `read_csv` along with the helper functions:

In [None]:
def parse_headers(header_line):
    return header_line.strip().split(',')

def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        if item == '':
            values.append(0.0)
        else:
            values.append(float(item))
    return values

def create_item_dict(values, headers):
    result = {}
    for value, header in zip(values, headers):
        result[header] = value
    return result

def read_csv(path):
    result = []
    # Open the file in read mode
    with open(path, 'r') as f:
        # Get a list of lines
        lines = f.readlines()
        # Parse the header
        headers = parse_headers(lines[0])
        # Loop over the remaining lines
        for data_line in lines[1:]:
            # Parse the values
            values = parse_values(data_line)
            # Create a dictionary using values & headers
            item_dict = create_item_dict(values, headers)
            # Add the dictionary to the result
            result.append(item_dict)
    return result

Try to create small, generic and reusable functions whenever possible, as they will likely be useful beyond just the problem at hand, and save you a lot of effort in the future.

In the [previous tutorial](https://jovian.ml/aakashns/python-functions-and-scope), we defined a function to calculate the equal monthly installments for a loan. Here's what it looked like:

In [None]:
import math

def loan_emi(amount, duration, rate, down_payment=0):
    """Calculates the equal montly installment (EMI) for a loan.
    
    Arguments:
        amount - Total amount to be spent (loan + down payment)
        duration - Duration of the loan (in months)
        rate - Rate of interest (monthly)
        down_payment (optional) - Optional intial payment (deducted from amount)
    """
    loan_amount = amount - down_payment
    try:
        emi = loan_amount * rate * ((1+rate)**duration) / (((1+rate)**duration)-1)
    except ZeroDivisionError:
        emi = loan_amount / duration
    emi = math.ceil(emi)
    return emi

We can use this function to calculate EMIs for all the loans in a file.

In [None]:
loans2 = read_csv('./data/loans2.txt')

In [None]:
loans2

In [None]:
for loan in loans2:
    loan['emi'] = loan_emi(loan['amount'], 
                           loan['duration'], 
                           loan['rate']/12, # the CSV contains yearly rates
                           loan['down_payment'])

In [None]:
loans2

You can see that each loan now has a new key `emi`, which provides the EMI for the loan. We can extract this logic into a function, so that we can be used for other files too.

In [None]:
def compute_emis(loans):
    for loan in loans:
        loan['emi'] = loan_emi(
            loan['amount'], 
            loan['duration'], 
            loan['rate']/12, # the CSV contains yearly rates
            loan['down_payment'])

### Writing to files

Now that we have performed some processing on the data, it would be a good idea to write the results back to a file in the CSV format. We can do this by creating/opening a file in write mode with `open` and using the `.write` method of the file object. The string `format` method will be useful for 

In [None]:
loans2 = read_csv('./data/loans2.txt')

In [None]:
compute_emis(loans2)

In [None]:
loans2

In [None]:
with open('./data/emis2.txt', 'w') as f:
    for loan in loans2:
        f.write('{},{},{},{},{}\n'.format(
            loan['amount'], 
            loan['duration'], 
            loan['rate'], 
            loan['down_payment'], 
            loan['emi']))

Let's verify that the file was created and written to as expected.

In [None]:
os.listdir('data')

In [None]:
with open('./data/emis2.txt', 'r') as f:
    print(f.read())

Great, looks like the loan details (along with the computed EMIs) were written into the file.

Let's define a generic function `write_csv` which takes a list of dictionaries and writes it to a file in CSV format. We will also include the column headers in the first line.

In [13]:
def write_csv(items, path):
    # Open the file in write mode
    with open(path, 'w') as f:
        # Return if there's nothing to write
        if len(items) == 0:
            return
        
        # Write the headers in the first line
        headers = list(items[0].keys())
        f.write('\t'.join(headers) + '\n')
        
        # Write one item per line
        for item in items:
            values = []
            for header in headers:
                values.append(str(item.get(header, "")))
            f.write('\t'.join(values) + "\n")

In [16]:
num_items = len(readout)
writeout = write_csv(readout, 'D:/python-os-and-filesystem-andrewt/test_write.txt')

In [17]:
writeout

Do you understand how the function works? If now, try executing each statement by line by line or a different cell to figure out how it works. 

Let's try it out!

In [None]:
loans3 = read_csv('./data/loans3.txt')

In [None]:
compute_emis(loans3)

In [None]:
write_csv(loans3, './data/emis3.txt')

In [None]:
with open('./data/emis3.txt', 'r') as f:
    print(f.read())

With just 4 lines of code, we can now read each downloaded file, calcualte the EMIs, and write the results back to new files:

In [None]:
for i in range(1,4):
    loans = read_csv('./data/loans{}.txt'.format(i))
    compute_emis(loans)
    write_csv(loans, './data/emis{}.txt'.format(i))

In [None]:
os.listdir('./data')

Isn't that wonderful? Once all the functions are defined, we can calculate EMIs for thousands or even millions of loans across many files with just a few lines of code, in a few seconds. Now we're starting to see the real power of using a programming language like Python for processing data!

## Summary and Further Reading

With this we complete our discussion of reading from and writing to files in Python. We've covered the following topics in this tutorial:

* Interacting with the filesystem using the `os` module
* Downloading files from URLs using the `urllib` module
* Opening files using the `open` built-in function
* Reading the contents of a file using `.read`
* Closing file automatically using `with`
* Reading a file line by line using `readlines`
* Processing data from a CSV file using our own functions
* Using helper funtions to build more complext functions
* Writing data to a file using `.write`



This is by no means an exhaustive or comprehensive tutorial on working with files in Python. Following are some more resources you should check out:

* Python Tutorial at W3Schools: https://www.w3schools.com/python/
* Practical Python Programming: https://dabeaz-course.github.io/practical-python/Notes/Contents.html
* Python official documentation: https://docs.python.org/3/tutorial/index.html

You are ready to move on to the next tutorial: ["Object-oriented programming using classes in Python"](https://jovian.ml/aakashns/python-object-oriented-programming).