# Reading from and Writing to Files using Python

![](https://i.imgur.com/rv8wZ7l.png)

## Interacting with the OS and filesystem

The `os` module in Python provides many functions for interacting with the OS and the filesystem.

In [1]:
import os

In [2]:
# present working directory..
os.getcwd()

'c:\\Users\\sahilsharma\\Desktop\\pdf\\python code\\practice👀'

In [3]:
# list of files in pwd
os.listdir()

['Condition Stat and Loop.ipynb',
 'dataType_variables.ipynb',
 'Function in python.ipynb',
 'Reading and Writing to files.ipynb']

In [7]:
print('realtive path....')
os.listdir('.')
print('absolute path..')
#os.listdir('/usr')

realtive path....
absolute path..


In [8]:
# make a new dir 
os.makedirs('./data', exist_ok=True)

Let us download some files into the `data` directory using the `urllib` module.

In [9]:
url1 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans1.txt'
url2 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans2.txt'
url3 = 'https://gist.githubusercontent.com/aakashns/257f6e6c8719c17d0e498ea287d1a386/raw/7def9ef4234ddf0bc82f855ad67dac8b971852ef/loans3.txt'

In [10]:
# urllib module can be usse to dowoad files into the system
from urllib.request import urlretrieve

In [11]:
urlretrieve(url1, './data/loans1.txt')
urlretrieve(url2, './data/loans2.txt')
urlretrieve(url3, './data/loans3.txt')

('./data/loans3.txt', <http.client.HTTPMessage at 0x1c2c351f440>)

In [12]:
# verify the new files in the new folder
os.listdir('./data/')

['loans1.txt', 'loans2.txt', 'loans3.txt']

## Reading from a file 

To read the contents of a file, we first need to open the file using the built-in `open` function. The `open` function returns a file object and provides several methods for interacting with the file's contents.

In [13]:
file1 = open('./data/loans1.txt', mode='r')

The `open` function also accepts a `mode` argument to specifies how we can interact with the file. The following options are supported:

```
    ========= ===============================================================
    Character Meaning
    --------- ---------------------------------------------------------------
    'r'       open for reading (default)
    'w'       open for writing, truncating the file first
    'x'       create a new file and open it for writing
    'a'       open for writing, appending to the end of the file if it exists
    'b'       binary mode
    't'       text mode (default)
    '+'       open a disk file for updating (reading and writing)
    'U'       universal newline mode (deprecated)
    ========= ===============================================================
```

To view the contents of the file, we can use the `read` method of the file object.

In [20]:
file1_contents = file1.read()


In [21]:
print(file1_contents)




In [22]:
file1.close()
# once a file is closed , you can no longer read from it..
# file1.read() will give error

## Closing files automatically using `with`

To close a file automatically after you've processed it, you can open it using the `with` statement.

In [23]:
with open('./data/loans2.txt') as file2:
    file2_contents = file2.read()
    print(file2_contents)

amount,duration,rate,down_payment
828400,120,0.11,100000
4633400,240,0.06,
42900,90,0.08,8900
983000,16,0.14,
15230,48,0.07,4300


Once the statements within the `with` block are executed, the `.close` method on `file2` is automatically invoked. Let's verify this by trying to read from the file object again.


In [24]:
file2.read()

ValueError: I/O operation on closed file.

## Reading a file line by line


File objects provide a `readlines` method to read a file line-by-line. 

In [25]:
with open('./data/loans3.txt', 'r') as file3:
    file3_lines = file3.readlines()

In [26]:
file3_lines

['amount,duration,rate,down_payment\n',
 '45230,48,0.07,4300\n',
 '883000,16,0.14,\n',
 '100000,12,0.1,\n',
 '728400,120,0.12,100000\n',
 '3637400,240,0.06,\n',
 '82900,90,0.07,8900\n',
 '316000,16,0.13,\n',
 '15230,48,0.08,4300\n',
 '991360,99,0.08,\n',
 '323000,27,0.09,4720010000,36,0.08,20000\n',
 '528400,120,0.11,100000\n',
 '8633400,240,0.06,\n',
 '12900,90,0.08,8900']

### Processing data from files

In [27]:
def parse_headers(header_line):
    return header_line.strip().split(',')

The `strip` method removes any extra spaces and the newline character `\n`. The `split` method breaks a string into a list using the given separator (`,` in this case).

In [32]:
file3_lines[0]

'amount,duration,rate,down_payment\n'

In [34]:
headers = parse_headers(file3_lines[0])
headers

['amount', 'duration', 'rate', 'down_payment']

In [36]:
def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        values.append(float(item))
    return values

file3_lines[1]

'45230,48,0.07,4300\n'

In [37]:
parse_values(file3_lines[1])

[45230.0, 48.0, 0.07, 4300.0]

In [39]:
def parse_headers(header_line):
    return header_line.strip().split(',')

def parse_values(data_line):
    values = []
    for item in data_line.strip().split(','):
        if item == '':
            values.append(0.0)
        else:
            try:
                values.append(float(item))
            except ValueError:
                values.append(item)
    return values

def create_item_dict(values, headers):
    result = {}
    for value, header in zip(values, headers):
        result[header] = value
    return result

def read_csv(path):
    result = []
    # Open the file in read mode
    with open(path, 'r') as f:
        # Get a list of lines
        lines = f.readlines()
        # Parse the header
        headers = parse_headers(lines[0])
        # Loop over the remaining lines
        for data_line in lines[1:]:
            # Parse the values
            values = parse_values(data_line)
            # Create a dictionary using values & headers
            item_dict = create_item_dict(values, headers)
            # Add the dictionary to the result
            result.append(item_dict)
    return result

## Writing to files

Now that we have performed some processing on the data, it would be good to write the results back to a CSV file. We can create/open a file in `w` mode using `open` and write to it using the `.write` method. The string `format` method will come in handy here.

In [40]:
loans2 = read_csv('./data/loans2.txt')
loans2

[{'amount': 828400.0,
  'duration': 120.0,
  'rate': 0.11,
  'down_payment': 100000.0},
 {'amount': 4633400.0, 'duration': 240.0, 'rate': 0.06, 'down_payment': 0.0},
 {'amount': 42900.0, 'duration': 90.0, 'rate': 0.08, 'down_payment': 8900.0},
 {'amount': 983000.0, 'duration': 16.0, 'rate': 0.14, 'down_payment': 0.0},
 {'amount': 15230.0, 'duration': 48.0, 'rate': 0.07, 'down_payment': 4300.0}]

In [44]:
with open('./data/emis2.txt', 'w') as f:
    for loan in loans2:
        f.write('{},{},{},{}\n'.format(
            loan['amount'], 
            loan['duration'], 
            loan['rate'], 
            loan['down_payment']))
        
os.listdir('data')        

['emis2.txt', 'loans1.txt', 'loans2.txt', 'loans3.txt']

In [45]:
with open('./data/emis2.txt', 'r') as f:
    print(f.read())

828400.0,120.0,0.11,100000.0
4633400.0,240.0,0.06,0.0
42900.0,90.0,0.08,8900.0
983000.0,16.0,0.14,0.0
15230.0,48.0,0.07,4300.0



## Using Pandas to Read and Write CSVs

There are some limitations to the `read_csv` and `write_csv` functions we've defined above:

* The `read_csv` function fails to create a proper dictionary if any of the values in the CSV files contains commas
* The `write_csv` function fails to create a proper CSV if any of the values to be written contains commas



In [47]:
movies_url = "https://gist.githubusercontent.com/aakashns/afee0a407d44bbc02321993548021af9/raw/6d7473f0ac4c54aca65fc4b06ed831b8a4840190/movies.csv"

In [48]:
urlretrieve(movies_url, 'data/movies.csv')

('data/movies.csv', <http.client.HTTPMessage at 0x1c2c36bca40>)

In [50]:
movies = read_csv('data/movies.csv')

movies

[{'title': 'Fast & Furious', 'description': '"A movie'},
 {'title': 'The Dark Knight', 'description': '"Gotham'},
 {'title': 'Memento',
  'description': 'A guy forgets everything every 15 minutes'}]

As you can seen above, the movie descriptions weren't parsed properly.

To read this CSV properly, we can use the `pandas` library.

In [52]:
!pip install pandas --upgrade --quiet

import pandas as pd


[notice] A new release of pip is available: 23.2.1 -> 23.3.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [54]:
movies_dataframe = pd.read_csv('data/movies.csv')

movies_dataframe

Unnamed: 0,title,description
0,Fast & Furious,"A movie, a race, a franchise"
1,The Dark Knight,"Gotham, the ""Batman"", and the Joker"
2,Memento,A guy forgets everything every 15 minutes


A dataframe can be converted into a list of dictionaries using the `to_dict` method.

In [56]:
movies = movies_dataframe.to_dict('records')

movies

[{'title': 'Fast & Furious', 'description': 'A movie, a race, a franchise'},
 {'title': 'The Dark Knight',
  'description': 'Gotham, the "Batman", and the Joker'},
 {'title': 'Memento',
  'description': 'A guy forgets everything every 15 minutes'}]

In [58]:
## If you don't pass the arguments `records`, you get a dictionary of lists instead.
movies_dict = movies_dataframe.to_dict()
movies_dict

{'title': {0: 'Fast & Furious', 1: 'The Dark Knight', 2: 'Memento'},
 'description': {0: 'A movie, a race, a franchise',
  1: 'Gotham, the "Batman", and the Joker',
  2: 'A guy forgets everything every 15 minutes'}}