# File input-output



## Unstructured text files

Generate a text file with some arbitrary contents.

Below we use a Jupyter notebook magic (special command). Alternatively, you can copy and paste using a text editor.

In [2]:
%%writefile inputfile.txt
Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.

Writing inputfile.txt


### Opening a file
* To open a file, we use the `open()` function.
* Returns a file object.
* File opening modes
    * `"r"` for reading (default)
    * `"w"` for writing (destroys existing content)
    * `"a"` for writing (appends to the end of file)

In [3]:
f = open("inputfile.txt")

### Reading text from a file
File object methods for reading
* `f.read()` returns the entire contents as a string.
* `f.readlines()` returns a list of each line in file.
* `f.readline()` returns one line at a time.

In [4]:
f.read()

'Lorem ipsum dolor sit amet,\nconsectetur adipiscing elit,\nsed do eiusmod tempor incididunt\nut labore et dolore magna aliqua.\n'

The file needs to be closed after use, to save memory.

In [5]:
f.close()

Using the `.readlines()` method:

In [6]:
f = open("inputfile.txt")
lines = f.readlines()
f.close()

lines

['Lorem ipsum dolor sit amet,\n',
 'consectetur adipiscing elit,\n',
 'sed do eiusmod tempor incididunt\n',
 'ut labore et dolore magna aliqua.\n']

### Using context managers
A *context manager* can be used to close a file it automatically when done.

This is the preferred way.

In [13]:
with open("inputfile.txt") as f:
    lines = f.readlines()

In [14]:
lines

['Lorem ipsum dolor sit amet,\n',
 'consectetur adipiscing elit,\n',
 'sed do eiusmod tempor incididunt\n',
 'ut labore et dolore magna aliqua.\n']

In [15]:
f.closed # check that the file is actually closed

True

### Iterating over the lines in a file
* For large files, objects returned by `read()` or `readlines()` may not fit into memory.
* The file object is an *iterator*. Returns one line at a time in a loop.

In [9]:
with open("inputfile.txt") as f:
    for line in f:
        print(line, end="")

Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tempor incididunt
ut labore et dolore magna aliqua.


One can then process every line as needed (look for substrings, split into words, convert to numeric values, etc.)

### Writing text to a file

To write to a file, the file object must be opened in `"w"` or `"a"` modes.

In [16]:
with open("outputfile.txt","w") as f:
    f.write("First line\n")  # newline characters must be explicitly given
    f.write("Second line\n")
    f.write("Third line\n")

Read it back:

In [17]:
with open("outputfile.txt") as f:
    print(f.read())

First line
Second line
Third line



### Appending text to a file

To add new lines without overwriting existing content:

In [18]:
with open("outputfile.txt","a") as f:
    f.write("Fourth line\n")

In [19]:
with open("outputfile.txt") as f:
    print(f.read())

First line
Second line
Third line
Fourth line



## Tabular files

* Comma-separated value (CSV) is a format for *tabular data*.
* Every row is in a line, ending with `\n`
* Columns are separated by a comma `,` (characters can be used)
* Readable by spreadsheet software

Generate a sample CSV file:

In [20]:
%%writefile records.csv
"Potter, H",37,"London, UK"
"Granger, H",36,"Sydney, Australia"
"Weasley, Bill",45,"Bucharest, Romania"

Writing records.csv


### Reading CSV files

In [21]:
import csv

with open("records.csv") as f:
    reader = csv.reader(f)
    for line in reader:
        print(line)

['Potter, H', '37', 'London, UK']
['Granger, H', '36', 'Sydney, Australia']
['Weasley, Bill', '45', 'Bucharest, Romania']


### Reading CSV files with other separators

Generate a file with different column separators. Also use `/` instead of quotes.

In [22]:
%%writefile records2.csv
/Potter, H/ 37 /London, UK/
/Granger, H/ 36 /Sydney, Australia/
/Weasley, Bill/ 45 /Bucharest, Romania/

Writing records2.csv


In [23]:
with open("records2.csv") as f:
    reader = csv.reader(f, delimiter=" ", quotechar="/")
    for line in reader:
        print(line)

['Potter, H', '37', 'London, UK']
['Granger, H', '36', 'Sydney, Australia']
['Weasley, Bill', '45', 'Bucharest, Romania']


### Writing to CSV files

In [24]:
with open("records3.csv", "w", newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['Potter, H', '37', 'London, UK'])
    writer.writerow(['Granger, H', '36', 'Sydney, Australia'])

In [25]:
with open("records3.csv") as f:
    print(f.read())


"Potter, H",37,"London, UK"
"Granger, H",36,"Sydney, Australia"



Sample CSV files to analyze: https://www.kaggle.com/datasets/samybaladram/multidisciplinary-csv-datasets-collection