# File Input and Output

1. [File modes](#modes)
1. [File object attributes](#attributes)
1. [Basic file writing](#writing)
    1. Writing a string to a file
    1. Writing multiple lines to a file 
1. [ Basic file reading](#reading)
    1. Reading the entire file at once
    1. Reading line by line
    1. Reading all lines into a list
1. [Some simple files operations](#)
    1. Using different encodings
    1. Working with csv
    1. Working with json
1. [Some advanced file operations](#)
    1. Reading and Processing Large Files Efficiently
    1. CSV File Processing with Custom Delimiters
    1. JSON File Operations with Error Handling
    1. Binary File Operations with Struct Packing
    1. Memory-Mapped File for Large Binary Files
    1. File Locking for Concurrent Access
    1. Temporary File Operations
## Summary

When a file is open, you get a file object which you can use to perform write and or write on the underlining file.

## <a id="modes"></a>Modes for opening a 
File mode determines what operations can be done on the file object

Character	Meaning
-   `r`	open for reading (default)
-   `t`	text mode (default)
-   `w`	open for writing, truncating the file first
-   `x`	create a new file and open it for writing.   
    This will fail if the file already exist.
-   `a`	open for writing, appending to the end of the file if it exists
-   `b`	binary mode
-   `+`	open a disk file for updating (reading and writing)

Default mode is set for reading text files using encoding `utf-8`.

## <a id="attributes"></a>Exploring some attribute of the file object

In [None]:
f = open('Data/person.txt', 'wt')
print(f'    name: {f.name}')
print(f'    mode: {f.mode}')
print(f'encoding: {f.encoding}')
print(f'  closed: {f.closed}')
print(f'  fileno: {f.fileno()}')
print(f'    tell: {f.tell()}')

f.close()
print(f'  closed: {f.closed}')

### <a id='writing'></a> Basic file writing  
1. Writing a string to a file using `write()`
1. Writing multiple lines to a file using `writelines()`

In [None]:
f = open('Data/person.txt', 'wt')       #if the file exist, it will be overwritten
f.write('Created on jan-18-2025 by Narendra\n')
f.write('12 times table\n')
f.write('--------------\n')
for x in range(1, 13):
    f.write(f'{x} x 12 = {x * 12}\n')
f.close()

If the file is open in append mode, then write puts text at the end of the existing file

In [None]:
f = open('Data/person.txt', 'at')
f.write('this should be at the bottom of the file')
f.close()

In [None]:
data = ['Created on jan-18-2025 by Narendra\n',
        '12 times table\n', 
        '--------------\n']

table = [f'{x:>3} x {12} = {x * 12} \n' for x in range(1, 13) ]

data.extend(table) 
# notice that all the items in the list are string that ends with '\n'
# so we can use writelines() to write all the lines at once
f = open('Data/table.txt', 'wt')
f.writelines(data)
f.close()


In [None]:
# f = open('person.dat', 'wb')
# f.write(b'Created on jan-18-2025 by Narendra\n')
# f.write(b'12 times table\n')
# f.write(b'--------------\n')
# for x in range(1, 13):
#     line = f'{x} x 12 = {x * 12}'
#     f.write(bytearray(line, encoding ='utf-8'))
#     f.write(b'\n')
# f.close()

### <a id='writing'></a>Basic file reading  
1. `read(size=-1)` reads the entire file and returns it as a single string   
       read() also takes an optional argument, an int representing the   
       number of number of characters to read.
1. `readline(size=-1)` reads one line
1. `readlines(hint=-1)` reads the entire file and returns it as a list of  
       strings. Each line is an item in the list. This also reads  
       and captures the newline character. 
       Like read() it can take an int representing the number of  
       lines to read.   
1. `tell()` give the current stream position in the file
1. `seek(cookie=-1)` moves the stream position 

In [None]:
f = open('Data/person.txt')
print(f'      first read: {f.read(10)}')            # read first 10 characters
print(f'current position: {f.tell()}')              # current position in the file
print(f'     the balance: {f.read()}')
print('\n\nresetting to the start of the file')
f.seek(0)                                           # move back to the start of the file
print(f.read())                                       # read the entire file
f.close()

In [None]:
f = open('Data/person.txt')
print(f'{f.readline(20)}')
f.close()

In [None]:
f = open('Data/person.txt')
print(f'{f.readlines()}')
f.close()

Below the the perferred way of reading a file

In [None]:
f = open('Data/person.txt')
for line in f:
    print(line.strip())  # strip() removes the trailing newline character
f.close()

### Working with Binary Files
Binary files are more compact and can be a slight deterrent to prying eyes.   
Only binary data can be written to a file.   
Reading will give binary data.   



In [None]:
# Writing binary data
data = bytes([0, 1, 2, 3, 4, 5])
with open('Data/binary.bin', 'wb') as file:
    # file.write('this is some plain text\n')           #this does not write correctly
    file.write(b'this is some binary text\n')
    file.write('this is a binary file\n'.encode('utf-8'))
    file.write(data)

In [None]:
# Reading binary data

with open('Data/binary.bin', 'rb') as file:
    binary_data = file.read()
    print(binary_data)

In [None]:
print(b'hello world')

In [None]:
with open('person.txt') as f:
    print('file contents:')
    print(f'{f.readline()}')
    print(f'{f.read()}')

### Using Different Encodings
In Python, encoding in file writing refers to the way text characters are converted into bytes before being stored in a file.

When you write to a file in text mode, Python takes your string (which internally uses Unicode) and encodes it into a sequence of bytes according to the specified character encoding scheme — like UTF-8, UTF-16, ASCII, etc.

#### Why encoding matters
Computers store files as bytes, but human-readable text is made of characters.    
Different encodings map characters to bytes differently.    
If you choose the wrong encoding when writing, the file might not display correctly when read later.

#### Common encodings
**UTF-8** (default in Python 3.9+ on most systems) universal, supports all Unicode characters.    
**ASCII** very limited (only English letters, numbers, and basic symbols).    
**UTF-16 / UTF-32** wider byte representation, often used for compatibility. 

#### Suggestion:
When you write a file with a specific encoding, read it later with the same encoding.   
UTF-8 is almost always the safest choice.

In [None]:
# Writing with specific encoding
with open('Data/utf8_example.txt', 'w', encoding='utf-8') as file:
    file.write("Some text with special characters: ñ, é, ü\n")

# Reading with specific encoding
with open('Data/utf8_example.txt', 'r', encoding='utf-8') as file:
    print(file.read())

In [None]:
#another example
with open('Data/utf8_example.txt', 'w', encoding='utf-8') as file:
    file.write("Hello, world! — こんにちは — Привет")

# Reading with specific encoding
with open('Data/utf8_example.txt', 'r', encoding='utf-8') as file:
    print(file.read())

### 1. working with CSV Files
CSV files, short for Comma-Separated Values files, are a simple and widely-used format for storing tabular data—like what you’d see in a spreadsheet.

What is a CSV file?
A CSV file is a plain text file where:

Each line represents a row of data.
Each value in the row is separated by a comma (or sometimes another delimiter like a semicolon or tab).


#### Why are CSV files important?
Here are some key reasons:

1.  Simplicity & Universality   
CSV files are easy to create, read and edit. Almost every programming language and data tool (like Excel, Google Sheets, databases and programming languages like Python, R, etc.) supports them.

1.  Lightweight Format    
Since they’re plain text, CSV files are small in size and quick to load, process or transfer.

1.  Easy Data Exchange   
They’re ideal for sharing data between different systems or platforms—especially when exporting or importing data from databases, spreadsheets, or web applications.

1.  Human-Readable    
You can open a CSV file in any text editor and understand the data without needing special software.

1.  Automation-Friendly    
CSVs are often used in data pipelines, machine learning workflows, and automated reporting systems because they’re easy to parse and manipulate programmatically.

1.  Integration with analytics tools   
Many data analysis tools (Pandas, R, Tableau, Power BI) directly read CSVs.

#### Limitations of CSV Files
-   No support for formatting (colors, formulas, multiple sheets like Excel)
-   Can’t handle complex data relationships (like a database)
-   Risk of errors if data contains commas or line breaks

### Writing to a CSV File

In [None]:
import csv

data = [
    ['Name', 'Age', 'City'],
    ['Alice', 30, 'Toronto'],
    ['Bob', 25, 'Vancouver']
]

with open('Data/people.csv', 'w', newline='') as file:
    writer = csv.writer(file)
    writer.writerows(data)


Writing to a CSV File

In [None]:
with open('Data/people.csv', 'r', newline='') as file:
    reader = csv.reader(file)
    for row in reader:
        print(row)

### 2. Working with json files

Writing to a JSON File

In [None]:
import json

data = {
    'name': 'Alice',
    'age': 30,
    'city': 'Toronto'
}

with open('Data/data.json', 'w') as file:
    json.dump(data, file, indent=4)


Reading a JSON files

In [None]:
with open('Data/data.json', 'r') as file:
    data = json.load(file)
    print(data)


## Advanced File I/O Examples in Python
Here are some more complex file operations that demonstrate practical scenarios:


### 1. Reading and Processing Large Files Efficiently


In [None]:
def process_large_file(filename):
    """Process a large file line by line without loading it all into memory"""
    with open(filename, 'r') as file:
        for i, line in enumerate(file, 1):
            # Process each line (example: count words)
            words = line.strip().split()
            print(f"Line {i}: {len(words)} words")
            # You could yield lines for streaming processing
            # yield process_line(line)

# Usage:
# process_large_file('big_data.txt')

### 2. CSV File Processing with Custom Delimiters


In [None]:
import csv

def process_csv(input_file, output_file):
    with open(input_file, 'r') as infile, open(output_file, 'w', newline='') as outfile:
        reader = csv.reader(infile, delimiter='|')
        writer = csv.writer(outfile, delimiter='\t')
        
        header = next(reader)  # Skip header
        writer.writerow([h.upper() for h in header])
        
        for row in reader:
            # Process each row (example: convert second column to uppercase)
            if len(row) >= 2:
                row[1] = row[1].upper()
            writer.writerow(row)

# Usage:
# process_csv('input.csv', 'output.tsv')

### 3. JSON File Operations with Error Handling

In [None]:
import json
from datetime import datetime

def log_to_json(filename, message, level="INFO"):
    try:
        # Try to read existing logs
        try:
            with open(filename, 'r') as file:
                logs = json.load(file)
        except (FileNotFoundError, json.JSONDecodeError):
            logs = []
        
        # Add new log entry
        logs.append({
            "timestamp": datetime.now().isoformat(),
            "level": level,
            "message": message
        })
        
        # Write back to file
        with open(filename, 'w') as file:
            json.dump(logs, file, indent=2)
            
    except Exception as e:
        print(f"Failed to write log: {e}")

# Usage:
# log_to_json('app_logs.json', 'Application started')

### 4. Binary File Operations with Struct Packing

In [None]:
import struct

def write_binary_data(filename, data):
    """Write a list of tuples as binary data"""
    with open(filename, 'wb') as file:
        for item in data:
            # Pack as 1 integer, 1 float, and 10-character string
            packed = struct.pack('if10s', item[0], item[1], item[2].encode('ascii'))
            file.write(packed)

def read_binary_data(filename):
    """Read binary data back into Python objects"""
    results = []
    with open(filename, 'rb') as file:
        while True:
            # Read 18 bytes at a time (4 for int, 4 for float, 10 for string)
            chunk = file.read(18)
            if not chunk:
                break
            # Unpack the binary data
            unpacked = struct.unpack('if10s', chunk)
            results.append((
                unpacked[0],
                unpacked[1],
                unpacked[2].decode('ascii').strip('\x00')
            ))
    return results

# Usage:
# data = [(1, 3.14, 'circle'), (2, 2.71, 'square')]
# write_binary_data('shapes.bin', data)
# print(read_binary_data('shapes.bin'))

### 5. Memory-Mapped File for Large Binary Files

In [None]:
import mmap

def search_in_large_file(filename, search_term):
    """Search for a string in a very large file using memory mapping"""
    with open(filename, 'rb') as file:
        # Memory-map the file
        with mmap.mmap(file.fileno(), 0, access=mmap.ACCESS_READ) as mm:
            # Search for the term (convert to bytes if needed)
            if isinstance(search_term, str):
                search_term = search_term.encode('utf-8')
            
            index = mm.find(search_term)
            if index != -1:
                # Show context around the found term
                start = max(0, index - 20)
                end = min(len(mm), index + len(search_term) + 20)
                print(f"Found at position {index}: {mm[start:end].decode('utf-8', errors='replace')}")

# Usage:
# search_in_large_file('large_file.bin', 'important data')

### 6. File Locking for Concurrent Access

In [None]:
import fcntl
import time

def safe_write(filename, data):
    """Write to a file with exclusive lock to prevent concurrent access issues"""
    with open(filename, 'a') as file:
        try:
            # Try to acquire exclusive lock (non-blocking)
            fcntl.flock(file, fcntl.LOCK_EX | fcntl.LOCK_NB)
            
            # Critical section
            file.write(f"{time.time()}: {data}\n")
            
            # Release lock (happens automatically when file closes)
        except BlockingIOError:
            print("File is locked by another process. Retrying...")
            time.sleep(1)
            safe_write(filename, data)  # Retry

# Usage:
# safe_write('concurrent.log', 'Processed item 42')

### 7. Temporary File Operations

In [None]:
import tempfile
import os

def process_with_temp_file(data):
    """Create and use a temporary file that auto-deletes when done"""
    with tempfile.NamedTemporaryFile(mode='w+', delete=True) as temp_file:
        # Write to the temp file
        temp_file.write(data)
        temp_file.flush()  # Ensure data is written to disk
        
        # Get the file path (just for demonstration)
        print(f"Using temp file: {temp_file.name}")
        
        # Process the temp file (example: read it back)
        temp_file.seek(0)
        processed = temp_file.read().upper()
        
        # File is automatically deleted when context exits
    return processed

# Usage:
# result = process_with_temp_file("Some temporary content")
# print(result)

### Other file I/O 
1. Processing excel, xml and sqlite files
2. Reading and writing parqet files

### Conclusion
CSV files are essential because they provide a universal, lightweight, and easy way to store and exchange structured data across different platforms. Whether you're a programmer, data analyst, or business user, CSVs make data handling simple and efficient.