# Reading CSV Files Line by Line in Python

This notebook demonstrates different methods to read and process CSV files line by line in Python. We'll cover various approaches from basic file operations to more sophisticated methods using pandas, suitable for different file sizes and use cases.

## 1. Import Required Libraries

First, let's import the necessary libraries we'll use throughout this notebook.

In [1]:
import csv
import pandas as pd
from io import StringIO
from typing import Iterator, List, Dict
import itertools

## 2. Read CSV File Using csv Module

The `csv` module is Python's built-in solution for reading CSV files. It handles different CSV formats and provides a simple interface for reading files line by line.

In [2]:
# Example CSV content
sample_csv = """date,symbol,price,shares
2025-01-01,AAPL,190.50,100
2025-01-02,MSFT,375.25,50
2025-01-03,GOOGL,140.75,75"""

# Create a CSV file-like object
csv_file = StringIO(sample_csv)

# Method 1: Using csv.reader
print("Method 1: csv.reader")
csv_file.seek(0)  # Reset file pointer to start
csv_reader = csv.reader(csv_file)
header = next(csv_reader)  # Read header row
print(f"Headers: {header}")

for row in csv_reader:
    date, symbol, price, shares = row
    print(f"Processing trade: {shares} shares of {symbol} @ ${price} on {date}")

# Method 2: Using csv.DictReader
print("\nMethod 2: csv.DictReader")
csv_file.seek(0)  # Reset file pointer
dict_reader = csv.DictReader(csv_file)

for row in dict_reader:
    print(f"Processing trade: {row['shares']} shares of {row['symbol']} @ ${row['price']} on {row['date']}")

Method 1: csv.reader
Headers: ['date', 'symbol', 'price', 'shares']
Processing trade: 100 shares of AAPL @ $190.50 on 2025-01-01
Processing trade: 50 shares of MSFT @ $375.25 on 2025-01-02
Processing trade: 75 shares of GOOGL @ $140.75 on 2025-01-03

Method 2: csv.DictReader
Processing trade: 100 shares of AAPL @ $190.50 on 2025-01-01
Processing trade: 50 shares of MSFT @ $375.25 on 2025-01-02
Processing trade: 75 shares of GOOGL @ $140.75 on 2025-01-03


## 3. Read CSV with File Object

For simple CSV files, you can also use basic file operations. This approach is useful when you need maximum control over the parsing process.

In [None]:
# Reset our sample file
csv_file.seek(0)

# Read line by line with basic file operations
header = csv_file.readline().strip()  # Read and store header
print(f"Headers: {header}")

for line in csv_file:
    # Skip empty lines
    if not line.strip():
        continue
        
    # Parse the line (split by comma and strip whitespace)
    date, symbol, price, shares = [field.strip() for field in line.split(',')]
    print(f"Processing trade: {shares} shares of {symbol} @ ${price} on {date}")

# Example of a generator function for memory-efficient reading
def read_csv_generator(file_obj, skip_header=True):
    if skip_header:
        next(file_obj)  # Skip header row
    for line in file_obj:
        if line.strip():  # Skip empty lines
            yield [field.strip() for field in line.split(',')]

# Use the generator
print("\nUsing generator function:")
csv_file.seek(0)  # Reset file pointer
for row in read_csv_generator(csv_file):
    date, symbol, price, shares = row
    print(f"Processing trade: {shares} shares of {symbol} @ ${price} on {date}")

## 4. Process CSV with Pandas

Pandas provides powerful tools for reading CSV files. While it typically loads the entire file into memory, you can use the `chunksize` parameter to read large files in chunks.

In [3]:
# Read the entire CSV into a DataFrame
csv_file.seek(0)
df = pd.read_csv(csv_file)
print("Full DataFrame:")
print(df)

# Process row by row using iterrows()
print("\nProcessing with iterrows():")
for index, row in df.iterrows():
    print(f"Processing trade: {row['shares']} shares of {row['symbol']} @ ${row['price']} on {row['date']}")

# Read in chunks (useful for large files)
print("\nProcessing with chunks:")
csv_file.seek(0)
chunk_size = 2  # Small chunk size for demonstration
for chunk in pd.read_csv(csv_file, chunksize=chunk_size):
    print("\nProcessing chunk:")
    print(chunk)

Full DataFrame:
         date symbol   price  shares
0  2025-01-01   AAPL  190.50     100
1  2025-01-02   MSFT  375.25      50
2  2025-01-03  GOOGL  140.75      75

Processing with iterrows():
Processing trade: 100 shares of AAPL @ $190.5 on 2025-01-01
Processing trade: 50 shares of MSFT @ $375.25 on 2025-01-02
Processing trade: 75 shares of GOOGL @ $140.75 on 2025-01-03

Processing with chunks:

Processing chunk:
         date symbol   price  shares
0  2025-01-01   AAPL  190.50     100
1  2025-01-02   MSFT  375.25      50

Processing chunk:
         date symbol   price  shares
2  2025-01-03  GOOGL  140.75      75


## 5. Handle Large CSV Files

When working with large CSV files, memory efficiency becomes crucial. Here are some techniques for processing large files without loading them entirely into memory.

In [None]:
# Memory-efficient CSV reader with generator
def read_csv_efficient(file_obj, chunk_size=1000):
    """Read a CSV file in chunks using a generator."""
    reader = csv.reader(file_obj)
    header = next(reader)  # Get header row
    
    # Create chunks using itertools.islice
    while True:
        chunk = list(itertools.islice(reader, chunk_size))
        if not chunk:
            break
        yield chunk

# Example usage with our sample data
print("Processing large file in chunks:")
csv_file.seek(0)
next(csv_file)  # Skip header since read_csv_efficient handles it

for chunk in read_csv_efficient(csv_file, chunk_size=2):
    print("\nProcessing chunk:")
    for row in chunk:
        date, symbol, price, shares = row
        print(f"Processing trade: {shares} shares of {symbol} @ ${price} on {date}")

# Using pandas for large files with specific columns
print("\nReading specific columns with pandas:")
csv_file.seek(0)
selected_columns = ['date', 'symbol']  # Only read these columns
for chunk in pd.read_csv(csv_file, 
                        usecols=selected_columns,
                        chunksize=2):
    print("\nChunk with selected columns:")
    print(chunk)

### Summary

We've covered several methods to read CSV files line by line:

1. Using `csv.reader` and `csv.DictReader` from the built-in `csv` module
2. Basic file operations with `split()` and generators
3. Pandas methods including `iterrows()` and chunked reading
4. Memory-efficient techniques for large files

Choose the method that best fits your needs:
- Use `csv` module for simple CSV processing
- Use basic file operations for maximum control
- Use pandas for data analysis and transformation
- Use generators and chunks for large files