#### File I/O Basics

Data Types :

1. Text : Unicode chars (e.g '12345' in UTF-8/ASCII)

2. Binary : Raw Bytes (e.g number : 12345)

File Types :

1. Text Files : Human readable (e.g source code,config files)

2. Binary Files : Non readable (e.g images,multi media)

Process :

1. Open : Connects program to a file
2. Read/Write : Handles data based on the type
3. Close : Completes Operation ,frees resources



In [None]:
Writing to a File --> '.txt' extension (notepad)

In [1]:
# Case 1 file not present
f = open('sample.txt','w')
f.write('Hello world')
f.close()

# Creates file in the current directory

In [2]:
# Error file closed
f.write('Hello')

ValueError: I/O operation on closed file.

In [3]:
# write multiline strings to a file

f = open('sample1.txt','w')
f.write('Hello world')
f.write('\n how are you?')
f.close()

In [4]:
# Case 2 file overwrite in write mode

f = open('sample.txt','w')
f.write('salman khan')
f.close()

# Note : Opening in w mode replaces all existing content in sample.txt

#### How open() works in Python

Handles file I/O;interacts with disk files

Example : f = open('sample.txt','w') : Opens sample.txt in write mode

File Access & RAM interaction : File loaded from disk (ROM) to RAM buffer

File Operation and Modes : Modes (e.g 'w' for write) determine file interactions (f.write('Salman') writes to RAM)

Data integrity : f.close() saves buffer changes back to disk

In [None]:
'open()' --> File in RAM

'write()' --> Modify RAM buffer

'close()' --> Save to disk

Source : Python Documentation

In [5]:
# Problem with 'w' mode : Overwrite file content
# To preserve existing content ,use 'a' append mode

f = open('sample1.txt','a')
f.write('\n I am fine')
f.close()

In [7]:
# Write multiple lines to a File

L = ['hello\n','hi\n','how are you\n','I am fine']

f = open('sample.txt','w')
f.writelines(L) # Efficiently writes multiple lines of text.
f.close()


When you use f.close() to close a file,it serves 2 main purpose

1. Memory management :

- Release RAM resources
- Crucial for large/multiple files

2. Security :

- Closes file buffers
- Prevents unauthorized access

Always use f.close() after file operations,manages memory and security

#### Reading from files

1. read() : Reads all contents of a file in a single string.Efficient for small files
Pros : simple Cons : Memory heavy for large files

2. readLine() : Reads one line at a time.Good for large files and sequential processing
Pros : Memory efficient Cons : Slower for full content access

In [8]:
# read() usage

f = open('sample.txt','r')
s = f.read()
print(s)
f.close()

# Note : File I/O handles as a string
# txt file processes data as a text only,no other formats

hello
hi
how are you
I am fine


In [9]:
# read upto n chars
f = open('sample.txt','r')
s = f.read(10)
print(s)
f.close()

hello
hi
h


In [11]:
# using readLine()
f = open('sample.txt','r')
print(f.readline(),end='') # Avoid auto new line
print(f.readline(),end='')
f.close()

hello
hi


In [None]:
'read()' : Method

smaller Files --> loads entire content

Immediate access --> full data available

Memory use --> risky for large files

'readline()' : Method

large files --> process line by line

Memory-efficient --> avoids full file load

Handles datasets --> prevents overflow

In [13]:
# Counts lines in file efficiently --> Avoid readline() per line;use custom code for efficiency

f = open('sample.txt','r')
while True:
    data = f.readline()
    if data == '':
        break
    else:
        print(data,end='')
f.close()

hello
hi
how are you
I am fine

#### ContextManager with()

Efficient resource management(e.g files)

with ensures auto cleanup,no manual file close needed

Purpose of 'with' statement :

1. File management : Handles file operations (read/write)

2. Resource Release : Auto closes files,freeing system resources

Avoids :

1. Memory leaks : Manual closure prevents tasks

2. File locking : Prevents locking issues

Benefits :

1. Automated cleanup : Ensures auto-closure of files
2. Exception handling : Closes files if exception occurs
3. Readability : Clarifies file access scopes
4. Reliability : Reduces bugs,ensures robust resource management




In [14]:
# with statement

with open('sample1.txt','w') as f:
    f.write('salman bhai')

In [15]:
f.write('hello')

ValueError: I/O operation on closed file.

In [17]:
# f.readline()

with open('sample.txt','r') as f:
    print(f.read(10)) # First 10 chars
    print(f.read(10)) # Next 10 chars
    print(f.read(10)) # Next 10 chars
    print(f.read(10)) # Next 10 chars
    # Each print reads next 10 chars sequentially

# Buffering tracks processed characters ;read() resumes the buffer

hello
hi
h
ow are you

I am fine



#### File processing Strategy for large files

Crucial for files --> RAM

Chunk-based processing

- process in chunks,not all at once. e.g 10 GB file,8 GB RAM --> 2000 chrs/chunk

Advantages :

1. Memory efficiency : RAM used for 1 chunk only
2. Scalability : Handles files > RAM
3. Performance : avoids system slowdowns

In [18]:
# Purpose : save dataset to a file(avoid memory load)

big_L = ['hello world' for i in  range(1,1000)]

with open('big.txt','w') as f:
    f.writelines(big_L)

In [19]:
with open('big.txt','r') as f:
    chunk_size = 10
    while (len(f.read(chunk_size)) > 0):
        print(f.read(chunk_size),end='***')
        f.read(chunk_size) # skip to next chunk

# Handles large files,processes in chunks,avoiding memory overload
# Libraries like pandas,keras use chunk based data processing

dhello wor***orldhello ***o worldhel***ello world***ldhello wo***worldhello***lo worldhe***hello worl***rldhello w*** worldhell***llo worldh***dhello wor***orldhello ***o worldhel***ello world***ldhello wo***worldhello***lo worldhe***hello worl***rldhello w*** worldhell***llo worldh***dhello wor***orldhello ***o worldhel***ello world***ldhello wo***worldhello***lo worldhe***hello worl***rldhello w*** worldhell***llo worldh***dhello wor***orldhello ***o worldhel***ello world***ldhello wo***worldhello***lo worldhe***hello worl***rldhello w*** worldhell***llo worldh***dhello wor***orldhello ***o worldhel***ello world***ldhello wo***worldhello***lo worldhe***hello worl***rldhello w*** worldhell***llo worldh***dhello wor***orldhello ***o worldhel***ello world***ldhello wo***worldhello***lo worldhe***hello worl***rldhello w*** worldhell***llo worldh***dhello wor***orldhello ***o worldhel***ello world***ldhello wo***worldhello***lo worldhe***hello worl***rldhello w*** worldhell***llo worldh**

In [21]:
# seek and tell function

with open('sample.txt','r') as f:
    f.seek(15)  # Move to 15th char
    print(f.read(10)) # Read 10 chars
    print(f.tell()) # Position after read
    print(f.read(10)) # Read next 10 chars
    print(f.tell()) # New position

e you
I am
25
 fine
30


In [None]:
'seek' --> Set desired location within the context
        --> Like YouTube red line for precise navigation
        --> Moves to specified points in system

'tell'  --> Reveals current position/status
        --> Acts as a marker indicating present status
        --> Provides feedback without changing position

# seek : navigates to points (You tube red line analogy)
# tell : shows current position/status

In [22]:
# seek during write

with open('sample.txt','w') as f:
    f.write('Hello')
    f.seek(0)   # cursor to start
    f.write('Xa')   # Overwrite 'He' --> 'Xa'

#### Limitations of Text Mode

* Binary files : Incompatible with non text data (e.g image,binaries)
* Data Type Efficiency : Inefficient for non text data types(Integers,floats,lists and tuples)

Binary files :

* Contain non textual binary data
* Text Mode cannot process these effectively

Non textual data:

* Incompatible with Text Mode
* Requires specific methods for management

Structured Data:

* Struggles with types like integers,floats,lists,tuples
* Needs specialised handling

In [2]:
# Read Binary file

with open('screenshots1.png','r') as f:
    f.read()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x89 in position 0: invalid start byte

In [4]:
# Binary File I/O

with open('screenshots1.png','rb') as f: # read binary
    with open('screenshots1_copy.png','wb') as wf: # write binary
        wf.write(f.read())

In [5]:
# working with a large binary file

In [6]:
# Working with different data types

with open('sample.txt','w') as f:
    f.write(str(5))

# Error : Text must be unicode ;ensure data is string

In [7]:
with open('sample.txt','w') as f:
    f.write('5')

In [9]:
with open('sample.txt','r') as f:
    print(int(f.read()) + 5) # convert read output to int

10


In [11]:
# More complex data

d = {
    'name':'Girish',
    'age':30,
    'gender':'male'
}

with open('sample.txt','w') as f:
    f.write(str(d))

In [13]:
with open('sample.txt','r') as f:
    print(dict(f.read()))

ValueError: dictionary update sequence element #0 has length 1; 2 is required

In [None]:
# Text based limitations for Complex data storage:

1. Storage --> Plain Text Files ideal for simple textual data.
                Complex Data (e.g Python Dicts) contains structured data with key-value pairs.

2. Conversion   --> Saving Dicts with 'write()' converts dicts to strings.
                    '{'name':'John','age':30}' --> "{'name':'John','age':30}"
                    This flattening loses structure and format.

3. Retrieval    --> Retrieval returns as a string;requires parsing to reconstruct original dict.
                    Error-prone parsing can introduce errors

# Note : for simple data use text files;for complex data use serialization libraries or binary formats


#### JSON Serialization & Deserialization

Serialization

Convert Python data --> JSON

json.dumps()

Human readable & machine parsable

Deserialization:

Convert JSON --> Python

json.loads()

Manipulate JSON data in Python

#### What is JSON ?

Java script object notation

Widely adopted in web apps,API,data interchange

Simple syntax,supports key value pairs,arrays,nested objects

{
    
    "d": {
        
        "results": [
            
            {
                
                "_metadata": {
                    
                    "type":"Employee details,Employee"
                
                },
                
                "UserID":"E12012",
                
                "RoleCode":"35"
            
            }
        
        ]
    
    }

}

JSON is a widely used text format across languages

In [14]:
# JSON serialization

# List to JSON

import json

L = [1,2,3,4]

with open('demo.json','w') as f:
    json.dump(L,f) # serialize L to demo.json

In [15]:
# Dict to JSON

d = {
    'name':'Girish',
    'age':49,
    'gender':'male'
}

with open('demo.json','w') as f:
    json.dump(d,f,indent=4) # serialize dict d with indentation

In [16]:
# Deserialization

import json
with open('demo.json','r') as f:
    d = json.load(f)
    print(d)
    print(type(d))

{'name': 'Girish', 'age': 49, 'gender': 'male'}
<class 'dict'>


Serialization & Deserialization : Convert complex data(lists,dicts,2D dicts,tuples,sets) to/from JSON

Serialization : Complex --> JSON (for Storage)

Deserialzation : JSON --> Original (for retrieval)

Handles complex data efficiently,overcomming string based limitations

In [17]:
# Serialize/Deserialize Tuple

import json
t = (1,2,3,4,5)

with open('demo.json','w') as f:
    json.dump(t,f)

In [None]:
#Note Serialization/Deserialization

Serialize Tuple --> list (using 'dump')

Deserialize     --> List (not tuple)

Need Tuple later --> Explicit conversion required

In [18]:
# serialize/De serialize nested dict

d = {
    'student':'girish',
    'marks':[23,14,34,45,56]
}

with open('demo.json','w') as f:
    json.dump(d,f)

#### Serializing and Deserializing custom Objects

In [19]:
class Person:

    def __init__(self,fname,lname,age,gender):
        self.fname = fname
        self.lname = lname
        self.age = age
        self.gender = gender

# Print format:
# Name: {fname} {lname}
# Age: {age}
# Gender: {gender}

In [20]:
person = Person('Girish','Mohite',49,'Male')

Python serializes built in types natively (e.g dicts)

Custom classes Needs custom serialization (Explicit)

In [21]:
# string Representation

import json
def show_object(person):
    if isinstance(person,Person):
        return "{} {} age -> {} gender -> {}".format(person.fname,person.lname,person.age,person.gender)

with open('demo.json','w') as f:
    json.dump(person,f,default=show_object)