# os module – File & Directory Operations

**Q:1 Create, Read, Rename, Remove files and folder**

**Q:1 Get the current working directory and navigate to a sibling directory**

**Q:2 Recursively list all files in a directory using only os.**

**Q:3 Check if a given path is a file, directory, or doesn't exist.**

**Q:4 Create a deeply nested folder structure like**

**Q:5 Delete all empty directories from a given folder tree.**

**Q:6 Count the number of .txt files in a directory using os.listdir().**

**Q:7 Move files from one folder to another, creating the destination if needed.**

**Q:8 Rename all .log files to .log.bak within a folder.**

**Q:9 Print the total size of all files in a directory in MB.**

**Q:10 Print the directory tree with indentation (like the tree command).**

**Q:11 Write a function that synchronizes the structure of two directory trees (mirror mode).**

**Q:12 Implement a safe folder deletion function that first moves the folder to a Trash directory.**

**Q13: Find and print the most recently modified file in a directory recursively.**

**Q:14 Generate a directory report (file count, total size, subfolders) in JSON format.**

**Q:15 Track changes (additions/removals) in a directory over time using file snapshots.**



In [None]:
**Q:1 Create, Read, Rename, Remove files and folder**

In [73]:
import os 
os.mkdir('my_folder1') #creating a folder 



In [5]:
with open('my_folder1/sample.txt' , 'w') as f:
    f.write("hello world")        #creatig a sample.txt file in my_folder 



In [6]:
with open('my_folder1/sample.txt' , 'r') as f:
    content = f.read()        #reading the contents of sample.txt file of my_folder
    print(content)

hello world


In [11]:
os.rename('my_folder1/rename.txt' , 'my_folder1/new_file.txt') #renaming file 

In [12]:
os.rename('my_folder1' , 'new_folder')      #renaming folder 

In [13]:
with open('new_folder/new_file.txt' , 'r') as f:
    content = f.read()                                #checking by using new names of the file and folder 
    print(content)

hello world


In [14]:
os.remove('new_folder/new_file.txt')   # deleting file 

In [None]:
os.rmdir('new_folder')         # only works to remove folder that is empty 

In [20]:
#use shutil to remove a non-empty folder
#for creating a dummy file and folder 
os.mkdir('folder')

In [22]:
with open('folder/raw.txt' , 'w') as f: f.write('Hello')

In [23]:
with open('folder/raw.txt' , 'r') as f: print(f.read())

Hello


In [24]:
#removing non-empty folder using shutil 
import shutil
shutil.rmtree('folder')

In [None]:
**Q:1 Get the current working directory and navigate to a sibling directory**

In [29]:
import os 
current_directory = os.getcwd()      #printing the current directory 
print("current directory" , current_directory)

current directory C:\Users\wastech\Downloads


In [35]:
sibling = 'Documents'
parent_directory = os.path.dirname(current_directory)   #this will return the path C:\Users|wastech
sibling_path = os.path.join(parent_directory , sibling)   #this will join the path of parent(C:\Users|wastech) and sibling(Documents) retruning the sibling
print(sibling_path)

C:\Users\wastech\Documents


In [None]:
**Q:2 Recursively list all files in a directory using only os.**

In [53]:
def list_of_file(directory):     
 for dirpath , dirnames , filenames in os.walk(directory):  
    for filename in filenames:
        print(os.path.join(dirpath , filename))

start = 'bw'
print("list of all files in directory" , list_of_file(start))


   '''- dirpath   → the path of that current folder - dirnames  → list of subfolders inside it
                                      - filenames → list of files inside it'''

bw\bw videos\v (1).mp4
bw\bw videos\v (10).mp4
bw\bw videos\v (11).mp4
bw\bw videos\v (12).mp4
bw\bw videos\v (2).mp4
bw\bw videos\v (3).mp4
bw\bw videos\v (4).mp4
bw\bw videos\v (5).mp4
bw\bw videos\v (6).mp4
bw\bw videos\v (7).mp4
bw\bw videos\v (8).mp4
bw\bw videos\v (9).mp4
list of all files in directory None


In [None]:
**Q:3 Check if a given path is a file, directory, or doesn't exist.**

In [47]:
def check(content):
    if os.path.isfile(content):
        print(f"'{content}' is a file")
    elif os.path.isdir(content):
        print(f"'{content}' is a director")
    elif not os.path.exists(content):
        print(f"'{content}' not exist")
    else:
        print(f"'{content} exist but a regular file or directory")

path = 'numbers'       
print('check is a file , directory or doesnt exist' , check(path))     #here i am in directory downloads

'numbers' is a director
check is a file , directory or doesnt exist None


In [None]:
**Q:4 Create a deeply nested folder structure like**

In [62]:
nested_folders = os.path.join("Folder1" , "folder1.1" , "folder1.2" , "folder1.3" , "folder1.4")
os.makedirs(nested_folders , exist_ok = True)
print("nested folders" , nested_folders)

nested folders Folder1\folder1.1\folder1.2\folder1.3\folder1.4


In [None]:
**Q:5 Delete all empty directories from a given folder tree.**

In [64]:
def delete(directory):
    for dirpath , dirnames , filenames in os.walk(directory , topdown = False):
        if not dirnames and not filenames:
            os.rmdir(dirpath)
            print(f"Deleted empty folders {dirpath}")

empty_directories = 'Folder1'
print("deleting all empty directories " , delete(empty_directories))

Deleted empty folders Folder1\folder1.1\folder1.2\folder1.3
deleting all empty directories  None


In [None]:
**Q:6 Count the number of .txt files in a directory using os.listdir().**

In [186]:
import os
directory = 'C:/Users/wastech/Documents/pf'

# Count .txt files
txt_count = len([f for f in os.listdir(directory) if f.endswith('.txt')])

print(f"Number of .txt files: {txt_count}")

Number of .txt files: 8



# glob module – Pattern Matching

**Q:1 List all .csv and .json files in the current directory.**

**Q:2 Recursively find all .jpg files in nested folders.**

**Q:3 Use glob to count files grouped by extension.**

**Q:4 Find files with names matching pattern report_*.txt.**

**Q:5 Replace spaces with underscores in filenames found via glob.**

**Q:6 Return all files with a date in the format 2025-06-*.log.**

**Q:7 List all files with numeric names only (e.g., 123.txt).**

**Q:8 Use glob to sort files by last modified time.**

**Q:9 Find all .txt files larger than 100KB using glob and os.**

**Q:10 Batch rename files with a custom suffix _archived.**

**Q:11 Create a utility that indexes all media files and stores the paths in a SQLite DB.**

**Q:12 Find duplicate filenames (regardless of path) across a directory tree.**

**Q:13 Generate a file manifest with relative paths and hash (MD5) of contents.**

**Q14: Use glob patterns dynamically to extract weekly reports (e.g., week_01.json, week_02.json).**

**Q:15 Write a recursive file crawler that ignores folders listed in a .ignore file.**


In [None]:
Q:1 List all .csv and .json files in the current directory.

In [86]:
import glob
csv = glob.glob("*.csv")
json = glob.glob("*json")
all_files = csv+json
print("all csv and json files ")
for i in all_files:
    print(i)


all csv and json files 
imdb_top_1000.csv
LCWU - Student Portal Accounts.csv
sentiment_dataset_150.csv


In [None]:
Q:2 Recursively find all .jpg files in nested folders.

In [93]:
#jpg = glob.glob("**/*.jpg" , recursive= True)    ... for searching the jpg files 
import glob 
import os 
path = "C:/Users/wastech/Documents/MSCS"
jpeg = glob.glob(os.path.join(path , "**" , "*jpeg") , recursive =True)     #search inside the target folder
print("all jpg files")
for i in jpeg:
    print(i)

all jpg files
C:/Users/wastech/Documents/MSCS\challan.jpeg
C:/Users/wastech/Documents/MSCS\WhatsApp Image 2024-09-16 at 8.19.08 PM (1).jpeg


In [None]:
Q:3 Use glob to count files grouped by extension.

In [99]:
import glob 
import os 
path =  "C:/Users/wastech/Documents/MSCS"
files = glob.glob(os.path.join(path , "**" ,"*") , recursive = True)
count = 0
for i in files:
    count +=1
print(count)

82


In [124]:
#Q4
import os 
os.chdir(r'C:\Users\wastech\Documents\pf')
import glob 
matching_files = glob.glob('report_*.txt')
for i in matching_files:
    print(i)

In [126]:
#Q5
import os 
os.chdir(r'C:\Users\wastech\Documents\pf')
import glob 
filename = 'New Text'
for filename in glob.glob(".txt"):
    if " " in run:
        new_filename = filename.replace(" " ,"_")
        os.rename(filename , new_filename)
        print(f"Renamed: {filename} to {new_filename}")

In [129]:
#**Q:6 Return all files with a date in the format 2025-06-*.log.**
import glob 
matching_file =  glob.glob('2025-06-*.log')
for i in matching_file:
    print(i)

In [140]:
#**Q:7 List all files with numeric names only (e.g., 123.txt).**
import glob
import os

files = glob.glob("*.txt") + glob.glob("*.txt.txt")

for f in files:
    name = os.path.splitext(os.path.splitext(f)[0])[0]
    if name.isdigit():
        print(f" Numeric file: {f}")

 Numeric file: 12.txt.txt
 Numeric file: 98.txt.txt
 Numeric file: 12.txt.txt
 Numeric file: 98.txt.txt


In [141]:
#**Q:8 Use glob to sort files by last modified time.**
import glob 
import os 
files = glob.glob("*.txt")
files.sort(key=os.path.getmtime)
for i in files:
    print(i)

extra.txt
raw.txt
New Text.txt
12.txt.txt
98.txt.txt


In [144]:
#**Q:9 Find all .txt files larger than 100KB using glob and os.**
import glob 
import os 
files = glob.glob("*.txt")
for i in files:
    if os.path.getsize(i) > 100 * 1024:   #100kb = 102400 bytes
        print(f"{i} is greater than 100kb ")


# File Handling – Text & Binary Files

**Q:1 Count the number of lines in a file without loading it entirely.**

**Q:2 Replace a specific word in a file and save it to a new file.**

**Q:3 Append data to an existing file with a timestamp.**

**Q:4 Read and print the first 10 lines of a file.**

**Q:5 Write a list of dictionaries as CSV manually (without csv module).**

**Q:6 Copy a binary file in chunks (e.g., image or PDF).**

**Q:7 Write a function to compare two files and print the differing lines.**

**Q:8 Safely read a file that may not exist using try-except.**

**Q:9 Read a file using a specific encoding (e.g., UTF-16).**

**Q:10 Detect and skip empty lines when reading a file.**

**Q:11 Implement a log rotation mechanism: create log.txt, log_1.txt, etc. when size exceeds 1MB.**

**Q:12 Build a file-based key-value store using JSON per line.**

**Q:13 Implement version control: on every write, back up the previous version with a timestamp.**

**Q:14 Create a reader that detects encoding using chardet or fallback encoding.**

**Q:15 Convert a large log file into separate files per date based on timestamps in each line.**


In [158]:
line_count = 0 
with open("extra.txt" , "r") as f:
    for i in file:
        line_count +=1
    print(f"total number of lines {line_count}")
    

total number of lines 5


In [160]:
old_word ='iqra'
new_word = 'world'
with open("extra.txt" , 'r') as f:
    content =f.read()
updation = content.replace(old_word , new_word)
with open("new_file.txt" , 'w') as file:
    file.write(updation)

In [None]:
#q3 
from datetime import datetime
new_data = "This is new log data."
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
with open('log.txt', 'a') as file:
    file.write(f"[{timestamp}] {new_data}\n")

In [None]:
#q4 
with open('myfile.txt', 'r') as file:
    for i in range(10):
        line = file.readline()
        if not line:  
            break
        print(line.strip()) 


In [None]:
#q5
data = [
    {"name": "Ali", "age": 20},
    {"name": "Sara", "age": 22},
    {"name": "Zain", "age": 19}
]
with open('output.csv', 'w') as file:
    headers = data[0].keys()
    file.write(','.join(headers) + '\n')
    for item in data:
        values = [str(item[key]) for key in headers]
        file.write(','.join(values) + '\n')

In [None]:
#q6
chunk_size = 1024  # 1 KB
with open('source_file.pdf', 'rb') as src:
    with open('copy_file.pdf', 'wb') as dst:
        while True:
            chunk = src.read(chunk_size)
            if not chunk:
                break
            dst.write(chunk)

print("Binary file copied in chunks successfully.")



# JSON Handling – json module

**Q:1 Load JSON from a file and print a nested field (e.g., data["user"]["name"]).**

**Q:2 Write a Python dict to a file with pretty formatting.**

**Q:3 Merge multiple JSON objects into a single file.**

**Q:4 Convert a JSON array into CSV format.**

**Q:5 Update a nested key inside a loaded JSON.**

**Q:6 Create a function to pretty-print JSON from string input.**

**Q:7 Safely load malformed JSON with exception handling.**

**Q:8 Remove a key from each item in a JSON list and re-save.**

**Q:9 Convert an object with datetime to a JSON string using a custom encoder.**

**Q:10 Search for all values associated with a key in nested JSON.**

**Q:11 Write a function to flatten deeply nested JSON into a flat dictionary.**

**Q:12 Build a recursive JSON validator for required schema keys.**

**Q:13 Convert a nested JSON into a pandas DataFrame with normalized columns.**

**Q:14 Create a diff tool that compares two JSON files and shows key-level changes.**

**Q:15 Handle and fix trailing commas in malformed JSON before parsing.**



In [None]:
#q1
import json
with open('data.json', 'r') as file:
    data = json.load(file)
print(data["user"]["name"])

In [None]:
data1 = {
    "name": "Alice",
    "age": 30,
    "skills": ["Python", "Data Analysis", "Machine Learning"]
}
with open('output.json', 'w') as f:       # Write to file with pretty formatting
    json.dump(data, f, indent=4)

print("Dictionary written to 'output.json' with pretty formatting.")

In [None]:
with open('file1.json', 'r') as f1:
    data1 = json.load(f1)
with open('file2.json', 'r') as f2:
    data2 = json.load(f2)
merged_data = [data1, data2]
with open('merged.json', 'w') as f_out:
    json.dump(merged_data, f_out, indent=4)

In [None]:
#q5 

with open('data.json', 'r') as file:
    data = json.load(file)
data['person']['address']['city'] = 'Karachi'   # For example, change the "city" inside "address"
with open('data.json', 'w') as file:
    json.dump(data, file, indent=4)


In [None]:
#q6
def pretty_print_json(json_string):
    # Convert string to Python dictionary
    data = json.loads(json_string)

    # Pretty-print the JSON with indent
    pretty_json = json.dumps(data, indent=4)
    print(pretty_json)

# Example usage
json_input = '{"name": "Ali", "age": 20, "city": "Lahore"}'
pretty_print_json(json_input)

In [None]:
#q7 
def load_json_safely(json_string):
    try:
        data = json.loads(json_string)
        print("JSON loaded successfully!")
        return data
    except json.JSONDecodeError as e:
        print("Error: Malformed JSON!")
        print("Details:", e)
        return None
# Example: Malformed JSON (missing quotes around keys)
bad_json = '{name: "Ali", age: 20}'
# Try to load it
load_json_safely(bad_json)

    
# Regular Expressions – re module

**Q:1 Extract email addresses from a string using re.findall().**

**Q:2 Validate a US phone number using regex.**

**Q:3 Extract hashtags from a tweet-like string.**

**Q:4 Replace all numbers with # in a paragraph.**

**Q:5 Match filenames with extension .pdf, .docx, or .xlsx.**

**Q:6 Split a paragraph into sentences using regex.**

**Q:7 Match a date in the format DD-MM-YYYY or YYYY/MM/DD.**

**Q:8 Extract quoted strings from text (e.g., "like this").**

**Q:9 Clean a text by removing special characters except alphanumerics and spaces.**

**Q:10 Capture repeated words like the the, is is in a sentence.**

**Q:11 Write a regex that extracts values from key-value pairs (key: value) even if keys contain spaces.**

**Q:12 Extract nested parentheses using recursive regex (advanced feature).**

**Q:13 Create a regex to detect and fix malformed URLs in a text block.**

**Q:14 Build a pattern to extract address-like strings (e.g., 123 Main St, City, ZIP).**

**Q:15 Tokenize a log line into timestamp, level, and message using regex groups.**

In [167]:
#Q:1 Extract email addresses from a string using re.findall().
import re

text = "Contact us at support@example.com ."

# Define a regex pattern for email addresses
pattern = r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,7}\b'

# Extract all email addresses
emails = re.findall(pattern, text)

print(emails)

['support@example.com']


In [169]:
import re

# Sample phone numbers
numbers = [
    "(123) 456-7890",
    "123-456-7890",
    "123.456.7890",
    "1234567890",
    "+1 123 456 7890",
    "456-7890"
]

# Regex pattern for US phone numbers
pattern = re.compile(r'^(\+1\s?)?(\(?\d{3}\)?[\s.-]?)?\d{3}[\s.-]?\d{4}$')

# Validate each number
for number in numbers:
    if pattern.match(number):
        print(f" Valid:   {number}")
    else:
        print(f" Invalid: {number}")

 Valid:   (123) 456-7890
 Valid:   123-456-7890
 Valid:   123.456.7890
 Valid:   1234567890
 Valid:   +1 123 456 7890
 Valid:   456-7890


In [170]:
#Q:3 Extract hashtags from a tweet-like string
tweet = "Loving the vibes at the beach! #sunset #relaxation #vacaymode"

# Regex pattern to match hashtags
hashtags = re.findall(r'#(\w+)', tweet)

print(hashtags)

['sunset', 'relaxation', 'vacaymode']


In [None]:
#Q:4 Replace all numbers with # in a paragraph.

text = "My phone number is 123-456-7890 and I was born in 1995."

# Replace all digits with '#'
masked = re.sub(r'\d', '#', text)

print(masked)

In [174]:
#q5Q:5 Match filenames with extension .pdf, .docx, or .xlsx.
import re

filenames = [
    "report.pdf", "summary.docx", "data.xlsx",
    "image.png", "notes.txt", "presentation.pptx"
]

pattern = r'^.+\.(pdf|docx|xlsx)$' 

matches = [f for f in filenames if re.match(pattern, f, re.IGNORECASE)]

print(matches)

['report.pdf', 'summary.docx', 'data.xlsx']


In [175]:
# **Q:6 Split a paragraph into sentences using regex.**
text = "Hello there! How are you doing today? I hope everything's great. Let's grab coffee."
# Split on punctuation followed by space
sentences = re.split(r'(?<=[.!?])\s+', text)
print(sentences)

['Hello there!', 'How are you doing today?', "I hope everything's great.", "Let's grab coffee."]


In [176]:
#**Q:7 Match a date in the format DD-MM-YYYY or YYYY/MM/DD.**
text = "Today's date is 25-06-2025 and the report was filed on 2025/06/24."
# Regex pattern for both formats
pattern = r'\b(\d{2}-\d{2}-\d{4}|\d{4}/\d{2}/\d{2})\b'

# Find all matching dates
dates = re.findall(pattern, text)

print(dates)

['25-06-2025', '2025/06/24']


In [177]:
#**Q:8 Extract quoted strings from text (e.g., "like this").**
text = 'She said, "Hello there!" and then added, "How are you?"'

# Extract text inside double quotes
quotes = re.findall(r'"(.*?)"', text)

print(quotes)

['Hello there!', 'How are you?']


In [178]:
#**Q:9 Clean a text by removing special characters except alphanumerics and spaces.**
text = "Hello! This is @example #1: Clean it up, please :)"

# Remove everything except letters, numbers, and spaces
cleaned = re.sub(r'[^a-zA-Z0-9\s]', '', text)

print(cleaned)

Hello This is example 1 Clean it up please 


In [None]:
#**Q:10 Capture repeated words like the the, is is in a sentence.**

In [None]:
#**Q:11 Write a regex that extracts values from key-value pairs (key: value) even if keys contain spaces.**