# os module – File & Directory Operations

**Q:1 Create, Read, Rename, Remove files and folder**

**Q:1 Get the current working directory and navigate to a sibling directory**

**Q:2 Recursively list all files in a directory using only os.**

**Q:3 Check if a given path is a file, directory, or doesn't exist.**

**Q:4 Create a deeply nested folder structure like**

**Q:5 Delete all empty directories from a given folder tree.**

**Q:6 Count the number of .txt files in a directory using os.listdir().**

**Q:7 Move files from one folder to another, creating the destination if needed.**

**Q:8 Rename all .log files to .log.bak within a folder.**

**Q:9 Print the total size of all files in a directory in MB.**

**Q:10 Print the directory tree with indentation (like the tree command).**

**Q:11 Write a function that synchronizes the structure of two directory trees (mirror mode).**

**Q:12 Implement a safe folder deletion function that first moves the folder to a Trash directory.**

**Q13: Find and print the most recently modified file in a directory recursively.**

**Q:14 Generate a directory report (file count, total size, subfolders) in JSON format.**

**Q:15 Track changes (additions/removals) in a directory over time using file snapshots.**



In [27]:
import os 
os.makedirs('os-working', exist_ok=True)
with open(os.path.join('os-working', 'file.txt'), 'w') as f:
    f.write('Hello, world!')
import os
os.rename(os.path.join('os-working', 'file.txt'), os.path.join('os-working', 'renamed.txt'))


In [12]:
os.remove(os.path.join('os-working', 'renamed.txt'))

In [19]:
cwd = os.getcwd()
print(f"Current working directory: {cwd}")

Current working directory: c:\My-Data\2nd-week-tasks


In [20]:
parent = os.path.dirname(cwd)
print("Parent directory path:", parent)
sibling = os.path.join(parent, 'Ghulam_Ahmad')
print("Sibling directory path:", sibling)

Parent directory path: c:\My-Data
Sibling directory path: c:\My-Data\Ghulam_Ahmad


In [30]:
path = "os-working/renamed.txt"
if os.path.isfile(path):
    print("This is a file.")
elif os.path.isdir(path):
    print("This is a directory.")
else:
    print("This is neither a file nor a directory.")


This is a file.


In [None]:
os.makedirs('os-working/folder1/folder2', exist_ok=True)
print("Nested Folders created successfully.")

Nested Folders created successfully.


In [None]:
for root, dirs, files in os.walk('.', topdown=False):
    for d in dirs:
        full_path = os.path.join(root, d)
        print("Checking directory:", full_path)
        if not os.listdir(full_path):
            os.rmdir(full_path)
            print("Removed empty directory:", full_path)

Checking directory: .\os-working\folder1\folder2
Removed empty directory: .\os-working\folder1\folder2
Checking directory: .\os-working\folder1
Removed empty directory: .\os-working\folder1
Checking directory: .\os-working


In [65]:
count = 0
for file in os.listdir('os-working'):
        if file.endswith('.txt'):
            count += 1
print(f"Number of .txt files in 'os-working': {count}")


Number of .txt files in 'os-working': 1


In [None]:
import shutil
source_folder = 'os-working'
for f in os.listdir(source_folder):
    src_file = os.path.join(source_folder, f)
    dst_file = os.path.join('os-2-working', f)
    shutil.move(src_file, dst_file)
print("Files moved to dest_folder.")

Files moved to dest_folder.


In [75]:
for f in os.listdir('os-2-working'):
    if f.endswith('.log'):
        os.rename( os.path.join('os-2-working', f), os.path.join('os-2-working', f + '.bak'))
        print(f"Renamed {f} to {f + '.bak'}")

Renamed main.log to main.log.bak


In [81]:
for f in os.listdir('.'):
    if os.path.isfile(os.path.join('.', f)):
        file_size = os.path.getsize(os.path.join('.', f))/ 1024*1024   # Convert to MB
        print(f"Size of {f}: {file_size:.2f} MB")

Size of 1_result.jpg: 108478.00 MB
Size of Filing.ipynb: 23864.00 MB
Size of Numpy.ipynb: 36186.00 MB
Size of Rain.txt: 514.00 MB
Size of task-2.txt: 475.00 MB


In [85]:
# Print the directory tree with indentation (like the tree command).
def print_tree(path, indent=''):
    for item in os.listdir(path):
        full_path = os.path.join(path, item)
        print(indent + '|-- ' + item)
        if os.path.isdir(full_path):
            print_tree(full_path, indent + '    ')
print_tree('.')

|-- 1_result.jpg
|-- Filing.ipynb
|-- Numpy.ipynb
|-- os-2-working
    |-- main.log.bak
    |-- main.py
    |-- renamed.txt
|-- os-working
|-- Rain.txt
|-- task-2.txt



# glob module – Pattern Matching

**Q:1 List all .csv and .json files in the current directory.**

**Q:2 Recursively find all .jpg files in nested folders.**

**Q:3 Use glob to count files grouped by extension.**

**Q:4 Find files with names matching pattern report_*.txt.**

**Q:5 Replace spaces with underscores in filenames found via glob.**

**Q:6 Return all files with a date in the format 2025-06-*.log.**

**Q:7 List all files with numeric names only (e.g., 123.txt).**

**Q:8 Use glob to sort files by last modified time.**

**Q:9 Find all .txt files larger than 100KB using glob and os.**

**Q:10 Batch rename files with a custom suffix _archived.**

**Q:11 Create a utility that indexes all media files and stores the paths in a SQLite DB.**

**Q:12 Find duplicate filenames (regardless of path) across a directory tree.**

**Q:13 Generate a file manifest with relative paths and hash (MD5) of contents.**

**Q14: Use glob patterns dynamically to extract weekly reports (e.g., week_01.json, week_02.json).**

**Q:15 Write a recursive file crawler that ignores folders listed in a .ignore file.**


In [98]:
import glob
csv_files = glob.glob("*.csv")
json_files = glob.glob("*.json")
print("CSV files:", csv_files)
print("JSON files:", json_files)

CSV files: ['main.csv']
JSON files: ['main.json']


In [106]:
x = glob.glob("**/*.jpg", recursive=True)
print("JPG files:", x)

JPG files: ['1_result.jpg', 'os-2-working\\2.jpg']


In [112]:
count = 0
for f in os.listdir('.'):
    if file.endswith('.txt'):
        count += 1
print(f"Number of .txt files: {count}")

Number of .txt files: 9


In [115]:
files = glob.glob("report_*.txt")
print(files)

['report_.txt']


In [None]:
for f in glob.glob('.'):
    new = f.replace(' ', '_')
    os.rename(f, new)


In [116]:
import os

for file in glob.glob("* *"):
    new_name = file.replace(" ", "_")
    os.rename(file, new_name)


In [117]:
files = glob.glob("2025-06-*.log")
print(files)

['2025-06-1.log', '2025-06-2.log']


In [119]:
import os

files = glob.glob("*")
files.sort(key=os.path.getmtime)
print(files)

['Rain.txt', 'task-2.txt', 'Numpy.ipynb', '1_result.jpg', 'os-working', 'main.csv', 'main.json', 'os-2-working', 'report_.txt', '2025-06-2.log', '2025-06-1.log', 'Filing.ipynb']


In [126]:
print(os.path.getmtime('Filing.ipynb'))

1751285626.0250368


In [131]:
txt_files = glob.glob('*.txt')
for file in txt_files:
    if os.path.getsize(file) > 100 * 1024:  
        print(f"Large file: {file}, Size: {os.path.getsize(file) / 1024:.2f} KB")
    else:
        print(f"File: {file}, Size: {os.path.getsize(file) / 1024:.2f} KB")

File: Rain.txt, Size: 0.50 KB
File: report_.txt, Size: 0.00 KB
File: task-2.txt, Size: 0.46 KB


In [None]:
for file in glob.glob("*.*"):
    base, ext = os.path.splitext(file)
    print(f"Base name: {base}, Extension: {ext}")
    new_name = f"{base}_archived{ext}"
    os.rename(file, new_name)


Base name: 1_result, Extension: .jpg
Base name: 2025-06-1, Extension: .log
Base name: 2025-06-2, Extension: .log
Base name: Filing, Extension: .ipynb
Base name: main, Extension: .csv
Base name: main, Extension: .json
Base name: Numpy, Extension: .ipynb
Base name: Rain, Extension: .txt
Base name: report_, Extension: .txt
Base name: task-2, Extension: .txt



# File Handling – Text & Binary Files

**Q:1 Count the number of lines in a file without loading it entirely.**

**Q:2 Replace a specific word in a file and save it to a new file.**

**Q:3 Append data to an existing file with a timestamp.**

**Q:4 Read and print the first 10 lines of a file.**

**Q:5 Write a list of dictionaries as CSV manually (without csv module).**

**Q:6 Copy a binary file in chunks (e.g., image or PDF).**

**Q:7 Write a function to compare two files and print the differing lines.**

**Q:8 Safely read a file that may not exist using try-except.**

**Q:9 Read a file using a specific encoding (e.g., UTF-16).**

**Q:10 Detect and skip empty lines when reading a file.**

**Q:11 Implement a log rotation mechanism: create log.txt, log_1.txt, etc. when size exceeds 1MB.**

**Q:12 Build a file-based key-value store using JSON per line.**

**Q:13 Implement version control: on every write, back up the previous version with a timestamp.**

**Q:14 Create a reader that detects encoding using chardet or fallback encoding.**

**Q:15 Convert a large log file into separate files per date based on timestamps in each line.**



# JSON Handling – json module

**Q:1 Load JSON from a file and print a nested field (e.g., data["user"]["name"]).**

**Q:2 Write a Python dict to a file with pretty formatting.**

**Q:3 Merge multiple JSON objects into a single file.**

**Q:4 Convert a JSON array into CSV format.**

**Q:5 Update a nested key inside a loaded JSON.**

**Q:6 Create a function to pretty-print JSON from string input.**

**Q:7 Safely load malformed JSON with exception handling.**

**Q:8 Remove a key from each item in a JSON list and re-save.**

**Q:9 Convert an object with datetime to a JSON string using a custom encoder.**

**Q:10 Search for all values associated with a key in nested JSON.**

**Q:11 Write a function to flatten deeply nested JSON into a flat dictionary.**

**Q:12 Build a recursive JSON validator for required schema keys.**

**Q:13 Convert a nested JSON into a pandas DataFrame with normalized columns.**

**Q:14 Create a diff tool that compares two JSON files and shows key-level changes.**

**Q:15 Handle and fix trailing commas in malformed JSON before parsing.**



In [28]:
import json
with open('main.json', 'r') as f:
    data = json.load(f)
    print(data["user"]["name"])
    print(data['user']['email'])


Ahmad Raza
ahmad@example.com


In [33]:
settings = {
    "volume": 75
}
def save_pretty_json(data, filename):
    with open(filename, 'w') as f:
        json.dump(data, f,indent=4)
save_pretty_json(settings, 'config.json')

In [None]:
def merge_json_files(file_list, output_file):
    merged = []
    for fname in file_list:
        with open(fname, 'r') as f:
            merged.append(json.load(f))
    with open(output_file, 'w') as out:
        json.dump(merged, out, indent=4)

    
# Regular Expressions – re module

**Q:1 Extract email addresses from a string using re.findall().**

**Q:2 Validate a US phone number using regex.**

**Q:3 Extract hashtags from a tweet-like string.**

**Q:4 Replace all numbers with # in a paragraph.**

**Q:5 Match filenames with extension .pdf, .docx, or .xlsx.**

**Q:6 Split a paragraph into sentences using regex.**

**Q:7 Match a date in the format DD-MM-YYYY or YYYY/MM/DD.**

**Q:8 Extract quoted strings from text (e.g., "like this").**

**Q:9 Clean a text by removing special characters except alphanumerics and spaces.**

**Q:10 Capture repeated words like the the, is is in a sentence.**

**Q:11 Write a regex that extracts values from key-value pairs (key: value) even if keys contain spaces.**

**Q:12 Extract nested parentheses using recursive regex (advanced feature).**

**Q:13 Create a regex to detect and fix malformed URLs in a text block.**

**Q:14 Build a pattern to extract address-like strings (e.g., 123 Main St, City, ZIP).**

**Q:15 Tokenize a log line into timestamp, level, and message using regex groups.**

In [1]:
import re
mail='ghulam.ahmad.uet@gmail.com'
def extract_emails(text):
    return re.findall(r'\b[\w.-]+@[\w.-]+\.\w+\b', text)
print(extract_emails(mail))

['ghulam.ahmad.uet@gmail.com']


In [8]:
def is_valid_us_phone(number):
    return bool(re.fullmatch(r'\(?\d{4}\)?[-.\s]?\d{5}[-.\s]?\d{1}', number))
is_valid_us_phone("0305-40487-3")

True

In [12]:
def extract_hashtags(text):
    return re.findall(r'#\w+', text)

extract_hashtags('#oppenhiemer')

['#oppenhiemer']

In [14]:
def replace_numbers(text):
    return re.sub(r'\d+', '#', text)
replace_numbers('numbers are 123-456-7890')

'numbers are #-#-#'

In [15]:
def find_filenames(text):
    return re.findall(r'\b\w+\.(pdf|docx|xlsx)\b', text)
text = "Here are some files: report.pdf, data.xlsx, notes.docx"
print(find_filenames(text))

['pdf', 'xlsx', 'docx']


In [18]:
def split_into_sentences(text):
    return re.split(r'(?<=[.!?])\s+', text)
text = "Hello world! How are you? I hope you're doing well."
split_into_sentences(text)

['Hello world!', 'How are you?', "I hope you're doing well."]

In [19]:
def find_dates(text):
    return re.findall(r'\b(?:\d{2}-\d{2}-\d{4}|\d{4}/\d{2}/\d{2})\b', text)
find_dates("Today's date is 12-05-2023 and tomorrow's date is 2023/05/13.")

['12-05-2023', '2023/05/13']

In [20]:
def extract_quoted(text):
    return re.findall(r'"(.*?)"', text)
text = 'He said, "Hello, world!" and then left.'
print(extract_quoted(text))

['Hello, world!']


In [24]:
def clean_text(text):
    return re.sub(r'[^\w\s]', '', text)
clean_text("The is a test! Let's clean it up.")

'The is a test Lets clean it up'

In [26]:
def find_repeated_words(text):
    return re.findall(r'\b(\w+)\s+\1\b', text, re.IGNORECASE)
text = "This is a test test to find find repeated words words."
find_repeated_words(text)

['test', 'find', 'words']