# Week 7 Lab: Experimenting with loading files

### Description
You have joined a data analytics team that works with a variety of file formats. Your first task is to read, extract basic information, and print summaries from different types of files stored in a directory dataset.

## Step 0 - Import modules/packages

In [30]:
import os
import pandas as pd
import json
from openpyxl import load_workbook
from PIL import Image
import pdfplumber

## Step 1 - Understanding the Data Directory
- List all files in the directory and print their names.

In [2]:
directory_path = "dataset"

In [3]:
try:
    files = os.listdir(directory_path)
    for file in files:
        print(file)
except FileNotFoundError:
    print("Directory not found.")

data.csv
data.json
data.txt
data.xlsx
document.pdf
image.jpg


## Step 2 & 3 - Opening and Reading Files & Handling Errors Gracefully

- Text File (`.txt`) → Read and print the first 5 lines.

In [9]:
file_name = "data.txt"
file_path = directory_path + "/" + file_name
try:
    with open(file_path, "r", encoding="utf-8") as file:
        for _ in range(5):
            line = file.readline()
            if not line:
                break
            print(line.strip())
except FileNotFoundError:
    print("File not found.")

I wrote an email to my friend and he praised me for my writing skills.
Я написал электронное письмо своему другу, и он похвалил меня за мои писательские навыки.
मैंने अपने मित्र को एक ईमेल लिखा और उसने मेरी लेखन कौशल की प्रशंसा की।
我给我的朋友写了一封电子邮件，他称赞我的写作技巧。
لقد كتبت بريدًا إلكترونيًا إلى صديقي وأشاد بي على مهاراتي في الكتابة.


- CSV File (`.csv`) → Use `pandas` package to load and display the first 3 rows.

In [34]:
file_name = "data.csv"
file_path = directory_path + "/" + file_name
try:
    df = pd.read_csv(file_path, encoding="utf-8")
    print(df.head(3))
except FileNotFoundError:
    print("File not found.")
except pd.errors.EmptyDataError:
    print("CSV file is empty.")
except pd.errors.ParserError:
    print("Error parsing CSV file.")

     Map My Agent Teammate1 Teammate2 Teammate3 Teammate4  Victory
0  Split     Raze      Tejo      Omen      Sage    Cypher    False
1  Lotus    Reyna      Sage     Clove      Tejo      Raze     True
2  Split     Sage      Omen      Jett      Yoru      Tejo     True


- JSON File (`.json`) → Use `json` package to parse and display key-value pairs.

In [35]:
file_name = "data.json"
file_path = directory_path + "/" + file_name
try:
    with open(file_path, "r", encoding="utf-8") as file:
        data = json.load(file)
        for key, value in data.items():
            print(f"{key}: {value}")
except FileNotFoundError:
    print("File not found.")
except json.JSONDecodeError:
    print("Error decoding JSON file.")

reports_made: {'AFK': 2, 'DISRESPECTFUL_BEHAVIORS': 1, 'SABOTAGING_TEAM': 1}
reports_received: {'AFK': 1, 'CHEATING': 1, 'COMMS_ABUSE_TEXT': 1, 'DISRESPECTFUL_BEHAVIORS': 1, 'SABOTAGING_TEAM': 5}


- Excel File (`.xlsx`) → Use `openpyxl` to read and display the first 3 rows.

In [36]:
file_name = "data.xlsx"
file_path = directory_path + "/" + file_name
try:
    workbook = load_workbook(file_path)
    sheet = workbook.active

    for row in sheet.iter_rows(min_row=1, max_row=3, values_only=True):
        print(row)
        
except FileNotFoundError:
    print("File not found.")
except Exception as e:
    print(f"Error reading Excel file: {e}")

('Map', 'My Agent', 'Teammate1', 'Teammate2', 'Teammate3', 'Teammate4')
('Split', 'Sage', 'Breach', 'Omen', 'Iso', 'Chamber')
('Fracture', 'Omen', 'Reyna', 'Deadlock', 'Gekko', 'Sage')


- Image File (`.jpg`) → Use `PIL` to open and display the image.

In [37]:
file_name = "image.jpg"
file_path = directory_path + "/" + file_name
try:
    image = Image.open(file_path)
    image.show()
except FileNotFoundError:
    print("File not found.")
except Exception as e:
    print(f"Error opening image file: {e}")

- PDF File (`.pdf`) → Use `pdfplumber` to extract and print the first few lines.

In [38]:
file_name = "document.pdf"
file_path = directory_path + "/" + file_name
try:
    with pdfplumber.open(file_path) as pdf:
        num_pages = min(2, len(pdf.pages))
        for i in range(num_pages):
            text = pdf.pages[i].extract_text()
            if text:
                lines = text.split("\n")
                for line in lines[:5]:
                    print(line)
            else:
                print("No text found on this page.")
except FileNotFoundError:
    print("File not found.")
except Exception as e:
    print(f"Error reading PDF file: {e}")

DA108 | Lab 07 Assignment
Objective A: File Management System for a Research Lab
You are working as a data manager in a research lab that collects and organizes data from various
sources. Your task is to automate the organization of files stored in a messy directory containing text files
(.txt), CSV datasets (.csv), JSON records (.json), images (.jpg, .png), and log files (.log).
● Image file (image.jpg)
● PDF file (document.pdf)
Write a script to list all files in the directory and print their names.
Step 2 - Opening and Reading Files
Implement functions to open and read each file type:
