# Laboratory on Data Carving
_Digital Forensics and Biometrics_ A.A. 2025/2026

Lecturer: prof. **Simone Milani** (simone.milani@dei.unipd.it)

Teaching Assistants: **Mattia Tamiazzo** (mattia.tamiazzo@studenti.unipd.it); **Luca Domeneghetti** (luca.domeneghetti@studenti.unipd.it)

### Prerequisites
This laboratory requires no prior knowledge concerning digital forensics. Although, basic knowledge of the Unix enviromnent might benefit the understanding of the deletion/recovery process.

In general, the following aspects are expected to be known and will not be covered in detail:
- Unix paradigm of files, directories and devices
- Basic Shell usage
- `Ext4` filesystem layout: inodes, block sectors, journaling
- Calculations with hexadecimal byte offsets

### Contents
The goal of this laboratory is to provide the basic theoretical and practical notions on data carving and file recovery. The details related to file system's structure and I/O mechanisms – despite being crucial for a potentially successful data recovery – are out of scope and will be provided as a further reading for the student.

At the end of the laboratory the student will have acquired the following skills:
- Perform a forensic disk copy using Unix imaging tools (`dcfldd`)
- Analyze the partition layout of a disk image
- List the files within a filesystem
- **Retrieve deallocated/deleted files**
- **Perform data carving** (manually and using dedicated tools)
___
## Theoretical aspects of filesystem forensics
Before diving into the practice, a brief theoretical detour is essential to understand how a recovery procedure is to be carried out.

Having in mind how a specific filesystem works helps to understand how deletion processes work, how files are allocated logically and physically, and most importantly where to look for when searching for deallocated data.

### The Ext filesystem
This laboratory will be focused on Ext filesystems as it is widely implemented and tested under Linux environment. Its simplicity makes it the suitable starting point for a comprehensive data forensic approach.

#### Inodes
An **inode** (index nodes) is a data structure used by Unix filesystems (including the Ext family) to represent metadata on files. Each inode stores the following information:

- File type (e.g., regular file, directory, symlink)
- Permissions and ownership (UID, GID)
- Timestamps (created, modified, accessed)
- File size
- Link count
- Pointers to data blocks

Inodes do **not** store the file name or its path — these are maintained in directory entries to map symbolic names to inode numbers. This separation is critical in forensic analysis and data carving where inodes may remain allocated even if directory structures are damaged or missing (this is the case for _orphan files_).

Within an Ext partition, inodes are located in inode tables. Each inode is marked as allocated or deallocated by using inode bitmaps.

[![](https://www.virtualcuriosities.com/wp-content/uploads/2025/03/linux-diagram-hard-links-inodes-20250326.webp)](https://www.virtualcuriosities.com/articles/4507/how-hard-links-and-inodes-work-on-linux)

#### Directories
In Ext filesystems, directories are special files (file code `0x2`) that store a list of **directory entries**, each mapping a filename to an **inode number**. These entries are stored sequentially in data blocks and are arranged as such:

| Offset | Size                | Name                   | Description                                               |
|--------|---------------------|------------------------|-----------------------------------------------------------|
| 0x0    | __le32              | inode                  | Number of the inode that this directory entry points to.  |
| 0x4    | __le16              | rec_len                | Length of this directory entry.                           |
| 0x6    | __u8                | name_len               | Length of the file name.                                  |
| 0x7    | __u8                | file_type              | File type code                                            |
| 0x8    | char\[255]          | name                   | File name.                                                |

This structure allows multiple filenames (hard links) to point to the same inode.

In Ext2 and Ext3, directory entries are stored in a linear list, which can become inefficient as directories grow. Ext4 introduced **HTree indexing**, a hashed B-tree-like structure, to improve performance in large directories.

### NTFS and FAT

Aside from Ext filesystems, NTFS and FAT filesystems are frequently employed in Windows environments or external data storage (e.g. USBs). Although they share some similarities, understanding the differences is key for a successful file recovery.

- **Metadata Handling**:  
  Ext uses **inodes** to store file metadata separately from directory entries. In contrast, **NTFS** stores metadata in the **Master File Table (MFT)**, with each file represented as a record. **FAT**, being simpler, uses a **File Allocation Table** and directory entries that contain both metadata and file location info.

- **File Deletion Behavior**:  
  In **Ext**, when a file is deleted, its directory entry and inode may persist until overwritten, which aids data recovery. **NTFS** marks MFT entries as deleted but often retains significant metadata. **FAT** simply marks the first character of the filename as deleted and updates the FAT chain.

- **Journaling**:  
  **Ext3/4** and **NTFS** are journaling filesystems, enhancing data integrity but sometimes complicating recovery due to overwrites. **FAT** lacks journaling, making it more vulnerable to corruption but also leaving raw data more directly accessible for carving.

- **File Name and Path Storage**:  
  Ext separates names (in directory entries) from inodes, while NTFS stores file name attributes directly within MFT records. FAT embeds the file name directly in the directory entry.
___

## Exercise 01: perform a forensic disk imaging
When in possess of a digital forensic device (e.g. an hard disk, USB drive, internal SSD...), the first crucial step is to perform a copy of it as to avoid any accidental modification of the original.

This process is called **disk imaging** and abides the following principles:
1. the copy is an exact duplicate of the original device
2. the original device remains unaltered by the process

To avoid that unwanted (or intentional!) modifications are introduced during subsequent procedures, it is important to keep track of the **hash signature** of the original device. This step has to be performed during the first acquisition of the digital media.