# FileManager - Workings

The purpose of this notebook is to allow for some testing of and workings on processes related to the FileManager project.

The assumed directory/file structure for this notebook is as follows:

- QuizWorkings.ipynb  
- assets/
  - images/
    - IMG_20180703_205212.jpg

## Table Of Contents

* **0.** [Dependancies and Settings](#0-Dependancies-and-Settings)  
* **1.** [Looking at Metadata](#1-Looking-at-Metadata)  

## 0 Dependancies and Settings

Using `pathlib` as it is cross-platform:

In [1]:
from pathlib import Path

Using `platform` to check users current system:

In [2]:
import platform

Using `time` and `datetime` for waiting and conversion:

In [14]:
import time
from datetime import datetime

`math` for simple operations:

In [31]:
import math

Current working directory for ease use:

In [3]:
cwd = Path.cwd()
cwd

WindowsPath('C:/Users/seani/Documents/Projects/FileManager')

## 1 Looking at Metadata

In this section we will investigate methods to look at file metadata.

Main images directory:

In [4]:
img = cwd / "assets\images"
img

WindowsPath('C:/Users/seani/Documents/Projects/FileManager/assets/images')

### 1.1 Single image file

First, let's obtain a single image file: `IMG_20180703_205212.jpg`

In [5]:
path = img / "IMG_20180703_205212.jpg"
path

WindowsPath('C:/Users/seani/Documents/Projects/FileManager/assets/images/IMG_20180703_205212.jpg')

We can look at its statdata:

In [6]:
statdata = path.stat()
statdata

os.stat_result(st_mode=33206, st_ino=5629499534378696, st_dev=2457206766, st_nlink=1, st_uid=0, st_gid=0, st_size=3855920, st_atime=1687521588, st_mtime=1687474906, st_ctime=1687474906)

There are numerous timestamps to look at:

- Time of last access (`st_atime`)
- Time of last change (`st_ctime`)
- Time of last modification (`st_mtime`)


Depending on the OS used, these timestamps can mean different things. On Windows - for files of image or video type - it is likely that `ctime` and `mtime` are the same. For files that are modified regularly (eg: `.txt` files), `ctime` likely refers to the time of creation (although not always). On Mac, we can look at a parameter `st_birthtime` for the creation timestamp. For Linux systems, it is more difficult to obtain creation dates, so the best estimate may be `mtime`. *Explanation for this can be found in [this stack overflow answer](https://stackoverflow.com/questions/237079/how-do-i-get-file-creation-and-modification-date-times/39501288#39501288).*

*NOTE: the `st_ino` parameter details the inode of the file. Explanation of an inode is not important here, but note that every file in any Unix system has an inode, which contains the files metaparameters.*

As a quick aside to test this, let's create a text-file, wait 10 seconds, then modify it, wait 10 seconds, then access it. We can then check these timestamps to see what's different. First, let's define a function to convert Unix timestamps to `YYYY-MM-DD H-M-S` format:

In [24]:
def unix_to_readable_timestamp(timestamp):
    '''
    Converts a timestamp from Unix (epoch in 00:00:00 UTC on 1 Jan 1970) to a readable format.
    '''
    
    # convert date from Unix to UTC
    converted = datetime.utcfromtimestamp(timestamp)
    # format in readable time
    formatted = converted.strftime('%Y-%m-%d %H:%M:%S')
    
    return formatted

In [27]:
# new file path string
test_text_path = 'test_text_file.txt'

# create file and write data to it
with open(test_text_path, 'w') as f:
    f.write('some data to be written to the file')

print('File created.\n')

# get pathlib reference to file
f = Path(test_text_path)
# get stats
f_stat = f.stat()

# print timestamp
print(f'timestamp: {time.time():10.7f} --> {unix_to_readable_timestamp(time.time())}')
print(f'atime:     {f_stat.st_atime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_atime)}')
print(f'ctime:     {f_stat.st_ctime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_ctime)}')
print(f'mtime:     {f_stat.st_mtime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_mtime)}')

# wait 10 seconds
print('\nModifying file...\n')
time.sleep(10)

# modify file by opening again
with open(test_text_path, 'w') as f:
    f.write('\nsome more data to be written to the file')

# get pathlib reference to file
f = Path(test_text_path)
# get stats again
f_stat = f.stat()

# print timestamp
print(f'timestamp: {time.time():10.7f} --> {unix_to_readable_timestamp(time.time())}')
print(f'atime:     {f_stat.st_atime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_atime)}')
print(f'ctime:     {f_stat.st_ctime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_ctime)}')
print(f'mtime:     {f_stat.st_mtime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_mtime)}')

# wait 10 seconds
print('\nAccessing file...\n')
time.sleep(10)

# access file by opening again
with open(test_text_path, 'r') as f:
    f.readlines()

# get pathlib reference to file
f = Path(test_text_path)
# get stats again
f_stat = f.stat()

# print timestamp
print(f'timestamp: {time.time():10.7f} --> {unix_to_readable_timestamp(time.time())}')
print(f'atime:     {f_stat.st_atime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_atime)}')
print(f'ctime:     {f_stat.st_ctime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_ctime)}')
print(f'mtime:     {f_stat.st_mtime:10.7f} --> {unix_to_readable_timestamp(f_stat.st_mtime)}')

# delete file
f.unlink()
print('\nFile deleted.')

File created.

timestamp: 1687623071.9826910 --> 2023-06-24 16:11:11
atime:     1687623071.9816937 --> 2023-06-24 16:11:11
ctime:     1687623071.9807999 --> 2023-06-24 16:11:11
mtime:     1687623071.9816937 --> 2023-06-24 16:11:11

Modifying file...

timestamp: 1687623081.9936018 --> 2023-06-24 16:11:21
atime:     1687623081.9936018 --> 2023-06-24 16:11:21
ctime:     1687623071.9807999 --> 2023-06-24 16:11:11
mtime:     1687623081.9936018 --> 2023-06-24 16:11:21

Accessing file...

timestamp: 1687623092.0017669 --> 2023-06-24 16:11:32
atime:     1687623092.0017669 --> 2023-06-24 16:11:32
ctime:     1687623071.9807999 --> 2023-06-24 16:11:11
mtime:     1687623081.9936018 --> 2023-06-24 16:11:21

File deleted.


As we can see, when modifying the file `mtime` changed, and in both modifying and accessing the file the `atime` changed. Importantly, the `ctime` attribute remained constant throughout.

A cross-platform implementation of the creation timestamp checking is as follows:

In [9]:
def get_creation_timestamp(path_to_file):
    """
    Try to get the Unix timestamp that a file was created, falling back to when it was
    last modified if that isn't possible.
    See http://stackoverflow.com/a/39501288/1709587 for explanation.
    
    path_to_file: string of the path to the file
    """
    
    # get path variable
    path = Path(path_to_file)
    # get its stats
    statdata = path.stat()
    
    # if windows, simply the ctime
    if platform.system() == 'Windows':
        return statdata.st_ctime
    # if not windows, try Mac method
    else:
        try:
            return statdata.st_birthtime
        except AttributeError:
            # We're probably on Linux. No easy way to get creation timestamps here,
            # so we'll settle for when its content was last modified.
            return statdata.st_mtime

Testing this:

In [29]:
unix_to_readable_timestamp(get_creation_timestamp("assets\images\IMG_20180703_205212.jpg"))

'2023-06-22 23:01:46'

We can get the filetype by looking at the suffix:

In [64]:
path.suffix

'.jpg'

Check if it is a file:

In [65]:
path.is_file()

True

We can define a function to return whether a path is a directory, image, video or audio by looking at the extension:

In [67]:
def get_path_type(path_string):
    '''
    Returns a string of either 'directory', 'image', 'audio', or 'video' depending on the files extension.
    '''
    
    # define list of extensions
    audio_extensions = ['.mp3', '.ogg']
    video_extensions = ['.mp4', '.mkv']
    image_extensions = ['.jpg', '.jpeg', '.png']
    
    # get as a path reference
    path = Path(path_string)
    
    # check if a file
    if path.is_file():
        # get suffix and lower it
        extension = path.suffix.lower()
        # check if an audio
        if extension in audio_extensions:
            return 'audio'
        # check if an video
        elif extension in video_extensions:
            return 'video'
        # check if an image
        elif extension in image_extensions:
            return 'image'
        # otherwise
        else:
            'other'
    # is a directory
    else:
        return 'directory'

Trying on the image file:

In [68]:
get_path_type(str(path))

'image'

Now, let's check the size of the image file, also from the stat data:

In [30]:
statdata.st_size

3855920

This is the size in bytes. For ease of use, let's define a function that returns the size in kilobytes, megabytes, or gigabytes depending on the file size:

In [38]:
def get_readable_filesize(filesize):
    '''
    Returns the filesize of an object in a readable format as a string depending on the size.
    '''
    
    # get magnitude of size
    magnitude = math.log10(filesize)
    # floor it
    magnitude = math.floor(magnitude)
    
    # check if fits GB
    if magnitude >= 9:
        # format so that GB magnitude is removed and ceiling to 3 digits
        filesize_format = math.ceil(filesize / 1e6) / 1e3
        # return as string
        filesize_string = f'{filesize_format:0.3f} GB'
    # check if fits MB
    elif magnitude >= 6:
        # format so that MB magnitude is removed and ceiling to 3 digits
        filesize_format = math.ceil(filesize / 1e3) / 1e3
        # return as string
        filesize_string = f'{filesize_format:0.3f} MB'
    # check if fits KB
    elif magnitude >= 3:
        # format so that KB magnitude is removed and ceiling to 3 digits
        filesize_format = math.ceil(filesize) / 1e3
        # return as string
        filesize_string = f'{filesize_format:0.3f} KB'
    else:
        # return as bytes string
        filesize_string = f'{filesize} B'
    
    return filesize_string

Testing this:

In [39]:
get_readable_filesize(statdata.st_size)

'3.856 MB'

### 1.2 Multiple files of various types