# Filesystem

In this page considered tools that are used for system management in linux.

## List contents (ls)

`ls` command that allows to show files/directories in the folder.

The following python code creates some random files and folders - so we can see what they look like in the output of the `ls` command.

In [None]:
python3 << EOF
import os
import random
import string

def random_name(length=8):
    'Function to create a random file/directory name'
    letters = string.ascii_lowercase
    return ''.join(random.choice(letters) for i in range(length))

experimental_path = "/tmp/ls"
os.mkdir(experimental_path)

for i in range(10):
    new_path = experimental_path + "/" + random_name()
    if random.choice([True, False]):
        os.mkdir(new_path)
    else:
        with open(new_path, "w") as f:
            f.write("some content")
EOF

Just by using the `ls` command, we'll get some files/folders listed in random order. But we can't tell which of them are directories and which are files, their creation dates, the user who created them, and so on.

In [None]:
ls /tmp/ls

bgvnkllg  gcpxmhxi  [0m[01;34mqikvjpad[0m  uezijtkv  uzrkogvi
fzktcbrl  [01;34mkwtzoaja[0m  rprudeil  uvyfqpnk  [01;34mvggltcfs[0m


When using the `ls -l` command, you receive additional information in a detailed, table-like format. The columns provide the following details:

- Line that indicates whether the item is a directory, along with its permissions.
- Number of links to the item.
- The third and fourth columns are the user who owns the file and the Unix group of users to which the file belongs.
- Size of item in bytes.
- Time at which item was changed.
- And last column is the name of the item.

In [None]:
ls /tmp/ls -l

total 40
-rw-rw-r-- 1 f-kobak-distance-desctop f-kobak-distance-desctop   12 Dec 29 19:01 bgvnkllg
-rw-rw-r-- 1 f-kobak-distance-desctop f-kobak-distance-desctop   12 Dec 29 19:01 fzktcbrl
-rw-rw-r-- 1 f-kobak-distance-desctop f-kobak-distance-desctop   12 Dec 29 19:01 gcpxmhxi
drwxrwxr-x 2 f-kobak-distance-desctop f-kobak-distance-desctop 4096 Dec 29 19:01 [0m[01;34mkwtzoaja[0m
drwxrwxr-x 2 f-kobak-distance-desctop f-kobak-distance-desctop 4096 Dec 29 19:01 [01;34mqikvjpad[0m
-rw-rw-r-- 1 f-kobak-distance-desctop f-kobak-distance-desctop   12 Dec 29 19:01 rprudeil
-rw-rw-r-- 1 f-kobak-distance-desctop f-kobak-distance-desctop   12 Dec 29 19:01 uezijtkv
-rw-rw-r-- 1 f-kobak-distance-desctop f-kobak-distance-desctop   12 Dec 29 19:01 uvyfqpnk
-rw-rw-r-- 1 f-kobak-distance-desctop f-kobak-distance-desctop   12 Dec 29 19:01 uzrkogvi
drwxrwxr-x 2 f-kobak-distance-desctop f-kobak-distance-desctop 4096 Dec 29 19:01 [01;34mvggltcfs[0m


## Find

Linux `find` command allows you to search for files in the system. It have following syntax `find <directory-to-search> <criteria> <action>` where:

- `<directory-to-search>`: Specifies the directory where you want to begin the search.
- `<criteria>`: Defines the properties of the files you are searching for. This can include the file name, size, modification date, permissions, and more.
- `<action>`: Specifies what to do with the found files. By default, it prints the path to the files, but it can also execute other commands on them.

The following Python code creates a random tree of foldres and puts `text.txt` in the random place.

In [None]:
import os
import random
import string

def random_directory_name(length=8):
    'Function to create a random directory name'
    letters = string.ascii_lowercase
    return ''.join(random.choice(letters) for i in range(length))

os.mkdir("linux_files/find")
folders = ["linux_files/find"]

for i in range(10):
    fold = random.choice(folders)
    new_dir = fold + "/" + random_directory_name()
    os.mkdir(new_dir)
    folders.append(new_dir)

with open(random.choice(folders) + "/" + "text.txt", "w") as f:
    f.write("Message to aliens")

As a result, we have the following file tree.

In [None]:
!tree linux_files/find

[01;34mlinux_files/find[0m
└── [01;34mgptiiiab[0m
    ├── [01;34miipubngm[0m
    ├── [01;34mngvixpsi[0m
    ├── [01;34mpluqbiln[0m
    └── [01;34myfjphojg[0m
        ├── [01;34mbosqqrcn[0m
        ├── [01;34mfopmjtfu[0m
        └── [01;34mrctkvqsm[0m
            └── [01;34mkegxfokz[0m
                ├── [00mtext.txt[0m
                └── [01;34mxsvcplwo[0m

10 directories, 1 file


And we can get the full path for `text.txt` by using construction `--name text.txt` as criteria.

In [None]:
%%bash
find linux_files/find -name text.txt
rm -r linux_files/find

linux_files/find/gptiiiab/yfjphojg/rctkvqsm/kegxfokz/text.txt


## Disk usage (du)

The `du` command is used to check disk usage by different paths in the filesystem. It provides information about how much space is being used by files and directories.

---

The following cell creates several folders and files. Notably, `linux/du_example/megabytes_file` is created with a size of exactly 2.5 megabytes, whereas `linux/du_example/folder/small_file` contains only a single short line, making it an extremely small file.

In [None]:
mkdir linux/du_example
mkdir linux/du_example/folder

dd if=/dev/zero of=linux/du_example/megabutes_file bs=1M count=2 &> /dev/null
dd if=/dev/zero of=linux/du_example/megabutes_file bs=512K count=1 oflag=append conv=notrunc &>/dev/null

echo "this is short message" >> linux/du_example/folder/small_file

Now let's try the `du` command. The following options are also added:

- `a`: prints both files and folders.
- `h`: displays file sizes in a human-readable format.

These options are really useful in my opinion.

In [None]:
du -ah linux/du_example/

2,5M	linux/du_example/megabutes_file
4,0K	linux/du_example/folder/small_file
8,0K	linux/du_example/folder
2,6M	linux/du_example/


After all don't forget foder that was used for experiments.

In [None]:
rm -r linux/du_example

## Archiving

This section discusses utilities in Linux related to archiving: combining a set of files into a single compact file and extracting it back to the original files. Below is a list of popular archive utilities:


| Utility  | Description                                                                                 |
|----------|---------------------------------------------------------------------------------------------|
| `tar`    | A widely-used tool for creating, extracting, and managing tarball archives (e.g., `.tar`, `.tar.gz`). |
| `gzip`   | Compresses files using the GNU zip algorithm, typically creating `.gz` files.               |
| `bzip2`  | Compresses files using the Burrows-Wheeler algorithm, typically creating `.bz2` files.      |
| `xz`     | Compresses files with high compression efficiency, typically creating `.xz` files.          |
| `zip`    | Creates compressed archives in `.zip` format, commonly used for cross-platform compatibility. |
| `unzip`  | Extracts `.zip` files.                                                                      |
| `7z`     | A high-compression utility for `.7z` format and other archive types, part of the p7zip package. |
| `ar`     | Creates, modifies, and extracts archives, often used for `.deb` packages in Debian-based systems. |
| `rar`    | Creates RAR archives, known for good compression ratios; proprietary software.               |
| `unrar`  | Extracts RAR files.                                                                         |
| `lzma`   | Compresses files using LZMA (Lempel-Ziv-Markov chain algorithm), predecessor to `xz`.       |
| `tar` + `lzma` | Combines tar archiving and LZMA compression, resulting in `.tar.lzma` files.          |
| `tar` + `xz`   | Combines tar archiving and XZ compression, resulting in `.tar.xz` files.              |
| `zstd`   | Compresses files with high speed and efficiency, creating `.zst` files.                     |
| `cpio`   | Archives files for use with tape backups or streams.                                        |

Find out more in the [particular page](filesystem/archiving.ipynb).

---

Consider the example of an archive file generated in the following cell.

In [None]:
for ((i=0; i<1000000; i++))
do
    echo -n "a" >> /tmp/archive_me
done

du -h /tmp/archive_me

980K	/tmp/archive_me


File just contains repeated `a`, it is repeated so many times that all files take `980K`.

The next cell applies archiving to the created file.

In [None]:
tar -cJvf /tmp/archive.tar.xz /tmp/archive_me
du -h /tmp/archive.tar.xz

tar: Removing leading `/' from member names
/tmp/archive_me
4,0K	/tmp/archive.tar.xz


Result takes only `4KB`.

The following cell restores the original file.

In [None]:
mkdir /tmp/unarchived
tar -xJvf /tmp/archive.tar.xz -C /tmp/unarchived

tmp/archive_me


This is the file tree we got after unarchiving.

In [None]:
tree /tmp/unarchived

[01;34m/tmp/unarchived[0m
└── [01;34mtmp[0m
    └── archive_me

1 directory, 1 file


Finally, check that the contents of the file have been restored correctly.

In [None]:
head -c 100 /tmp/unarchived/tmp/archive_me

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
