# File handling in Python

Python has existed for quite a while, so over its history many different file-handling solutions have emerged. We will, however, only deal with a single, modern solution: Pathlib!

## Path object

The modern, recommended solution for files/directories is the pathlib.Path object. It represents a given directory or file for us. We don't need to worry about differences between various operating systems (some old OSes use `\\` as a path separator), because the Path object overrides the division operator, so we simply use that.

In [None]:
# import the Path object
from pathlib import Path

# current working directory
cwd = Path.cwd()
print("We are here now:", cwd)

# user's home directory
home = Path.home()
print("This is the user's home directory:", home)

# specific path
p = Path("adatok/meresek.txt")

# joining paths: using the / operator
data_dir = Path("adatok")
file = data_dir / "meresek.txt"

file  # Its type will be PosixPath now because we're on Linux
# if you do the same on Windows, it will be WindowsPath


With pathlib we can quickly inspect files. Among other things, we can check:
```python
p.exists()      # exists?
p.is_file()     # is it a file?
p.is_dir()      # is it a directory?
p.name          # filename with extension: "meresek.txt"
p.stem          # without extension: "meresek"
p.suffix        # what is the extension: ".txt"
p.parent        # parent directory?
```

## File operations

We've opened files earlier, but we can also do this via Path. We don't really need much else.

Let's write some data to a file!

In [None]:
# super valuable parabola data!
data = [x**2+2 for x in range(30)]

In [None]:
from pathlib import Path

# create a Path (file)
p = Path("kimenet/eredmeny.txt")

# if necessary, create the (parent) directory!
# if it already exists, that's fine
p.parent.mkdir(parents=True, exist_ok=True)

# write out the data ("w" indicates we want to write)
with p.open("w") as f:
    for x in data:
        # add a newline at the end
        f.write(f"{x}\n") # each number on a new line!


In [None]:
# the file is now on the filesystem:
!cat kimenet/eredmeny.txt

In [None]:
# now let's see how to load it back!

from pathlib import Path

p = Path("kimenet/eredmeny.txt")
# if we only need it as text:
content = p.read_text(encoding="utf-8")
print(content)


In [None]:
# or if we'd rather read the lines one by one:
# "r" indicates that we want to read!
with p.open("r") as f:
    for line in f:
        print(line.strip())


### Binary files

In [None]:
# download the YBL logo
!curl 'https://ybl.uni-obuda.hu/wp-content/themes/yblszm/img/oe_ybl_hu.png' > ybl.png

In [None]:
# Not all files are text. If we want to handle images or data, we can read a binary file
p = Path("ybl.png")

data_bytes = p.read_bytes()      # bytes
# ... processing ...
p2 = Path("logo_masolat.png")
p2.write_bytes(data_bytes)


We will probably do this quite rarely. When working with binary files we will typically process them with some package (e.g. images with PIL, databases with a DB package, numerical datasets with numpy), and those usually provide their own file-opening functions.

### Rename and delete files

In [None]:
# file handling is straightforward

# create a path
p = Path("info.txt")

# write something into it
p.write_text("Important message!")

# rename it
new_path = Path("uj_nev.txt")
p.rename(new_path)

# delete it, but only if it exists
if new_path.exists():
    new_path.unlink()


## Directory management

We can traverse a folder and analyze the files and directories inside it:

In [None]:
# full traversal of a folder
folder = Path(".") # this is the directory where the program runs

for item in folder.iterdir():
    if item.is_file():
        print("File:", item.name)
    elif item.is_dir():
        print("Directory:", item.name)


Even more practical is the glob generator, which lets us search for arbitrary filenames, and even traverse the filesystem recursively!

In [None]:
folder = Path("kimenet")

# All .txt files
for file in folder.glob("*.txt"):
    print(file)            # full path
    print(file.name)       # only name


kimenet/eredmeny.txt
eredmeny.txt


In the example above, * means: any sequence of characters (possibly empty). We can use multiple such filters!

* `*.txt` → any something.txt
* `adat_*.csv` → any csv that starts with adat_
* `*` – 0 or more arbitrary characters
* `?` – exactly 1 arbitrary character
* `[abc]` – exactly 1 character from those listed in the brackets

So:
```python
folder.glob("adat_??.txt")      # e.g. adat_01.txt, adat_ab.txt
folder.glob("[ab]*.log")        # .log files starting with 'a' or 'b'
```

If we want to search in every subfolder, we have two options:

In [None]:
root = Path(".")

for file in root.glob("**/*.txt"):
    print(file)


In [None]:
# or:
for file in root.rglob("*.txt"):
    print(file)


## File handling

In [None]:
# file handling is straightforward

# create a path
p = Path("info.txt")

# write something into it
p.write_text("Important message!")

# rename it
new_path = Path("uj_nev.txt")
p.rename(new_path)

# delete it, but only if it exists
if new_path.exists():
    new_path.unlink()


## ZipFile - handling compressed archives

Data often arrives compressed. In Python we can easily work with them even without extracting them every time. The zipfile package helps us here and handles compressed archives in a similar way to PathLib.