<a href="https://colab.research.google.com/github/jonaslindemann/compute-course-public/blob/master/general/2025/Python_built_in_functions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<img src="https://github.com/jonaslindemann/compute-course-docs/blob/2b815899b4ff728c42dd330e0853412efb12e075/source/images/builtin_functions.png?raw=true" alt="Alt text" width="600">

# Python Built-in Functions and Runtime Library

This notebook provides a comprehensive guide to Python's built-in functions and runtime library. You'll learn how to interact with the operating system, manage files and directories, execute external programs, handle logging, and work with various data formats.

## Table of Contents

1. **System Functions**
   - Environment Variables
   - System Path Management
   - Working Directory Operations
   - File and Directory Listing
   - Directory Manipulation
   - File Information Queries
   - Path Management with `os.path` and `pathlib`
   - Temporary Files

2. **Process Management**
   - Executing External Programs
   - Subprocess Module
   - Process Communication and Control

3. **Logging**
   - Basic Logging
   - Custom Loggers
   - Log Formatting

4. **Data Serialization**
   - JSON Format
   - Pickle Format
   - Data Persistence Best Practices

5. **Data Archiving and Compression**
   - TAR Files
   - ZIP Files
   - Compression Techniques

6. **Special File Formats**
   - Configuration Files (INI)
   - CSV Files
   - Advanced File Processing

## Learning Objectives

By the end of this notebook, you will be able to:
- Interact with the operating system using Python
- Manage files and directories programmatically
- Execute and control external processes
- Implement structured logging in applications
- Serialize and deserialize data in various formats
- Work with compressed archives and special file formats


---
# System functions

## Environment variables

Environment variables are variables defined in the operating environment. Users can modify these to control behavior of applications and software. In Python the dirctionary os.environ contains these variables.

To safely access the environ dictionary we use the .get() method.

In [None]:
import os

print("PATH:", os.environ.get("PATH", "PATH not found"))

PATH: /opt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/tools/node/bin:/tools/google-cloud-sdk/bin


We can iterate over the envion dictionary just like any dictionary.

In [None]:
print("\nFirst 5 environment variables:")

for i, (variable, value) in enumerate(os.environ.items()):
    if i < 5:
        print(f"{variable} = {value[:50]}{'...' if len(value) > 50 else ''}")
    else:
        break


First 5 environment variables:
SHELL = /bin/bash
NV_LIBCUBLAS_VERSION = 12.5.3.2-1
NVIDIA_VISIBLE_DEVICES = all
COLAB_JUPYTER_TRANSPORT = ipc
NV_NVML_DEV_VERSION = 12.5.82-1


We can use the Python len() method to query the number of environment variables

In [None]:
print(f"\nTotal environment variables: {len(os.environ)}")


Total environment variables: 104


We can also define new environment variables in the current process in the same way as we assign new items to a dictionary.

In [None]:
# Setting and getting custom environment variables
os.environ["PYTHON_COURSE_DEMO"] = "Hello from Python!"
print(f"\nCustom variable: {os.environ.get('PYTHON_COURSE_DEMO')}")


Custom variable: Hello from Python!


It is good practice to check if a variable exists before accessing it. We use the `in` operator in Python to accomplish this.

In [None]:
if "HOME" in os.environ:
    print(f"Home directory: {os.environ['HOME']}")
elif "USERPROFILE" in os.environ:  # Windows equivalent
    print(f"User profile: {os.environ['USERPROFILE']}")
else:
    print("Home directory not found in environment variables")

Home directory: /root


## Accessing the system path

The system path controls how the operating system searches for executables in the file system. Python has a neutral way of querying these from the system.

In [None]:
import os

exe_path_list = os.get_exec_path()

for path in exe_path_list:
    print(path)

/opt/bin
/usr/local/nvidia/bin
/usr/local/cuda/bin
/usr/local/sbin
/usr/local/bin
/usr/sbin
/usr/bin
/sbin
/bin
/tools/node/bin
/tools/google-cloud-sdk/bin


## Changing and querying the working directory

The working directory is the directory where your application is started. Python has functions for changing and querying the working directory.

In [None]:
import os

cwd = os.getcwd()
print(cwd)

os.chdir("..")
print(os.getcwd())
os.chdir(cwd)
print(os.getcwd())

/content
/
/content


## Listing files in a directory

For many applications it can be important to query what files and directories are available. The `os.listdir()` function implements this in Python.

In [None]:
# Create a test file using Python (cross-platform)
with open("testfile", "w") as f:
    f.write("This is a test file created by Python")

In [None]:
import os

for item in os.listdir():
    print(item)

for item in os.listdir():
    if os.path.isdir(item):
        print("Katalog:", item)
    if os.path.isfile(item):
        print("Fil    :", item)

.config
testfile
sample_data
Katalog: .config
Fil    : testfile
Katalog: sample_data


## Directory manipulation

Python also has a lot of functions for creating and manipulating directories.

In [None]:
import os
from pathlib import Path

# Store current directory for cleanup
cwd = os.getcwd()

try:
    # Create directory if it doesn't exist

    test_dir = Path("demo_directory")
    test_dir.mkdir(exist_ok=True)

    # Change to the new directory

    os.chdir(test_dir)
    print(f"Changed to: {os.getcwd()}")

    # Create subdirectory and file

    subdir = Path("testdir")
    subdir.mkdir(exist_ok=True)

    test_file = Path("testfile.txt")
    test_file.write_text("This is test content")

    print("Directory contents after creation:")
    for item in os.listdir():
        item_path = Path(item)
        if item_path.is_dir():
            print(f"  📁 Directory: {item}")
        else:
            print(f"  📄 File: {item} ({item_path.stat().st_size} bytes)")

    # Rename the file

    new_name = Path("renamed_testfile.txt")
    test_file.rename(new_name)

    print("\nDirectory contents after rename:")
    for item in os.listdir():
        print(f"  {item}")

    # Cleanup

    new_name.unlink()  # Remove file
    subdir.rmdir()     # Remove empty directory

    print("\nDirectory contents after cleanup:")
    print(f"  Items remaining: {len(os.listdir())}")

finally:
    # Always return to original directory

    os.chdir(cwd)

    # Remove the demo directory if empty

    try:
        test_dir.rmdir()
        print("Demo directory removed")
    except OSError as e:
        print(f"Could not remove demo directory: {e}")

Changed to: /content/demo_directory
Directory contents after creation:
  📁 Directory: testdir
  📄 File: testfile.txt (20 bytes)

Directory contents after rename:
  testdir
  renamed_testfile.txt

Directory contents after cleanup:
  Items remaining: 0
Demo directory removed


## Listing and querying file information

The `os.scandir()` function can be used to query more detailed information on the a file or directory.

In [None]:
import os

with os.scandir() as items:
    for entry in items:
        print("------------------------")
        print("name", entry.name)
        print("path", entry.path)
        print("is_dir", entry.is_dir())
        print("is_file", entry.is_file())

------------------------
name .config
path ./.config
is_dir True
is_file False
------------------------
name testfile
path ./testfile
is_dir False
is_file True
------------------------
name sample_data
path ./sample_data
is_dir True
is_file False


## Walking directories with os.walk()

The `os.walk()` function enables you to traverse directories as an iteration using the for-statement.

In [None]:
import os

for root, dirs, files in os.walk("."):
    print("--->")
    print(root)
    print(dirs)
    print(files)
    print("<---")

--->
.
['.config', 'sample_data']
['testfile']
<---
--->
./.config
['configurations', 'logs']
['hidden_gcloud_config_universe_descriptor_data_cache_configs.db', '.last_opt_in_prompt.yaml', '.last_survey_prompt.yaml', 'active_config', 'default_configs.db', '.last_update_check.json', 'gce', 'config_sentinel']
<---
--->
./.config/configurations
[]
['config_default']
<---
--->
./.config/logs
['2025.08.18']
[]
<---
--->
./.config/logs/2025.08.18
[]
['13.38.17.023788.log', '13.38.07.073430.log', '13.38.15.728895.log', '13.37.46.544538.log', '13.38.25.411240.log', '13.38.26.129653.log']
<---
--->
./sample_data
[]
['README.md', 'anscombe.json', 'mnist_train_small.csv', 'mnist_test.csv', 'california_housing_test.csv', 'california_housing_train.csv']
<---


## Querying file information

On Unix-based platform the `os.stat()` function can return detailed information on files and directories.

In [None]:
import os

with open("testfile", "w") as f:
    f.write("testfile")

os.mkdir("testdir2")

statinfo_file = os.stat("testfile")
statinfo_dir = os.stat("testdir2")

print(statinfo_file)
print(statinfo_dir)
print(statinfo_file.st_size)

## Path information

The os.path module contains even more functions for querying path and file information.

In [None]:
import os

print(os.path.abspath('.')) # Expands relative path to absolute
print(os.path.basename('/home/user/test.txt')) # Filename part of filename
print(os.path.dirname('/home/user/test.txt'))  # Directory part of filename

# Create a test file

with open("testfile_path", "w") as f:
    f.write("testfile")

# Check if file exists

if os.path.exists('/home/user/test.txt'):
    print('test.txt is valid')
else:
    print('test.txt is not valid')

# Expand user directory ~ (tilde)

print(os.path.expanduser('~'))

# Query file meta data

print(os.path.getatime('testfile_path')) # Access time
print(os.path.getmtime('testfile_path')) # Modification time
print(os.path.getctime('testfile_path')) # Creation time
print(os.path.getsize('testfile_path'))  # Size of file

# Check if a path is absolute or not

if os.path.isabs('testfile_path'):
    print('Absolute path')
else:
    print('No absolute path')

if os.path.isabs("C:/Users/jonas/Development/python_book/examples/rtl/ospath1.py"):
    print('Absolute path')
else:
    print('No absolute path')

# Check if a path is a file

if os.path.isfile('testfile_path'):
    print('ospath1.py is a file')
else:
    print('ospath1.py is not a file')

# Check if a path is directory

if os.path.isdir('testfile_path'):
    print('ospath1.py is a directory')
else:
    print('ospath1.py is not a directory')

# Combining directory and filename in a platform neutran way

dir_name = 'c:\\Users\\jonas'
file_name = 'test.txt'

file_path = os.path.join(dir_name, file_name)
print(file_path)

# Use os.path.split() to split a path in a platform neutral way

print(os.path.split(file_path))
dir_name, file_name = os.path.split(file_path)
print(dir_name)
print(file_name)

# Extract drive letter if exists (Windows only)

print(os.path.splitdrive(file_path))

# Extract file extension

print(os.path.splitext(file_path))

/content
test.txt
/home/user
test.txt is not valid
/root
1755673347.2420638
1755673347.243064
1755673347.243064
8
No absolute path
No absolute path
ospath1.py is a file
ospath1.py is not a directory
c:\Users\jonas/test.txt
('c:\\Users\\jonas', 'test.txt')
c:\Users\jonas
test.txt
('', 'c:\\Users\\jonas/test.txt')
('c:\\Users\\jonas/test', '.txt')


## Path management using pathlib

pathlib is a module implementing OO-based file and path management.

In [None]:
import pathlib as pl

# Create a path object

p = pl.Path('/contents')

print("p =",p)

# Appending a file/directory to a path using the / operator

p = p / "testfile"
print("p after append", p)

# Check if a path object exists in the file system

print("p.exists() returns", p.exists())

# Make path absolute

q = p.resolve()
print("q after p.resolve()", q)

# Extract parts of path

print("q.parts =", q.parts)
print("q.drive =", q.drive)  # Windows only

# Create a path for current working directory

r = pl.Path.cwd()
print("r = pl.Path.cwd() =>", r)

print("r.exists()", r.exists())
print("r.is_dir()", r.is_dir())
print("r.is_file()", r.is_file())

# Get a path object representing the home dir

s = pl.Path.home()
print("s = sl.Path.home() =>", s)

p = /contents
p after append /contents/testfile
p.exists() returns False
q after p.resolve() /contents/testfile
q.parts = ('/', 'contents', 'testfile')
q.drive = 
r = pl.Path.cwd() => /content
r.exists() True
r.is_dir() True
r.is_file() False
s = sl.Path.home() => /root


## Iterating with pathlib

One really nice thing with Path object is that you can iterate over them using the `.iterdir()` method. In the example we iterate through the directory and check if objects are files or direcrtories.

In [None]:
import pathlib as pl

p = pl.Path(".")

for x in p.iterdir():
    if x.is_dir():
        print(x,'- katalog')
    else:
        print(x,'- fil')

.config - katalog
testfile - fil
testfile_path - fil
sample_data - katalog


## Changing current path with pathlib

The objects in pathlib doesn't change the actual working directory. To do this you need to call a system function such as `os.chdir(p)`, where p is a Path-object.

In [None]:
import os
import pathlib as pl

# Get the path of the directory above the current dir

p = pl.Path('..')
print("p =", p)
print("p.resolve() = ", p.resolve())

# Change to this directory

os.chdir(p)

# Get the current working dir using the .cwd() method

q = pl.Path.cwd()
print("pl.Path.cwd() =", q)

p = ..
p.resolve() =  /
pl.Path.cwd() = /


In [None]:
import os
import pathlib as pl

# Make sure we are in our home dir

os.chdir(os.path.expanduser('~'))

new_path = pl.Path('..')
old_path = pl.Path.cwd()

print("new_path =", new_path.resolve())
print("old_path =", old_path)

os.chdir(new_path)

print("pl.Path.cwd() =>", pl.Path.cwd())

os.chdir(old_path)

print("(pl.Path.cwd() => (after chdir to old_path)", old_path)

new_path = /
old_path = /root
pl.Path.cwd() => /
(pl.Path.cwd() => (afte chdir to old_path) /root


## Temporary files

In many applications you need to create temporary files that are required when running, but doesn't need to be stored afterwards. In Python there are several functions for securely creating temporary files.

### Using mkstemp

The `mkstemp(...)` function in the `tempfile` module securely creates a temporary file in system designated place for temporary files. It will return a file descriptor (fd) and the full path to the file. You can write directly to this file, however in most cases it is more convenient to create a Python file object using the `os.fdopen(...)` method, see the code below. Please note that the temporary file is not automatically removed after use. In the code below we remove the file in the `finally` section.

In [None]:
import os
import tempfile
from pathlib import Path

# Method 1: Using mkstemp (lower-level, more control)

print("=== Method 1: Using mkstemp ===")
temp_fd, temp_path = tempfile.mkstemp(suffix='.txt', prefix='python_demo_')

try:
    print(f'Temporary file created: {temp_path}')
    print(f'File exists: {os.path.isfile(temp_path)}')
    print(f'File permissions: {oct(os.stat(temp_path).st_mode)[-3:]}')

    # Write to the temporary file using the file descriptor

    with os.fdopen(temp_fd, 'w+t') as temp_file:
        temp_file.write('This is written to the temporary file\n')
        temp_file.write('Line 2 of content\n')

        # Read back the content

        temp_file.seek(0)
        content = temp_file.read()
        print(f'File content:\n{content}')

finally:
    # Important: Always clean up temporary files
    if os.path.exists(temp_path):
        os.remove(temp_path)
        print(f'Temporary file cleaned up: {not os.path.exists(temp_path)}')

=== Method 1: Using mkstemp ===
Temporary file created: /tmp/python_demo_e6nkfkbf.txt
File exists: True
File permissions: 600
File content:
This is written to the temporary file
Line 2 of content

Temporary file cleaned up: True


### Using TemporaryFile class (recommended)

A more automatic and easy to use method to create temporary files in Python is to use the `TemporaryFile` class in `tempfile`. When you instance this class is creates a temporary file object which you can use like any file object in Python. When the variable goes out of scope it will automatically delete the temporary file. The best way of using the class is in a `with`-statement as shown in the example below:


In [None]:
print("\n=== Method 2: Using TemporaryFile (recommended) ===")
# Method 2: Using TemporaryFile (automatically cleaned up)
with tempfile.TemporaryFile(mode='w+t', suffix='.txt') as temp_file:
    temp_file.write('This content will be automatically cleaned up\n')
    temp_file.write('No manual cleanup required!\n')

    # Read back
    temp_file.seek(0)
    content = temp_file.read()
    print(f'Content: {content.strip()}')

    # File is automatically deleted when exiting the with block


=== Method 2: Using TemporaryFile (recommended) ===
Content: This content will be automatically cleaned up
No manual cleanup required!


### Using NamedTemporaryFile class

If you need to know the actual name of the temporary class and you still want the benefits of the `TemporaryFile`-class you can instead use the `NamedTemporaryFile`-class, which provides your with a named file using the `.name` property. See example below:

In [None]:
print("\n=== Method 3: Using NamedTemporaryFile ===")
# Method 3: NamedTemporaryFile (has a name but still auto-cleaned)
with tempfile.NamedTemporaryFile(mode='w+t', suffix='.txt', delete=True) as temp_file:
    print(f'Named temporary file: {temp_file.name}')
    temp_file.write('Named temporary file content\n')
    temp_file.flush()  # Ensure content is written

    # You can access the file by name while it's open
    print(f'File size: {Path(temp_file.name).stat().st_size} bytes')


=== Method 3: Using NamedTemporaryFile ===
Named temporary file: /tmp/tmpnrs3jo6n.txt
File size: 29 bytes


### Using TemporaryDirectory class

In some cases your application will need to be able to create files in a temporary directory. This can be done using the `TemporaryDirectory`-class. This class creatas a temporary diretory, which removed all content and directory when it is not needed anymore. Just like the previous tempfile classes it is always a good idea to use them in a `with`-statement.

In [None]:
print("\n=== Temporary Directory ===")
# Bonus: Temporary directories
with tempfile.TemporaryDirectory(prefix='python_demo_') as temp_dir:
    print(f'Temporary directory: {temp_dir}')

    # Create files in the temporary directory
    temp_file_path = Path(temp_dir) / 'example.txt'
    temp_file_path.write_text('File in temporary directory')

    print(f'Files in temp dir: {list(Path(temp_dir).iterdir())}')
    # Directory and all contents automatically cleaned up


=== Temporary Directory ===
Temporary directory: /tmp/python_demo_6is_b1t7
Files in temp dir: [PosixPath('/tmp/python_demo_6is_b1t7/example.txt')]


---
# Process management (subprocess)

One big usage of Python is as a workflow or glue language for running and controling other applications. To be able to do this we need tools for starting applications and querying status and output from these. In the following chapter we will go through some of the function that exists in Python for this purpose.

## Using the subprocess.run()

`subprocess.run()` is the recommended way of running another application. The run function takes several arguments typically

* **cmd** - Path or command that should be run
* **shell** - Does the command required a shell to be run (True). Can be a security risk so beware.
* **capture_outpout** - output is stored int the return value as stdout or stderr.
* **text** - If set to True it will return string instead of bytes in the outpout.
* **timeout** - Time limit to wait for the external process to run.

When the process terminates the return status is returned in the attribute `returncode`.

The command can also generate exceptions such as `TimeoutExpired`, triggered at the tiemout time and `SubprocessError` when there was an error starting the process. The example below shows the usage of the `run()` function.

In [None]:
import subprocess
import sys
import platform

# Cross-platform directory listing command

if platform.system() == "Windows":
    cmd = ['dir']
    shell_needed = True
else:
    cmd = ['ls', '-la']
    shell_needed = False

print(f"Running on: {platform.system()}")
print(f"Command: {' '.join(cmd) if not shell_needed else cmd[0]}")

try:

    # Using subprocess.run() - the recommended modern approach

    result = subprocess.run(
        cmd,
        shell=shell_needed,
        capture_output=True,  # Capture both stdout and stderr
        text=True,            # Return strings instead of bytes
        timeout=10            # Prevent hanging
    )

    print(f"Return code: {result.returncode}")

    if result.returncode == 0:
        print("✅ Process completed successfully")
        print(f"Output:\n{result.stdout}")
    else:
        print("❌ Process failed")
        print(f"Error output:\n{result.stderr}")

except subprocess.TimeoutExpired:
    print("❌ Process timed out")
except subprocess.SubprocessError as e:
    print(f"❌ Subprocess error: {e}")
except Exception as e:
    print(f"❌ Unexpected error: {e}")

# Demonstrate with Python command (cross-platform)

print("\n" + "="*50)
print("Running Python version check:")

try:
    result = subprocess.run(
        [sys.executable, '--version'], # Python interpreter executable
        capture_output=True,
        text=True,
        timeout=5
    )

    if result.returncode == 0:
        print(f"✅ {result.stdout.strip()}")
    else:
        print(f"❌ Failed: {result.stderr}")

except Exception as e:
    print(f"❌ Error checking Python version: {e}")

Running on: Linux
Command: ls -la
Return code: 0
✅ Process completed successfully
Output:
total 72
drwx------ 1 root root 4096 Aug 20 06:57 .
drwxr-xr-x 1 root root 4096 Aug 20 06:56 ..
-r-xr-xr-x 1 root root 1071 Jan  1  2000 .bashrc
drwxr-xr-x 1 root root 4096 Aug 18 13:53 .cache
drwxr-xr-x 1 root root 4096 Aug 18 13:53 .config
drwxr-xr-x 5 root root 4096 Aug 18 13:52 .ipython
drwxr-xr-x 1 root root 4096 Aug 18 13:24 .julia
drwx------ 1 root root 4096 Aug 18 13:52 .jupyter
drwxr-xr-x 2 root root 4096 Aug 20 06:57 .keras
drwx------ 3 root root 4096 Aug 18 13:15 .launchpadlib
drwxr-xr-x 1 root root 4096 Aug 18 13:24 .local
drwxr-xr-x 4 root root 4096 Aug 18 13:52 .npm
-rw-r--r-- 1 root root  161 Jul  9  2019 .profile
-r-xr-xr-x 1 root root  254 Jan  1  2000 .tmux.conf
-rw-r--r-- 1 root root  211 Aug 18 13:52 .wget-hsts


Running Python version check:
✅ Python 3.12.11


## Using Popen with .wait()

In [None]:
import subprocess

p = subprocess.Popen(['ls', '-la'])

# Other processing here...

# Wait for process to complete

p.wait()

if p.returncode == 0:
    print('Processen returnerade 0')
else:
    print('Processen returnerade felkoden = ', p.returncode)

Processen returnerade 0


## Using Popen with .poll()

In [None]:
import subprocess, time

p = subprocess.Popen(['sleep', '5'])

while p.poll() is None:
    print('Väntar...')
    time.sleep(1)

if p.returncode == 0:
    print('Processen returnerade 0')
else:
    print('Processen returnerade felkoden = ', p.returncode)

Väntar...
Väntar...
Väntar...
Väntar...
Väntar...
Processen returnerade 0


## Using Popen with polling and .communicate()

In [None]:
import subprocess, time

p = subprocess.Popen('ls -la; sleep 4', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

while p.poll() is None:
    print('Väntar...')
    time.sleep(1)

stdout, stderr = p.communicate()

if p.returncode == 0:
    print('Processen returnerade 0')
    print('standard output:')
    print(stdout)
    print('standard error:')
    print(stderr)
else:
    print('Processen returnerade felkoden = ', p.returncode)

Väntar...
Väntar...
Väntar...
Väntar...
Väntar...
Processen returnerade 0
standard output:
total 72
drwx------ 1 root root 4096 Aug 20 06:57 .
drwxr-xr-x 1 root root 4096 Aug 20 06:56 ..
-r-xr-xr-x 1 root root 1071 Jan  1  2000 .bashrc
drwxr-xr-x 1 root root 4096 Aug 18 13:53 .cache
drwxr-xr-x 1 root root 4096 Aug 18 13:53 .config
drwxr-xr-x 5 root root 4096 Aug 18 13:52 .ipython
drwxr-xr-x 1 root root 4096 Aug 18 13:24 .julia
drwx------ 1 root root 4096 Aug 18 13:52 .jupyter
drwxr-xr-x 2 root root 4096 Aug 20 06:57 .keras
drwx------ 3 root root 4096 Aug 18 13:15 .launchpadlib
drwxr-xr-x 1 root root 4096 Aug 18 13:24 .local
drwxr-xr-x 4 root root 4096 Aug 18 13:52 .npm
-rw-r--r-- 1 root root  161 Jul  9  2019 .profile
-r-xr-xr-x 1 root root  254 Jan  1  2000 .tmux.conf
-rw-r--r-- 1 root root  211 Aug 18 13:52 .wget-hsts

standard error:



In [None]:
import subprocess

with subprocess.Popen('ls -la', shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True) as p:
    stdout, stderr = p.communicate()

    if p.returncode == 0:
        print('Processen returnerade 0')
        print('standard output:')
        print(stdout)
        print('standard error:')
        print(stderr)
    else:
        print('Processen returnerade felkoden = ', p.returncode)

Processen returnerade 0
standard output:
total 72
drwx------ 1 root root 4096 Aug 20 06:57 .
drwxr-xr-x 1 root root 4096 Aug 20 06:56 ..
-r-xr-xr-x 1 root root 1071 Jan  1  2000 .bashrc
drwxr-xr-x 1 root root 4096 Aug 18 13:53 .cache
drwxr-xr-x 1 root root 4096 Aug 18 13:53 .config
drwxr-xr-x 5 root root 4096 Aug 18 13:52 .ipython
drwxr-xr-x 1 root root 4096 Aug 18 13:24 .julia
drwx------ 1 root root 4096 Aug 18 13:52 .jupyter
drwxr-xr-x 2 root root 4096 Aug 20 06:57 .keras
drwx------ 3 root root 4096 Aug 18 13:15 .launchpadlib
drwxr-xr-x 1 root root 4096 Aug 18 13:24 .local
drwxr-xr-x 4 root root 4096 Aug 18 13:52 .npm
-rw-r--r-- 1 root root  161 Jul  9  2019 .profile
-r-xr-xr-x 1 root root  254 Jan  1  2000 .tmux.conf
-rw-r--r-- 1 root root  211 Aug 18 13:52 .wget-hsts

standard error:



## Wrapping process management in function

In [None]:
import subprocess


def execute_with_output(cmd):

    with subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, universal_newlines=True) as p:
        stdout, _ = p.communicate()

        if p.returncode == 0:
            return stdout
        else:
            return None


if __name__ == "__main__":

    output = execute_with_output('ls -la')

    if output is not None:

        lines = output.split("\n")

        for line in lines:
            print('>' + line)
    else:
        print('Ingen utdata returnerades.')

>total 72
>drwx------ 1 root root 4096 Aug 20 06:57 .
>drwxr-xr-x 1 root root 4096 Aug 20 06:56 ..
>-r-xr-xr-x 1 root root 1071 Jan  1  2000 .bashrc
>drwxr-xr-x 1 root root 4096 Aug 18 13:53 .cache
>drwxr-xr-x 1 root root 4096 Aug 18 13:53 .config
>drwxr-xr-x 5 root root 4096 Aug 18 13:52 .ipython
>drwxr-xr-x 1 root root 4096 Aug 18 13:24 .julia
>drwx------ 1 root root 4096 Aug 18 13:52 .jupyter
>drwxr-xr-x 2 root root 4096 Aug 20 06:57 .keras
>drwx------ 3 root root 4096 Aug 18 13:15 .launchpadlib
>drwxr-xr-x 1 root root 4096 Aug 18 13:24 .local
>drwxr-xr-x 4 root root 4096 Aug 18 13:52 .npm
>-rw-r--r-- 1 root root  161 Jul  9  2019 .profile
>-r-xr-xr-x 1 root root  254 Jan  1  2000 .tmux.conf
>-rw-r--r-- 1 root root  211 Aug 18 13:52 .wget-hsts
>


---
# Logging

For larger applications there is often a need to create log entries in a more structured way. The `print()`-statement is not suitable for this. The `logging` module in Python implements this functionality. The module provides the functions `info()` for logging informational messages, `warning()` for logging warning messages on unusual states in your application, `error()` for logging any errors occuring in your application. There is also a `debug()` functions for logging things for finding problems in your application. Usually debugging messasges are not activated by default.

What is displayed in the log can be controlled by using the `setLevel()` method on the logger as shown below:

    logging.getLogger().setLevel(logging.INFO)

Setting the level to `INFO` will display all log messages including `INFO`. That is `INFO`, `WARNING`, `ERROR`. Setting it to `ERROR` will only at this level. The `DEBUG` level is higher than.

It is also possible to customise the logging output by changing the logformat. This is also illustrated in the example below.

In [None]:
import logging
import sys
from datetime import datetime

# --- This is the normal logging setup

# Configure logging with a better format

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s | %(levelname)-8s | %(name)s | %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)

# --- BELOW is required for Colab/Notebooks ----

# Remove any handlers that Jupyter/Colab already set up
for handler in logging.root.handlers[:]:
    logging.root.removeHandler(handler)

# Now configure logging as you want
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s | %(levelname)-8s | %(name)s | %(message)s",
    datefmt="%Y-%m-%d %H:%M:%S",
    handlers=[logging.StreamHandler(sys.stdout)]
)

def simulate_application_workflow():
    """Simulate a real application with different logging scenarios"""
    logger = logging.getLogger('MyApplication')

    logger.info("🚀 Application starting...")

    # Simulate processing some data
    try:
        logger.debug("Processing user data...")

        # Simulate different scenarios
        users = ['alice', 'bob', 'charlie']

        for user in users:
            logger.info(f"Processing user: {user}")

            if user == 'charlie':
                logger.warning(f"⚠️  User {user} has incomplete profile")

        logger.info("✅ All users processed successfully")

    except Exception as e:
        logger.error(f"❌ Error processing users: {e}")
        logger.exception("Full traceback:")  # This includes the stack trace

    logger.info("🏁 Application workflow completed")

print("=== Logging with different levels ===")

# Test different logging levels
print("\n1. INFO level and above:")
logging.getLogger().setLevel(logging.INFO)
simulate_application_workflow()

print("\n2. WARNING level and above:")
logging.getLogger().setLevel(logging.WARNING)
simulate_application_workflow()

print("\n3. ERROR level only:")
logging.getLogger().setLevel(logging.ERROR)
simulate_application_workflow()

# Reset to INFO for subsequent examples
logging.getLogger().setLevel(logging.INFO)

=== Logging with different levels ===

1. INFO level and above:
2025-08-20 11:37:36 | INFO     | MyApplication | 🚀 Application starting...
2025-08-20 11:37:36 | INFO     | MyApplication | Processing user: alice
2025-08-20 11:37:36 | INFO     | MyApplication | Processing user: bob
2025-08-20 11:37:36 | INFO     | MyApplication | Processing user: charlie
2025-08-20 11:37:36 | INFO     | MyApplication | ✅ All users processed successfully
2025-08-20 11:37:36 | INFO     | MyApplication | 🏁 Application workflow completed


3. ERROR level only:


---
# Data serialization and deserialization

Most applications need to be able to store variables to disk. The easiest way is to use the Python `open()` function to create a file and write to it. However in this case you need to define the structure of your data yourself and define your own file format. In Python there are better ways of reading and writing data to and from disk using the built-in serialisation and deserialisation libraries Pickle and JSON. These functions know how to store the standard Python data structures and read them back again. In the following sections we will go through how you use these functions.

## Storing variables and data structures using Pickle

Pickle is the default mechanism for storing Python data structures to disk. To use Pickle we import the pickle module.

In [None]:
import pickle

To pickle a datastructure to disk we use the `dump()` function. This function takes the data you want to write and a filename of the file to store your data in as input. In the example below we write our dictionary, `my_data` to the file `mydata.pkl`.

In [None]:
my_data = {"a number": 42, "a list": list(range(1000)), "a dict": {'a': 1, 'b': 2}}

with open("mydata.pkl", "wb") as my_file:
    pickle.dump(my_data, my_file)

Pickle-files stores the Python data structures in a compact binary non-readable form.

We can read the files back using the `load()` method.

In [None]:
with open('mydata.pkl', 'rb') as my_file:
    my_data_copy = pickle.load(my_file)

print(my_data_copy)

{'a number': 42, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 21

As you can see we read the file again into a new data structure which contains the exact data as in `my_data`.

It is also possible to write multiple data structures using mutiple calls of `dump()`.

In [None]:
section_1 = {"A": 42, "a list": list(range(1000)), "a dict": {'a': 1, 'b': 2}}
section_2 = {"B": 84, "a list": list(range(1000)), "a dict": {'a': 1, 'b': 2}}

with open("sections.pkl", "wb") as my_file:
    pickle.dump(section_1, my_file)
    pickle.dump(section_2, my_file)

It is however important that the data is read back in the same order as it was written.

In [None]:
with open('sections.pkl', 'rb') as my_file:
    section_1_copy = pickle.load(my_file)
    section_2_copy = pickle.load(my_file)

print(section_1_copy)
print(section_2_copy)

{'A': 42, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217,

Data can also be serialised to strings or streams. This can be useful if you want to serialise over a network connection or similar connection. In this case you call the `dumps()` function which converts the data structures to a string that then can be transferred over a network connection.

In the example below we pickle our data structure to a string which we then compress and decompress using the `zlib` module. This is a good module if you want to compress your pickle data before writing to disk.

In [None]:
import pickle, zlib

my_data = {"a number": 42, "a list": list(range(1000)), "a dict": {'a': 1, 'b': 2}}

# Dump data structure to a string

my_data_dump = pickle.dumps(my_data)
uncompressed_length = len(my_data_dump)

# Compress data using zlib

my_data_compressed = zlib.compress(my_data_dump)
compressed_length = len(my_data_compressed)

# Uncompress data again

my_data_uncompressed = zlib.decompress(my_data_compressed)

# Convert uncompressed string to a Python data structure again

my_data_copy = pickle.loads(my_data_uncompressed)
print(my_data_copy)

print(f"Uncompressed length: {uncompressed_length} bytes")
print(f"Compressed length: {compressed_length} bytes")
print('Ratio:', round(uncompressed_length / compressed_length, 2))

{'a number': 42, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 21

## Storing variables and data structures in JSON

Python has built in functions for writing data types to disk. If readability is important the Javascript Object Notation or JSON can be used as a storage format. In Python this functionality is found in the **json**-module. The module is very similar to the pickle module and has similar functions.

To use json we first import the module:

In [None]:
import json

To write our data in JSON-format to disk we use the `dump()` function in the same way as for pickle.

In [None]:
my_data = {"a number": 42, "a list": list(range(1000)), "a dict": {'a': 1, 'b': 2}}

with open("mydata.json", "w") as my_file:
    json.dump(my_data, my_file)

The difference between pickle and json is that JSON is human readable. Looking at the generated file we get:

In [None]:
!cat mydata.json

{"a number": 42, "a list": [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 21

We can see here that the JSON file format is very close to how our Python data structures are defined i our Python source files. We can make the json output a little more readable by adding some additional parameters in the `dump()` function:

In [None]:
my_data = {"a number": 42, "a list": list(range(10)), "a dict": {'a': 1, 'b': 2}}

with open("mydata_formatted.json", "w") as my_file:
    json.dump(my_data, my_file, sort_keys=True, indent=4)

Which looks like this:

In [None]:
!cat mydata_formatted.json

{
    "a dict": {
        "a": 1,
        "b": 2
    },
    "a list": [
        0,
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9
    ],
    "a number": 42
}

Loading json files is done in similar way as with pickle using the `load()` function.

In [None]:
with open('mydata.json', 'r') as my_file:
    my_data_copy = json.load(my_file)

print(my_data_copy)

{'a number': 42, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 21

We usually don't write multiple times when use the json module as the structure of json must be representable as data structure in Python. If you want to store multiple structures in a JSON file either store them in an outer list or dictionary like in the following example:

In [None]:
section_1 = {"A": 42, "a list": list(range(10)), "a dict": {'a': 1, 'b': 2}}
section_2 = {"B": 84, "a list": list(range(10)), "a dict": {'a': 1, 'b': 2}}

# Store the sections in a list which we pass to dump().

with open("sections.json", "w") as my_file:
    json.dump([section_1, section_2], my_file)

We then read back the entire structure in a single `load()`.

In [None]:
# Read the complete JSON file

with open('sections.json', 'r') as my_file:
    sections = json.load(my_file)

# The sections are stored in the outer list

print(sections)

# Extract the individual sections

section_1_copy = sections[0]
section_2_copy = sections[1]

print(section_1_copy)
print(section_2_copy)

[{'A': 42, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'a dict': {'a': 1, 'b': 2}}, {'B': 84, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'a dict': {'a': 1, 'b': 2}}]
{'A': 42, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'a dict': {'a': 1, 'b': 2}}
{'B': 84, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'a dict': {'a': 1, 'b': 2}}


Just like with pickle the json module can serialise to strings as well using the `dumps()` function.

In [None]:
import json

my_data = {"a number": 42, "a list": list(range(10)), "a dict": {'a': 1, 'b': 2}}

# Dump my_data to a JSON string

my_data_dump = json.dumps(my_data, sort_keys=True, indent=4)

print("JSON string content")
print(my_data_dump)

# Read back data structure fron JSON string

print("Python data structure copy")
my_data_copy = json.loads(my_data_dump)
print(my_data_copy)

JSON string content
{
    "a dict": {
        "a": 1,
        "b": 2
    },
    "a list": [
        0,
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9
    ],
    "a number": 42
}
Python data structure copy
{'a dict': {'a': 1, 'b': 2}, 'a list': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'a number': 42}


## Pickle / JSON Comparison

There are differences in the serialisation protocols. The example below compares performance and data complexity between the different methods.

In [None]:
import pickle
import json
import time
from pathlib import Path

# Create test data of different types
test_data = {
    "integers": list(range(10000)),
    "strings": [f"string_{i}" for i in range(1000)],
    "nested_dict": {
        "level1": {
            "level2": {
                "data": [1, 2, 3, 4, 5] * 50
            }
        }
    },
    "mixed_types": [1, "string", [1, 2, 3], {"key": "value"}]
}

print("=== Pickle vs JSON Comparison ===")
print(f"Data complexity: {len(str(test_data))} characters when converted to string")

# Test Pickle
print("\n1. PICKLE FORMAT:")
start_time = time.time()

# Save with pickle
with open("test_data.pkl", "wb") as f:
    pickle.dump(test_data, f)

pickle_save_time = time.time() - start_time
pickle_size = Path("test_data.pkl").stat().st_size

# Load with pickle
start_time = time.time()
with open("test_data.pkl", "rb") as f:
    pickle_loaded = pickle.load(f)
pickle_load_time = time.time() - start_time

print(f"  ✅ Save time: {pickle_save_time:.4f} seconds")
print(f"  ✅ Load time: {pickle_load_time:.4f} seconds")
print(f"  📦 File size: {pickle_size} bytes")
print(f"  🔍 Data integrity: {'✅ OK' if pickle_loaded == test_data else '❌ FAILED'}")

# Test JSON (only for JSON-compatible data)
json_compatible_data = {
    "integers": list(range(10000)),
    "strings": [f"string_{i}" for i in range(1000)],
    "nested_dict": {
        "level1": {
            "level2": {
                "data": [1, 2, 3, 4, 5] * 50
            }
        }
    }
}

print("\n2. JSON FORMAT:")
start_time = time.time()

# Save with JSON
with open("test_data.json", "w") as f:
    json.dump(json_compatible_data, f)

json_save_time = time.time() - start_time
json_size = Path("test_data.json").stat().st_size

# Load with JSON
start_time = time.time()
with open("test_data.json", "r") as f:
    json_loaded = json.load(f)
json_load_time = time.time() - start_time

print(f"  ✅ Save time: {json_save_time:.4f} seconds")
print(f"  ✅ Load time: {json_load_time:.4f} seconds")
print(f"  📦 File size: {json_size} bytes")
print(f"  🔍 Data integrity: {'✅ OK' if json_loaded == json_compatible_data else '❌ FAILED'}")

print("\n=== SUMMARY ===")
print(f"🏎️  Speed - Save: {'Pickle' if pickle_save_time < json_save_time else 'JSON'} is faster")
print(f"🏎️  Speed - Load: {'Pickle' if pickle_load_time < json_load_time else 'JSON'} is faster")
print(f"💾 Size: {'Pickle' if pickle_size < json_size else 'JSON'} produces smaller files")
print(f"📖 Human readable: JSON ✅ | Pickle ❌")
print(f"🔧 Python-specific types: Pickle ✅ | JSON ❌")
print(f"🌐 Cross-language compatibility: JSON ✅ | Pickle ❌")

print("\n=== BEST PRACTICES ===")
print("🎯 Use PICKLE when:")
print("   • Working exclusively with Python")
print("   • Need to preserve exact Python object types")
print("   • Performance is critical")
print("   • Working with complex Python objects (classes, functions, etc.)")

print("\n🎯 Use JSON when:")
print("   • Need human-readable format")
print("   • Sharing data with other languages/systems")
print("   • Working with web APIs")
print("   • Data is relatively simple (dicts, lists, strings, numbers)")

# Cleanup
Path("test_data.pkl").unlink(missing_ok=True)
Path("test_data.json").unlink(missing_ok=True)

=== Pickle vs JSON Comparison ===
Data complexity: 73667 characters when converted to string

1. PICKLE FORMAT:
  ✅ Save time: 0.0013 seconds
  ✅ Load time: 0.0009 seconds
  📦 File size: 43305 bytes
  🔍 Data integrity: ✅ OK

2. JSON FORMAT:
  ✅ Save time: 0.0055 seconds
  ✅ Load time: 0.0017 seconds
  📦 File size: 73608 bytes
  🔍 Data integrity: ✅ OK

=== SUMMARY ===
🏎️  Speed - Save: Pickle is faster
🏎️  Speed - Load: Pickle is faster
💾 Size: Pickle produces smaller files
📖 Human readable: JSON ✅ | Pickle ❌
🔧 Python-specific types: Pickle ✅ | JSON ❌
🌐 Cross-language compatibility: JSON ✅ | Pickle ❌

=== BEST PRACTICES ===
🎯 Use PICKLE when:
   • Working exclusively with Python
   • Need to preserve exact Python object types
   • Performance is critical
   • Working with complex Python objects (classes, functions, etc.)

🎯 Use JSON when:
   • Need human-readable format
   • Sharing data with other languages/systems
   • Working with web APIs
   • Data is relatively simple (dicts, lists

---
# Data archiving and compression

In many application there is a need to package files and directories into archives such as tar- or zip-files. The runtime library of Python has built-in support for creating these types of archives. No need to have any external tools to accomplish this.

## Tarfiles

Tar-files or Tape Archive files are very common way of archiving a directory structure with associated files into a single file. A tar-file is also often compressed with gzip. This is full supported by the `tarfile` module.

To be able to create som tar archives we first download some example files.

In [None]:
!wget https://fossbytes.com/wp-content/uploads/2016/10/commodore64.jpg -O bild1.jpg
!wget https://ichef.bbci.co.uk/news/640/media/images/68628000/jpg/_68628283_apple-1.jpg -O bild2.jpg

--2025-08-20 20:05:39--  https://fossbytes.com/wp-content/uploads/2016/10/commodore64.jpg
Resolving fossbytes.com (fossbytes.com)... 104.21.89.140, 172.67.160.54, 2606:4700:3037::6815:598c, ...
Connecting to fossbytes.com (fossbytes.com)|104.21.89.140|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 73687 (72K) [image/jpeg]
Saving to: ‘bild1.jpg’


2025-08-20 20:05:40 (257 KB/s) - ‘bild1.jpg’ saved [73687/73687]

--2025-08-20 20:05:40--  https://ichef.bbci.co.uk/news/640/media/images/68628000/jpg/_68628283_apple-1.jpg
Resolving ichef.bbci.co.uk (ichef.bbci.co.uk)... 23.220.188.130, 2600:1402:8000:a86::f33, 2600:1402:8000:a84::f33, ...
Connecting to ichef.bbci.co.uk (ichef.bbci.co.uk)|23.220.188.130|:443... connected.
HTTP request sent, awaiting response... 200 OK
Cookie coming from ichef.bbci.co.uk attempted to set domain to bbc.co.uk
Length: 45249 (44K) [image/jpeg]
Saving to: ‘bild2.jpg’


2025-08-20 20:05:40 (2.67 MB/s) - ‘bild2.jpg’ saved [45249/45249]



To use the tarfile module we import it. We use the same convention as when importing Numpy. We import it as a short prefix `tf`.

In [None]:
import tarfile as tf

To create an archive we create an instance of the `TarFile` class. It is very similar in how you create a file in Python. We give a filename and a parameter that determines if the archive should be written or read. We can then add files to the archive using the `.add()` method. It is also possible to create directories in the archive using the `makedir()` method.

In [None]:
with tf.TarFile("myarchive.tar.gz", "w") as mytar:

    # Add a single file

    mytar.add("bild1.jpg")

    # Add a file under the mydir directory

    mytar.add("bild2.jpg", arcname="mydir/bild1.jpg")

Opening an archive is done in similar way by creating a TarFile instance in a with-statement. The filenames can be queried with the `.getnames()` method. More detailed information can be found by using the `.getmembers()` method which returns a list of TarInfo objects. These objects contain more detailded information on the stored files.

Files can be extracted with the `.extract()` method. Please note the `filter="data"` parameter. As archives can be potentially unsafe you can specify that it should only extract regular files or symlinks. This will become the default in Python 3.14. You can extract all files in the archive using the `.extractall()` method. The first parameter is the directory to unpack the archive in.

In [None]:
with tf.TarFile("myarchive.tar.gz", "r") as mytar:

    # Query information about the archive

    print(mytar.getnames())
    print(mytar.getmembers())

    # Extract a file and extract it under "mytar"

    mytar.extract("bild1.jpg", "mytar", filter="data")

    # Extract the entire archive in mytar_all

    mytar.extractall("mytar_all", filter="data")
    mytar.list(verbose=True)

['bild1.jpg', 'mydir/bild1.jpg']
[<TarInfo 'bild1.jpg' at 0x78d590552680>, <TarInfo 'mydir/bild1.jpg' at 0x78d590552440>]
?rw-r--r-- root/root      73687 2016-10-01 12:24:24 bild1.jpg 
?rw-r--r-- root/root      45249 2025-08-20 20:05:40 mydir/bild1.jpg 


If we list the current diretory we get:

In [None]:
!ls -la

total 276
drwxr-xr-x 1 root root   4096 Aug 20 20:11 .
drwxr-xr-x 1 root root   4096 Aug 20 19:55 ..
-rw-r--r-- 1 root root  73687 Oct  1  2016 bild1.jpg
-rw-r--r-- 1 root root  45249 Aug 20 20:05 bild2.jpg
drwxr-xr-x 4 root root   4096 Aug 19 13:37 .config
-rw-r--r-- 1 root root 133120 Aug 20 20:10 myarchive.tar.gz
drwxr-xr-x 2 root root   4096 Aug 20 20:11 mytar
drwxr-xr-x 3 root root   4096 Aug 20 20:11 mytar_all
drwxr-xr-x 1 root root   4096 Aug 19 13:38 sample_data


## Zip-files

Zip-files are archives that has a background on Windows

In [None]:
import zipfile as zf
import matplotlib.pyplot as plt

with zf.ZipFile("myarchive.zip", "w") as myzip:
    myzip.write("bild1.jpg")
    myzip.write("bild2.jpg")

with zf.ZipFile("myarchive.zip", "r") as myzip:
    print(myzip.namelist())
    print(myzip.getinfo("bild1.jpg"))
    myzip.extract("bild2.jpg", "myzip")
    myzip.extractall("myzip_all")
    myzip.printdir()
    #with myzip.open("bild1.jpg") as myfile:
    #    image1 = plt.imread(myfile)
    with myzip.open("bild2.jpg") as myfile:
        image2 = plt.imread(myfile)
    #plt.imshow(image1)
    plt.imshow(image2)
    plt.show()

---
# Special file formats

## Configuration files

In [None]:
config_file = """[DEFAULT]
ServerAliveInterval = 45
Compression = yes
CompressionLevel = 9
ForwardX11 = yes

[bitbucket.org]
User = hg

[topsecret.server.com]
Port = 50022
ForwardX11 = no"""


with open("config1.ini", "w") as f:
    f.write(config_file)

import configparser

config = configparser.ConfigParser()
config.read("config1.ini")

sections = config.sections()
print(sections)

print(config["bitbucket.org"]["user"])

for section in config.sections():
    print("section =", section)
    keys = config[section].keys()
    for key in keys:
        print(key, "=", config[section][key])

config["bitbucket.org"]["user"] = "jonas"
print(config["bitbucket.org"]["user"])

with open("config2.ini", "w") as config_file:
    config.write(config_file)

In [None]:

import configparser

config = configparser.ConfigParser()

config["DEFAULT"] = {
        "Rating":"No rating",
        "Length":"No length"
        }

config["Dr Who"] = {"Rating":"9/9"}
config["Firefly"] = {"Length":"Too long"}

with open("config3.ini", "w") as config_file:
    config.write(config_file)


In [None]:
!cat config3.ini

## Comma separated files - CSV

In [None]:
!rm example1.csv
!wget https://raw.githubusercontent.com/jonaslindemann/guide_to_python/master/chapters/kapitel4/notebook/example1.csv

In [None]:
import csv

with open('example1.csv', 'r') as csv_file:
    csv_data = csv.reader(csv_file, delimiter=',')
    for row in csv_data:
        print(row)

with open('example2.csv', 'w') as csv_file:
    csv_writer = csv.writer(csv_file, delimiter=',')
    csv_writer.writerow(['Beteckning', 'Antal'])
    csv_writer.writerow(['Gurka', '2'])
    csv_writer.writerow(['Tomat', '4'])

l = [["Beteckning", "Antal"],["Gurka", "2"], ["Tomat", "4"]]

with open('example3.csv', 'w') as csv_file:
    csv_writer = csv.writer(csv_file, delimiter=',')
    csv_writer.writerows(l)

In [None]:
!cat example1.csv

# 🎯 Practical Exercises and Challenges

Now that you've learned about Python's built-in functions and runtime library, here are some practical exercises to reinforce your understanding:

## Exercise 1: System Information Tool 🖥️

Create a Python script that gathers and displays system information:

**Requirements:**
- Display current working directory
- Show environment variables (HOME/USERPROFILE, PATH, PYTHONPATH)
- List files in current directory with their sizes
- Show Python version and executable path
- Display disk usage for current directory

**Bonus:** Save the information to a JSON file with timestamp.

In [None]:
# Exercise 1 Solution Template
import os
import sys
import json
import platform
from datetime import datetime
from pathlib import Path

def create_system_info_tool():
    """Create a comprehensive system information tool"""

    system_info = {
        "timestamp": datetime.now().isoformat(),
        "system": {
            "platform": platform.system(),
            "platform_version": platform.version(),
            "architecture": platform.architecture()[0],
            "processor": platform.processor() or "Unknown"
        },
        "python": {
            "version": sys.version,
            "executable": sys.executable,
            "path": sys.path[:3]  # First 3 entries to avoid clutter
        },
        "directories": {
            "current_working_directory": os.getcwd(),
            "home_directory": os.path.expanduser("~"),
            "temp_directory": os.path.dirname(tempfile.gettempdir())
        },
        "environment": {
            "PATH": os.environ.get("PATH", "Not found")[:100] + "...",  # Truncated
            "PYTHONPATH": os.environ.get("PYTHONPATH", "Not set"),
            "USER": os.environ.get("USER") or os.environ.get("USERNAME", "Unknown")
        },
        "current_directory_contents": []
    }

    # TODO: Complete this function
    # Add code to:
    # 1. List files in current directory with sizes
    # 2. Calculate total directory size
    # 3. Save to JSON file

    print("📊 System Information Tool")
    print("=" * 40)
    print("🖥️  System:", system_info["system"]["platform"])
    print("🐍 Python:", sys.version.split()[0])
    print("📁 Working Dir:", system_info["directories"]["current_working_directory"])

    return system_info

# Run the tool
info = create_system_info_tool()

# Your task: Complete the implementation!

## Exercise 2: Log File Analyzer 📋

Create a program that analyzes log files and generates reports:

**Requirements:**
- Read log files with different severity levels
- Count occurrences of each log level
- Find the most common error messages
- Generate a summary report
- Save results in both JSON and CSV formats

**Sample log format:**
```
2025-08-19 10:30:15 | INFO     | User login successful: alice
2025-08-19 10:31:22 | WARNING  | Slow query detected: SELECT * FROM users
2025-08-19 10:32:45 | ERROR    | Database connection failed
```

## Exercise 3: Backup Utility 💾

Build a simple backup utility:

**Requirements:**
- Create compressed archives of specified directories
- Support both ZIP and TAR formats
- Include timestamp in backup filename
- Log backup operations
- Verify backup integrity after creation
- Clean up old backups (keep only last N backups)

## Exercise 4: Configuration Manager ⚙️

Create a configuration management system:

**Requirements:**
- Read configuration from INI files
- Provide default values for missing settings
- Validate configuration values
- Support environment variable overrides
- Save modified configurations back to file
- Support both development and production configs

## Challenge: Process Monitor 🔍

**Advanced Challenge:** Create a process monitoring tool that:

1. Lists running processes (use `subprocess` with system commands)
2. Monitors CPU and memory usage
3. Logs process information periodically
4. Sends alerts when processes exceed thresholds
5. Stores historical data in JSON format
6. Generates summary reports

**Bonus Features:**
- Web dashboard using simple HTTP server
- Email notifications for critical alerts
- Configuration via INI files
- Automatic log rotation

# 📚 Summary and Best Practices

## Key Takeaways

### 🏗️ **System Operations**
- **Always use try-except blocks** when working with file systems
- **Prefer `pathlib.Path`** over `os.path` for modern Python (3.4+)
- **Use context managers** (`with` statements) for file operations
- **Check for cross-platform compatibility** when using system commands

### 🔧 **Process Management**
- **Use `subprocess.run()`** for simple command execution
- **Always set timeouts** to prevent hanging processes
- **Capture both stdout and stderr** for proper error handling
- **Use `text=True`** to work with strings instead of bytes

### 📝 **Logging Best Practices**
- **Configure logging early** in your application
- **Use appropriate log levels**: DEBUG < INFO < WARNING < ERROR < CRITICAL
- **Include meaningful context** in log messages
- **Use structured logging** with consistent formatting
- **Consider log rotation** for production applications

### 💾 **Data Serialization Guidelines**

| Format | Use When | Pros | Cons |
|--------|----------|------|------|
| **JSON** | Web APIs, config files, cross-language | Human readable, widely supported | Limited data types |
| **Pickle** | Python-only, complex objects | Preserves Python types, fast | Not human readable, Python-specific |
| **CSV** | Tabular data, Excel compatibility | Simple, widely supported | Limited structure |
| **INI** | Configuration files | Human readable, simple | Limited nesting |

### 🗜️ **Compression and Archives**
- **Use ZIP** for cross-platform compatibility
- **Use TAR.GZ** for better compression on Unix systems
- **Always verify** archive integrity after creation
- **Consider compression level** vs. speed trade-offs

## 🚨 Common Pitfalls to Avoid

1. **Not handling exceptions** when working with files/processes
2. **Forgetting to close resources** (use `with` statements!)
3. **Hard-coding file paths** (use `os.path.join()` or `pathlib`)
4. **Not validating user input** before passing to system commands
5. **Ignoring return codes** from subprocess operations
6. **Not setting timeouts** for external processes
7. **Using deprecated functions** (`os.system()` instead of `subprocess`)
8. **Not considering security** when executing external commands

## 🎯 Production Readiness Checklist

- [ ] **Error handling**: All operations wrapped in try-except
- [ ] **Logging**: Comprehensive logging with appropriate levels
- [ ] **Configuration**: Externalized configuration files
- [ ] **Security**: Input validation and safe command execution
- [ ] **Performance**: Timeouts and resource management
- [ ] **Monitoring**: Process monitoring and health checks
- [ ] **Documentation**: Clear docstrings and comments
- [ ] **Testing**: Unit tests for critical functionality

## 🔗 Further Learning Resources

- **Official Documentation**: [Python Standard Library](https://docs.python.org/3/library/)
- **PEP 8**: Python Style Guide
- **Real Python**: Tutorials on system programming
- **Automate the Boring Stuff**: Practical Python programming
- **Effective Python**: Advanced best practices

---

## 🎉 Congratulations!

You've completed the Python Built-in Functions and Runtime Library guide! You now have the knowledge to:

- ✅ Interact with the operating system programmatically
- ✅ Manage files, directories, and processes
- ✅ Implement robust logging and error handling
- ✅ Work with various data formats and compression
- ✅ Build production-ready Python applications

**Next Steps:** Try the exercises above and start building your own system utilities!