# **Python Overview: Parallelism & Functional Programming**

*21.02.2026*

## **Programs and Processes**

When you run a program, the operating system creates a ***process***.

It uses system resources ***(CPU, memory, disk space)*** and data structures in the operating system’s kernel ***(file and network connections, usage statistics, and so on)***.

Each process is isolated from other processes: it cannot directly access their memory or interfere with their execution.

The ***operating system (OS)*** keeps track of all running processes and schedules them by giving each process a small slice of CPU time before switching to another. This process scheduling allows the system to remain responsive while sharing resources fairly among programs.

You can observe running processes using:

  * **Task Manager** on Windows,
  * **Activity Monitor** on macOS,
  * the **top** command on Linux.

You can also access process-related information from Python. The standard library’s **os module** provides a portable interface to many operating system features.

In [1]:
import os

print(f"The os name is: {os.name}") # nt - Windows New Technology
# print(f"\nThe current file's path is: {os.getcwd()}")
print(f"\nThe current process id is: {os.getpid()}")
# print(f"\nThe current login is: {os.getlogin()}")

The os name is: nt

The current process id is: 1140


In [52]:
print(f"Logical Processors: {os.cpu_count()}")

Logical Processors: 8


## **Parallelism in Python: Multithreading vs Multiprocessing**

### **Sequential vs. Parallel Computing**

- **Sequential computing** is the traditional programming approach in which instructions are executed **one after another on a single processor**. At any given time, only one instruction is executed and the overall speed of execution is limited by the capabilities of that single processor.
- **Parallel computing** uses **multiple processors working simultaneously on different parts of a task**. The problem is divided into independent sub-tasks that can be executed at the same time, reducing total execution time.

_____________________________________________________________________
**Advantages of Parallel Computing**
- Improved performance
- Ability to solve larger and more complex problems

_____________________________________________________________________
**Challenges of Parallel Computing**
- Increased programming complexity
- Task dependencies and waiting time

### **Concurrency**

**Concurrency** is a programming concept where multiple tasks make progress during the same time period. These tasks may execute out of order or partially overlap, as long as the final result remains correct.

Parallel computing always implies concurrency,
but concurrency does NOT necessarily imply parallel computing.

_____________________________________________________________________

**Process**

A ***process*** is an instance of a running program.
In Python, a process typically corresponds to a running instance of the **Python interpreter**.

Each process:

- Has its **own memory space**
- Is **isolated** from other processes
- Contains at least one thread, called the **main thread**

_____________________________________________________________________
**Thread**:

A ***thread*** is a lightweight unit of execution within a process.
Threads represent how a program executes inside a single process.

Because threads belong to the same process:

- They **share memory**
- They can **access the same data and state**
- They **require synchronization** to avoid race conditions

#### **Concurrency in Python**

**Python supports concurrency using two main approaches:**

- **Multithreading**
    - Multiple threads within a single process
    - Shared memory
    - Limited by the Global Interpreter Lock (GIL)
    - *threading.Thread*

- **Multiprocessing**
    - Multiple independent processes
    - Separate memory spaces
    - True parallel execution on multiple CPU cores
    - *multiprocessing.Process*


In CPython:
- Multithreading is concurrent but not parallel (for CPU-bound work)
- Multiprocessing is both concurrent and parallel

_____________________________________________________________________

**Shared Memory vs Inter-Process Communication**

Since **threads** share the same memory space, they can easily share state. This model is known as **shared memory**.

**Processes**, however, **do not share memory**. To exchange data between processes, Python uses **inter-process communication (IPC)** mechanisms such as:

- multiprocessing.Queue
- multiprocessing.Pipe

These mechanisms require data to be **serialized and transmitted** between processes.

_____________________________________________________________________

**The Global Interpreter Lock (GIL)**

The Global Interpreter Lock (GIL) is a mechanism in CPython that allows only one thread at a time to execute Python bytecode.

As a result:
- Multithreading does not provide true parallelism for CPU-bound tasks
- Threads are still useful for I/O-bound operations
- The GIL exists per process, not across processes

Because each process has its own interpreter and GIL, multiprocessing is not limited by the GIL.
_____________________________________________________________________

**Choosing Between Threads and Processes**

We choose based on the task type:

- I/O-bound tasks -> use threads
- CPU-bound tasks -> use processes

### **Multiprocessing library, demo optimization, demo parallel processes**

The multiprocessing library in Python allows a program to run multiple processes in parallel, each with its own memory space and process ID. Unlike multithreading, multiprocessing bypasses Python’s Global Interpreter Lock (GIL), making it especially effective for CPU-bound tasks.

Multiprocessing is commonly used to:

- Speed up CPU-intensive computations

- Run tasks truly in parallel on multiple CPU cores

- Isolate workloads for better stability and safety

You can run a Python function as a separate process, or even create multiple independent processes with the multiprocessing module.

Creating a new process

In [53]:
from multiprocessing import Process

process = Process()
print(f"Process info: {process}")

Process info: <Process name='Process-5' parent=16928 initial>


We initialized a new process object from Process class, which has not started yet.

In [54]:
process.start()

print(f"Process info: {process}")

print(f"Check if the process is running: {process.is_alive()}")

Process info: <Process name='Process-5' pid=13188 parent=16928 started>
Check if the process is running: True


With the start function we instructed the OS to start/create this process and run it in parallel with the main program.

In [55]:
process.terminate()
print(f"Check if the process is running: {process.is_alive()}")

Check if the process is running: True


Finally, since this process was started and is alive, we explicitly stop it using terminate(). This tells the operating system to end the process.

In this example, we demonstrate multiprocessing by running the same function in multiple independent processes.

In [56]:
import os
import multiprocessing as mp

def process_id(message):
    print(f"The current python process id is={os.getpid()} |\n{message}")

if __name__ == "__main__":
    process_id("This is the main program.")

    processes = []
    for n in range(3):
        p = mp.Process(target=process_id, args=(f"Hello from child #{n}",))
        p.start()
        processes.append(p)

    for p in processes:
        print(p)
        print(f"The process is running before join: {p.is_alive()}")
        p.join()
        print(f"The process is running after join: {p.is_alive()}")

    print("All child processes finished.")

The current python process id is=16928 |
This is the main program.
<Process name='Process-6' pid=8096 parent=16928 started>
The process is running before join: True
The process is running after join: False
<Process name='Process-7' pid=10388 parent=16928 started>
The process is running before join: True
The process is running after join: False
<Process name='Process-8' pid=9156 parent=16928 started>
The process is running before join: True
The process is running after join: False
All child processes finished.


When we call start(), the operating system creates a new process with its own memory space and its own process ID.

After starting the processes, we store them in a list so we can manage them later.

join() tells the main process to wait until each child process finishes execution.
This ensures proper synchronization.

Finally, once all child processes are finished, the main process prints a completion message.

Each child process runs in its own memory space, so it cannot directly print to the main program’s output.

In [57]:
# import time
# import multiprocessing as mp


# def task(n: int) -> int:
#     """CPU-bound task: sum numbers from 0 to n-1."""
#     total = 0
#     for i in range(n):
#         total += i ** 2
#     return total


# def run_sequential(numTasks: int, n: int):
#     """Run tasks sequentially in a single process."""
#     start = time.time()
#     results = [task(n) for _ in range(numTasks)]
#     elapsed = time.time() - start
#     return elapsed, results


# def run_parallel_pool(numTasks: int, n: int, workers: int = 4):
#     """Run tasks in parallel using a multiprocessing Pool."""
#     start = time.time()
#     with mp.Pool(processes=workers) as pool:
#         results = pool.map(task, [n] * numTasks)
#     elapsed = time.time() - start
#     return elapsed, results


# if __name__ == "__main__":
#     numTasks = 8
#     nLarge = 4_000_000
#     nSmall = 2
#     runs = 5

#     for run in range(1, runs):
#         print(f"\n--- Run {run} ---")

#         # Large task
#         timeSeqLarge, resSeqLarge = run_sequential(numTasks, nLarge)
#         print(f"Sequential (large task): {timeSeqLarge: .5f}")

#         timeParLarge, resParLarge = run_parallel_pool(numTasks, nLarge, workers=4)
#         print(f"Multiprocessing (large task): {timeParLarge: .5f}")

#         # Very small task
#         timeSeqSmall, resSeqSmall = run_sequential(numTasks, nSmall)
#         print(f"Sequential (small task): {timeSeqSmall: .5f}")

#         timeParSmall, resParSmall = run_parallel_pool(numTasks, nSmall, workers=4)
#         print(f"Multiprocessing (small task): {timeParSmall: .5f}")

## **Functional Programming Basics in Python**

**Functional programming** is a **programming paradigm** that emphasizes computing results by applying **functions** to data, rather than modifying program state step by step. Programs are built by combining small, well-defined functions that transform input values into output values.

A central concept in functional programming is the **pure function**. A pure function produces results only based on its input values and has **no side effects**, i.e.,  it does not modify external variables, write to files or interact with databases. This predictability makes functional programs easier to reason about, test and parallelize.

Because of these properties, functional programming is widely used in:
   - ML, DL and AI
   - Large-scale data analysis and processing
   - Data engineering and streaming pipelines

Functional programming in Python often goes hand in hand with lazy evaluation. Tools like map and filter do not process data immediately; instead, they create lazy iterators that generate values only when needed. This allows data to flow through a chain of transformations efficiently, without creating unnecessary intermediate results.

**Functional programming describes what should happen to the data, and lazy evaluation ensures it happens only when necessary.**

### **a. Lazy evaluation (iterators)**

| Term | Definition |
|-----|-----------|
| **Lazy Evaluation** | An evaluation strategy where values are only computed when they are needed, also known as *call-by-need*. |
| **Iterable** | A Python object that can be looped over (iterated). Examples include lists, sets, tuples, dictionaries, strings, etc. |
| **Iterator** | An object that can be iterated upon and produces a sequence of values one at a time. Iterators maintain state and are exhausted after one pass. |
| **Iterator Protocol** | A set of rules requiring an object to implement `__iter__()` and `__next__()` to be considered an iterator in Python. |
| **`iter()`** | A built-in function that returns an iterator from an iterable object. |
| **`next()`** | A built-in function that returns the next item from an iterator, raising `StopIteration` when no items remain. |
| **Generator** | A special type of function that uses `yield` instead of `return` and produces an iterator object that generates values lazily. |
| **`yield`** | A Python keyword used inside functions to produce a generator; it yields values one at a time instead of returning a single result. |


#### **Iterators and Iterables**

In [58]:
it = iter([1, 2, 3])
# it = iter({'a': 1, 'b': 2, 'c': 3}.items())

In [59]:
type(it)

list_iterator

In [60]:
a = it.__next__()

In [61]:
print(f"a value is: {a}, iterator's next value is: {it.__next__()}")

a value is: 1, iterator's next value is: 2


In [62]:
dir(it)

['__class__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__length_hint__',
 '__lt__',
 '__ne__',
 '__new__',
 '__next__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__setstate__',
 '__sizeof__',
 '__str__',
 '__subclasshook__']

Not everything can be iterated - only iterables

In [63]:
it = iter(5)

TypeError: 'int' object is not iterable

#### **Range**

**range** is a built-in Python type that represents a sequence of numbers generated lazily.

**range** is an iterable that produces integers on demand, without storing them all in memory.

In [64]:
r = range(3)

hasattr(r, "__iter__")

True

In [65]:
hasattr(r, "__next__")

False

Testing the memory efficiency of range vs list.

In [66]:
r = range(100_000)
l = list(range(100_000))

In [67]:
import sys

print(f"The memory usage of r in bytes: {sys.getsizeof(r)}. \nThe memory usage of l in bytes: {sys.getsizeof(l): ,}")

The memory usage of r in bytes: 48. 
The memory usage of l in bytes:  800,056


In [68]:
round(800_056 / 48)

16668

In [69]:
del r

In [70]:
del l

#### **Generator function**

A special type of function which does not return a single value: it returns an iterator object with a sequence of values. A generator is both an iterable and an iterator, but it is not reusable.

In [71]:
def squares(n):
    for i in range(n):
        yield i ** 2

In [72]:
g = squares(3)

In [73]:
type(g)

generator

In [74]:
g.__next__()

0

In [75]:
genExpr = (x ** 2 for x in range(5_000_000))
listCompr = [x ** 2 for x in range(5_000_000)]

In [76]:
genExpr

<generator object <genexpr> at 0x00000225F2BFB440>

In [77]:
listCompr[:5]

[0, 1, 4, 9, 16]

In [78]:
genExpr.__next__()

0

In [79]:
type(genExpr), type(listCompr)

(generator, list)

In [80]:
import sys

print(f"The memory usage of genExpr in bytes: {sys.getsizeof(genExpr)}. \nThe memory usage of listCompr in bytes: {sys.getsizeof(listCompr): ,}")

The memory usage of genExpr in bytes: 200. 
The memory usage of listCompr in bytes:  43,947,864


In [81]:
round(43_947_864 / 200)

219739

In [82]:
del genExpr

In [83]:
del listCompr

### **b. Lambda functions**

lambda functions represent an anonymous function

**lambda arguments: expression**

Custom function

In [84]:
def average(values):
    averageValue = sum(values)/ len(values)
    return averageValue

In [85]:
average(range(10))

4.5

Lambda function

In [86]:
averageLambda = lambda x: sum(x) / len(x)

In [87]:
averageLambda(range(10))

4.5

### **c. Functional transformations and aggregations using `map`, `reduce` and chained operations**

#### **Map()**

We can use lambda functions to perform actions on values within an iterable (like list, dict, etc.). To do this, we need to use a lambda inside Python's built-in **map** function, which applies a function to the iterable we provide.

map returns a **map object**, which is an iterator.

**map(function, iterable) -> map object**

In [88]:
power = lambda x, y: x ** y

In [89]:
power(2, 3)

8

In [90]:
map(lambda x: x ** 2, [1, 2, 3])

<map at 0x225f27a6a10>

In [91]:
a = map(lambda x: x ** 2, [1, 2, 3])

In [92]:
a.__next__()

1

In [93]:
squares = list(map(lambda x: x ** 2, [1, 2, 3]))

In [94]:
squares

[1, 4, 9]

#### **filter()**

The **filter** function offers a way to filter out elements from an iterable that do not satisfy certain criteria.

filter returns a **filter object**, which is an iterator.

**filter(function, iterable) -> filter object**

In [95]:
filter(lambda x: x % 2 == 0, [1, 2, 7, 8])

<filter at 0x225f27a47c0>

In [96]:
a = filter(lambda x: x % 2 == 0, [1, 2, 7, 8])

In [97]:
a.__next__()

2

In [98]:
list(filter(lambda x: x % 2 == 0, [1, 2, 7, 8]))

[2, 8]

**map** and **filter** are lazy because they **return iterators, not results**.

They describe a computation but do not perform it until values are requested.

#### **reduce()**

The **reduce** function is useful for performing some computation on an iterable and, unlike map(), returns a single value as a result. **Reduce is not lazy, as it consumes all elements.**

**from functools import reduce**

**reduce(function, iterable, initial (optional))**

In [99]:
from functools import reduce

reduce(lambda x, y: x + y,  (1, 2, 3))

6

Step 1: x = 1, y = 2 -> 1 + 2 = 3

Step 2: x = 3, y = 3 -> 3 + 3 = 6

In [100]:
reduce(lambda x, y: x + y, [1, 2, 3], 10)

16

Step 1: x = 10, y = 1 -> 10 + 1 = 11

Step 2: x = 11, y = 2 -> 11 + 2 = 13

Step 3: x = 13, y = 3 -> 13 + 3 = 16

In [101]:
reduce(lambda x, y: x if x > y else y, [5, 9, 4, 15], 10)

15

In [102]:
from functools import reduce

words = ["Data", "Science", "is", "fun"]

result = reduce(lambda x, y: x + " " + y, words)
print(result)

Data Science is fun


#### **Chaining functions (functional pipeline)**

Each function does one tiny job and passes the result to the next one.

In [103]:
from functools import reduce

data = range(10)

result = reduce(
    lambda x, y: x + y,
    map(lambda x: x * 2,
        filter(lambda x: x % 2 == 0, data))
)
result

40

data -> filter (even numbers) -> map (double values) -> reduce (sum)

Nothing is computed until reduce consumes the iterator.

**Functional programming emphasizes**:

- Small pure functions

- No side effects

- Immutable data

- Transformations instead of mutations

Functional programming encourages building data transformation pipelines where each function performs a small, predictable transformation. These functions are chained together so that data flows through them step by step, producing a final result without modifying the original data.

**Pipeline thinking means**:

- Think in transformations

- Avoid intermediate state mutation

- Compose small reusable functions

In [104]:
data = range(10)

evens = filter(lambda x: x % 2 == 0, data)
doubled = map(lambda x: x * 2, evens)
result = reduce(lambda x, y: x + y, doubled)

result

40

In [105]:
sum(x * 2 for x in range(10) if x % 2 == 0)

40

## **Appendix 1: External OS processes with subprocess (optional)**

Python can start and manage external programs using the standard library’s **subprocess** module.

The recommended approach to invoking subprocesses is to use the **run()**, which is suitable for most use cases.

For more advanced use cases, such as interacting with a running process, the **Popen** interface can be used.

A simple way to run a subprocess is shown below:

In [2]:
import subprocess

subprocess1 = subprocess.run(
    ["cmd", "/c", "cd"],
    capture_output=True,
        # Captures standard output (stdout) and standard error (stderr)
    text=True
)

print(subprocess1.stdout)

In this case, the program:

*1. Opens a Windows shell*

*2. Executes the command*

*3. Waits for it to finish*

*4. Exits immediately*

To access the **process ID (PID)** of the subprocess, we must use Popen:

In [3]:
subprocess2 = subprocess.Popen(
    ["cmd", "/c", "cd"],
    stdout=subprocess.PIPE,
        # Redirect standard output so Python can capture it
    stderr=subprocess.PIPE,
        # Redirect standard error output
    text=True
)

print("The new subprocess id is:", subprocess2.pid)

stdout, stderr = subprocess2.communicate()
    # Wait for the subprocess to finish execution
    # Collect both standard output and standard error

print(stdout)

This entire process completes extremely quickly, so it is difficult to observe it in Task Manager.

**Long-Running Subprocess Example**

In [108]:
subprocess3 = subprocess.Popen(
    ["cmd", "/c", "ping google.com -t"],
    stdout=subprocess.PIPE,
    stderr=subprocess.PIPE,
    text=True
)

print("The current process id is:", subprocess3.pid)

The current process id is: 9440


Here, the command *ping google.com -t* runs indefinitely due to the -t flag.
Because the command never completes, the shell process does not exit automatically, allowing us to observe it in Task Manager using its process ID.

We can explicitly stop the subprocess from Python:

In [109]:
subprocess3.terminate()

## **Appendix 2: Programming Paradigms (optional)**

**What is a paradigm?**

    - A standard, a perspective or a set of ideas for operating
    - A way of thinking or approaching a problem

**What is a programming paradigm?**

- **Program**: set of written instructions that tells a computer what to do
- **Programming paradigm**: style or a way of approaching writing programs
    - Different programming paradigms are used for different tasks
    - Examples:
        - **Procedural programming**
        - **Functional programming**
        - **Object-oriented programming**



**Imperative and declarative programming**

- **Imperative**
    - Exact instructions for the program to follow, step-by-step
    - How the program should work
    - Includes procedural and object-oriented programming
- **Declarative**
    - The logic of the program without details of the exact steps
    - What the program should execute
    - Includes functional programming

Operating within a specific programming paradigm means having a standard approach to a common problem, which saves time and effort on the part of the programmer.

One of the common goals among all programming paradigms is to have code that is modular and easily reusable.

- Modular code: code that is broken up into reusable sections that can be re-run or reused in different contexts
- Avoid rewriting identical code
- Separation of responsibilities: certain sections of code have distinct responsibilities and don't duplicate logic
- Important from the very beginning, not just when the code becomes too large

***Each paradigm from earlier handles the separation of responsibilities differently.***

**Procedural (procedures), Functional (functions) and Object-Oriented (objects) programming paradigms** are named after the units that they are broken into.

**Procedural Programming**

Procedural programming organizes programs as a sequence of step-by-step instructions and procedures. It is widely used in general-purpose programming and is well suited for tasks that can be clearly broken down into ordered steps, such as data ingestion, cleaning, transformation, and automation scripts.

- **Key characteristics**:

    - Focus on how the program operates step by step

    - Code organized into procedures and functions

    - Shared, globally accessible data is common

- **Advantages**:

    - Simple and intuitive, especially for beginners

    - Encourages modular code through functions

    - Effective for many practical, task-oriented problems

- **Limitations**:

    - Data is often globally visible, reducing encapsulation

    - Code reuse across projects is limited

    - Less suitable when data integrity and scalability are primary concerns

**Functional Programming**

Functional programming structures programs as compositions of functions that transform data. It emphasizes immutability, stateless computation, and declarative logic, making it well suited for parallel and scalable data processing.

- **Key characteristics**:

    - Functions are first-class objects

    - Emphasis on immutability and statelessness

    - Computation expressed as transformations of data

- **Advantages**:

    - Easier to reason about and test

    - Naturally supports parallelism and lazy evaluation

    - Highly suitable for large-scale data processing pipelines

- **Limitations**:

    - Can be less intuitive for beginners

    - Debugging chained transformations may be harder

    - Not always ideal for problems requiring complex state management

**Object-Oriented Programming (OOP)**

Object-oriented programming organizes code around objects that combine data and behavior. It is commonly used for large, complex systems where modeling real-world entities and maintaining long-term code structure is important.

- **Key characteristics**:

    - Code structured using classes and objects

    - Emphasis on encapsulation, inheritance, and polymorphism

    - Data and behavior are tightly coupled

- **Advantages**:

    - Promotes code reuse and extensibility

    - Improves data protection through encapsulation

    - Well suited for large, long-lived software systems

- **Limitations**:

    - More complex design and boilerplate

    - Can introduce unnecessary abstraction for simple tasks

    - Less natural for data-parallel transformations

## **Reference Reading and Resources**

1. Chapters 15-18 "Introducing Python, 2nd Edition", Bill Lubanovic, O'Reilly Media, Inc., November 2019
2. https://docs.python.org/3/library/multiprocessing.html
3. https://docs.python.org/3/library/threading.html#module-threading
4. https://www.datacamp.com/tutorial/python-multiprocessing-tutorial
5. https://docs.python.org/3/library/subprocess.html
6. https://www.geeksforgeeks.org/python/dunder-magic-methods-python/
7. https://www.datacamp.com/tutorial/python-iterators-generators-tutorial