### Using `concurrent.futures` for Parallel Processing in Python

#### 1. Introduction to `concurrent.futures`
- `concurrent.futures` provides a high-level interface for asynchronously executing callables.
- Two main classes:
  - `ThreadPoolExecutor` for managing a pool of threads.
  - `ProcessPoolExecutor` for managing a pool of processes.

#### 2. Basic Example of Parallel Processing
- Using `ProcessPoolExecutor` to apply a function to each element in a list:

```python
import concurrent.futures

def process_element(element, multiplier):
    return element * multiplier

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
multiplier = 3

with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(process_element, n, multiplier) for n in numbers]
    results = [future.result() for future in concurrent.futures.as_completed(futures)]

print(results)
```

#### 3. Handling Multiple Arguments
- To pass multiple arguments to the function, use `zip` and `itertools.repeat`:

```python
import concurrent.futures
import itertools

def process_element(element, multiplier):
    return element * multiplier

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
multiplier = 3

args = zip(numbers, itertools.repeat(multiplier))

with concurrent.futures.ProcessPoolExecutor() as executor:
    results = list(executor.map(lambda p: process_element(*p), args))

print(results)
```

#### 4. Debugging and Logging
- Adding logging to help debug and monitor the processing:

```python
import concurrent.futures
import logging
import itertools

logging.basicConfig(level=logging.INFO, format='%(asctime)s %(levelname)s:%(message)s')

def process_element(element, multiplier):
    logging.info(f'Processing element: {element}')
    return element * multiplier

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
multiplier = 3

args = zip(numbers, itertools.repeat(multiplier))

with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(lambda p: process_element(*p), args))

print(results)
```

#### 5. Handling Exceptions
- Ensure that exceptions are handled properly within the function to prevent `BrokenProcessPool` errors:

```python
def process_element(element, multiplier):
    try:
        logging.info(f'Processing element: {element}')
        return element * multiplier
    except Exception as e:
        logging.error(f'Error processing element {element}: {e}')
        raise
```

#### 6. Limiting the Number of Workers
- Use the `max_workers` parameter to limit the number of threads or processes:

```python
with concurrent.futures.ProcessPoolExecutor(max_workers=2) as executor:
    results = list(executor.map(lambda p: process_element(*p), args))
```

#### 7. Comparison with `ThreadPoolExecutor`
- To compare or switch to `ThreadPoolExecutor`, the code structure remains largely the same:

```python
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
    results = list(executor.map(lambda p: process_element(*p), args))
```

### Key Takeaways
- `concurrent.futures` is a powerful library for parallel processing using threads or processes.
- Proper handling of function arguments and logging can help debug and optimize parallel tasks.
- Exception handling within functions is crucial to prevent process pool errors.
- The `max_workers` parameter controls the level of parallelism.
- Testing with `ThreadPoolExecutor` can help identify if issues are specific to process-based parallelism.


In `concurrent.futures`, both `map` and `submit` are used to schedule tasks to be executed by a thread or process pool. However, they serve different purposes and have different usage patterns. Here’s a detailed comparison:

### `submit`
- **Usage**: Submits a single callable (function) for execution.
- **Returns**: A `Future` object representing the execution of the callable.
- **Advantages**:
  - Fine-grained control: Allows you to manage each task individually.
  - Can be used with any callable, not just functions with iterable inputs.
  - Supports different arguments for each callable.
  - Useful when you need to handle the results or exceptions of individual tasks.

**Example**:

```python
import concurrent.futures

def process_element(element, multiplier):
    return element * multiplier

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
multiplier = 3

with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [executor.submit(process_element, n, multiplier) for n in numbers]
    results = [future.result() for future in concurrent.futures.as_completed(futures)]

print(results)
```

### `map`
- **Usage**: Submits a callable to be executed for each item in an iterable.
- **Returns**: An iterator equivalent to `map(func, *iterables)`, but the calls may be evaluated out-of-order.
- **Advantages**:
  - Simpler syntax for applying a function to an iterable.
  - Automatically collects results in the order they were submitted.
  - More concise when the function and arguments fit the pattern.

**Example**:

```python
import concurrent.futures

def process_element(element, multiplier):
    return element * multiplier

numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
multiplier = 3

def process_element_wrapper(args):
    return process_element(*args)

with concurrent.futures.ThreadPoolExecutor() as executor:
    results = list(executor.map(process_element_wrapper, zip(numbers, [multiplier]*len(numbers))))

print(results)
```

### Key Differences
1. **Granularity**:
   - `submit` provides fine-grained control over each task.
   - `map` is higher-level and simpler when you need to apply a function to a sequence of inputs.

2. **Return Values**:
   - `submit` returns a `Future` object immediately.
   - `map` returns an iterator that yields results as tasks complete.

3. **Order**:
   - `submit` with `as_completed` can process results as they complete, potentially out of order.
   - `map` returns results in the order the tasks were submitted.

4. **Complexity**:
   - `submit` is more flexible and can handle varying arguments and more complex scenarios.
   - `map` is easier to use for straightforward applications of a function to a list of inputs.

### When to Use Each
- Use **`submit`** when:
  - You need fine-grained control over individual tasks.
  - Tasks are independent and have different arguments.
  - You need to handle exceptions or results individually.

- Use **`map`** when:
  - You have a single function to apply to an iterable.
  - You want results in the order they were submitted.
  - The syntax is simple and concise for your use case.

In [None]:
#Multiprocess

import concurrent.futures as thread
import logging as logger
lst_to_process = [1,2,3,4,5,6,7,8]
multiplier = 5

def fun(x, multiplier):
    logger.info(f"proceesing element {x}")
    return x*multiplier

with thread.ThreadPoolExecutor(max_workers=2) as tp_executor:                      # Use max_workers to define parallel threads at a time
    futures = [tp_executor.submit(fun, n, multiplier) for n in lst_to_process]

    # result = [future.result() for future in thread.as_completed(futures)]        # When you don't care about order
    result = [future.result() for future in futures]                               # When you care about order

print(result)

2024-07-21 16:56:46,849 INFO:proceesing element 1
2024-07-21 16:56:46,855 INFO:proceesing element 2
2024-07-21 16:56:46,856 INFO:proceesing element 3
2024-07-21 16:56:46,862 INFO:proceesing element 4
2024-07-21 16:56:46,866 INFO:proceesing element 5
2024-07-21 16:56:46,868 INFO:proceesing element 6
2024-07-21 16:56:46,873 INFO:proceesing element 8
2024-07-21 16:56:46,871 INFO:proceesing element 7


[5, 10, 15, 20, 25, 30, 35, 40]


In [None]:
#Multiprocess

import concurrent.futures as thread
import logging as logger
import itertools

lst_to_process = [1,2,3,4,5,6,7,8]
multiplier = 5

def fun(x, multiplier):
    logger.info(f"proceesing element {x}")
    return x*multiplier

list_of_args = zip(lst_to_process, itertools.repeat(multiplier))

with thread.ThreadPoolExecutor(max_workers=2) as tp_executor:                      # Use max_workers to define parallel threads at a time
    results = list(tp_executor.map(lambda arg : fun(*arg) , list_of_args))

print(results)

2024-07-21 17:21:19,243 INFO:proceesing element 1
2024-07-21 17:21:19,249 INFO:proceesing element 2
2024-07-21 17:21:19,256 INFO:proceesing element 4
2024-07-21 17:21:19,254 INFO:proceesing element 3
2024-07-21 17:21:19,259 INFO:proceesing element 5
2024-07-21 17:21:19,262 INFO:proceesing element 6
2024-07-21 17:21:19,264 INFO:proceesing element 7
2024-07-21 17:21:19,266 INFO:proceesing element 8


[5, 10, 15, 20, 25, 30, 35, 40]
