**Advance python**

- [x] Collections
- [x] Itertools 
- [ ] Lambda 
- [ ] Exceptions And Errors
- [x] Logging
- [ ] JSON 
- [x] Threading vs Multiprocessing
- [x] Multithreading 
- [x] Multiprocessing 
- [x] Shallow vs Deep Copying
- [x] Context Managers

[Advanced Python](https://www.python-engineer.com/courses/advancedpython/06-collections/)

## collections

The collections module in Python implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.

The collections module in Python implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.
The following tools exist:

- `namedtuple` : factory function for creating tuple subclasses with named fields 
- `OrderedDict` : dict subclass that remembers the order entries were added 
- `Counter` : dict subclass for counting hashable objects 
- `defaultdict` : dict subclass that calls a factory function to supply missing values 
- `deque` : list-like container with fast appends and pops on either end

[collections doc](https://docs.python.org/3/library/collections.html)

| objects |  function     |
|:--------------:|:----------------------------------------------------------------------|
| `namedtuple()` | factory function for creating tuple subclasses with named fields     |
| `deque`        | list-like container with fast appends and pops on either end         |
| `ChainMap`     | dict-like class for creating a single view of multiple mappings      |
| `Counter`      | dict subclass for counting hashable objects                          |
| `OrderedDict`  | dict subclass that remembers the order entries were added            |
| `defaultdict`  | dict subclass that calls a factory function to supply missing values |
| `UserDict`     | wrapper around dictionary objects for easier dict subclassing        |
| `UserList`     | wrapper around list objects for easier list subclassing              |
| `UserString`   | wrapper around string objects for easier string subclassing          |

In [2]:
from collections import Counter
a = "aaaaabbbbcccdde"
my_counter = Counter(a)
print(my_counter)

print(my_counter.items())
print(my_counter.keys())
print(my_counter.values())

my_list = [0, 1, 0, 1, 2, 1, 1, 3, 2, 3, 2, 4]
my_counter = Counter(my_list)
print(my_counter)

# most common items
print(my_counter.most_common(1))

# Return an iterator over elements repeating each as many times as its count. 
# Elements are returned in arbitrary order.
print(list(my_counter.elements()))

Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1})
dict_items([('a', 5), ('b', 4), ('c', 3), ('d', 2), ('e', 1)])
dict_keys(['a', 'b', 'c', 'd', 'e'])
dict_values([5, 4, 3, 2, 1])
Counter({1: 4, 2: 3, 0: 2, 3: 2, 4: 1})
[(1, 4)]
[0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4]


---

## Itertools

The Python itertools module is a collection of tools for handling iterators. Simply put, iterators are data types that can be used in a for loop.

**Source**

- [All possible itertools](https://docs.python.org/3/library/itertools.html)

### Infinite iterators:

| Iterator | Arguments     | Results                                        | Example                               |
|:----------|:---------------|:------------------------------------------------|:---------------------------------------|
| `count()`  | start, [step] | start, start+step, start+2*step, …             | `count(10)` --> 10 11 12 13 14 ...      |
| `cycle()`  | p             | p0, p1, … plast, p0, p1, …                     | `cycle('ABCD')` --> A B C D A B C D ... |
| `repeat()` | elem [,n]     | elem, elem, elem, … endlessly or up to n times | `repeat(10, 3)` --> 10 10 10            |

### Iterators terminating on the shortest input sequence:

| Iterator              | Arguments                   | Results                                    | Example                                                  |
|:-----------------------|:-----------------------------|:--------------------------------------------|:----------------------------------------------------------|
| `accumulate()`          | p [,func]                   | p0, p0+p1, p0+p1+p2, …                     | accumulate([1,2,3,4,5]) --> 1 3 6 10 15                  |
| `chain()`               | p, q, …                     | p0, p1, … plast, q0, q1, …                 | chain('ABC', 'DEF') --> A B C D E F                      |
| `chain.from_iterable()` | iterable                    | p0, p1, … plast, q0, q1, …                 | chain.from_iterable(['ABC', 'DEF']) --> A B C D E F      |
| `compress()`            | data, selectors             | (d[0] if s[0]), (d[1] if s[1]), …          | compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F            |
| `dropwhile()`           | pred, seq                   | seq[n], seq[n+1], starting when pred fails | dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1          |
| `filterfalse()`         | pred, seq                   | elements of seq where pred(elem) is false  | filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8      |
| `groupby()`             | iterable[, key]             | sub-iterators grouped by value of key(v)   |                                                          |
| `islice()`              | seq, [start,] stop [, step] | elements from seq[start:stop:step]         | islice('ABCDEFG', 2, None) --> C D E F G                 |
| `pairwise()`            | iterable                    | (p[0], p[1]), (p[1], p[2])                 | pairwise('ABCDEFG') --> AB BC CD DE EF FG                |
| `starmap()`             | func, seq                   | func(*seq[0]), func(*seq[1]), …            | starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000       |
| `takewhile()`           | pred, seq                   | seq[0], seq[1], until pred fails           | takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4            |
| `tee()`                 | it, n                       | it1, it2, … itn splits one iterator into n |                                                          |
| `zip_longest()`         | p, q, …                     | (p[0], q[0]), (p[1], q[1]), …              | zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D- |

### Combinatoric iterators:

| Iterator                        | Arguments          | Results                                                       |   |
|:---------------------------------|:--------------------|:---------------------------------------------------------------|---|
| `product()`                       | p, q, … [repeat=1] | cartesian product, equivalent to a nested for-loop            |   |
| `permutations()`                  | p[, r]             | r-length tuples, all possible orderings, no repeated elements |   |
| `combinations()`                  | p, r               | r-length tuples, in sorted order, no repeated elements        |   |
| `combinations_with_replacement()` | p, r               | r-length tuples, in sorted order, with repeated elements      |   |

| Examples                                 | Results                                         |
|:------------------------------------------|:-------------------------------------------------|
| `product('ABCD', repeat=2)`                | AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD |
| `permutations('ABCD', 2)`                  | AB AC AD BA BC BD CA CB CD DA DB DC             |
| `combinations('ABCD', 2)`                  | AB AC AD BC BD CD                               |
| `combinations_with_replacement('ABCD', 2)` | AA AB AC AD BB BC BD CC CD DD                   |

In [1]:
from itertools import product

prod = product([1, 2], [3, 4])
print(list(prod)) # note that we convert the iterator to a list for printing

# to allow the product of an iterable with itself, specify the number of repetitions 
prod = product([1, 2], [3], repeat=2)
print(list(prod)) # note that we convert the iterator to a list for printing

[(1, 3), (1, 4), (2, 3), (2, 4)]
[(1, 3, 1, 3), (1, 3, 2, 3), (2, 3, 1, 3), (2, 3, 2, 3)]


---

## Lambda

`lambda arguments: expression`

They are also used along with built-in functions like `map()`, `filter()`, `reduce()`.

## Exceptions And Errors

## Logging

- Logging is a means of tracking events that happen when some software runs. 
- Logging is important for software developing, debugging, and running. 
- If you don’t have any logging record and your program crashes, there are very few chances that you detect the cause of the problem.

**“Why not just use printing?”**
- When you run an algorithm and want to confirm it is doing what you expected, it is natural to add some 1print()1 statements at strategic locations to show the program’s state. 
- Printing can help debug simpler scripts, but as your code gets more and more complex, printing lacks the flexibility and robustness that logging has.

- With logging, you can pinpoint where a logging call came from, differentiate severity between messages, and write information to a file, which printing cannot do. 
- **For example**, we can turn on and off the message from a particular module of a larger program. We can also increase or decrease the verbosity of the logging messages without changing a lot of code.

There are 5 different logging levels that indicate the severity of the logs, shown in increasing severity:

- DEBUG
- INFO
- WARNING
- ERROR
- CRITICAL

1. `debug` : These are used to give Detailed information, typically of interest only when diagnosing problems.
2. `info` : These are used to confirm that things are working as expected
3. `warning` : These are used an indication that something unexpected happened, or is indicative of some problem in the near future
4. `error` : This tells that due to a more serious problem, the software has not been able to perform some function
5. `critical` : This tells serious error, indicating that the program itself may be unable to continue running

A very simple example of logging is shown below, using the default logger or the root logger:

In [3]:
import logging
logging.debug('This is a debug message')
logging.info('This is an info message')
logging.warning('This is a warning message')
logging.error('This is an error message')
logging.critical('This is a critical message')

ERROR:root:This is an error message
CRITICAL:root:This is a critical message


These will emit log messages of different severity. While there are five lines of logging, you may see only three lines of output if you run this script,
```
WARNING:root:This is a warning message
ERROR:root:This is an error message
CRITICAL:root:This is a critical message
```

This is because the root logger, by default, only prints the log messages of a severity level of WARNING or above. However, using the root logger this way is not much different from using the `print()` function.

In [18]:
import logging

logging.basicConfig(filename = 'file1.log',
                    level = logging.DEBUG,
                    format = '%(asctime)s:%(levelname)s:%(name)s:%(message)s',
                    force=True)

logging.debug('Debug message')
logging.info('Info message')
logging.warning('Warning message')
logging.error('Error message')
logging.critical('Critical message')

In [19]:
import logging
logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG,force=True)
logging.debug('This message should go to the log file')
logging.info('So should this')
logging.warning('And this, too')
logging.error('And non-ASCII stuff, too, like Øresund and Malmö')

In [20]:
import logging

logger = logging.getLogger(__name__)

# Create handlers
stream_handler = logging.StreamHandler()
file_handler = logging.FileHandler('file.log')

# Configure level and formatter and add it to handlers
stream_handler.setLevel(logging.WARNING) # warning and above is logged to the stream
file_handler.setLevel(logging.ERROR) # error and above is logged to a file

stream_format = logging.Formatter('%(name)s - %(levelname)s - %(message)s')
file_format = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
stream_handler.setFormatter(stream_format)
file_handler.setFormatter(file_format)

# Add handlers to the logger
logger.addHandler(stream_handler)
logger.addHandler(file_handler)

logger.warning('This is a warning') # logged to the stream
logger.error('This is an error') # logged to the stream AND the file!

__main__ - ERROR - This is an error
__main__ - ERROR - This is an error


## JSON

In [21]:
import json

person = {"name": "John", "age": 30, "city": "New York", "hasChildren": False, "titles": ["engineer", "programmer"]}

# convert into JSON:
person_json = json.dumps(person)
# use different formatting style
person_json2 = json.dumps(person, indent=4, separators=("; ", "= "), sort_keys=True)

# the result is a JSON string:
print(person_json) 
print(person_json2) 

{"name": "John", "age": 30, "city": "New York", "hasChildren": false, "titles": ["engineer", "programmer"]}
{
    "age"= 30; 
    "city"= "New York"; 
    "hasChildren"= false; 
    "name"= "John"; 
    "titles"= [
        "engineer"; 
        "programmer"
    ]
}


## Threading vs Multiprocessing

- The threading module uses threads, the multiprocessing module uses processes. 
- The difference is that threads run in the same memory space, while processes have separate memory. 
- This makes it a bit harder to share objects between processes with multiprocessing. Since threads use the same memory, precautions have to be taken or two threads will write to the same memory at the same time. 


**Multiprocessing**

*Pros*

- Separate memory space
- Code is usually straightforward
- Takes advantage of multiple CPUs & cores
- Avoids GIL limitations for cPython
- Eliminates most needs for synchronization primitives unless if you use shared memory (instead, it's more of a communication model for IPC)
- Child processes are interruptible/killable
- Python multiprocessing module includes useful abstractions with an interface much like threading.Thread
- A must with cPython for CPU-bound processing

*Cons*

- IPC a little more complicated with more overhead (communication model vs. shared memory/objects)
- Larger memory footprint


**Threading**

*Pros*

- Lightweight - low memory footprint
- Shared memory - makes access to state from another context easier
- Allows you to easily make responsive UIs
- cPython C extension modules that properly release the GIL will run in parallel
- Great option for I/O-bound applications

*Cons*

- cPython - subject to the GIL
- Not interruptible/killable
- If not following a command queue/message pump model (using the Queue module), then manual use of synchronization - primitives become a necessity (decisions are needed for the granularity of locking)
- Code is usually harder to understand and to get right - the potential for race conditions increases dramatically

### Process

- A Process is an instance of a program, e.g. a Python interpreter. They are independent from each other and do not share the same memory.

Key facts: 

    - A new process is started independently from the first process 
    - Takes advantage of multiple CPUs and cores - Separate memory space 
    - Memory is not shared between processes 
    - One GIL (Global interpreter lock) for each process, i.e. avoids GIL limitation 
    - Great for CPU-bound processing 
    - Child processes are interruptable/killable

- *Starting a process is slower that starting a thread*
- *Larger memory footprint*
- *IPC (inter-process communication) is more complicated*

### Threads

- A thread is an entity within a process that can be scheduled for execution (Also known as "leightweight process"). A Process can spawn multiple threads. 
- The main difference is that all threads within a process share the same memory.

Key facts: 

    - Multiple threads can be spawned within one process 
    - Memory is shared between all threads 
    - Starting a thread is faster than starting a process 
    - Great for I/O-bound tasks 
    - Leightweight 
    - low memory footprint

- One GIL for all threads, i.e. threads are limited by GIL
- Multithreading has no effect for CPU-bound tasks due to the GIL
- Not interruptible/killable -> be careful with memory leaks
- increased potential for race conditions

### Threading in Python

Use the `threading` module.

Note: The following example usually won't benefit from multiple threads since it is CPU-bound. It should just show the example of how to use threads.

In [44]:
from threading import Thread

def square_numbers():
    for i in range(1000):
        result = i * i


if __name__ == "__main__":        
    threads = []
    num_threads = 10

    # create threads and asign a function for each thread
    for i in range(num_threads):
        thread = Thread(target=square_numbers)
        threads.append(thread)

    # start all threads
    for thread in threads:
        thread.start()

    # wait for all threads to finish
    # block the main thread until these threads are finished
    for thread in threads:
        thread.join()

In [43]:
square_numbers()

998001

In [3]:
# !systeminfo 

### Example: **Mean of 100 Million observations**

In [4]:
# Generate random 100MM data points 
import numpy as np
n =100000000
d = np.random.rand(n)
print(d.shape)

(100000000,)


In [7]:
import time
def mean():

  #Sum using for loops. We can use inbuilt NumPy Sum opeartion for better speed.
  sum = 0
  n=d.size
  for i in range(n):
    sum +=d[i]

  #Mean
  mean = sum/n
  return mean


#Time the execution
start_time = time.time()
m = mean() # compute mean of 100MM numbers.
end_time = time.time()
print (end_time-start_time)
print(m)

9.240562200546265
0.5000171647352617


### Multiprocessing

**Multi-Processing Code**

In [8]:
#Refer: https://docs.python.org/3/library/multiprocessing.html
from multiprocessing import Process, Queue
import math

def mean_MP(s, e, q ):

  #Sum using for loops. We can use inbuilt NumPy Sum opeartion for better speed.
  sum = 0
  for i in range(s,e+1):
    sum +=d[i]

  #Mean
  mean = sum/(e-s+1)
  q.put(mean)
  return 

n1 = math.floor(n/2)

q = Queue() #Queues are thread and process safe. For communicating between processes and threads.

p1 = Process(target=mean_MP, args=(0, n1,q )) 
p2 = Process(target=mean_MP, args=(n1+1,n-1, q)) 


#Time the execution
start_time = time.time()

p1.start()
p2.start()

p1.join() # Wait till p1 finishes
p2.join() 

m=0;
while not q.empty():
     m += q.get()

m /= 2;
    
end_time = time.time()
print (end_time-start_time)
print(m)


0.1225745677947998
0.0


### Multithreading

In [9]:
#Refer: https://docs.python.org/3/library/threading.html
from threading import Thread


means = [0,0];

def mean_MT(s, e, threadNum ):

  #Sum using for loops. We can use inbuilt NumPy Sum opeartion for better speed.
  sum = 0
  for i in range(s,e+1):
    sum +=d[i]

  #Mean
  mean = sum/(e-s+1)
  means[threadNum] = mean; # means is a shared varibale between the threads

  return 

n1 = math.floor(n/2)

t1 = Thread(target=mean_MT, args=(0, n1,0 ))  # Third apram is the thread number
t2 = Thread(target=mean_MT, args=(n1+1,n-1,1)) 

#Time the execution
start_time = time.time()

t1.start()
t2.start()

t1.join() # Wait till t1 finishes
t2.join() 

m = (means[0]+means[1])/2
    
end_time = time.time()
print (end_time-start_time)
print(m)



9.567532539367676
0.5000171647347875


### Joblib

`!pip install joblib`

- Joblib is one such python library that provides easy to use interface for performing parallel programming/computing in python. 
- The machine learning library `scikit-learn` also uses `joblib` behind the scene for running its algorithms in paralle


 #### Caching of function output values

- Separate persistence and flow-execution logic from domain logic or algorithmic code by writing the operations as a set of steps with well-defined inputs and outputs: Python functions. 
- Joblib can save their computation to disk and rerun it only if necessary:

In [3]:

#Transparent and fast disk-caching of output value
# Refer: https://joblib.readthedocs.io/en/latest/
from joblib import Memory
cachedir = './'
mem = Memory(cachedir)

import numpy as np
a = np.vander(np.arange(4)).astype(float)
square = mem.cache(np.square)
print(a)

b = square(a)    
print(b)

[[ 0.  0.  0.  1.]
 [ 1.  1.  1.  1.]
 [ 8.  4.  2.  1.]
 [27.  9.  3.  1.]]
________________________________________________________________________________
[Memory] Calling square...
square(array([[ 0.,  0.,  0.,  1.],
       [ 1.,  1.,  1.,  1.],
       [ 8.,  4.,  2.,  1.],
       [27.,  9.,  3.,  1.]]))
___________________________________________________________square - 0.0s, 0.0min
[[  0.   0.   0.   1.]
 [  1.   1.   1.   1.]
 [ 64.  16.   4.   1.]
 [729.  81.   9.   1.]]


In [4]:
c = square(a)
# The above call did not trigger an evaluation
print(c)

[[  0.   0.   0.   1.]
 [  1.   1.   1.   1.]
 [ 64.  16.   4.   1.]
 [729.  81.   9.   1.]]


#### Simple Parallel programming for Loops

Below is a list of simple steps to use "Joblib" for parallel computing.

- Wrap normal python function calls into delayed() method of joblib.
- Create Parallel object with a number of processes/threads to use for parallel computing.
- Pass the list of delayed wrapped functions to an instance of Parallel. It'll run them all in parallel and return the result as a list.

In [39]:
def calculate_time(func):

    def inner1(*args, **kwargs):
        begin = time.time()
        
        returned_value=func(*args, **kwargs)
        end = time.time()
        print("Total time taken in : ", func.__name__, end - begin)
        return returned_value
    return inner1

import time
from math import sqrt # inbuilt fucntion

@calculate_time
def f(i): 
    
    # some computations  that take time
    x=10000
    p =1;
    for j in range(x):
        for k in range(j):
            p *= k
    
    return sqrt(i ** 2);

# Find sqrt of first n numbers
n=10;

# start_time = time.time()
for i in range(n):
    print(f(i))

# end_time = time.time()
# print (end_time-start_time)

In [5]:
# Refer: https://joblib.readthedocs.io/en/latest/generated/joblib.Parallel.html

import time
from math import sqrt # inbuilt fucntion

def f(i): 
    
    # some computations  that take time
    x=10000
    p =1;
    for j in range(x):
        for k in range(j):
            p *= k
    
    return sqrt(i ** 2);

# Find sqrt of first n numbers
n=10;

start_time = time.time()

for i in range(n):
    f(i)

end_time = time.time()
print (end_time-start_time)

13.10816764831543


- Below we have converted our sequential code written above into parallel using joblib. 
- We have first given function name as input to `delayed` function of joblib and then called `delayed` function by passing arguments. 
- This will create a `delayed` function that won't execute immediately.

- We then create a Parallel object by setting `n_jobs` argument as the number of cores available in the computer. 
- joblib provides a method named `cpu_count()` which returns a number of cores on a computer. 
- It'll then create a parallel pool with that many processes available for processing in parallel.

**Multi-Processing** `n_jobs=2`

In [6]:
from joblib import Parallel, delayed

start_time = time.time()

a = Parallel(n_jobs=2)(delayed(f)(i) for i in range(n)) 

# Why we need dealyed(): https://stackoverflow.com/questions/42220458/what-does-the-delayed-function-do-when-used-with-joblib-in-python

end_time = time.time()
print (end_time-start_time)

7.84437108039856


**Multi threading**

In [7]:
# Multi threading: GIL is an issue
start_time = time.time()

a = Parallel(n_jobs=2,prefer="threads")(delayed(f)(i) for i in range(n))

end_time = time.time()
print (end_time-start_time)

13.572847843170166


**Multi-Processing** `n_jobs=6`

In [9]:

# 6 jobs

from joblib import Parallel, delayed

start_time = time.time()

a = Parallel(n_jobs=6)(delayed(f)(i) for i in range(n)) 

# Why we need dealyed(): https://stackoverflow.com/questions/42220458/what-does-the-delayed-function-do-when-used-with-joblib-in-python

end_time = time.time()
print (end_time-start_time)

4.4876134395599365


**Multi-Processing** `n_jobs=-1`

In [10]:
# -1 jobs

from joblib import Parallel, delayed

start_time = time.time()

a = Parallel(n_jobs=-1)(delayed(f)(i) for i in range(n)) 

# Why we need dealyed(): https://stackoverflow.com/questions/42220458/what-does-the-delayed-function-do-when-used-with-joblib-in-python

end_time = time.time()
print (end_time-start_time)

4.683947801589966


### numba

Numba is an open source **JIT** compiler that translates a subset of Python and NumPy code into fast machine code.

`! pip install numba`

https://numba.pydata.org/

#### Matrix Multiplication: NumPy vs Numba

*Matrix multiplication without any library*

In [14]:
def mat_mult1(A, B):
    assert A.shape[1] == B.shape[0]
    res = np.zeros((A.shape[0], B.shape[1]), )
    for i in range(A.shape[0]):
        for k in range(A.shape[1]):
            for j in range(B.shape[1]):
                res[i,j] += A[i,k] * B[k,j]
    return res


start_time = time.time()
res = mat_mult1(A,B)

end_time = time.time()
print (end_time-start_time)

810.4455978870392


*Matrix multiplication via `numpy`*

In [18]:
start_time = time.time()

res = np.matmul(A,B)

end_time = time.time()
print (end_time-start_time)

3.668886184692383


**Matrix multiplcation via `numba`**

In [19]:
import numpy as np
import time
from numba import njit, prange

@njit(parallel=True)
def mat_mult(A, B):
    assert A.shape[1] == B.shape[0]
    res = np.zeros((A.shape[0], B.shape[1]), )
    for i in prange(A.shape[0]):
        for k in range(A.shape[1]):
            for j in range(B.shape[1]):
                res[i,j] += A[i,k] * B[k,j]
    return res

m, n, c = 1000, 1500, 1200
A = np.random.randint(1, 50, size = (m, n))
B = np.random.randint(1, 50, size = (n, c))

start_time = time.time()

res = mat_mult(A, B)

end_time = time.time()
print (end_time-start_time)

0.8644311428070068


Time taken:

- Matrix multiplication without any library: 810.4455978870392 sec
- Matrix multiplication via numpy:3.775667190551758 sec 
- Matrix multiplcation via numba: 0.8644311428070068 sec

Here we can see the difference

## Shallow vs Deep Copying

**Source**
- [Shallow Copy and Deep Copy](https://www.programiz.com/python-programming/shallow-deep-copy)

- [Shallow Copy and Deep Copy](https://www.python-engineer.com/courses/advancedpython/20-copy/)

- The assignment operator `=` is used to create the copy of the Python object, but this is not true; it only create the binding between a target and object. 
- When we use the assignment operator, *instead of creating a new object, it creates a new variable that shares the old object's reference.*

- The copies are helpful when a user wants to make changes without modifying the original object at the same time. A user also prefers to create a copy to work with mutable objects.

Let's understand the following example.

In [30]:
list1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]  
list2 = list1  
  
list2[1][2] = "a"  
  
print('Old List:', list1)  
print('ID of Old List:', id(list1))  
  
print('New List:', list2)  
print('ID of New List:', id(list2))  

Old List: [[1, 2, 3], [4, 5, 'a'], [7, 8, 9]]
ID of Old List: 2096689538112
New List: [[1, 2, 3], [4, 5, 'a'], [7, 8, 9]]
ID of New List: 2096689538112


- In the above output, we can see that both variable list1 and list2 share the same id `2096689538112`.

- If we make any changes in any value in list1 or list2, the change will reflect in both.

**Example: Adding `[4, 4, 4]` to old_list, using shallow copy**

In [36]:
old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = old_list

old_list.append([4, 4, 4])

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]
New list: [[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]


### Types of Copies in Python

- The main motive is to create a copy of Python object that we can modify the copy without changing the original data. 
- In Python, there are two methods to create copies.

    - Shallow Copy
    - Deep Copy

We will use the `copy` module to create the above copies.

### Shallow:

Before Copy Shallow Copying Shallow Done

![](https://i.stack.imgur.com/49psq.png)
![](https://i.stack.imgur.com/qqE2L.png)
![](https://i.stack.imgur.com/cys27.png)

The variables A and B refer to different areas of memory, when B is assigned to A the two variables refer to the same area of memory. 
Later modifications to the contents of either are instantly reflected in the contents of other, as they share contents.

- Shallow copies duplicate as little as possible.
- A shallow copy of a collection is a copy of the collection structure, not the elements.
- With a shallow copy, two collections now share the individual elements.

In [33]:
# importing "copy" for copy operations   
import copy   
  
# initializing list 1   
list1 = [1, 7, [3,5], 8]   
  
# using copy to shallow copy   
list2 = copy.copy(list1)   
 
list2[2][0] = 10     
    
print('Old List:', list1)  
print('ID of Old List:', id(list1))  
  
print('New List:', list2)  
print('ID of New List:', id(list2)) 

Old List: [1, 7, [10, 5], 8]
ID of Old List: 2096687213248
New List: [1, 7, [10, 5], 8]
ID of New List: 2096690497600


**Example: Adding `[4, 4, 4]` to old_list, using shallow copy**

In [37]:
import copy

old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.copy(old_list)

old_list.append([4, 4, 4])

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], [2, 2, 2], [3, 3, 3], [4, 4, 4]]
New list: [[1, 1, 1], [2, 2, 2], [3, 3, 3]]


- In the above program, we created a shallow copy of `old_list`. 
- The `new_list` contains references to original nested objects stored in `old_list`.
- Then we add the new list i.e `[4, 4, 4]` into `old_list`. This new sublist was not copied in `new_list`.

*However, when you change any nested objects in `old_list`, the changes appear in `new_list`.*

In [38]:
import copy

old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.copy(old_list)

old_list[1][1] = 'AA'

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], [2, 'AA', 2], [3, 3, 3]]
New list: [[1, 1, 1], [2, 'AA', 2], [3, 3, 3]]


### Deep:

Before Copy Deep Copying Deep Done
![](https://i.stack.imgur.com/DRLn7.png)
![](https://i.stack.imgur.com/yuURM.png)
![](https://i.stack.imgur.com/yuURM.png)

The variables A and B refer to different areas of memory, when B is assigned to A the values in the memory area which A points to are copied into the memory area to which B points. Later modifications to the contents of either remain unique to A or B; the contents are not shared.

- Deep copies duplicate everything. 
- A deep copy of a collection is two collections with all of the elements in the original collection duplicated.

![](https://i.stack.imgur.com/AWKJa.jpg)

In [34]:
# importing "copy" for copy operations   
import copy   
  
# initializing list 1   
list1 = [1, 7, [3,5], 8]   
  
# using copy to shallow copy   
list2 = copy.deepcopy(list1)   
 
list2[2][0] = 10     
    
print('Old List:', list1)  
print('ID of Old List:', id(list1))  
  
print('New List:', list2)  
print('ID of New List:', id(list2)) 

Old List: [1, 7, [3, 5], 8]
ID of Old List: 2096690483328
New List: [1, 7, [10, 5], 8]
ID of New List: 2096690417024


In [39]:
import copy

old_list = [[1, 1, 1], [2, 2, 2], [3, 3, 3]]
new_list = copy.deepcopy(old_list)

old_list[1][0] = 'BB'

print("Old list:", old_list)
print("New list:", new_list)

Old list: [[1, 1, 1], ['BB', 2, 2], [3, 3, 3]]
New list: [[1, 1, 1], [2, 2, 2], [3, 3, 3]]


In the above program, we use `deepcopy()` function to create copy which looks similar.

However, if you make changes to any nested objects in original object `old_list`, you’ll see no changes to the copy `new_list`.

---

## Context Managers

**Source**

- [Python Context Managers](https://www.pythontutorial.net/advanced-python/python-context-managers/)

- [Why Use Context Managers](https://towardsdatascience.com/why-you-should-use-context-managers-in-python-4f10fe231206)

- [Context Manager](https://book.pythontips.com/en/latest/context_managers.html)

- If you have used the `with` statement in Python then chances are you’ve already used a context manager.

- A context manager usually takes care of setting up some resource, e.g. opening a connection, and automatically handles the clean up when we are done with it.

- Probably, the most common use case is opening a file.
```python
with open('/path/to/file.txt', 'r') as f:
  for line in f:
    print(line)
```



Now, you could open a file without using with but you would have to clean up after yourself.

```python
f = open('/path/to/file.txt', 'r')
for line in f:
    print(line)
f.close()  # must remember to close f
```


Apart from the fact that this needs an extra line of code, there are a few other downsides. Namely,

- it is easier to forget to close the file,
- f.close() won’t be called if there’s an exception somewhere earlier in the code.

*To exactly replicate opening a file using the with statement we would need even more code to make sure we close the file even in the event of an exception.*

```python
f = open('/path/to/file.txt', 'r')
try:
  for line in f:
    print(line)
finally:
  f.close()  # runs no matter what happens
```

### How To Implement a Context Manager

There are two ways to implement a context manager. 
1. The first one is defining a class with implementations for the `__enter__` and `__exit__` methods. 
2. The second one is by creating a generator and using the `contextlib.contextmanager` decorator.

### Implementing a context manager as a class

- To support the `with` statement for our own classes, we have to implement the `__enter__` and `__exit__` methods.
- Python calls `__enter__` when execution enters the context of the `with` statement.
- In here the resource should be acquired and returned.
- When execution leaves the context again, `__exit__` is called and the resource is freed up.

In [24]:
class ManagedFile:
    def __init__(self, filename):
        print('init', filename)
        self.filename = filename

    def __enter__(self):
        print('enter')
        self.file = open(self.filename, 'w')
        return self.file

    def __exit__(self, exc_type, exc_value, exc_traceback):
        if self.file:
            self.file.close()
        print('exit')

with ManagedFile('notes.txt') as f:
    print('doing stuff...')
    f.write('some todo2...')

init notes.txt
enter
doing stuff...
exit


#### Handling exceptions

- If an exception occurs, Python passes the type, value, and traceback to the `__exit__` method. 
- It can handle the exception here. If anything other than `True` is returned by the `__exit__` method, then the exception is raised by the `with` statement.

In [26]:
class ManagedFile:
    def __init__(self, filename):
        print('init', filename)
        self.filename = filename

    def __enter__(self):
        print('enter')
        self.file = open(self.filename, 'w')
        return self.file

    def __exit__(self, exc_type, exc_value, exc_traceback):
        if self.file:
            self.file.close()
        print('exc:', exc_type, exc_value)
        print('exit')

# No exception
with ManagedFile('notes.txt') as f:
    print('doing stuff...')
    f.write('some todo3...')
print('continuing...')

print()

# Exception is raised, but the file can still be closed
with ManagedFile('notes2.txt') as f:
    print('doing stuff...')
    f.write('some todo3...')
    f.do_something()
print('continuing...')

init notes.txt
enter
doing stuff...
exc: None None
exit
continuing...

init notes2.txt
enter
doing stuff...
exc: <class 'AttributeError'> '_io.TextIOWrapper' object has no attribute 'do_something'
exit


AttributeError: '_io.TextIOWrapper' object has no attribute 'do_something'

We can handle the exception in the `__exit__` method and `return True`.

In [27]:
class ManagedFile:
    def __init__(self, filename):
        print('init', filename)
        self.filename = filename

    def __enter__(self):
        print('enter')
        self.file = open(self.filename, 'w')
        return self.file

    def __exit__(self, exc_type, exc_value, exc_traceback):
        if self.file:
            self.file.close()
        if exc_type is not None:
            print('Exception has been handled')
        print('exit')
        return True


with ManagedFile('notes2.txt') as f:
    print('doing stuff...')
    f.write('some todo...')
    f.do_something()
print('continuing...')

init notes2.txt
enter
doing stuff...
Exception has been handled
exit
continuing...


### Implementing a context manager as a generator

- Instead of writing a class, we can also write a `generator` function and `decorate` it with the `contextlib.contextmanager` decorator. 
- Then we can also call the function using a `with` statement. 
- For this approach, the function must `yield` the resource in a `try` statement, and all the content of the `__exit__` method to free up the resource goes now inside the corresponding finally statement.

In [22]:
from contextlib import contextmanager

@contextmanager
def open_managed_file(filename):
    f = open(filename, 'w')
    try:
        yield f
    finally:
        f.close()

with open_managed_file('notes.txt') as f:
    f.write('some todo...')

--- 