# Garbage collection

--- 

In this notebook I want to give you some ideas and strategies to optimize you python programs.

---

## 1. Garbage

When using Python normally you will never get in touch with memory management as you need to know when using C or C++. Whenever you create an object in Python, Python will take care of the memory management.

To explain `Garbage` you may think of, that during a Python program many objects will be created and at certain points these objects are not used anymore and in may cases the memory an object is occupying can be freed. The internal procedure of freeing unused memory is called `Garbage collection` and unused objects can be called `Garbage`.

---

## 2. Unused objects

This big question is what are unused objects or when become an object unused?

 * at the end of a function -> all local defined variables will be freed if possible
 * when an object will be overwritten be a new value
 * when objects are removed with  `del` 
 
not if:
 * full objects or part of objects which are used by other objects

In [16]:
# example

def test():
    a = 123
    b = [1,2,3]
    

test()
# after the call the memory of a and b is freed

but:

In [17]:
def test(l):
    a = 123
    
    l.append(a)
    
l = []
test(l)
# after the call a is unused in the function, but there is a reference to the object in l, so a will be freed, if l will be deleted!

To demonstrate what a reference of an object is, have a look at the next cell:

In [2]:
import sys

a = []
b = a

sys.getrefcount(a)

3

As you can see during the call 3 references are counted for $a$ and only if the counter is zero, $a$ will be removed from the memory. (the 3rd reference comes from the function call itself!)

---

## 3. Garbage collection

In general in python the interpreter looks from time to time, if there are unused objects and free the memory. 

To be honest, there are a few additional mechanism in which you can define strategies to free memory also in other situations, but for most situations this is the best description.

---

## 4. Memory usage and mistakes

Of course using python in different situations can create different problems with memory. As seen while using the animation frameworks, the same animation created for watching in notebooks takes less memory as when you want to produce a video file in `.gif` or `.mp4` format.

However, there are other difficult situations:

 * how to handle big data sets, e.g. Ice maps file sets
 * working on individual big data files

1. As seen for the Ice maps project, the data set consists of a few files with a total size of ~500 MB. The strategy is to extract values from each file individually and combine these data for a plot.
 * in many of the solutions, all data is read in one loop (bad idea, if you have less memory)
 * create a loop in which you work on each file (file by file) and extract the necessary data, good idea, the data will be overwritten in each loop step and the old data will be freed -> less memory is needed

2.  I've seen a lot of python programs in which in a main program without any function one big file was read and analysed and using a new variable, the next data set was read, ... -> at a certain point the memory of the server was exhausted and the program was killed by the linux  OS
   
The solution in this case is, to either put the analysis in a special function which, when done, will free the used memory, of to use `del` to remove unwanted objects.

---