# Optimizing python code with Profilers

A profile is a set of statistics that describes how often and for how long various parts of the program executed. These statistics can be formatted into reports via the "pstats" module.

## When to optimize?

- Do you need optimization?
    - if speed is not a problem, then there is no reason to optimize
- If yes: Which parts of the code should be optimized?
    - Use a profiler, such as cProfile
    - Usually almost all the execution time occurs within a small part of the code
    - Optimize that code, and leave the rest alone
- if you needd even better performance
    - Redisign the code completely
    - But this takes effort, so consider this.

In [None]:
import cProfile, pstats, io

def profile(fnc):
    
    """A decorator that uses cProfile to profile a function"""
    
    def inner(*args, **kwargs):
        
        pr = cProfile.Profile()
        pr.enable() #Start collecting profiling data.
        retval = fnc(*args, **kwargs)
        pr.disable()#Stop collecting profiling data.
        s = io.StringIO()#Output will be printed to the stream specified by stream
        sortby = 'cumulative'
        ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
        ps.print_stats()#prints out a report as described in the profile.run() definition.
        print(s.getvalue())
        return retval

    return inner

In [None]:
def read_movies(src):
    with open(src) as fd:
        return fd.read().splitlines()
    
def is_duplicate(needle, haystack):
    for movie in haystack:
        if needle.lower() == movie.lower():
            return True
    return False

@profile
def find_duplicate_movies(src="movies.txt"):
    movies = read_movies(src)
    movies = [movie.lower() for movie in movies]
    duplicates = []
    while movies:
        movie = movies.pop()
        if is_duplicate(movie, movies):
            duplicates.append(movie)
    return duplicates

find_duplicate_movies()