What is Profiler? <br>
Profiler is used to identify which portions of the program are executed most frequently or where most of the time is spent. <br>
It not only gives the total running time, but also times each function separately, and tells you how many times each function was called, making it easy to determine where you should make optimizations.

We can use cProfile as documented here: https://docs.python.org/3/library/profile.html   <br>
And instead of repeating again and again for using cProfile in different functios, we can create a decorator as mentioned below. After creating decorator function 'profile', just use @profile as decorator before the function and it will start give all the stats for that function. (nested also)

In [1]:
import cProfile, pstats, io



def profile(fnc):
    
    """A decorator that uses cProfile to profile a function"""
    
    def inner(*args, **kwargs):
        
        pr = cProfile.Profile()
        pr.enable()
        retval = fnc(*args, **kwargs)
        pr.disable()
        s = io.StringIO()
        sortby = 'cumulative'
        ps = pstats.Stats(pr, stream=s).sort_stats(sortby)
        ps.print_stats()
        print(s.getvalue())
        return retval

    return inner

Let's take an example to understand it. <br><br>

Reference: https://osf.io/upav8/ <br>

Our task to find the duplicate movies in the given text file. I have created this file from IMDB dataset on kaggle. <br>

We can get the desired result in many different ways but our task is to analyze the execution time for each line and find the optimized solution here.

In [2]:
'''In first method, I have added functions for reading text data 'read_movies', function to compare two movies 'is_duplicate' and 
the main function where we are calling read_movies once to read data, looping through all the movies in the data and call 
is_duplicate function everytime in loop.

'''


def read_movies(src):
    
    with open(src, encoding="utf-8" ) as fd:
        return fd.read().splitlines()
    
def is_duplicate(new_mov, movies):
    for mov in movies:
        if new_mov.lower() == mov.lower():
            return True
    return False
    
@profile
def find_duplicate_movies(src='movies.txt'):
    
    movies = read_movies(src)
    duplicates = []
    while movies:
        movie = movies.pop()
        if is_duplicate(movie, movies):
            duplicates.append(movie)
    return duplicates
    
find_duplicate_movies()

         23067720 function calls in 4.106 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    4.106    4.106 <ipython-input-2-aaa4458af140>:19(find_duplicate_movies)
     4803    2.509    0.001    4.103    0.001 <ipython-input-2-aaa4458af140>:13(is_duplicate)
 23058102    1.594    0.000    1.594    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.001    0.001 <ipython-input-2-aaa4458af140>:8(read_movies)
        1    0.000    0.000    0.000    0.000 {method 'read' of '_io.TextIOWrapper' objects}
        1    0.000    0.000    0.000    0.000 {method 'splitlines' of 'str' objects}
     4803    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.000    0.000 C:\Users\jainp\anaconda3\lib\codecs.py:319(decode)
        1    0.000    0.000    0.000    0.000 {bui

['Batman', 'Out of the Blue']

Analysis: <br>
Total execution time is 4.106 seconds. <br>
Stats has been sorted by cummulative time. <br>
Now check the cummulative time column 'cumtime', we can see that it is inreasing rapidly from fifth to fourth row(lower method). It is because we are calling this lower function again and again in loop.<br><br>

Let's do it in another way.



In [3]:
'''In second method, I have added functions for reading text data 'read_movies', function to compare two movies 'is_duplicate' 
and the main function where we are calling read_movies once to read data, looping through all the movies in the data and call 
is_duplicate function everytime in loop. The only difference from first method is that instead of converting text to lower in 
is_duplicate function, we are doing it just after reading the data and in this way, we don't have to call it again and again 
in the loop.

'''


def read_movies(src):
    
    with open(src, encoding="utf-8" ) as fd:
        return fd.read().splitlines()
    
def is_duplicate(new_mov, movies):
    for mov in movies:
        if new_mov == mov:
            return True
    return False
    
@profile
def find_duplicate_movies(src='movies.txt'):
    
    movies = read_movies(src)
    movies = [movie.lower() for movie in movies]
    duplicates = []
    while movies:
        movie = movies.pop()
        if is_duplicate(movie, movies):
            duplicates.append(movie)
    return duplicates
    
find_duplicate_movies()

         14422 function calls in 0.293 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.001    0.001    0.293    0.293 <ipython-input-3-17ba09b8ce3d>:21(find_duplicate_movies)
     4803    0.289    0.000    0.289    0.000 <ipython-input-3-17ba09b8ce3d>:15(is_duplicate)
        1    0.000    0.000    0.001    0.001 <ipython-input-3-17ba09b8ce3d>:10(read_movies)
        1    0.000    0.000    0.001    0.001 <ipython-input-3-17ba09b8ce3d>:25(<listcomp>)
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
     4803    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'read' of '_io.TextIOWrapper' objects}
        1    0.000    0.000    0.000    0.000 {method 'splitlines' of 'str' objects}
     4803    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
        1    0.000    0.000    0.000    0.000 C:\Users\ja

['batman', 'out of the blue']

Analysis: <br>
Total execution time is 0.293 which is far better than the first method. <br>
See, with just only 1 change, we can reduce the execution time by huge amount. <br>


Let's check if we can reduce it more. <br>
Now the cummulatime time has increased from third to second row suddenly (is_duplicate) function, so we can remove this function, it will reduce execution time as well.

In [4]:
'''In third method, I have added functions for reading text data 'read_movies' and the main function where we are calling 
read_movies once to read data, looping through all the movies in the data and storing unique movies. Instead of using string 
comparison, here we are checking if the last movie is in the remaining list. (if it is present, it is duplicate and we will 
store it in duplicates list.) In this way, we can remove string comparison using is_duplicate function.

'''


def read_movies(src):
    
    with open(src, encoding="utf-8" ) as fd:
        return fd.read().splitlines()
    
    
@profile
def find_duplicate_movies(src='movies.txt'):
    
    movies = read_movies(src)
    movies = [movie.lower() for movie in movies]
    duplicates = []
    while movies:
        movie = movies.pop()
        if movie in movies:
            duplicates.append(movie)
    return duplicates
    
find_duplicate_movies()

         9619 function calls in 0.093 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.090    0.090    0.093    0.093 <ipython-input-4-7684026582ba>:15(find_duplicate_movies)
        1    0.000    0.000    0.001    0.001 <ipython-input-4-7684026582ba>:9(read_movies)
        1    0.001    0.001    0.001    0.001 <ipython-input-4-7684026582ba>:19(<listcomp>)
     4803    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'read' of '_io.TextIOWrapper' objects}
        1    0.000    0.000    0.000    0.000 {built-in method io.open}
        1    0.000    0.000    0.000    0.000 {method 'splitlines' of 'str' objects}
     4803    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
        1    0.000    0.000    0.000    0.000 C:\Users\jainp\anaconda3\lib\codecs.py:319(decode)
        1    0.000    0.000    0.000    0.000 {built-in 

['batman', 'out of the blue']

Analysis: <br>
Total execution time: 0.093 much better than first and second methods. <br>


Therefoer, in comparison, method is the efficient one.

