# All about Functions
Let us talk about the functions in detail. We will primarily focus on functions that come as part of Python Standard Library and then how to develop our own functions.

* Pre-defined Functions
* Special Functions
* String Manipulation Functions
* Defining Functions and Returning Values
* Function Parameters and Arguments
* Lambda Functions

## Pre-defined Functions
As any programming language, Python also has robust pre-defined functions. 

Following are the ones Data Engineers typically use:
* String Manipulation Functions
* Date and Time Manipulation Functions
* Collection Manipulation Functions
* Sometimes we use functions that are available as part of core python. But in the case of Data Engineering Projects we use modules such as pandas, pyspark etc.


## Special Functions
Functions that are enclosed with double underscores and others.

* Special Functions are typically meant for operators.
* Here are the examples for operators.
  * `in`
  * Comparison Operators such as `==`
  * `len`

In [None]:
help(str)

In [99]:
s = 'Hello'

In [95]:
str.__eq__?

[0;31mSignature:[0m      [0mstr[0m[0;34m.[0m[0m__eq__[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mvalue[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mCall signature:[0m [0mstr[0m[0;34m.[0m[0m__eq__[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m           wrapper_descriptor
[0;31mString form:[0m    <slot wrapper '__eq__' of 'str' objects>
[0;31mNamespace:[0m      Python builtin
[0;31mDocstring:[0m      Return self==value.


In [100]:
s.__eq__('Hello')

True

In [101]:
s == 'Hello'

True

In [96]:
str.__contains__?

[0;31mSignature:[0m      [0mstr[0m[0;34m.[0m[0m__contains__[0m[0;34m([0m[0mself[0m[0;34m,[0m [0mkey[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mCall signature:[0m [0mstr[0m[0;34m.[0m[0m__contains__[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m           wrapper_descriptor
[0;31mString form:[0m    <slot wrapper '__contains__' of 'str' objects>
[0;31mNamespace:[0m      Python builtin
[0;31mDocstring:[0m      Return key in self.


In [102]:
s.__contains__('e')

True

In [103]:
'e' in s

True

In [97]:
str.__len__?

[0;31mSignature:[0m      [0mstr[0m[0;34m.[0m[0m__len__[0m[0;34m([0m[0mself[0m[0;34m,[0m [0;34m/[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mCall signature:[0m [0mstr[0m[0;34m.[0m[0m__len__[0m[0;34m([0m[0;34m*[0m[0margs[0m[0;34m,[0m [0;34m**[0m[0mkwargs[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mType:[0m           wrapper_descriptor
[0;31mString form:[0m    <slot wrapper '__len__' of 'str' objects>
[0;31mNamespace:[0m      Python builtin
[0;31mDocstring:[0m      Return len(self).


In [104]:
s.__len__()

5

In [105]:
len(s)

5

## String Manipulation Functions
In Python, depending on the nature of the project we might manipulate strings using functions that come as part of higher level modules such as pandas, pyspark etc. However, we will go through some of the string manipulation functions to get an idea about how to use existing functions.

As in any programming language, a string in Python is nothing but a list of characters. We can access elements in a list using index based operations.

Let us understand some common string manipulations we perform in any application.
* Extracting information from fixed length strings (eg: last 4 digits of social security number). There is no substring function. However, we can use index based operations to extract information from fixed length strings.

In [1]:
# Last 4 digits of SSN
ssn = '1234567890'
# ssn[6:10]
ssn[6:]

'7890'

In [2]:
# Converting into integer
int(ssn[6:])

7890

In [None]:
# Last 4 digits of SSN (alternative)
ssn[-4:]

* Extracting information from delimited strings (eg: extract first field from record where fields are comma separated)

In [None]:
# Extracting first item from delimited string
order = '1,2013-07-25 00:00:00.0,1,CLOSED'

In [None]:
order.split(',') # Convert string into array of strings

In [None]:
order.split(',')[0] # Extracts first element

* Trimming unnecessary characters at the beginning or at the end of the string.

In [None]:
# Getting help on the string
help(str)

In [None]:
# Strip leading and trailing spaces as well as dots (.)
s = '   hello.  '
s.strip(' ').strip('.')

* Length of the string for data quality (eg: checking if telephone number is 10 digits or not)

In [4]:
phone_number = '1234567890'
len(phone_number)

10

In [5]:
len(phone_number) == 10

True

* Validating the type of content in string (eg: checking if social security number which is passed as string have only numbers or not)

In [6]:
ssn = '123456789'
ssn.isdigit()

True

In [7]:
ssn = '123 45 6789'
ssn.isdigit()

False

* Converting case of the string

In [13]:
company = 'iTVersity, inc'

In [14]:
company.upper()

'ITVERSITY, INC'

In [15]:
company.lower()

'itversity, inc'

In [16]:
company.capitalize()

'Itversity, inc'

* Replacing part of the string. Replace spaces with empty string and validate if it contain numbers only

In [17]:
ssn = '123 45 6789'
ssn.isdigit()

False

In [18]:
ssn.replace(' ', '').isdigit()

True

* Counting how many times a particular substring is repeated in the main string.

In [19]:
ssn = '123 45 6789'
ssn.count(' ')

2

## Defining Functions and Returning Values
Here are simple rules to define a function in Python -
* Function blocks begin with the keyword def followed by the function name and parentheses ().
* While defining functions we need to specify parameters in these parentheses.
* The code block within every function starts with a colon (:) and is indented.
* The statement return [expression] exits a function, passing back an expression to the caller. A return statement with no expression is the same as return None.
* The first statement of a function can be an optional statement - the documentation string of the function or docstring.
* When functions are invoked all arguments in the Python language are passed by reference into the function parameters.
* Every function implicitly contains a return None statement.
* We can return multiple expressions in Python.


In [None]:
def function_name(a, b):
    """Sample function"""
    return a, b

In [None]:
help(function_name)

In [None]:
print(str(function_name(1, 2)))

## Function Parameters and Arguments
Let us get an overview of different types of Function Parameters and Arguments supported by Python.
* Parameter is variable in the declaration of function. Argument is the actual value of this variable that gets passed to function.
* However, in some cases they are used interchangeably.
* In Python, parameters can be objects or even functions. We can pass named functions or lambda functions as arguments. We will talk about this later.

### Tasks
Let us perform a few tasks to understand all aspects of parameters.

* Checking whether phone numbers of a given employee are valid - get_invalid_phone_count
  * Function should take 2 arguments, employee_id and phone_numbers (variable number)
  * Check whether each phone number have 10 digits.
  * Return employee_id and number of phone numbers with less than 10 digits

In [38]:
def get_invalid_phone_count(employee_id, *phone_numbers):
    invalid_count = 0
    for phone_number in phone_numbers:
        if len(phone_number) < 10:
            invalid_count += 1
    return employee_id, invalid_count

In [24]:
s = 'Employee {employee_id} have {invalid_count} invalid phones'

In [54]:
employee_id, invalid_count = get_invalid_phone_count(1, '1234567890')

In [55]:
print(s.format(employee_id=employee_id, invalid_count=invalid_count))

Employee 1 have 1 invalid phones


* Adding employee add_employee
  * Function should take employee_id, employee_name, salary and phone_numbers (variable number), degrees (variable keyword arguments) as arguments.
  * Degrees should be with specialization. There can be one or more degrees with specializations with keys bachelors, masters, executive, doctorate.
  * Make sure salary is defaulted to 3000. If salary is passed and if it is less than 3000 throw exception with message “Invalid Salary, Salary should be at least 3000”
  * Call get_invalid_phone_count and check if it is greater than 0. If invalid phone count is greater than 0, throw an exception with message “One or more phone number of an employee is not valid”
  * Get count of degrees by processing variable keyword argument
  * If there are no exceptions print “Employee {employee_id} with {number} degrees is successfully added”

In [90]:
def add_employee(employee_id, employee_name, *phone_numbers, 
                 salary=3000, **degrees
                ):
    try:
        l_employee_id, l_invalid_count = get_invalid_phone_count(employee_id, *phone_numbers)
        if l_invalid_count != 0:
            print('One or more invalid phone numbers')
            raise ValueError
        for degree in degrees:
            if degree not in ('bachelors', 'masters', 'executive', 'doctorate'):
                print('One or more invalid degrees')
                raise ValueError
        print('Employee {} with {} degrees is successfully added'.
              format(employee_id, len(degrees))
             )
    except ValueError as ve:
        print('Problem with employee data')
    

In [91]:
add_employee(1, 'IT', '1234567890', salary=3000, bachelors='Math', masters='Math')

Employee 1 with 2 degrees is successfully added


In [92]:
add_employee(1, 'IT', '1234567890', salary=3000, bachelors='Math', some_degree='Math')

One or more invalid degrees
Problem with employee data


In [93]:
add_employee(1, 'IT', '1234567890', '1234', salary=3000, bachelors='Math', masters='Math')

One or more invalid phone numbers
Problem with employee data


## Recap
Let us recap all about arguments as we have used all the examples.
* Required Arguments or Parameters (typically starts at beginning)
* Parameters with Defaults or Keyword Parameters
* Variable-length Parameters or Arguments
  * Typically starts with *
  * It can be a normal parameter or keyword parameter.
  * Variable number keyword parameters start with ** and is interpreted as dict in function.
* Order of Parameters
  * Required Parameters (with no defaults)
  * One or more of these
    * Parameters with defaults or Keyword Parameter 
    * Variable length Parameter
  * Variable length Keyword Parameter
  * In Python 2, variable length parameters cannot be before keyword argument.
* As a Python Programmer we need to be familiar with all types of arguments as they are used quite extensively in the 3rd party libraries such as Pandas.


## Lambda Functions

Let us understand details related to Lambda Functions.

* A lambda function is nothing but a function without a name.
* We can assign it to a variable or use it as a parameter to a function.
* There are limitations to lambda functions.
  *  We cannot specify return statement.
  * We cannot create new variables.
  * We can only process the arguments to lambda function using simple expressions. These simple expressions will be returned automatically.
* Let’s take the example of sum of integers between a range using loops and develop other functionality using lambda functions.
  * Sum of squares of integers between a range
  * Sum of cubes of integers between a range
  * Sum of the even numbers between a range
* Code using named functions


In [None]:
def sumOfIntegers(lb, ub):
    total = 0
    for i in range(lb,ub+1):
        total += i
    return total

sumOfIntegers(1, 10)

In [None]:
def sumOfSquares(lb, ub):
    total = 0
    for i in range(lb,ub+1):
        total += i * i
    return total

sumOfSquares(1, 10)

In [None]:
def my_sum(lb, ub, f):
    total = 0
    for i in range(lb, ub+1):
        total += f(i)
    return total

In [None]:
def i(n): return n

my_sum(5, 10, i)

In [None]:
def sqr(n): return n * n

my_sum(5, 10, sqr)

In [None]:
def cube(n): return n * n * n

my_sum(5, 10, cube)

In [None]:
def even(n): return n if (n%2 == 0) else 0

my_sum(5, 10, even)

* Code using lambda functions

In [None]:
def my_sum(lb, ub, f):
    total = 0
    for i in range(lb, ub+1):
        total += f(i)
    return total

In [None]:
my_sum(5, 10, lambda n: n)

In [None]:
my_sum(5, 10, lambda n: n * n)

In [None]:
my_sum(5, 10, lambda n: n * n * n)

In [None]:
my_sum(5, 10, lambda n: n if(n%2==0) else 0)

* Without lambda functions we might have to develop 3 different functions with name. However, we have reduced the amount of coding by passing lambda functions as arguments.
* Let us see where Lambda Functions are extensively used.
  * Several 3rd party libraries such as itertools
  * Functions from standard libraries such as map, filter etc