# PyThon Data Science Toolbox (Part 1)
* Link: https://learn.datacamp.com/courses/python-data-science-toolbox-part-1


## Course Description

It's time to push forward and develop your Python chops even further. There are tons of fantastic functions in Python and its library ecosystem. However, as a data scientist, you'll constantly need to write your own functions to solve problems that are dictated by your data. You will learn the art of function writing in this first Python Data Science Toolbox course. You'll come out of this course being able to write your very own custom functions, complete with multiple parameters and multiple return values, along with default arguments and variable-length arguments. You'll gain insight into scoping in Python and be able to write lambda functions and handle errors in your function writing practice. And you'll wrap up each chapter by using your new skills to write functions that analyze Twitter DataFrames.

## Course Agenda - What you will learn
* **Functions**
  * Functions with and without parameters.
  * Functions with and without return values.
  * Functions with multiple arguments.
  * Functions with multiple return values.
  * Functions with default arguments.
  * Functions that accept an arbitrary number of parameters (```*args``` and ```*kwargs```).
* **Function usage in data science**
  * Code examples 
* **Other functions features**
  * Nested functions
  * Lamba functions / Anonymous functions
  * Error-handling within functions

# Chapter 1 - Writing your own functions

In this chapter, you'll learn how to write simple functions, as well as functions that accept multiple arguments and return multiple values. You'll also have the opportunity to apply these new skills to questions commonly encountered by data scientists.

## User-defined functions

### Defining functions

In [1]:
# Defining functions

def square_noarg_noreturn():
    '''Case 1 - Function with no arguments and no return'''
    new_value = 4 ** 2
    print(new_value)

def square_noarg_withreturn():
    '''Case 2 - Function with no arguments and with return'''
    new_value = 4 ** 2
    return new_value

def square_witharg_withreturn(value):
    '''Case 3 - Function with 1 argument and with return'''
    new_value = value ** 2
    return new_value

def square_withStdarg_withreturn(value=1): 
    '''Case 4 - Function with default argument value and with return'''
    new_value = value ** 2
    return new_value

print("Case 1 - square_noarg_noreturn(): ")
square_noarg_noreturn()
print("Case 2 - square_noarg_withreturn():", square_noarg_withreturn())
print("Case 3 - square_witharg_withreturn(5):", square_witharg_withreturn(5))
print("Case 4.1 - square_withStdarg_withreturn():", square_withStdarg_withreturn())
print("Case 4.2 - square_withStdarg_withreturn(3):", square_withStdarg_withreturn(3))


Case 1 - square_noarg_noreturn(): 
16
Case 2 - square_noarg_withreturn(): 16
Case 3 - square_witharg_withreturn(5): 25
Case 4.1 - square_withStdarg_withreturn(): 1
Case 4.2 - square_withStdarg_withreturn(3): 9


### Docstrings
* Docstrings is a way to make comments on your code in multiple lines. 
* It can be used as documentation to describe function behavior and to make importante notes inside the code.
* To use Docstrings simply put text between use triple double quotes """ 
* Example:

```
"""
This is
a multiline
comment
"""

# This is a single line comment
```

## Multiple Parameters and Return Values

* **Parameters**: A function can have multiple parameters. Each parameter has it's own name. Parameters have a defined order. Let's assume a you define a function as ```def my_func(value1, value2, value3): ... ```. We can call this function in different ways. 
  * Calling the function as ```my_func(2, 4, 8)```, will make ```value1 = 2```, ```value2 = 4``` and ```value3 = 8```.
  * Another way is to call the function as ```my_func(value2=2, value3=4, value1=8)```, which makes ```value1 = 8```, ```value2 = 2```, ```value3 = 4```.


### Multiple function parameters

In [None]:
# Multiple Parameters

def raise_to_power(value1, value2):
    """Raise value1 to the power of value2."""
    new_value = value1 ** value2
    return new_value

print("Example 1 - raise_to_power(3, 2):", raise_to_power(3, 2))
print("Example 2 - raise_to_power(2, 3):", raise_to_power(2, 3))
print("Example 3 - raise_to_power(value1=3, value2=2):", raise_to_power(value1=3, value2=2))
print("Example 4 - raise_to_power(value2=3, value1=2):", raise_to_power(value2=3, value1=2))

Example 1 - raise_to_power(3, 2): 9
Example 2 - raise_to_power(2, 3): 8
Example 3 - raise_to_power(value1=3, value2=2): 9
Example 4 - raise_to_power(value2=3, value1=2): 8


### Multiple Returns
* **Returns**: To return multiple values we use tuples.

#### Tuples
  * **Definition**: Contains multiples values, are immutable (can't be changed).
  * **Creating tuples**: Tuples are constructed using parentheses (). Example: ```my_typle = (10, 20, 40)```.
  * **Accessing values, first way**: To access tuples values we can assign it to several variables. Example: ```val1, val2, val3 = my_tuples```, which makes ```val1 = 10```, ```val2 = 20``` and ```val3 = 40```.
    * Note that to assign a tuple to several variables we have to take into consideration the tuple's order of values. That's why in the example above ```val1``` has value ```10``` instead of ```20```.
  * **Accessing values, second way**: We can access tuples values the same way we do for lists. Example: ```my_typle[2]``` returns value ```40```.

In [None]:
# Tuples

value1 = 10
value2 = 20
value3 = 40
first_tuple = (value1, value2, value3)
print("first_tuple = (", first_tuple[0], ",", first_tuple[1], ",", first_tuple[2],")")

second_tuple = (13, 54, 30)
a, b, c = second_tuple
print("second_tuple = (", a, ",", b, ",", c,")")

first_tuple = ( 10 , 20 , 40 )
second_tuple = ( 13 , 54 , 30 )


In [None]:
# Multiple Returns

def sum_diff(value1, value2):
    """Makes sum and different of value1 and value2."""
    my_sum = value1 + value2
    my_diff = value1 - value2

    new_tuple = (my_sum, my_diff)
    return new_tuple

result = sum_diff(10, 2)
print("sum_diff(10, 2):", result)

sum_diff(10, 2): (12, 8)


## Bringing it all together

You've got your first taste of writing your own functions in the previous exercises. You've learned how to add parameters to your own function definitions, return a value or multiple values with tuples, and how to call the functions you've defined.

In this and the following exercise, you will bring together all these concepts and apply them to a simple data science problem. You will load a dataset and develop functionalities to extract simple insights from the data.

For this exercise, your goal is to recall how to load a dataset into a DataFrame. The dataset contains Twitter data and you will iterate over entries in a column to build a dictionary in which the keys are the names of languages and the values are the number of tweets in the given language. The file `tweets.csv` is available in your current directory.

In [None]:
import pandas as pd

def count_entries(df, col_name):
    # Initialize an empty dictionary
    langs_count = {} 

    for entry in df[col_name]:
        # If the language is in langs_count, add 1 
        if entry in langs_count.keys():
            langs_count[entry] += 1
        # Else add the language to langs_count, set the value to 1
        else:
            langs_count[entry] = 1

    return langs_count

# Import Twitter data as DataFrame: df
twitter_df = pd.read_csv("/content/drive/My Drive/Colab Notebooks/Course - Python Data Science Toolbox - Part 1/tweets.csv")

result = count_entries(twitter_df, "lang")

# Print the populated dictionary
print(result)


{'en': 97, 'et': 1, 'und': 2}


# Chapter 2 - Default arguments, variable-length arguments and scope

In this chapter, you'll learn to write functions with default arguments so that the user doesn't always need to specify them, and variable-length arguments so they can pass an arbitrary number of arguments on to your functions. You'll also learn about the essential concept of scope.

## Scope and user-defined functions

### Overview
* Not all objects are accessible everywhere in a script
* Scope is a part of the program where an object or name may be
accessible
  * **Global scope**: dened in the main body of a script
  * **Local scope**: dened inside a function
  * **Built-in scope**: names in the pre-dened built-ins module

In [None]:
# Testing scopes: local x global (1)

global_numerator = 10
print("Test 1 - Global Variable: ", global_numerator)

def divide_by_2(value):
    """Returns the value divided by 2"""
    local_division = value / 2
    print("Test 2 - Local Variable: ", local_division)
    return local_division

divide_by_2(global_numerator)
print("Test 3 - Global Variable: ", global_numerator)
print("Test 4 - Local Variable: ", local_division) # twrows error, variable not accesible


Test 1 - Global Variable:  10
Test 2 - Local Variable:  5.0
Test 3 - Global Variable:  10


NameError: ignored

In [None]:
# Testing scopes: local x global (2)

global_value = 10
print("Test 1 - Global Variable: ", global_value)

def reduce_by_2(global_value): # The funcation creates a local variable named "global_value"
    """Returns the value reduced by 2"""
    global_value = global_value - 2
    print("Test 2 - Local Variable: ", global_value) 
    return global_value

reduce_by_2(global_value)
print("Test 3 - Global Variable: ", global_value)

Test 1 - Global Variable:  10
Test 2 - Local Variable:  8
Test 3 - Global Variable:  10


In [None]:
# Testing scopes: local x global (3)

global_value = 10
print("Test 1 - Global Variable: ", global_value)

def reduce_by_2(value): 
    """Returns the value reduced by 2"""
    global global_value # We force the use of the global variable "global_value"
    global_value = value - 2
    print("Test 2 - Local Variable: ", global_value) 
    return global_value

reduce_by_2(global_value)
print("Test 3 - Global Variable: ", global_value)

Test 1 - Global Variable:  10
Test 2 - Local Variable:  8
Test 3 - Global Variable:  8


### Python's built-in scope
* It is a built-in module called builtins. 
* To query builtins it's needed to import ```builtins```.
  * It's interesting to note that ```builtins``` are not itself built in, because they must be imported
* After executing ```import builtins``` in the IPython Shell, execute ```dir(builtins)``` to print a list of all the names in the module ```builtins```. 
* Some of the ```builtins``` are ```bool```, ```enumerate```, ```format```, ```help```, ```len```, ```map```, ```max```, ```min```, ```pow```, ```round```, ```sorted```, ```str```, ```sum```, ```tuple```, ```type``` and many more.

Have a look and you'll see a bunch of names that you'll recognize! Which of the following names is NOT in the module builtins?

In [None]:
import builtins
my_dict = {"bultins": dir(builtins)}

for i, builtin in enumerate(my_dict["bultins"]):
    print("builtin", i, ":", builtin)

builtin 0 : ArithmeticError
builtin 1 : AssertionError
builtin 2 : AttributeError
builtin 3 : BaseException
builtin 4 : BlockingIOError
builtin 5 : BrokenPipeError
builtin 6 : BufferError
builtin 8 : ChildProcessError
builtin 9 : ConnectionAbortedError
builtin 10 : ConnectionError
builtin 11 : ConnectionRefusedError
builtin 12 : ConnectionResetError
builtin 14 : EOFError
builtin 15 : Ellipsis
builtin 16 : EnvironmentError
builtin 17 : Exception
builtin 18 : False
builtin 19 : FileExistsError
builtin 20 : FileNotFoundError
builtin 21 : FloatingPointError
builtin 23 : GeneratorExit
builtin 24 : IOError
builtin 25 : ImportError
builtin 27 : IndentationError
builtin 28 : IndexError
builtin 29 : InterruptedError
builtin 30 : IsADirectoryError
builtin 31 : KeyError
builtin 32 : KeyboardInterrupt
builtin 33 : LookupError
builtin 34 : MemoryError
builtin 35 : ModuleNotFoundError
builtin 36 : NameError
builtin 37 : None
builtin 38 : NotADirectoryError
builtin 39 : NotImplemented
builtin 40 : No

## Nested functions

* A function inside another function.
* Used to avoid repeating code snipets.
* Organizes the code and make code reading easier.
* Still about scopes, another declaration to use is ```nonlocal``` which makes explict that a variable is not local, belonging to an external scope.

### Scopes searched order
1. Local scope
2. Enclosing functions 
3. Global
4. Built-in

LEGB search order

In [None]:
# Nested functions

def outer_function(value):
    value = value + 1

    def inner_function():
        nonlocal value # Says that "value" is nonlocal
        print("Step 2 - Inside inner_function: ", value)
        value = value * value
        print("Step 3 - Inside inner_function: ", value)
        
    print("Step 1 - Before inner_function: ", value)
    inner_function()
    print("Step 4 - After inner_function: ", value)

outer_function(10)


Step 1 - Before inner_function:  11
Step 2 - Inside inner_function:  11
Step 3 - Inside inner_function:  121
Step 4 - After inner_function:  121


In [None]:
# Nested functions - returning a function

def echo(n):
    """Return the inner_echo function."""

    def inner_echo(word1):
        """Concatenate n copies of word1."""
        echo_word = word1 * n
        return echo_word

    return inner_echo

twice = echo(2) # Call echo: twice
thrice = echo(3) # Call echo: thrice
print(twice('hello'), thrice('hello'))

hellohello hellohellohello


## Default arguments
* It's possible to set some default values to arguments. 

In [None]:
# Default values
def sum_values(value1, value2=2, value3=3):
    return value1 + value2 + value3

# value1 = 5, value2 = 5, value3 = 5
print(sum_values(5, 5, 5))

# value1 = 2, value2 = 2, value3 = 3
print(sum_values(2))

15
7


## Flexible arguments
* You can pass any number of arguments to a function.
* Use of ```*args```
  * Inside the function some handling must be done to access all the values in ```*args```
* Use of ```*kwargs```
  * Used to pass any number of parameters, each with an identifier

In [None]:
# Flexible arguments with *args
def sum_all_values(*args):
    sum_all = 0

    for num in args:
        sum_all += num
    return sum_all

print(sum_all_values(1))
print(sum_all_values(1, 2))
print(sum_all_values(1, 2, 3))
print(sum_all_values(1, 2, 3, 4))
print(sum_all_values(1, 2, 3, 4, 5))

1
3
6
10
15


In [None]:
# Flexible arguments with **kwargs
def print_all(**kwargs):
    """Print out key-value pairs in **kwargs."""
    
    for key, value in kwargs.items():
        print(key + ": " + value)

print_all(first_name="João", middle_name="Luiz", last_name="Gross")
print("")
print_all(country="Brazil", state="Rio Grande do Sul", city="Porto Alegre")

first_name: João
middle_name: Luiz
last_name: Gross

country: Brazil
state: Rio Grande do Sul
city: Porto Alegre


## Exercise 
* Using DataFrames to exercise scope, nested functions, default arguments and flexible arguments (```*args``` and ```*kwargs```)

In [None]:
import pandas as pd

def count_entries(df, *args):
    """Return a dictionary with counts of
    occurrences as value for each key."""
    
    cols_count = {}
    for col_name in args:
        col = df[col_name]
    
        for entry in col:
            if entry in cols_count.keys():
                cols_count[entry] += 1
            else:
                cols_count[entry] = 1

    return cols_count

twitter_df = pd.read_csv("/content/drive/My Drive/Colab Notebooks/Course - Python Data Science Toolbox - Part 1/tweets.csv")
print(count_entries(twitter_df, "lang"))
print(count_entries(twitter_df, "lang", "source"))

{'en': 97, 'et': 1, 'und': 2}
{'en': 97, 'et': 1, 'und': 2, '<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>': 24, '<a href="http://www.facebook.com/twitter" rel="nofollow">Facebook</a>': 1, '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>': 26, '<a href="http://twitter.com/download/iphone" rel="nofollow">Twitter for iPhone</a>': 33, '<a href="http://www.twitter.com" rel="nofollow">Twitter for BlackBerry</a>': 2, '<a href="http://www.google.com/" rel="nofollow">Google</a>': 2, '<a href="http://twitter.com/#!/download/ipad" rel="nofollow">Twitter for iPad</a>': 6, '<a href="http://linkis.com" rel="nofollow">Linkis.com</a>': 2, '<a href="http://rutracker.org/forum/viewforum.php?f=93" rel="nofollow">newzlasz</a>': 2, '<a href="http://ifttt.com" rel="nofollow">IFTTT</a>': 1, '<a href="http://www.myplume.com/" rel="nofollow">Plume\xa0for\xa0Android</a>': 1}


# Chapter 3 - Lambda functions and error-handling

Learn about lambda functions, which allow you to write functions quickly and on the fly. You'll also practice handling errors in your functions, which is an essential skill. Then, apply your new skills to answer data science questions.

## Lambda (Anonymous) functions

* Functions with no name assigned to variables. Example:

```
raise_to_power = lambda x, y: x ** y
raise_to_power(2, 3)
> 8
```

* Applying a series of values to a lambda function requires ```map()```
  * Function map takes two arguments: ```map(func, seq)```. Example:

```
values = [2, 3, 4, 5, 6, 7]
raise_to_power2 = map(lambda x: x ** 2, values)
print(list(raise_to_power2))
> [4, 9, 16, 25, 36, 49]
```  

* Next you will see the use of functions ```map()```, ```filter()``` and ```reduce()``` with lambda functions

In [None]:
# Lambda function - map()

values = [2, 3, 4, 5, 6, 7]
raise_to_power2 = map(lambda x: x ** 2, values)
print(raise_to_power2)
print(list(raise_to_power2))

<map object at 0x7f448fd11828>
[4, 9, 16, 25, 36, 49]


In [None]:
# Lambda functions - filter()
fellowship = ['frodo', 'samwise', 'merry', 'pippin', 'aragorn', 'boromir', 'legolas', 'gimli', 'gandalf']
result = filter(lambda member: len(member) > 6, fellowship)
print(list(result))

['samwise', 'aragorn', 'boromir', 'legolas', 'gandalf']


In [None]:
# Lambda function - reduce()
from functools import reduce
stark = ['robb', 'sansa', 'arya', 'brandon', 'rickon']
result = reduce(lambda item1, item2: item1 + "|" + item2, stark)
print(result)

robb|sansa|arya|brandon|rickon


## Introduction to error handling

* Used to catch specfic problems or behaviors.
* During Python code execution some **Exception** may occur.
  * Exceptions are not expected behaviors when using a function.
* Catching exceptions can be done with **try-except clause**.
  * First we **try** to run a piece of code.
  * If and **exception** occurs, then run another piece of code.

In [None]:
# Catching expection
def sqrt(test, x):
    """Returns the square root of a number."""
    try:    
        x ** 0.5
    except:
        print(test, '- x must be an int or float')
        return
    print(test, "- ok")
    return

sqrt("Test 1", 4) # int - ok
sqrt("Test 2", 3.2) # float - ok
sqrt("Test 3", "John") # string - error

Test 1 - ok
Test 2 - ok
Test 3 - x must be an int or float


In [None]:
# Specifying TypeError expection and raising ValueError exception
def sqrt(test, x):
    """Returns the square root of a number."""
    if x < 0:
        raise ValueError(test, '- x must be non-negative')
    try:    
        x ** 0.5
    except TypeError:
        print(test, '- x must be an int or float')
        return
    print(test, "- ok")
    return

sqrt("Test 1", 4) # int - ok
sqrt("Test 2", 3.2) # float - ok
sqrt("Test 3", -10) # negative number - error

Test 1 - ok
Test 2 - ok


ValueError: ignored

# What you’ve learned
* Write functions that accept single and multiple arguments
* Write functions that return one or many values
* Use default, exible, and keyword arguments
* Global and local scope in functions
* Write lambda functions
* Handle errors

# Python Data Science Toolbox Part 2
* Create lists with list comprehensions
* Iterators - you’ve seen them before!
* Case studies to apply these techniques to Data Science