***
# Python Functions & Packages
***

Without telling you, we've been using Python built-in functions without specifically calling them out. **type()** is a function, as is **print()**. What is a function? Simply, it's a piece of reusable code that solves or performs a particular task. The **type()** function tells us what data type we are dealing with. We can use Python's functions instead of writing code ourselves. Let's create a new list of employees.
## Functions

In [1]:
# Create a new list of employee salaries
employees = [20000, 25000, 30000, 35000, 40000, 45000, 50000]

# Use the Python function max() to find the maximum salary in our list
max(employees)

50000

We don't need to know exactly how **max()** works, we just need to know that it does. It's a bit like a black box. We simply pass it an argument and let it do the work and produce an output. More importantly, we didn't have to write the function code, we simply called it and passed an argument.

We can also assign the result of a function call to a new variable.

In [2]:
# Create a new variable: highest_paid
highest_paid = max(employees)

# Output: highest paid
highest_paid

50000

We can now use the variable **highest_paid** elsewhere in our Python programs.

Python's **round()** function takes two inputs. First the number that you want to round and second the precision point to round to, which simply means how many digits behind the decimal point you want to keep.

Here is pi to 10 digits: 3.14159265358979. But we may not want it to 10 digits we may simply want it to 2.

In [3]:
# Use round() to round pi to 2 decimal places
pi = round(3.14159265358979, 2)

# Print pi
pi

3.14

If you are not sure on how to use a function or what format it should take you can always fall-back on its documentation. To find help for the **round()** we simply type _help(round)_ or _round?_.

In [4]:
help(round)

Help on built-in function round in module builtins:

round(...)
    round(number[, ndigits]) -> number
    
    Round a number to a given precision in decimal digits (default 0 digits).
    This returns an int when called with one argument, otherwise the
    same type as the number. ndigits may be negative.



Something to remember when reading Python function documentation, when an argument is inside square brackets it means that the argument is optional. We can see this here, round(number**[, ndigits]**)

In [5]:
# Round 1.68 to 1 decimal place
num = round(1.68, 1)

# Print num
num

1.7

In [6]:
# Round 1.68 with only 1 input
num01 = round(1.68)

# Print num
num01

2

Python knows that we didn't enter a second input and so automatically rounds to the closest integer. In the examples above that's 1.7 and 2.

You may be thinking at this point how are you to know what functions are available. This is were your own initiative is your best resource. If you are looking to write a piece of code to perform a task there is a very strong chance that another coder has tried to do something similar. Generally, for every task that you would like to perform with Python a function exists. To find it just have a Google.

Python offers lots of functions to help make your life as a data scientist easier. We have already seen several, **print(), type(), str(), int(), bool(), float()**. 

Let's look at some other Python built-in functions

In [7]:
# Create two new lists of employees salary
developers = [30000, 35000, 40000, 45000, 50000, 55000]
testers = [25000, 28000]

# Combine both lists
technical_team_salary = developers + testers

# Print combined lists
print(technical_team_salary)

[30000, 35000, 40000, 45000, 50000, 55000, 25000, 28000]


As you can see from the output above the **technical_team_salary** output is unsorted. Lists in Python are unsorted by default. To resolve this, there is as you would you expect a function available, **sorted()**. Let's take a look at its documentation using **help(sorted)**.

In [8]:
help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.



We can see from the documentation that sorted() takes three arguments, iterable, key and reverse. What happens if we add the reverse argument to our list?

In [9]:
# List with argument reverse = True
technical_team_salary_sorted = sorted(technical_team_salary, reverse = True)

# Output new list
print(technical_team_salary_sorted)

[55000, 50000, 45000, 40000, 35000, 30000, 28000, 25000]


In [10]:
# List using sorted function with no arguments
technical_team_salary_sorted = sorted(technical_team_salary)

# Output list
print(technical_team_salary_sorted)

[25000, 28000, 30000, 35000, 40000, 45000, 50000, 55000]


Another useful built-in Python functions is **len()**, which returns the numbers of elements in a list. Take a look.|

In [11]:
# Use len(len) to get number of elements in a list
len(technical_team_salary_sorted)

8

## Python Objects
Everything in Python is an object. But what is an object? In previous code samples we've created several variables that have been a mix of python data types, strings, floats and lists. Each one of these data structures are called Python objects.

- A string is an object
- A float is an object
- A list is an object

These objects have specific types which we already know, str, floats and lists and we've seen how each type holds different values.

Python objects also come with methods. Methods are functions which belong to specific objects.

For example:

- A Python object of type str has methods such as capitalize() and replace()
- A Python object of type float has methods such as bit_length() and conjugate()
- A Python object of type list has methods such as index() and count()

Python objects have specific methods depending on their type. You cannot apply the method capitalize() to an integer. This method belongs only to string objects.

Lets look at some examples by recreating our developers salary list.

In [12]:
# Create a developers salary list
developers = [30000, 35000, 40000, 45000, 50000, 55000]

We know that developers is a Python object of type list.

In [13]:
type(developers)

list

We also know that lists have a method called index. Lets find the index for the element 40000.

In [14]:
developers.index(40000)

2

This method would work the same no matter what data types are inside our list such as strings, floats or other lists. What we have done is called the index() method on the developers lists. What happens if we use the count() method?

In [15]:
# .count() method applied to developers list
developers.count(55000)

1

Here we've use the count() method to tell us how many of our employees are on a salary of 55000. As mentioned above there are other Python objects which have methods associated with them. Such as floats, integers, booleans and strings are all python objects which have specific methods associated with them. Lets look at some string methods.

In [16]:
# New variable: lead_developer
lead_developer = "tony"

In [17]:
# Apply the method .capitalize()
lead_developer.capitalize()

'Tony'

This method returns a string where the first letter is capitalized. We can also replace parts of a string with other parts. Lets call the method **replace()** with two inputs, "y" and "i".

In [18]:
# Call lead_developer with replace() method
lead_developer.replace("y", "i")

'toni'

As you can see the character "y" has been replaced with "i". Let's examine other string methods.

In [19]:
# Call lead_developer with .upper(method)
lead_developer.upper()

'TONY'

In [20]:
# Use .count() to count how many time a character is used in the variable lead_developer
lead_developer.count("t")

1

At this points it's worth reminding ourselves that not all methods are applicable to all Python types. However where methods can be applied to more than one data type such as the method index() which can be applied to strings and lists the method behaves differently.

Before we finish our discussion of Python's functions and methods let's look at one more method, the append method. We've just hired a new genius level developer who's salary is 70000. How do we add this new salary to our developers list?

In [21]:
# Create a list of developers salaries
developers = [30000, 35000, 40000, 45000, 50000, 55000]

# Apply .append() method to developers list with one argument
developers.append(70000)

# Output developers
developers


[30000, 35000, 40000, 45000, 50000, 55000, 70000]

Our developers list has now been extended with the integer 70000.

## Packages
In the last section we seen several examples of Python's functions and methods and how powerful they can be. As well as time saving. Python's functions and methods allows us to leverage not only other people's code but the code of built-in functions as well. 

The idea of using code that is not our own brings us nicely to the topic of packages. In Python, packages are ready to use scripts or programs. Unlike functions or methods they are complete programs which we can plug in to our code at the right time to produce a specific output. Each script is a module which contains functions, methods and types which as just mentioned are aimed at solving particular problems.  

There are thousands of packages available to help you do all sorts of amazing things with Python. Shortly we'll be using the matplotlib package which helps when creating visualizations of our data.

Not all packages are available in Python by default. We need to install and enable some. To use any package in your programs you will first need to install it on your computer and then tell your program to use that specific package.

To install a package we need to use PIP. PIP is a package maintenance system for Python. Go to https://pip.pypa.io/en/stable/ for more information on installing PIP for your particular operating system. It's worth pointing out here that if you download Python directly from the Python website and are using versions Python 2 >=2.7.9 or Python 3 >=3.4 then you already have PIP installed. If you installed Python as part of the Anaconda data science platform then you may need to install PIP. 

Go to https://pip.pypa.io/en/stable/installing/ and download get-pip.py. Open up your terminal or command prompt and enter python3 (if you are using Python3) get-pip.py. The exact line is _python3 get-pip.py_. When PIP is installed you can use it to install any Python package that you need. For example by typing the line _pip3 install numpy_ into your terminal or command prompt you will install the NumPy package. When typing "python" at the command line we use "python3" and "pip3" to tell our system that we are working with version 3 of Python.

With the necessary package successfully installed we can start to use it in our Python programs. To do this we need to import the package or one of its modules into our programs using the import statement, **import numpy**. When working with large datasets NumPy is a package that we will be using a lot. One common function of NumPy is **array**. When working with NumPy we cannot simply say "array([1,2,3]), we need to tell Python that we want to make use of the array function from NumPy. Take a look.

In [22]:
# Import the NumPy package
import numpy

In [23]:
# This approach will not work
new_array = array([1,2,3])

NameError: name 'array' is not defined

In [24]:
new_array = numpy.array([1,2,3])
new_array

array([1, 2, 3])

Having to write "numpy" every time is quickly going to become tedious particularly as our programs become larger and more complex. Instead what we can do is import a package and then refer to it with a different, shorter name. Take a look:

In [25]:
import numpy as np

array2 = np.array([1,2,3,])

array2

array([1, 2, 3])

We've just extended our import statement with "as np". You will find this is a very common practice in Python. 

There will be times when we need only one specific function from a package. Suppose we only ever want to use array from NumPy. Instead of writing "import numpy" we can write "from numpy import array"

In [26]:
from numpy import array

array3 = array([1,2,3])

array3

array([1, 2, 3])

Have you noticed the different way we called the array function? Instead of using numpy.array or np.array we simply wrote "array([1,2,3])". This might seem more convenient right now when our program is three lines long but imagine 1000 or 2000 lines of code and your array is at line 1456. Will you know then that your array is referring to a numpy array? What if you share your program with a colleague? How will she know that this array is a NumPy array? Every time you or your colleague need to check you will have to scroll to the top of your program to see how you imported NumPy. Not very efficient. You will quickly loose context as your programs get longer and for this reason using _"import numpy as np"_ is the preferred option. 