# Python Introduction

Useful Resources
- [Python Docs 3.6](https://docs.python.org/3.6/)
- [Numpy Docs](https://docs.scipy.org/doc/numpy-1.15.1/reference/)
- [Pandas Docs](https://pandas-docs.github.io/pandas-docs-travis/index.html)
- [Python Cheatsheet I recommend](https://gto76.github.io/python-cheatsheet/)

The Python overview follows the structure of the tutorial in the [official documentation](https://docs.python.org/3.6/tutorial/introduction.html). Many of the descriptions and examples below are directly from this tutorial.

This course also offers free access to DataCamp which provides interactive lessons on Python concepts.

In [2]:
"Hello World!"

'Hello World!'

In [3]:
print("Hello World!")

Hello World!


## Numbers


The integer numbers (e.g. `2`, `4`, `20`) have type `int`, the ones with a fractional part (e.g. `5.0`, `1.6`) have type `float`.

Some examples of basic operations:

In [4]:
100 + (12 * 3) - (4 / 10)

135.6

In [5]:
print("addition:", "2 + 2 =",             2 + 2) # addition
print("subtraction:", "7 - 2 =",          7 - 2) # subtraction
print("multiplication:", "2 * 2 =",       2 * 2) # multiplication
print("classic division:", "17 / 3 =",    17 / 3) # classic division
print("floor division:", "17 // 3 =",     17 // 3) # floor division
print("exponent:", "2 ** 4 =",            2 ** 4) # exponent
print("remainder or modulo:", "17 % 3 =", 17 % 3) # remainder or modulo

addition: 2 + 2 = 4
subtraction: 7 - 2 = 5
multiplication: 2 * 2 = 4
classic division: 17 / 3 = 5.666666666666667
floor division: 17 // 3 = 5
exponent: 2 ** 4 = 16
remainder or modulo: 17 % 3 = 2


Parentheses work as they normally do:

(if you are ever in doubt explicitly write the parentheses)

In [6]:
print(5 + 4 * 1/2) # becomes 5 + 2
print((5 + 4) * (1/2)) # becomes 9 * 0.5

7.0
4.5


You can perform Type conversion (like casting) between int and float like so:

In [7]:
print(int(3.0))
print(float(3))
print(int(3.1)) # note going from float to int will floor the number

3
3.0
3


The equal sign (`=`) is used to assign a value to a variable. A variable can be referenced and its value modified.

In [8]:
x = 5
y = 10
z = x + y
z

15

A variable must be assigned when it is declared or else it will throw an error.

In [9]:
a

NameError: name 'a' is not defined

A variable can have no value associated with it by assigning `None` to it (null value)

In [10]:
a = None
a

In [11]:
weight = 135 #pounds
height = 65 #inches
bmi = (weight / (height ** 2)) * 703
bmi

22.46272189349112

## Strings
Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes (`'...'`) or double quotes (`"..."`) with the same result. `\` can be used to escape quotes:

In [12]:
print('spam eggs')  # single quotes
print('doesn\'t')  # use \' to escape the single quote...
print("doesn't")  # ...or use double quotes instead

spam eggs
doesn't
doesn't


Strings can be concatenated (glued together) with the `+` operator, and repeated with `*`:

In [13]:
(3 * 'un') + 'ium'

'unununium'

Strings can be _indexed_ (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one:

In [14]:
word = 'Python'
print(word)
print(word[0])  # character in position 0
print(word[5])  # character in position 5

Python
P
n


Indices may also be negative numbers, to start counting from the right:

In [15]:
print(word)
print(word[-1])  # last character
print(word[-2])  # second-last character

Python
n
o


In addition to indexing, _slicing_ is also supported. While indexing is used to obtain individual characters, slicing allows you to obtain substring:

In [16]:
print(word[0:2])  # characters from position 0 (included) to 2 (excluded)
print(word[2:5])  # characters from position 2 (included) to 5 (excluded)

Py
tho


Note how the start is always included, and the end always excluded. This makes sure that `s[:i] + s[i:]` is always equal to `s`. Start and end indices can imply the start and end of the word.

In [17]:
print(word[:2] + word[2:])
print(word[:4] + word[4:])

Python
Python


In [18]:
len(word) # returns length of a string

6

In [19]:
3 + " blind mice" # + operand does not work between int and str

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [20]:
print(type(3)) # the type of 3 is int
print(type(str(3))) # use the str(...) builtin function to cast an int into a str

str(3) + " blind mice" # now that both are type str, the + operand will concat

<class 'int'>
<class 'str'>


'3 blind mice'

## Lists
Python knows a number of compound data types, used to group together other values. The most versatile is the list, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type.

Strings and lists are both sequence types, so they share many of the same behaviors.

In [21]:
squares = [1, 4, 9, 16, 25]

print(len(squares)) # use len to count number of elements in list

5


In [22]:
print(squares[0])  # indexing returns the item
print(squares[-1]) 
print(squares[-3:])  # slicing returns a new list
print(squares + [36, 49]) # use + to concat lists
print(squares * 2) # use * to multiply list

1
25
[9, 16, 25]
[1, 4, 9, 16, 25, 36, 49]
[1, 4, 9, 16, 25, 1, 4, 9, 16, 25]


In [23]:
print(squares)
print(squares[5])

[1, 4, 9, 16, 25]


IndexError: list index out of range

In [24]:
print(sum(squares)) # sum() adds all elements in list
print(max(squares)) # max() returns max element
print(min(squares)) # min() returns min element

55
25
1


In [25]:
# To create a list with 10 zeros
zeros = [0] * 10
print(zeros)
print(len(zeros))

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
10


In [26]:
zeros.append(36) # use append to add an element to the end of an existing list
print(zeros)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 36]


In [27]:
zeros[-1] = "Hello"
print(zeros)
zeros[len(zeros) - 1] = 100
print(zeros)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'Hello']
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100]


In [28]:
list_of_lists = [squares, zeros] # list can contain elements that are nested lists
print(list_of_lists)
list_of_lists[0][4] # use multiple brackets to referece the 5th element of the 1st nested list

[[1, 4, 9, 16, 25], [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 100]]


25

## Dictionaries

It is best to think of a dictionary as an unordered set of key: value pairs, with the requirement that the keys are unique. Dictionaries use the `{}`  curly brakets with comma seperated key:value pairs

In [29]:
new_dict = {} # creates an empty dict
new_dict

{}

In [30]:
area_codes = {"raliegh": 919, 
              "charlotte": 704, 
              "greensboro": 336,
              "richmond": 804,
              "washington": 202,
              "chesapeake": 757} # dict, list, etc. assignments can span multiple lines
area_codes

{'raliegh': 919,
 'charlotte': 704,
 'greensboro': 336,
 'richmond': 804,
 'washington': 202,
 'chesapeake': 757}

In [31]:
area_codes['charlotte'] # reference a key using brackets to return the associated value

704

In [32]:
area_codes['columbia'] # reference to a key not in the dict will throw an error

KeyError: 'columbia'

In [33]:
area_codes['columbia'] = 803 # add a new key value pair by referencing a new key while assigning a value in the same line
area_codes['columbia']

803

In [34]:
keys = list(area_codes.keys()) # dict.keys() returns the keys in a dict_keys object, and list(...) casts it to a list
keys.sort() # sorts the list keys in place (i.e. sort(...) returns None and keys is now sorted)
print(keys)

values = list(area_codes.values()) #dict.values() returns the values, and again is cast to a list
print(sorted(values)) # builtin function sorted will return a sorted list

['charlotte', 'chesapeake', 'columbia', 'greensboro', 'raliegh', 'richmond', 'washington']
[202, 336, 704, 757, 803, 804, 919]


## Conditionals

In [35]:
3 < 4

True

In [36]:
3 > 4

False

In [37]:
3 == 4

False

In [38]:
print(squares)
print(4 in squares)
print(1 not in squares)

[1, 4, 9, 16, 25]
True
False


In [39]:
number = 14
if number % 2 == 0: # modulo 2 will indicate even or odd
    print('even')
else:
    print('odd')

even


In [86]:
x = int(input("Please enter an integer: ")) # takes user input, press Enter to submit input

if x < 0:
    x = 0
    print('Negative changed to zero')
elif x == 0:
    print('Zero')
elif x == 1:
    print('Single')
else:
    print('More')
print(x)

Please enter an integer: -14
Negative changed to zero
0


## Loops
Loops are used in Python (like other progrmming languages) to repeatedly execute a block of code. The loop contains a condition which determines how many time the loop executes. This could be a fix value or a variable that changes. 

As you work with loops you may have a condition that never stops the loop from executing and creates an infinite loop. To stop it, go to the `oolbar > Kernal > Interrupt` to stop the execution and fix your loop.

In [41]:
#let's count to ten!
i = 1
while i <= 10: # this condition is check at the start of the loop. If it is true the loop body will be executed.
    print(i)
    i += 1 # increment i so that it will eventually reach > 10 and stop the loop

1
2
3
4
5
6
7
8
9
10


In [42]:
i = 1
while True:
    if i > 10:
        break
    print(i)
    i += 1

1
2
3
4
5
6
7
8
9
10


In [43]:
for element in squares:
    print(element)

1
4
9
16
25


In [44]:
print(range(10)) # a range is a sequence of numbers, useful for iterating in a for loop
print(list(range(10))) # cast as a list to view the numbers in the range
for i in range(10): # iterate through the range and print the values
    print(i)

range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
0
1
2
3
4
5
6
7
8
9


In [45]:
for i in range(len(squares)):
    print(squares[i])

1
4
9
16
25


In [46]:
list(area_codes.items())

[('raliegh', 919),
 ('charlotte', 704),
 ('greensboro', 336),
 ('richmond', 804),
 ('washington', 202),
 ('chesapeake', 757),
 ('columbia', 803)]

In [47]:
for k,v in area_codes.items():
    print(k, 'has an area code of', v)

raliegh has an area code of 919
charlotte has an area code of 704
greensboro has an area code of 336
richmond has an area code of 804
washington has an area code of 202
chesapeake has an area code of 757
columbia has an area code of 803


## Challenge: Create list of powers of 2
Populate the variable `powers2` with the powers of 2 from 0 to 20 (i.e. [2\*\*0, 2\*\*1, ... , 2\*\*20]

In [48]:
powers2 = []
#write code here

print(powers2)
assert powers2 == [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576]

[1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, 131072, 262144, 524288, 1048576]


## Challenge: Reverse a string
Given a string, store the reversed string in `reversed_str`

Example: input is "Hello World", output is "dlroW olleH"

In [49]:
input_str = 'Hello World'
reversed_str = ''

print(reversed_str)
assert reversed_str == 'dlroW olleH'

dlroW olleH


## Functions
Functions are a way to segment blocks of code into logic units with inputs and outputs. In Python, functions are declared with `def` keyword, a name, and a list of arguments (that can be empty) followed by `:`. The function can omit a return statement, and in that case would return `None`.

In [50]:
def fib(n):
    a, b = 0, 1
    while a < n:
        print(a)
        temp = b + a
        a = b
        b = temp

In [51]:
fib(10)

0
1
1
2
3
5
8


In [52]:
def fib2(n):
    results = []
    a, b = 0, 1
    while a < n:
        results.append(a)
        a, b = b, b + a
    return results

In [53]:
fib2(10)

[0, 1, 1, 2, 3, 5, 8]

In [54]:
def fib3(n=10):
    results = []
    a, b = 0, 1
    while a < n:
        results.append(a)
        a, b = b, b + a
    return results

In [55]:
fib3()

[0, 1, 1, 2, 3, 5, 8]

### Challenge: Capitalize a word
Given a word, capitalize the first letter in the word. (Hint: use `string.upper()` to make all characters in `word` uppercase)

Example: "example" returns "Example"

Bonus: modify this function to work on sentences! (Hint: use `string.split(' ')` to split a string into a lists of strings that were seperated by a space)

Example: "a quick brown fox" returns "A Quick Brown Fox"

In [56]:
def capitalize(string):
    output = ''
    #your code here
    
    return output

assert capitalize('test') == 'Test'
assert capitalize('Hello') == 'Hello'
# assert capitalize('a quick brown fox') == 'A Quick Brown Fox' # uncomment this line to check bonus

### Challenge: Fizzbuzz
Write a function that takes an int (integer number) as input, and returns a string based on the [Fizzbuzz](https://en.wikipedia.org/wiki/Fizz_buzz) game.

The rules of Fizzbuzz:
- If the number is divisible by 3, return 'fizz'
- If the number is divisible by 5, return 'buzz'
- If the number is divisible by 3 and 5 (i.e. by 15), return 'fizzbuzz'
- Else return the number that was passed in

Examples
- fizzbuzz(3) returns 'fizz'
- fizzbuzz(4) returns 4 or '4'
- fizzbuzz(5) returns 'buzz'
- fizzbuzz(13) returns 13 or '13'
- fizzbuzz(30) returns 'fizzbuzz'

In [57]:
def fizzbuzz(number):
    output = ''
    # write your code here

    return output


assert fizzbuzz(3) == 'fizz'
assert str(fizzbuzz(4)) == '4'
assert fizzbuzz(5) == 'buzz'
assert str(fizzbuzz(13)) == '13'
assert fizzbuzz(30) == 'fizzbuzz'

## NumPy

These descriptions and examples are from [Offical NumPy Quickstart](https://docs.scipy.org/doc/numpy/user/quickstart.html)

NumPy’s main object is the homogeneous multidimensional array. These arrays are called ndarays (for n-dimensional arrays) and can perform many functions faster and more convientently than noraml Python lists. For now this notebook covers the some basic operations in numpy with the goal of understanding the structure of these ndarrays.

Before getting started with NumPy, the numpy package must be imported into this notebook environment. Running the `import` statement once will load in NumPy package which can then be referenced throughout the notebook.

In [58]:
import numpy as np # using the as keyword allows numpy to be referenced as np
a = np.arange(15).reshape(3, 5) # arange creates an 1-d array from 0 to 14, and then reshapes it into a 3 row by 5 column array
a

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])

There are multiple attributes in an ndarray that provide information:

In [59]:
print('shape', a.shape) # the dimensions of the array
print('ndim', a.ndim) # the number of axes (dimensions) i.e. len(array.shape)
print('dtype name', a.dtype.name) # describes the types of elements in the array
print('size', a.size) # number of elements in the array
print('type', type(a)) # type of the array (which is ndarray)

shape (3, 5)
ndim 2
dtype name int64
size 15
type <class 'numpy.ndarray'>


Here we create an ndarray from a Python `list` and from a `np.arange` and perform simple operations on them:

In [60]:
a = np.array( [20,30,40,50] ) # here an ndarray is created from a Python list using np.array()
print('a', a)
b = np.arange( 4 ) # here an ndarray is made from arange
print('b', b)

c = a-b 
print('c', c)
print()
print('b**2', b**2)
print()
print('10*np.sin(a)', 10*np.sin(a))
print()
print('a<35', a<35)


a [20 30 40 50]
b [0 1 2 3]
c [20 29 38 47]

b**2 [0 1 4 9]

10*np.sin(a) [ 9.12945251 -9.88031624  7.4511316  -2.62374854]

a<35 [ True  True False False]


In [61]:
print(np.zeros( [3,4] )) # create an ndarray filled with zeros with the size provided (3 rows, 4 columns)
print(np.ones( [2,3] )) # create an ndarray filled with zeros with the size provided (2 rows, 3 columns)

[[0. 0. 0. 0.]
 [0. 0. 0. 0.]
 [0. 0. 0. 0.]]
[[1. 1. 1.]
 [1. 1. 1.]]


In [62]:
a = np.random.random((2,3)) # creates an ndarray filled with random numbers from 0 to 1
print(a)

print('sum', a.sum())
print('min', a.min())
print('max', a.max())

[[0.78088254 0.6570408  0.32869593]
 [0.45879015 0.13661448 0.70117425]]
sum 3.063198143086362
min 0.1366144807616878
max 0.7808825354082867


In [63]:
b = np.arange(12).reshape(3,4)
print(b)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [64]:
print(b.sum(axis=0))            # sum of each column

[12 15 18 21]


In [65]:
print(b.min(axis=1))            # min of each row

[0 4 8]


In [66]:
print(b.cumsum(axis=1))         # cumulative sum along each row

[[ 0  1  3  6]
 [ 4  9 15 22]
 [ 8 17 27 38]]


NumPy provides some universal functions that operate elementwise on arrays:

In [67]:
B = np.arange(3)
print(B)

print('exp', np.exp(B)) # performs e**element

print('sqrt', np.sqrt(B)) # take square root of each element

C = np.array([2., -1., 4.]) 
print(np.add(B, C)) # add two ndarrays


[0 1 2]
exp [1.         2.71828183 7.3890561 ]
sqrt [0.         1.         1.41421356]
[2. 0. 6.]


To convert an ndarray to Python list use `array.tolist()`

In [68]:
a = np.arange(6).reshape([2,3])
print(a)
print(a.tolist())

[[0 1 2]
 [3 4 5]]
[[0, 1, 2], [3, 4, 5]]


## Pandas Dataframes

Pandas DataFrame is a two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. _Can be thought of as a dict-like container for Series objects. The primary pandas data structure._

Descriptions and examples are inspired from  [Offical 10 minutes to pandas](https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)


First step we will do is import our data set. The data set we will be using is from [UCI](https://archive.ics.uci.edu/ml/datasets/Bank+Marketing)

The data is related with direct marketing campaigns (phone calls) of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y). 

Right now we will just be exploring the data set to practice how to use DataFrames

In [69]:
import pandas as pd

banking_data_file = project.get_file('UCI Bank Marketing Data Set.csv') # here the data asset is loaded into the notebook memory
df = pd.read_csv(banking_data_file) # pandas reads in the csv file and creates a dataframe
df.head() # look at the headers and first 5 rows

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
0,30,unemployed,married,primary,no,1787,no,no,cellular,19,oct,79,1,-1,0,unknown,no
1,33,services,married,secondary,no,4789,yes,yes,cellular,11,may,220,1,339,4,failure,no
2,35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
3,30,management,married,tertiary,no,1476,yes,yes,unknown,3,jun,199,4,-1,0,unknown,no
4,59,blue-collar,married,secondary,no,0,yes,no,unknown,5,may,226,1,-1,0,unknown,no


Attribute Information:

Input variables:
Bank client data:
1. age (numeric)
2. job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student",
"blue-collar","self-employed","retired","technician","services")
3. marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed)
4. education (categorical: "unknown","secondary","primary","tertiary")
5. default: has credit in default? (binary: "yes","no")
6. balance: average yearly balance, in euros (numeric)
7. housing: has housing loan? (binary: "yes","no")
8. loan: has personal loan? (binary: "yes","no")

Related with the last contact of the current campaign:
9. contact: contact communication type (categorical: "unknown","telephone","cellular")
10. day: last contact day of the month (numeric)
11. month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec")
12. duration: last contact duration, in seconds (numeric)

Other attributes:
13. campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact)
14. pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted)
15. previous: number of contacts performed before this campaign and for this client (numeric)
16. poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success")

Output variable (desired target):
17. y - has the client subscribed a term deposit? (binary: "yes","no")

In [70]:
df.tail(3) # show the last 3 rows

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
4518,57,technician,married,secondary,no,295,no,no,cellular,19,aug,151,11,-1,0,unknown,no
4519,28,blue-collar,married,secondary,no,1137,no,no,cellular,6,feb,129,4,211,3,other,no
4520,44,entrepreneur,single,tertiary,no,1136,yes,yes,cellular,3,apr,345,2,249,7,other,no


In [71]:
print(df.index) # view index
print(df.columns) # view column names

RangeIndex(start=0, stop=4521, step=1)
Index(['age', 'job', 'marital', 'education', 'default', 'balance', 'housing',
       'loan', 'contact', 'day', 'month', 'duration', 'campaign', 'pdays',
       'previous', 'poutcome', 'y'],
      dtype='object')


In [72]:
df.describe()

Unnamed: 0,age,balance,day,duration,campaign,pdays,previous
count,4521.0,4521.0,4521.0,4521.0,4521.0,4521.0,4521.0
mean,41.170095,1422.657819,15.915284,263.961292,2.79363,39.766645,0.542579
std,10.576211,3009.638142,8.247667,259.856633,3.109807,100.121124,1.693562
min,19.0,-3313.0,1.0,4.0,1.0,-1.0,0.0
25%,33.0,69.0,9.0,104.0,1.0,-1.0,0.0
50%,39.0,444.0,16.0,185.0,2.0,-1.0,0.0
75%,49.0,1480.0,21.0,329.0,3.0,-1.0,0.0
max,87.0,71188.0,31.0,3025.0,50.0,871.0,25.0


`DataFrame.to_numpy()` gives a NumPy representation of the underlying data. Note that this can be an expensive operation when your DataFrame has columns with different data types, which comes down to a fundamental difference between pandas and NumPy: **NumPy arrays have one dtype for the entire array, while pandas DataFrames have one dtype per column.**

In [73]:
df.to_numpy()

array([[30, 'unemployed', 'married', ..., 0, 'unknown', 'no'],
       [33, 'services', 'married', ..., 4, 'failure', 'no'],
       [35, 'management', 'single', ..., 1, 'failure', 'no'],
       ...,
       [57, 'technician', 'married', ..., 0, 'unknown', 'no'],
       [28, 'blue-collar', 'married', ..., 3, 'other', 'no'],
       [44, 'entrepreneur', 'single', ..., 7, 'other', 'no']],
      dtype=object)

The DataFrame can be sorted by an axis (row or column) or by value of column:

In [74]:
df.sort_index(axis=1, ascending=False).head(10) # sorts the df by the index of the columns (since axis=1) descending (since ascending=False)

Unnamed: 0,y,previous,poutcome,pdays,month,marital,loan,job,housing,education,duration,default,day,contact,campaign,balance,age
0,no,0,unknown,-1,oct,married,no,unemployed,no,primary,79,no,19,cellular,1,1787,30
1,no,4,failure,339,may,married,yes,services,yes,secondary,220,no,11,cellular,1,4789,33
2,no,1,failure,330,apr,single,no,management,yes,tertiary,185,no,16,cellular,1,1350,35
3,no,0,unknown,-1,jun,married,yes,management,yes,tertiary,199,no,3,unknown,4,1476,30
4,no,0,unknown,-1,may,married,no,blue-collar,yes,secondary,226,no,5,unknown,1,0,59
5,no,3,failure,176,feb,single,no,management,no,tertiary,141,no,23,cellular,2,747,35
6,no,2,other,330,may,married,no,self-employed,yes,tertiary,341,no,14,cellular,1,307,36
7,no,0,unknown,-1,may,married,no,technician,yes,secondary,151,no,6,cellular,2,147,39
8,no,0,unknown,-1,may,married,no,entrepreneur,yes,tertiary,57,no,14,unknown,2,221,41
9,no,2,failure,147,apr,married,yes,services,yes,primary,313,no,17,cellular,1,-88,43


In [75]:
df.sort_values(by='duration', ascending=False).head(10) # sorts the df by the value of the column duration (since by='duration') descending (since ascending=False)

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
568,59,unemployed,married,primary,no,0,no,no,cellular,30,jan,3025,2,-1,0,unknown,no
3673,36,entrepreneur,married,tertiary,no,3057,no,no,unknown,16,jun,2769,4,-1,0,unknown,yes
4123,47,blue-collar,divorced,primary,no,126,yes,no,unknown,3,jun,2456,2,-1,0,unknown,yes
980,43,management,divorced,tertiary,no,388,yes,no,unknown,8,may,2087,2,-1,0,unknown,yes
3853,54,technician,married,secondary,no,-315,no,yes,cellular,10,jul,2029,1,-1,0,unknown,yes
2875,29,technician,single,secondary,no,778,yes,no,unknown,6,jun,1994,2,-1,0,unknown,no
2827,49,services,married,secondary,no,320,no,no,telephone,9,feb,1971,4,-1,0,unknown,yes
125,34,self-employed,single,tertiary,no,462,no,no,cellular,21,aug,1877,3,-1,0,unknown,yes
429,40,management,married,tertiary,no,542,yes,no,cellular,20,nov,1816,1,-1,0,unknown,no
51,37,technician,single,secondary,no,228,yes,no,cellular,20,aug,1740,2,-1,0,unknown,no


Individual rows or columns (or subsets of these) can be selected from a DataFrame:

In [76]:
df['y'] # also df.y

0        no
1        no
2        no
3        no
4        no
5        no
6        no
7        no
8        no
9        no
10       no
11       no
12       no
13      yes
14       no
15       no
16       no
17       no
18       no
19       no
20       no
21       no
22       no
23       no
24       no
25       no
26       no
27       no
28       no
29       no
       ... 
4491     no
4492     no
4493     no
4494    yes
4495     no
4496     no
4497     no
4498     no
4499     no
4500     no
4501     no
4502     no
4503    yes
4504    yes
4505    yes
4506     no
4507     no
4508     no
4509     no
4510     no
4511    yes
4512     no
4513     no
4514     no
4515     no
4516     no
4517     no
4518     no
4519     no
4520     no
Name: y, Length: 4521, dtype: object

In [77]:
df[10:20] # slices by rows

Unnamed: 0,age,job,marital,education,default,balance,housing,loan,contact,day,month,duration,campaign,pdays,previous,poutcome,y
10,39,services,married,secondary,no,9374,yes,no,unknown,20,may,273,1,-1,0,unknown,no
11,43,admin.,married,secondary,no,264,yes,no,cellular,17,apr,113,2,-1,0,unknown,no
12,36,technician,married,tertiary,no,1109,no,no,cellular,13,aug,328,2,-1,0,unknown,no
13,20,student,single,secondary,no,502,no,no,cellular,30,apr,261,1,-1,0,unknown,yes
14,31,blue-collar,married,secondary,no,360,yes,yes,cellular,29,jan,89,1,241,1,failure,no
15,40,management,married,tertiary,no,194,no,yes,cellular,29,aug,189,2,-1,0,unknown,no
16,56,technician,married,secondary,no,4073,no,no,cellular,27,aug,239,5,-1,0,unknown,no
17,37,admin.,single,tertiary,no,2317,yes,no,cellular,20,apr,114,1,152,2,failure,no
18,25,blue-collar,single,primary,no,-221,yes,no,unknown,23,may,250,1,-1,0,unknown,no
19,31,services,married,secondary,no,132,no,no,cellular,7,jul,148,1,152,1,other,no


In [78]:
df.iloc[10] # selection by position, similar to indexing a list

age                 39
job           services
marital        married
education    secondary
default             no
balance           9374
housing            yes
loan                no
contact        unknown
day                 20
month              may
duration           273
campaign             1
pdays               -1
previous             0
poutcome       unknown
y                   no
Name: 10, dtype: object

You can iterate through a Dataframe, but modifying the DataFrame while iterating is not recommended.

In [79]:
# calculating age by iterating the rows
print('expected average:', df.age.mean()) # calling mean() on a series will calculate its mean

age_sum = 0
age_count = 0

for index, row in df.iterrows():
    age_sum += row['age'] # notice values in row can be referenced by column name
    age_count += 1

print('derived average:', age_sum/age_count)

expected average: 41.17009511170095
derived average: 41.17009511170095


#### Challenge: Determine the percentage of bank clients who subscribed to a term deposit

Example answer: `term_percentage` is 0.56

In [80]:
term_percentage = 0.00

#write your code here

assert (term_percentage < 0.12 and term_percentage > 0.11)

#### Challenge: Count the number of clients who has personal loan and subscribed to term deposit 

Count the number of rows that have `yes` in both the `loan` and `y` columns

Example: `double_count` is 126

In [83]:
double_count = 0

#write your code here

assert double_count == 43

#### Bonus Challenge: Create a dictionary that provides the percent of people in different age groups who subscribed a term deposit

Example answer: `age_dict = {
    '0-10':None,
    '20-29':0.5,
    '30-39':0.34,
    ...,
    '90-99':0.07
}`

In [85]:
age_dict = {}