# Python for Data Analysis

Based on Greg Hammel's Sessions: https://www.kaggle.com/hamelg/python-for-data-analysis-index

## 1. Getting started
### Shortcuts
* "A" to create a new cell above the current cell 
* "B" to create a new cell below the current cell 
* "M" to convert the current cell to Markdown 
* "Y" to convert the current cell to code 
* "DD" (press "D" twice) to delete the current cell

## 2. Python Arithmetic
### Operations
* Substraction: 10 - 3
* Addition: 10 + 3
* Multiplication: 10 * 3
* Decimal division: 10 / 3
* Floor division: 10 // 3
* Exponentiation: 10 ** 3
* Modulus (produces the remainder you'd get when dividing two numbers): 100 % 75 = 25

### Math module

In [2]:
import math
# Logarithm of argument
math.log(2.7182)

0.9999698965391098

In [3]:
# Add a second argument to specify the log base
math.log(100,10)

2.0

In [4]:
# Raise e to the power of its argument
math.exp(10)

22026.465794806718

In [5]:
# Take the root square of a number
math.sqrt(64)

8.0

In [7]:
# Absolute value of a number
abs(-30)

30

In [8]:
# Constant of pi
math.pi

3.141592653589793

In [9]:
# Round to nearest whole number
round(10.6)

11

In [10]:
# Add a second number to specify number of decimals
round(233.4678, 2)

233.47

In [11]:
# Round down to nearest whole number
math.floor(100.5)

100

## 3. Basic data types


* Integers: whole-numbered numeric values (positive or negative)
* Floats: numbers with decimal values. Inf and -Inf (infinite values) are floats
* Booleans: true/false values that result from logical statements. bool(1)=True, bool(0)=False
* Strings: text value ('') or ("")
* None: represents a missing value

## 4. Variables
A variable is a name you assign a value or object.

In [1]:
x = 10
y = "Python is fun"
z = 144**0.5 == 12

print(x,y,z)

10 Python is fun True


In [3]:
# 'Tuple unpacking' is the method of extracting variables from a comma separated sequence

x, y, z = (10 ,20 ,30)

print(x)
print(y)
print(z)

10
20
30


In [4]:
# Swap values of two variables

(x, y) = (y, x)

print(x)
print(y)

20
10


When assigning a variable in Python, the variable is a reference to a specific object in the computer's memory.
Reassigning a variable simply switches the reference to a different object in memory.
If the object a variable refers to in memory is altered, the value of the variable corresponding to the altered object will also change. 
All of the data types seen so far are inmutable (they cannot be changed after created).
If an operations appears to be altering an inmutable object, it is actually creating a totally new object rather than altering the one that exists.

In [8]:
x = "Hello"
y = x        # Assign y the same object as x
y = y.lower() # Assing y the result of lower()

# Strings are inmutable, Python creates an entirely new string "hello" and stores it somewhere else separate from "Hello"
# x and y are different objects in memory

print(x)
print(y)

Hello
hello


Lists are a mutable data structure that can hold multiple objects. When altering a list, Python doesn't make an entirely new list in memory, it changes the actual list object itself.

In [10]:
x = [1, 2, 3]  # Create a list
y = x          # Assing y the same object as x
y.append(4)    # Add 4 to the end of the list
print(x)
print(y)

# x and y have the same value, even though it may appear that 4 was only added to y

[1, 2, 3, 4]
[1, 2, 3, 4]


## 5. Lists
### List basics
One of the most common sequenced data types in Python.
* A list is a mutable, ordered collection of objects **it can be altered after created**
* Lists are heterogenous, they can hold objects of different types
* A list with no content is an empty list. Will return []

In [11]:
my_list = ["Lesson", 5, "Is fun?", True]
print(my_list)

['Lesson', 5, 'Is fun?', True]


Construct a list by passing some iterable into the list() function. An **iterable** is an object you can look through one item at a time (lists, tuples, strings...) 

In [22]:
second_list = list("Life is awesome")
print(second_list)

['L', 'i', 'f', 'e', ' ', 'i', 's', ' ', 'a', 'w', 'e', 's', 'o', 'm', 'e']


In [13]:
# Add an item to an existing list with the list.append() function

second_list.append("!!!")
print(second_list)

['L', 'i', 'f', 'e', ' ', 'i', 's', ' ', 'a', 'w', 'e', 's', 'o', 'm', 'e', '!!!']


In [21]:
# Remove a matching item from a list with list.remove()
# It deletes the first matching item only

second_list.remove('i')
print(second_list)

['f', 'e', ' ', 'i', 's', ' ', 'a', 'w', 'e', 's', 'o', 'm', 'e', '!!!']


In [23]:
# Join two lists together with the + operator

combined_list = my_list + second_list
print(combined_list)

['Lesson', 5, 'Is fun?', True, 'L', 'i', 'f', 'e', ' ', 'i', 's', ' ', 'a', 'w', 'e', 's', 'o', 'm', 'e']


In [24]:
# Add a sequence to the end of an existing list with the list.extend() function

combined_list.extend({1, 2, 3})
print(combined_list)

['Lesson', 5, 'Is fun?', True, 'L', 'i', 'f', 'e', ' ', 'i', 's', ' ', 'a', 'w', 'e', 's', 'o', 'm', 'e', 1, 2, 3]


In [27]:
# Check min, max, length

num_list =[1, 3, 5, 7, 9]
print(len(num_list))  # Get lenght of list
print(min(num_list))  # Get min of list
print(max(num_list))  # Get max of list
print(sum(num_list))  # Get the sum of items in list
print(sum(num_list)/len(num_list))  # Get the mean

5
1
9
25
5.0


In [28]:
# Check wether a list contains a certain object with "in"

1 in num_list

True

In [29]:
# Check wether an object is not in a list "not in"
8 not in num_list

True

In [30]:
# Count recurrence of an object in a list with lis.count()
num_list.count(1)

1

In [34]:
# Reverse and sort list

new_list = [1, 2, 3, 4]
new_list.reverse()
print ("Reversed list: ", new_list)

new_list.sort()
print("Sorted list: ", new_list)

Reversed list:  [4, 3, 2, 1]
Sorted list:  [1, 2, 3, 4]


### List indexign and slicing
* Indexed: each position in the sequence has a corresponding number called the "index", used to look up the value at that position
* First element of a sequence in Python is 0

In [37]:
another_list = ["Hello","my","name","is","Micaela"]

print(another_list[0]) # Get the first object in list

Hello


In [38]:
# When supplying a negative number while indexing, it accesses items from the other end of the list going backwards
print(another_list[-1])

Micaela


In [39]:
# IndexError happens when supplying an index outside of the list's range

In [40]:
# When list contains indexed numbers, you can supply additional indexes to get items contained within the nested objects

nested_list = [[0, 1, 2], [3, 4, 5]]
print(nested_list[0][1])

1


You can slice a list using the sintax [start:stop:step]:
* Start: starting index
* Stop: ending index
* Step: controls how frequently to sample values along the slice. The default step size is one

In [41]:
another_list[0:3]

['Hello', 'my', 'name']

In [42]:
another_list[0:3:2]

['Hello', 'name']

In [47]:
# Leave the starting and ending index blank to slice from the beginning or up to the end of the list
print(another_list[:4]) # End index is 4
print(another_list[3:]) # Start index is 3
print(another_list[:])  # To slice the entire list
print(another_list[::-1]) # Slices and reverses the list

['Hello', 'my', 'name', 'is']
['is', 'Micaela']
['Hello', 'my', 'name', 'is', 'Micaela']
['Micaela', 'is', 'name', 'my', 'Hello']


In [50]:
# Index new items and delete items
another_list[3]="new" # Adding new index in position 3
print(another_list)
del(another_list[3]) # Delete from another_list index 3
print(another_list)

['Hello', 'my', 'name', 'new']
['Hello', 'my', 'name']


In [52]:
# pop() function removes the final item in a list and returns it
another_list = ["Hello", "here", "I", "am"]

final_item = another_list.pop()
print(final_item)
print(another_list)

am
['Hello', 'here', 'I']


### Copying Lists

In [54]:
# Copy a list with the list.copy() function
list1 = [1, 2, 3]
list2 = list1.copy() # Copy list
list1.append(4)      # Add item to list 1
print("List1: ", list1)
print("List2: ", list2)

List1:  [1, 2, 3, 4]
List2:  [1, 2, 3]


List2 (the copy) is not affected by the append() function, it's a **'shallow copy'**. A shallow copy makes a new list where each element refers to the object at the same position in the original list. Shallow copies can have *undisered effects* when they coppy lists that contain mutable objects, like other lists.

In [56]:
list1 = [1, 2, 3]
list2 = ["The list", list1] # Nest list in another list
list3 = list2.copy()        # Shallow copy list2

print("Before appending to list1: ")
print("List2: ", list2)
print("List3: ", list3, "\n")

list1.append(4)
print("After appending to list1: ")
print("List2: ", list2)
print("List3: ", list3)

Before appending to list1: 
List2:  ['The list', [1, 2, 3]]
List3:  ['The list', [1, 2, 3]] 

After appending to list1: 
List2:  ['The list', [1, 2, 3, 4]]
List3:  ['The list', [1, 2, 3, 4]]


When altering list 1, the copies list2 and list3 both change. 
**When working with nested lists, you have to make a deepcopy if you want to trully copy nested objects in the original to avoid this behavior.**

In [57]:
import copy      # Load the copy module

list1 = [1, 2, 3]

list2 = ["List within a list", list1]   # Nest list1 into another list
list3 = copy.deepcopy(list2)            # Deep copy list 2

print("Before appending to list1:")
print("List2:", list2)
print("List3:", list3, "\n")

list1.append(4)                        # Add an item to list1
print("After appending to list1:")
print("List2:", list2)
print("List3:", list3)

Before appending to list1:
List2: ['List within a list', [1, 2, 3]]
List3: ['List within a list', [1, 2, 3]] 

After appending to list1:
List2: ['List within a list', [1, 2, 3, 4]]
List3: ['List within a list', [1, 2, 3]]


List3 isn't altered by the change in list1, because list3 is a copy rather than a reference of list1.

## 6. Tuples and Strings