# Intro to Python

As you may expect, Python support all common mathematical operations. The only difference between Python and other languages is the Exponentiation operator; while other languages uses the ^ symbol to indicate an exponentation in Python we use the ** symbol. For example, what in R or other languages we would use 4 ^ 2, in Python we use 4 ** 2. 

In [2]:
4 ** 2

16

## Variables

To create a variable we do not need to specify its data type. We can just assign with "=" a value to a name. Please note that variable names are case sensitive.

In [3]:
x = 5

print(x)

5


To find out the type of a value or a variable you can use the type() function. 

In [4]:
type(x)

int

## Type Conversion

To convert data you use one of the following functions: 
int()
float()
bool()
str()

## Lists

To create a list of values we use the []. A list on python can contain any data type and a list can also contains mixed data types and even sublists. 

In [5]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Create list areas
areas = [hall, kit, liv, bed, bath]

# Print areas
print(areas)


[11.25, 18.0, 20.0, 10.75, 9.5]


A list can contain any Python type. Although it's not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc.

The printout of the previous exercise wasn't really satisfying. It's just a list of numbers representing the areas, but you can't tell which area corresponds to which part of your house.

In [6]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Adapt list areas
areas = ["hallway", hall,"kitchen", kit, "living room", liv,"bedroom", bed, "bathroom", bath]

# Print areas
print(areas)


['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]


As a data scientist, you'll often be dealing with a lot of data, and it will make sense to group some of this data.

Instead of creating a flat list containing strings and floats, representing the names and areas of the rooms in your house, you can create a list of lists. 

In [7]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# house information as list of lists
house = [["hallway", hall],
         ["kitchen", kit],
         ["living room", liv],
         ["bedroom", bed],
         ["bathroom", bath]
         ]

# Print out house
print(house)

# Print out the type of house
print(type(house))

[['hallway', 11.25], ['kitchen', 18.0], ['living room', 20.0], ['bedroom', 10.75], ['bathroom', 9.5]]
<class 'list'>


## Subsetting lists

Subsetting Python lists is a piece of cake. We can use the [] to select an index for the record that we want. Python is 0 based, that is, the first record in the list is the record 0.
You can also use negative index, in this case, the last record in the list is the -1, the record before the last is -2 and so on. 

In [8]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
print(areas[5])

11.25
9.5
20.0


After you've extracted values from a list, you can use them to perform additional calculations. 

In [9]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Sum of kitchen and bedroom area: eat_sleep_area
eat_sleep_area = areas[3] + areas[7]

# Print the variable eat_sleep_area
print(eat_sleep_area)

28.75


Selecting single values from a list is just one part of the story. It's also possible to slice your list, which means selecting multiple elements from your list. To do this we use the ":" to specify the range. What may look weird is that in Python the range is INCLUSIVE for the Start of the Range, but not for the end of the range. That is, the range 1:5 will return the records in the position 1, 2,3 and 4, leaving the record in the position 5 out. 
If you do not specify one of the indexes Python will load all data from the beginning/end. :4 will load all data from 0 to 4. And 4: will load data from the record 4 until the last one, inclusive.

In [None]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Use slicing to create downstairs
downstairs = areas[:6]

# Use slicing to create upstairs
upstairs = areas[6:]

# Print out downstairs and upstairs
print(downstairs)
print(upstairs)

You saw before that a Python list can contain practically anything; even other lists! To subset lists of lists, you can use the same technique as before: square brackets.

## List Manipulation

1. Change List elements
2. Add elements to lists
3. Remove elements to lists

To change lists all we have to do is to slice the list and specify the new values using "=".
To add elements we can use the "+" sign to merge two lists.
To remove elements from a list we have to use the function del(). For instance, del(myList[2]) will remove the item in the position ID 2 from the list. 

In [19]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Correct the bathroom area
areas[-1] = 10.50

# Change "living room" to "chill zone"
areas[4] = "chill zone"

print(areas)

# Create the areas list and make some changes
areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0,
         "bedroom", 10.75, "bathroom", 10.50]

# Add poolhouse data to areas, new list is areas_1
areas_1 = areas + ["poolhouse", 24.5]

print(areas_1)



['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5]
['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'poolhouse', 24.5]


One important detail is that on Python if you attribute a variable to another variable you're only copying the reference to the memory location of that list, and not the value. 
Take a look at this code: 

In [11]:
x = ["a", "b", "c"]
y = x

print(x)
print(y)

y[1] = "z"
print(y)
print(x)

['a', 'b', 'c']
['a', 'b', 'c']
['a', 'z', 'c']
['a', 'z', 'c']


If you want to create a copy of the list in memory you have to use the function list() or to attribute the values to another list by using an empty slice:

In [15]:
y = list(x)
z = x[:]

print(y)
print(z)

y[2] = "a"
z[2] = "d"

print(x)
print(y)
print(z)

['a', 'z', 'c']
['a', 'z', 'c']
['a', 'z', 'c']
['a', 'z', 'a']
['a', 'z', 'd']


## Functions

Piece of reusable code to solve a particular task. As expected, Python comes with a lot of built-in functions that we will learn with time.
Some examples are the str(), int(), float() that we saw already or others like max() and round(). Another very useful function is the function help, which will open the help for a function.

In [20]:
help(round)

Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.
    
    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.



In [21]:
# Create lists first and second
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

# Paste together first and second: full
full = first + second

# Sort full in descending order: full_sorted
full_sorted = sorted(full, reverse=True)

# Print out full_sorted
print(full_sorted)

[20.0, 18.0, 11.25, 10.75, 9.5]


## Methods

Python objects also contains functions that you can call. For instance, the list data type has a method called index, that returns the index for an object, or the string data type contains a method called capitalize or replace. 

In [22]:
# string to experiment with: place
place = "poolhouse"

# Use upper() on place: place_up
place_up = place.upper()

# Print out place and place_up
print(place)
print(place_up)

# Print out the number of o's in place
print(place.count('o'))

poolhouse
POOLHOUSE
3


Most list methods will change the list they're called on. Examples are:

append(), that adds an element to the list it is called on,
remove(), that removes the first element of a list that matches the input, and
reverse(), that reverses the order of the elements in the list it is called on.

In [23]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Use append twice to add poolhouse and garage size
areas.append(24.5)
areas.append(15.45)


# Print out areas
print(areas)

# Reverse the orders of the elements in areas
areas.reverse()

# Print out areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5, 24.5, 15.45]
[15.45, 24.5, 9.5, 10.75, 20.0, 18.0, 11.25]


## Packages

Package is a directory of python scripts. Each script is equal to a module. The modules specify functions, methods and types. There are thousands of packages available, like Numpy to efficiently work with arrays, Matplotlib for data visualization and Scikit-learn for machine learning.

To install packages you can use Anaconda.

After you install a package you have to import a package using the import keyword. 
When you import a package you have to reference the functions, methods, etc on that package with the full name.
For instance: 


In [26]:
import numpy
#This will fail. Uncomment code to see error
#array([1,2,3])

# Correct way: 
numpy.array([1,2,3])

array([1, 2, 3])

To make our life easier we can specify an alias to our imported package: 

In [27]:
import numpy as np
np.array([1,2,3])

array([1, 2, 3])

If we do not want to import the entire package, but instead only specific functions, types, etc we can do this with the following sintax:

In [28]:
from numpy import array

# If we import this way, we do not need to specify the package
array([1,2,3])


array([1, 2, 3])

## NumPy

Lists are powerful, you can store a collection of values and it can hold different types but we cannot perform mathematical operations over a collection of data and do it fast using lists. Lists do not support element-wise calculations between lists. 
To perform these kind of calculations we use the NumPy package. NumPy does support element-wise calculations, but be careful, if we're performing element-wise operations we must make sure that all data types are the same in your list. This also means that if we're trying to add two arrays using the + sign, it will do an element-wise sum, and not the data append like it happens with the list. 

### Subsetting NumPy
We can subset our arrays using the same techniques we learned with lists, or more advanced methods like doing element-wise comparison. myarray[column1 > 10] will return all elements where the column1 have a value greatr than 10. myarray > 10 will return an boolean array with the comparison result for each element.

In [30]:
# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np
import numpy as np

# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))

<class 'numpy.ndarray'>


### 2D NumPy Arrays

We can create a 2D array (a table). By having an array inside an array. An evolution for the lists inside lists. We can then subset just like we do with vectors in R. array[row, columns]


In [33]:
# Create baseball, a list of lists
baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Import numpy
import numpy as np

# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out the type of np_baseball
print(type(np_baseball))

# Print out the shape of np_baseball
print(np_baseball.shape)
# Remember that Python is 0 based. So the following line gets the second row, second column. Which is 102.7
print(np_baseball[1,1])

<class 'numpy.ndarray'>
(4, 2)
102.7


### NumPy: Basic Statistics

Yeah, we can't run away from statistics. We can use numpy to generate summarizing statitics about our data. We have functions like mean, median, std, corrcoef, etc...  that apply to all items to the numpy array.

In [38]:
# Using NumPy to generate our demo data
import numpy as np

# np.random.normal: Get values from a random normal distribution. Parameters = Distribution Mean, Distribution Standard Dev, Number of Samples
height = np.round(np.random.normal(1.75, 0.20, 5000), 2)
weight = np.round(np.random.normal(60.32, 15, 5000), 2)
np_combined = np.column_stack((height, weight))

print(np_combined)

print(np.mean(np_combined[:,0]))
print(np.median(np_combined[:,0]))


[[ 1.68 72.15]
 [ 2.04 51.6 ]
 [ 1.71 69.66]
 ...
 [ 2.07 39.53]
 [ 2.13 57.48]
 [ 1.74 59.19]]
1.751736
1.75
