# Python Language <img src="images/python.png" width="50px" height="50px" align="left"/>
<div class="success">
  <p><strong>Welcome to Python Language Notes</strong> What do you want to know?</p>
</div>
The examples shown next have been adapted from [Python course on Datacamp](https://campus.datacamp.com/courses/intro-to-python-for-data-science/). 
We provide ready to run Python examples based on the official documentation that you can fin at [Python 3.7.1 documentation](https://docs.python.org/3/).
You can find Python settings at this location: [Python Notes](Python%20Notes.ipynb), let's start digging into the Python language. 


## Data Types
Every value in Python has a datatype. Since everything is an object in Python programming, data types are actually classes and variables are instance (object) of these classes.

There are various data types in Python. Some of the important types are described below.


## Lists
Python knows a number of compound data types, used to group together other values. The most versatile is the **list**, which is an ordered (each element is indexed) collection of <span class="yellow">comma-separated values (items) between square brackets</span>. Lists might contain items of different types, but usually the items all have the same type. The following examples demonstrate how to use **lists**. 


### Create a List
Create a list named *areas* that contains the area of the hallway (hall), kitchen (kit), living room (liv), bedroom (bed) and bathroom (bath), in this order. Use the predefined variables.
Print areas with the print() function.

In [2]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Create list areas
areas = [hall, kit, liv, bed, bath]

# Print areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5]


### Create List with Different Types
A list can contain any Python type. Although it's not really common, a list can also contain a mix of Python types including strings, floats, booleans, etc.
Let's modify the previous code so we can have the names of the rooms and related areas. Pay attention here! For example, "bathroom" is a string, while bath is a variable that represents the float 9.50 you specified earlier.

In [3]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# Adapt list areas
areas = ["hallway", hall, "kitchen", kit, "living room", liv, "bedroom", bed, "bathroom", bath]

# Print areas
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]


### List of lists
You'll often be dealing with a lot of data, and it will make sense to group some of this data.
Instead of creating a flat list containing strings and floats, representing the names and areas of the rooms in your house, you can create a list of lists. 

In [4]:
# area variables (in square meters)
hall = 11.25
kit = 18.0
liv = 20.0
bed = 10.75
bath = 9.50

# house information as list of lists
house = [["hallway", hall],
         ["kitchen", kit],
         ["living room", liv],
         ["bedroom", bed],
         ["bathroom", bath]]

# Print out house
print(house)

# Print out the type of house
print(type(house))


[['hallway', 11.25], ['kitchen', 18.0], ['living room', 20.0], ['bedroom', 10.75], ['bathroom', 9.5]]
<class 'list'>


### Subset and Conquer
Subsetting Python lists is a piece of cake. Take the code sample below, which creates a list x and then selects "b" from it. Remember that this is the second element, so it has index 1. You can also use negative indexing.

    x = ["a", "b", "c", "d"]
    x[1]
    x[-3] # same result!

Slicing can be best visualized by considering the index to be between the elements as shown below.
![list_slicing](images/list_slicing.png). 
Remember the areas list from before, containing both strings and floats? Let's do some Python subsetting.

In [5]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Print out second element from areas
print(areas[1])

# Print out last element from areas
print(areas[-1])

# Print out the area of the living room
print(areas[5])

11.25
9.5
20.0


### Subset and Calculate
After you've extracted values from a list, you can use them to perform additional calculations. Take this example, where the second and fourth element of a list x are extracted. The strings that result are pasted together using the + operator:

    x = ["a", "b", "c", "d"]
    print(x[1] + x[3])


In [2]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Sum of kitchen and bedroom area: eat_sleep_area
eat_sleep_area = areas[3] + areas[-3]

# Print the variable eat_sleep_area
print(eat_sleep_area)


28.75


### Slicing and Dicing
Selecting single values from a list is just one part of the story. It's also possible to slice your list, which means selecting multiple elements from your list. Use the following syntax:

    my_list[start:end]
The start index will be included, while the end index is not.

The code sample below shows an example. A list with "b" and "c", corresponding to indeces 1 and 2, are selected from a list x:

    x = ["a", "b", "c", "d"]
    x[1:3]
The elements with index 1 and 2 are included, while the element with index 3 is not.
If you don't specify the begin index, Python figures out that you want to start your slice at the beginning of your list. If you don't specify the end index, the slice will go all the way to the last element of your list. 

In [4]:
# Create the areas list
areas = ["hallway", 11.25, "kitchen", 18.0, "living room", 20.0, "bedroom", 10.75, "bathroom", 9.50]

# Use slicing to create downstairs
downstairs = areas[:6]

# Use slicing to create upstairs
upstairs = areas[-4:]

# Print out downstairs and upstairs
print(downstairs)
print(upstairs)

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0]
['bedroom', 10.75, 'bathroom', 9.5]


### Subsetting Lists of Lists
A Python list can contain practically anything; even other lists! To subset lists of lists, you can use the same technique as before: <span style="background-color:yellow">square brackets</span>. Try out the commands in the following code sample. 

What will house[-1][1] return? house, the list of lists that you created before, is already defined for you in the workspace. You can experiment with it in the IPython Shell.

In [11]:
# Define a list of lists
x = [["a", "b", "c"],
    ["d", "e", "f"],
    ["g", "h", "i"]]
    
# Get the last list
print(x[2])

# Get first and second element of last list
print(x[2][:2])

# Get the first element of the last list
print(x[2][0])
 

['g', 'h', 'i']
['g', 'h']
g


### Replace List Elements
Replacing list elements is pretty easy. Simply subset the list and assign new values to the subset. You can select single elements or you can change entire list slices at once.


In [14]:
# Define list
x = ["a", "b", "c", "d"]
print(x)
# Replace second element
x[1] = "r"
print(x)

# Replace third and last elements
x[2:] = ["s", "t"]
print(x)

# Another list
areas = ["hallway", 11.25, "kitchen", 18.0, 
         "living room", 20.0, "bedroom", 10.75, 
         "bathroom", 9.50]
print(areas)

# Correct the bathroom area
areas[9]=10.50

# Change "living room" to "chill zone"
areas[4]="chill zone"
print(areas)

['a', 'b', 'c', 'd']
['a', 'r', 'c', 'd']
['a', 'r', 's', 't']
['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0, 'bedroom', 10.75, 'bathroom', 9.5]
['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5]


### Extend a List
If you can change elements in a list, you sure want to be able to add elements to it, right? You can use the + operator. 

In [15]:
# Create the areas list and make some changes
areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0,
         "bedroom", 10.75, "bathroom", 10.50]

# Add poolhouse data to areas, new list is areas_1
areas_1 = areas + ["poolhouse", 24.5]
print(areas_1) 

# Add garage data to areas_1, new list is areas_2
areas_2 = areas_1 + ["garage", 15.45]
print(areas_2)

['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'poolhouse', 24.5]
['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'poolhouse', 24.5, 'garage', 15.45]


### Delete List Elements
Finally, you can also remove elements from your list. You can do this with the del statement:

    x = ["a", "b", "c", "d"]
    del(x[1])

Pay attention here: as soon as you remove an element from a list, the indeces of the elements that come after the deleted element all change!


In [1]:
# Create the areas list and make some changes
areas = ["hallway", 11.25, "kitchen", 18.0, "chill zone", 20.0, 
         "bedroom", 10.75, "bathroom", 10.50, "poolhouse", 24.5,
         "garage", 15.45]
# Delete poohouse entries
del(areas[-4:-2])

print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'garage', 15.45]


## Tuples
Tuple is an ordered sequence of items same as list.The only difference is that <span class="yellow">tuples are immutable</span>. Tuples once created cannot be modified.
Tuples are <span class="yellow">used to write-protect data and are usually faster than list beacuse they cannot change dynamically<span>.
A tuple is <span class="yellow">defined within parentheses () where items are separated by commas</span>.

## Strings
String is sequence of Unicode characters. We can <span class="yellow">use single quotes or double quotes to define strings</span>. <span class="gray">Multi-line strings can be denoted using triple quotes, ''' or """</span>.

## Sets
A set is an unordered collection of unique items. Set is <span class="yellow">defined by values separated by comma inside braces { }</span>. 



## Dictionary
Dictionary is an <span class="yellow">unordered collection of **key-value pairs** </span>.
Generally used to handle huge amount of data and optimized for retrieving data. <span class="blue">You must know the key to retrieve a value</span>.
Dictionaries are <span class="yellow">defined using braces {} with each item being a pair in the form key:value</span>. Key and value can be of any type.



In [13]:
# Create a MLB (major League Baseball) dictionary 
MLB_teams = { 
    'Colorado' : 'Rockies',
    'Boston'   : 'Red Sox',
    'Minnesota': 'Twins',
    'Milwaukee': 'Brewers',
    'Seattle'  : 'Mariners'
}
msg = '{0} {1}'.format("Create dictionary with direct key-value assignement: \n", str(MLB_teams))
print(msg)

Create dictionary with direct key-value assignement: 
 {'Colorado': 'Rockies', 'Boston': 'Red Sox', 'Minnesota': 'Twins', 'Milwaukee': 'Brewers', 'Seattle': 'Mariners'}


In [15]:
# Another way to define a dictionary
MLB_teams = dict(
    Colorado  = 'Rockies',
    Boston    = 'Red Sox',
    Minnesota = 'Twins',
    Milwaukee = 'Brewers',
    Seattle   = 'Mariners'
)
msg = '{0} {1}'.format("Create dictionary using the 'dict' function and key-value pairs: \n", str(MLB_teams))
print(msg)

# One more
MLB_teams = dict([
    ('Colorado','Rockies'),
    ('Boston','Red Sox'),
    ('Minnesota', 'Twins'),
    ('Milwaukee', 'Brewers'),
    ('Seattle', 'Mariners')
])
msg = '{0} {1}'.format("Create dictionary using the 'dict' function and list of key-value pairs: \n", str(MLB_teams))
print(msg)

Create dictionary using the 'dict' function and key-value pairs: 
 {'Colorado': 'Rockies', 'Boston': 'Red Sox', 'Minnesota': 'Twins', 'Milwaukee': 'Brewers', 'Seattle': 'Mariners'}
Create dictionary using the 'dict' function and list of key-value pairs: 
 {'Colorado': 'Rockies', 'Boston': 'Red Sox', 'Minnesota': 'Twins', 'Milwaukee': 'Brewers', 'Seattle': 'Mariners'}


### Access Dictionary Values
To traverse a dictionary, normally by key, you can perform the following:

In [24]:
# Define the dictionary 
MLB_teams = dict([
    ('Colorado','Rockies'),
    ('Boston','Red Sox'),
    ('Minnesota', 'Twins'),
    ('Milwaukee', 'Brewers'),
    ('Seattle', 'Mariners')
])

# Traverse the dictionary using the key
for key in MLB_teams.keys():
    value = MLB_teams[key]
    msg = '{0} {1} {2} {3}'.format("key: ", key, 'value:', value) 
    print(msg)

key:  Colorado value: Rockies
key:  Boston value: Red Sox
key:  Minnesota value: Twins
key:  Milwaukee value: Brewers
key:  Seattle value: Mariners


### Dictionary Methods
The following is an overview of methods that apply to dictionaries.

In [40]:
# Define the dictionary 
MLB_teams = dict([
    ('Colorado','Rockies'),
    ('Boston','Red Sox'),
    ('Minnesota', 'Twins'),
    ('Milwaukee', 'Brewers'),
    ('Seattle', 'Mariners')
])
# Clear the dictionary
MLB_teams.clear()
msg = '{0} {1}'.format("MLB_teams cleared using the 'clear' method: \n", str(MLB_teams))
print(msg)

MLB_teams cleared using the 'clear' method: 
 {}


In [41]:
# Define the dictionary 
MLB_teams = dict([
    ('Colorado','Rockies'),
    ('Boston','Red Sox'),
    ('Minnesota', 'Twins'),
    ('Milwaukee', 'Brewers'),
    ('Seattle', 'Mariners')
])

# Get a value with the specified key
value = MLB_teams.get('Seattle')
msg = '{0} {1}'.format("Get the value with 'Seattle' key using the 'get' method: \n", value)
print(msg)

Get the value with 'Seattle' key using the 'get' method: 
 Mariners


In [42]:
# Define the dictionary 
MLB_teams = dict([
    ('Colorado','Rockies'),
    ('Boston','Red Sox'),
    ('Minnesota', 'Twins'),
    ('Milwaukee', 'Brewers'),
    ('Seattle', 'Mariners')
])

# Get list of tuples of key-value pairs
tuples = MLB_teams.items()
msg = '{0} {1}'.format("Get list of key-value pairs using the 'items' method: \n", tuples)
print(msg)

# Get the keys
keys = MLB_teams.keys()
msg = '{0} {1}'.format("Get the keys using the 'keys' method: \n", keys)
print(msg)

# Get the values
keys = MLB_teams.values()
msg = '{0} {1}'.format("Get the values using the 'values' method: \n", keys)
print(msg)

Get list of key-value pairs using the 'items' method: 
 dict_items([('Colorado', 'Rockies'), ('Boston', 'Red Sox'), ('Minnesota', 'Twins'), ('Milwaukee', 'Brewers'), ('Seattle', 'Mariners')])
Get the keys using the 'keys' method: 
 dict_keys(['Colorado', 'Boston', 'Minnesota', 'Milwaukee', 'Seattle'])
Get the values using the 'values' method: 
 dict_values(['Rockies', 'Red Sox', 'Twins', 'Brewers', 'Mariners'])


## Functions
Python offers built-in functions to make your life easier. You already know two such functions: `print()` and `type()`. You've also used the functions `str(), int(), bool(), float()` to switch between data types. These are built-in functions as well.

Calling a function is easy. To get the type of 3.0 and store the output as a new variable, result, you can use the following: `result = type(3.0)`
Other examples are:

In [2]:
# Create variables var1 and var2
var1 = [1, 2, 3, 4]
var2 = True

# Print out type of var1
print(type(var1))

# Print out length of var1
print(len(var1))

# Convert var2 to an integer: out2
out2 = int(var2)
print(out2)bb

<class 'list'>
4
1


You can findout about a function using the help function as in `help(max)` or prefixing the function with a question mark as in `?max`.  

In [4]:
help(max)
?str

Help on built-in function max in module builtins:

max(...)
    max(iterable, *[, default=obj, key=func]) -> value
    max(arg1, arg2, *args, *[, key=func]) -> value
    
    With a single iterable argument, return its biggest item. The
    default keyword-only argument specifies an object to return if
    the provided iterable is empty.
    With two or more arguments, return the largest argument.



### Multiple Arguments
Square brackets around a function argument, in the documentation, implies that the argument is optional. But Python also uses a different way to tell users about arguments being optional.
Have a look at the documentation of sorted() by typing help(sorted) in the IPython Shell.

In [7]:
help(sorted)

Help on built-in function sorted in module builtins:

sorted(iterable, /, *, key=None, reverse=False)
    Return a new list containing all items from the iterable in ascending order.
    
    A custom key function can be supplied to customize the sort order, and the
    reverse flag can be set to request the result in descending order.



As you can see sorted() takes three arguments: *iterable*, *key* and *reverse*. 

- key=None means that if you don't specify the key argument, it will be None. 
- reverse=False means that if you don't specify the reverse argument, it will be False.

In the following exercise, you'll only have to specify iterable and reverse, not key. The first input you pass to sorted() will be matched to the iterable argument, but what about the second input? To tell Python you want to specify reverse without changing anything about key, you can use do the following: `full_sorted = sorted(full, reverse=True)`. 

In [8]:
# Create lists first and second
first = [11.25, 18.0, 20.0]
second = [10.75, 9.50]

# Paste together first and second: full
full = first + second

# Sort full in descending order: full_sorted
full_sorted = sorted(full, reverse=True)

# Print out full_sorted
print(full_sorted)

[20.0, 18.0, 11.25, 10.75, 9.5]


## Methods
Methods are functions associated with an object, for examplle type `?list` in the Python shell to see the methods associated with the list object. 

### String Methods
Strings provide a bunch of methods. Let's discover some of them. If you want to discover them in more detail, you can always type `help(str)` in the Python Shell.

In [13]:
# string to experiment with: place
place = "poolhouse"

# Use upper() on place: place_up
place_up = place.upper()

# Print out place and place_up
print(place_up)
print(place)

# Print out the number of o's in place
print(place.count('o'))

POOLHOUSE
poolhouse
3


### List Methods
Strings are not the only Python types that have methods associated with them. Lists, floats, integers and booleans are also types that come packaged with a bunch of useful methods. In this exercise, you'll be experimenting with:

- *index()*, to get the index of the first element of a list that matches its input and
- *count()*, to get the number of times an element appears in a list.


In [14]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Print out the index of the element 20.0
index = areas.index(20.0)
print(index)

# Print out how often 9.50 appears in areas
freq = areas.count(9.50)
print (freq)

2
1


<span class="danger">Some methods can change the object to which they apply</span>. See the following examples.

In [15]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Use append twice to add poolhouse and garage size
areas.append(24.5)
areas.append(15.45)

# Print out areas
print(areas)

# Reverse the orders of the elements in areas
areas.reverse()

# Print out areas
print(areas)

[11.25, 18.0, 20.0, 10.75, 9.5, 24.5, 15.45]
[15.45, 24.5, 9.5, 10.75, 20.0, 18.0, 11.25]


## Numeric Python - Numpy
[NumPy](https://www.numpy.org/) is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. 

In [23]:
# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Import the numpy package as np
import numpy as np

# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))


<class 'numpy.ndarray'>


### Converting Baseball Players Height in Meters 
You are a huge baseball fan. You decide to call the MLB (Major League Baseball) and ask around for some more statistics on the height of the main players. They pass along data on more than a thousand players, which is stored as a regular Python list: height_in. The height is expressed in inches. Make a numpy array out of it and convert the units to meters?

We used a very small subset of height_in available at [MLB Statistics](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights).

In [27]:
# Players height in inches
height_in=[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73] 

# Import numpy
import numpy as np

# Create a numpy array from height: np_height_in
np_height_in = np.array(height_in)

# Print out np_height
print(np_height_in)

# Convert np_height to m: np_height_m
np_height_m = np_height_in * .0254
  
# Print out np_height
print(np_height_m)    
          

[74 74 72 72 73 69 69 71 76 71 73 73]
[1.8796 1.8796 1.8288 1.8288 1.8542 1.7526 1.7526 1.8034 1.9304 1.8034
 1.8542 1.8542]


### Calculate Baseball player's BMI
The MLB also offers to let you analyze the players' weight data. Again, both are available as regular Python lists: height_in and weight. height_in is in inches and weight_lb is in pounds.

It's now possible to calculate the BMI of each baseball player. Python code to convert height_in to a numpy array with the correct units is already available in the workspace. Follow the instructions step by step and finish the game!

In [32]:
# Players height in inches
height_in=[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73] 

# Players weight in pounds
weight_lb = [180, 215, 210, 210, 188, 176, 209, 200, 231, 180, 188, 180]

# Import numpy
import numpy as np

# Create array from height_in with metric units: np_height_m
np_height_m = np.array(height_in) * 0.0254

# Create array from weight_lb with metric units: np_weight_kg
np_weight_kg = np.array(weight_lb) * 0.453592

# Calculate the BMI: bmi
bmi = np_weight_kg/np_height_m**2

# Print bmi
print(bmi)


[23.11037639 27.60406069 28.48080465 28.48080465 24.80333518 25.99036864
 30.86356276 27.89402921 28.11789135 25.10462629 24.80333518 23.7478741 ]


### Search Light Weight Palyers
Let's create a boolean numpy array where the element of the array should be True if the corresponding baseball player's BMI is below 21. You can use the < operator for this. Name the array light. The use this array to obtain all the BMIs below 21 contaid in the bmi array.


In [40]:
# Players height in inches
height_in=[74, 74, 72, 72, 73, 69, 69, 71, 76, 71, 73, 73] 

# Players weight in pounds
weight_lb = [180, 215, 210, 210, 188, 176, 209, 200, 231, 180, 188, 180]

# Import numpy
import numpy as np

# Create array from height_in with metric units: np_height_m
np_height_m = np.array(height_in) * 0.0254

# Create array from weight_lb with metric units: np_weight_kg
np_weight_kg = np.array(weight_lb) * 0.453592

# Calculate the BMI: bmi
bmi = np_weight_kg/np_height_m**2

# Print bmi array
print(bmi)

# Create the ligh array
light = bmi < 25

# Print light array
print(light)

# Print the BMIs below 25
print(bmi[light])


[23.11037639 27.60406069 28.48080465 28.48080465 24.80333518 25.99036864
 30.86356276 27.89402921 28.11789135 25.10462629 24.80333518 23.7478741 ]
[ True False False False  True False False False False False  True  True]
[23.11037639 24.80333518 24.80333518 23.7478741 ]


Subsetting NumPy Arrays
You've seen it with your own eyes: Python lists and numpy arrays sometimes behave differently. Luckily, there are still certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same. To see this for yourself, try the following lines of code in the IPython Shell:

`x = ["a", "b", "c"]`
`x[1]`

`np_x = np.array(x)`
`np_x[1]`


In [45]:
# Height and weight are available as a regular lists
# Store weight and height lists as numpy arrays
np_weight_lb = np.array(weight_lb)
np_height_in = np.array(height_in)

# Print out the weight at index 50
print(np_weight_lb[10])

# Print out sub-array of np_height: index 2 up to and including index 6
print(weight_lb[2:7])


188
[210, 210, 188, 176, 209]


### 2D NumPy Array
A 2D numpy array is a list of lists, where each list represents a row in a rectangular matrix. And an element (cell) is located by two indeces: a row index and a column index [x,y].

In the next exercise, baseball is a list of lists. The main list contains 4 sub-lists. Each of these sub-list is a contains the height and the weight of 4 baseball players.

In [49]:
# Import numpy
import numpy as np

# Create a baseball list of lists
baseball = [[180, 78.4],
            [215, 102.7],
            [210, 98.5],
            [188, 75.2]]

# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print np_baseball
print(np_baseball)

# Print the type of np_baseball
print(type(np_baseball))

# Print the shape of np_baseball
print(np_baseball.shape)


[[180.   78.4]
 [215.  102.7]
 [210.   98.5]
 [188.   75.2]]
<class 'numpy.ndarray'>
(4, 2)


### Baseball Data in 2D Format
It makes more sense to restructure all this information in a 2D numpy array. This array should have as many rows, as the baseball players you have information on, and 2 columns (for height and weight).
In this list of lists, each sublist represents the height and weight of a single baseball player. 
Can you store the data as a 2D array to unlock numpy's extra functionality?

In [50]:
# Import numpy
import numpy as np

# Create a baseball list of lists
baseball = [[74, 180], [74, 215], [72, 210], [72, 210], [73, 188], [69, 176], 
            [71, 200], [76, 231], [71, 180], [73, 188], [73, 180], [74, 185]]

# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print the shape of np_baseball
print(np_baseball.shape)

(12, 2)


### Subsetting 2D NumPy Arrays
If your 2D numpy array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy. Have a look at the code below where the elements "a" and "c" are extracted from a list of lists.

- Regular list of lists

        x = [["a", "b"], ["c", "d"]]
        [x[0][0], x[1][0]]

- Numpy

        import numpy as np
        np_x = np.array(x)
        np_x[:,0]

For regular Python lists, this is a real pain. For 2D numpy arrays, however, it's pretty intuitive! The indeces before the comma refer to the **rows**, while those after the comma refer to the **columns**. Colon **:** is for slicing; in this example, it tells Python to include all rows.


In [54]:
# Import numpy
import numpy as np

# Create a baseball list of lists
baseball = [[74, 180], [74, 215], [72, 210], [72, 210], [73, 188], [69, 176], 
            [71, 200], [76, 231], [71, 180], [73, 188], [73, 180], [74, 185]]

# Create a 2D numpy array from baseball: np_baseball
np_baseball = np.array(baseball)


# Print the 6th row of np_baseball
print(np_baseball[5])

# Select the entire second column of np_baseball: np_weight_lb
np_weight_lb = np_baseball[:,b 1] 
print(np_weight_lb)

# Print height of 6th player
print(np_baseball[5,0])

[ 69 176]
[180 215 210 210 188 176 200 231 180 188 180 185]
69


### 2D Arithmetic
Remember how you calculated the Body Mass Index for all baseball players? numpy was able to perform all calculations element-wise (i.e. element by element). For 2D numpy arrays this isn't any different! You can combine matrices with single numbers, with vectors, and with other matrices.

Execute the code below in the IPython shell and see if you understand:

        import numpy as np
        np_mat = np.array([[1, 2],
                           [3, 4],
                           [5, 6]])
        np_mat * 2
        np_mat + np.array([10, 10])
        np_mat + np_mat
        

In [61]:
# Import numpy package
import numpy as np

# Create a baseball list of lists contaning height, 
# weight and age of the baseball players.
baseball = [[74.0, 180.0, 22.99], [74.0, 215.0, 34.69], [72.0, 210.0, 30.78], 
            [72.0, 210.0, 35.43], [73.0, 188.0, 35.71], [69.0, 176.0, 29.39]]
            
# Create updated info list of lists contaning height, 
# weight and age changes of the baseball players.
updated = [[  1.2303559,  -11.16224898,   1. ],
 [  1.02614252,  16.09732309,   1.        ],
 [  1.1544228,    5.08167641,   1.        ],
 [  1.09349925,   4.23890778,   1.        ],
 [  0.82285669, -17.78200035,   1.        ],
 [  0.99484223,   8.14402711,   1.        ]]

# Create a baseball array
np_baseball = np.array(baseball)

# Add np_baseball and updated 
results = np_baseball + updated

# Print the results
print(results)

# Create numpy array: conversion
conversion = np.array([0.0254, 0.453592, 1])

# Moltipy np_baseball and conversion
results = b

# Print the results
print(results)

[[ 75.2303559  168.83775102  23.99      ]
 [ 75.02614252 231.09732309  35.69      ]
 [ 73.1544228  215.08167641  31.78      ]
 [ 73.09349925 214.23890778  36.43      ]
 [ 73.82285669 170.21799965  36.71      ]
 [ 69.99484223 184.14402711  30.39      ]]
[[ 1.8796   81.64656  22.99    ]
 [ 1.8796   97.52228  34.69    ]
 [ 1.8288   95.25432  30.78    ]
 [ 1.8288   95.25432  35.43    ]
 [ 1.8542   85.275296 35.71    ]
 [ 1.7526   79.832192 29.39    ]]


### Data Statistics - Average vs Median
You now know how to use numpy functions to get a better feeling for your data. It basically comes down to importing numpy and then calling several simple functions on the numpy arrays:

        import numpy as np
        x = [1, 4, 8, 10, 12]
        np.mean(x)
        np.median(x)

The baseball data is available as a 2D numpy array with 3 columns (height, weight, age) and 2 rows. The name of this numpy array is np_baseball. After restructuring the data, however, you notice that some height values are abnormally high. Follow the instructions and discover which summary statistic is best suited if you're dealing with so-called outliers. 
<span style="background-color:lightblue">Notice that the **mean** is the average, while the **median** is the middle point of a set of values.</span> 

In [76]:
# Import numpy package
import numpy as np

# Create a baseball list of lists contaning height, 
# weight and age of the baseball players.
baseball = [[74.0, 180.0, 22.99], [74.0, 215.0, 34.69], [572.0, 210.0, 30.78], 
            [72.0, 210.0, 35.43], [173.0, 188.0, 35.71], [69.0, 176.0, 29.39]]

# Create a baseball array
np_baseball = np.array(baseball)
            
# Create np_height_in from np_baseball first column
np_height_in = np_baseball[:,0]

# Print the mean of np_height_in
np_mean_height = np.mean(np_height_in)
mean ='{0} {1}'.format('Average:', str(np_mean_height))
print(mean)
                       
# Print the median of np_height_in
np_median_height = np.median(np_height_in)
median ='{0} {1}'.format('Median:', str(np_median_height))
print(median)

# Print standard deviation of np_height_in
np_stdv_height = np.std(np_height_in)
stdv ='{0} {1}'.format('Deviation:', str(np_stdv_height))
print(stdv)

# Print out correlation between first and second column. Replace 'None'
np_corr_height_weight = np.corrcoef(np_baseball[:,0], np_baseball[:,1])
corr ='{0} {1}'.format('Correlation:', str(np_corr_height_weight))
print(corr)


Average: 172.33333333333334
Median: 74.0
Deviation: 182.4907912440759
Correlation: [[1.         0.34740086]
 [0.34740086 1.        ]]


### Soccer Statistics
In the last few exercises you've learned everything there is to know about heights and weights of baseball players. Now it's time to dive into another sport: soccer.

You've contacted FIFA for some data and they handed you two lists. The lists are the following:

        positions = ['GK', 'M', 'A', 'D', ...]
        heights = [191, 184, 185, 180, ...]
        
Each element in the lists corresponds to a player. The first list, positions, contains strings representing each player's position. The possible positions are: 'GK' (goalkeeper), 'M' (midfield), 'A' (attack) and 'D' (defense). The second list, heights, contains integers representing the height of the player in cm. The first player in the lists is a goalkeeper and is pretty tall (191 cm).


In [79]:
positions = ['GK', 'M', 'A', 'D', 'M', 'D', 'M', 'M', 'M', 
             'A', 'M', 'M', 'A', 'A', 'A', 'M', 'D', 'A', 
             'D', 'M', 'GK', 'D', 'D', 'M', 'M', 'M', 'M', 
             'D', 'M', 'GK', 'D', 'GK', 'D', 'D', 'M', 'A', 
             'M', 'D', 'M', 'GK', 'M', 'GK', 'A', 'D', 'GK', 
             'A', 'GK', 'GK', 'GK', 'GK', 'A', 'D', 'A', 'D', 
             'D', 'M', 'D', 'M', 'D', 'D', 'GK', 'GK', 'D', 'M', 
             'M', 'GK', 'M', 'D', 'M', 'M', 'D', 'D', 'M']

heights = [191, 184, 185, 180, 181, 187, 170, 179, 183, 186, 185, 
           170, 187, 183, 173, 188, 183, 180, 188, 175, 193, 180, 
           185, 170, 183, 173, 185, 185, 168, 190, 178, 185, 185, 
           193, 183, 184, 178, 180, 177, 188, 177, 187, 186, 183, 
           189, 179, 196, 190, 189, 188, 188, 188, 182, 185, 184, 
           178, 185, 193, 188, 179, 189, 188, 180, 178, 186, 188, 
           180, 185, 172, 179, 180, 174, 183]

# Import numpy
import numpy as np

# Convert positions and heights to numpy arrays: np_positions, np_heights
np_positions = np.array(positions)
np_heights = np.array(heights)

# Heights of the goalkeepers: gk_heights
gk_heights = np_heights[np_positions == 'GK']

# Heights of the other players: other_heights
other_heights = np_heights[np_positions != 'GK']

# Print out the median height of goalkeepers. Replace 'None'
np_gk_median_height = np.median(gk_heights)
print("Median height of goalkeepers: " + str(np_gk_median_height))

# Print out the median height of other players. Replace 'None'
np_other_median_height = np.median(other_heights)
print("Median height of other players: " + str(np_other_median_height))


Median height of goalkeepers: 189.0
Median height of other players: 183.0


## References

- [A lot of Examples - Python Central](https://www.pythoncentral.io/)
- [Lorem Ipsum Generator](https://www.lipsum.com/feed/html)
- [Python Cookbook](https://d.cxcore.net/Python/Python_Cookbook_3rd_Edition.pdf)
- [Python Examples](https://www.programiz.com/python-programming/examples)
- [Python 3.7.1 Documentation](https://docs.python.org/3/)
- [Python Coding Guidelines](http://jaynes.colorado.edu/PythonGuidelines.html)
- [Python Courses by DataCamp](https://www.datacamp.com/courses/tech:python/?utm_campaign=intro-to-python&utm_medium=pop-up&utm_source=campus)
- [Conda](https://conda.io/docs/index.html)
- [Conda Cheatsheet](conda-cheatsheet.pdf)
- [All Python Courses](https://www.datacamp.com/courses/tech:python/?utm_campaign=intro-to-python&utm_medium=pop-up&utm_source=campus)
- [Python Formatting](https://pyformat.info/)
- [ActiveState Code Recipes](https://github.com/ActiveState)
- [Mean Median Average](https://www.vocabulary.com/articles/chooseyourwords/mean-median-average/)
- [Google Search on Python Examples](https://www.info.com/serp?sc=8zjIMswoPkfZwvrTUOoAD7i3lKWl4X_pWoVGzYhtJABbToZ0G-cnxIwOXeJ4nvnyO-ihmuXQ9v-t-9QqlQtaVdUfmo4hMSzCQs6z26FjVQz4HK5G3573k5OCKGEjmkTSsgPHvnPIJaFKwqZ1-hc-GKEpb_rGL4UTVCogsQoTKJV3R22M-fPSfgCGEfp6LHCFgkFyLbKy--_cGM8Gjs9bDyTEGbC9-pHWAA7OJ94JNxZqQX6BxtvuhQtirwhHWxv_9lGgiVgRsJmZMz_D0xm087LuW1RphTHXgU3VGSzMf6cdjujx-oDgH_0GbhaVPjtEvk0W7CAWsgq_5Zqpe4rVB0k0AYgQW0OSKYDIvNSBeLSNE0J1FGIRp5ACrkJuByfh6uh44sdNLUQHirLSXqshkPZ2zvjo3_FTlUobddbetRSciBaJUbT19HIu9_CDkIvn4I6Ozrykd9PRNUVqmnEqKwjO0LodOrWb465mQQDs2UiCngWp8eBrRDfUmSC5f_VXEQIvcrdHsEvRy70ACGyucqKb5-oRf1eYnlnQwOhX7m6wbA9L7QSH8Iik4IV6e-bzugouVYchfCAREDPQ0Gtove90avFoBakM-ROeBQcG0mUez-Pwk7lVt_PTAZ3rSrsMMGbVl_cwn_hrsFbv4MKkplXqpCwFbFT2TOPvhGY5GmJOyK4cbfD8l9R2o-Bl05YEy9WZyN5nsN8bRr1iVwHiG72GJsfThjlVCY2OvO-SybM9QfNTM51qRhjnx6LkMqjD0eomyHgQwQiCS3_YX_lbk1re0U1J-YFe_ybL8lMitm5Au_RJDg6RLBPoj706O6ZxMwMVLdxc4LOq7ZhxMGnvXUWkCMCNDDZVCfVcwBYRDQddR7JyFVtIzAZTTofbugrVPr2XPp3WxKR5GJR1_jRkOijHtYurLeTmpST3YC1NcU-zVpjIxqcfXSJmCxCzhHvHhJI1gvmg7ulwypB8Wo8De3lbZypxMYNG7Gw9Dz37379Nd16vB_2s-U6B1pkC9AShwhQJWgazLrFgt_rtNvc-Vxs51dlcoA_blzwhzjpDWPY42bNNu9OeVJCFanL8icw6QFHxWknlAlW8sMaUHwNYEdlIg-v0JQ4WztaAz-mYACrMMTCVldHIxhNouoMOJ-qSG7V-3Dy7hzA12USB0yB8ZW21Y4ijjF_6Skxbz_0AB5-hIBoksvKLfh9iEtPQY3DJf0Wc6yoLXKiScc7bNUk6AgkR_ZtSxwFP5G8ltWs6zlzI4chvr9IGp2rtQPv3NfSuyeveQK2wv6F7mIR1lnmx7mfqpbhJ363n2uZR7AYfGpIdQgwsTXjDjPyPM8d8R9MAYx5jdQPDVFfWJSNqHGJS1uD_UzrinDktKODZAZeZC-14sxRBZ0IxDVQJa72tgvZK9DNIRW4U9rypQ5p1w-KybWOxtH6CeqaPHQJ28QQJ5ecwJN043l2EExeU3lVZnx0FytjQCd9PcixRtVtC9CsmqBJ4pDWW8qm1KEAZU2FGUjDstrSAyl5ITG29wDVg4AwinwHpe65ywpt0FdHYD-Jk80FOtpcPb2UpZIB9Jt3qRR1CWzKjWwqlZL0vpIqYFpLQ_Dv2kUtBuX--_aHyGHBqkBf_XsAt7LemhN71un7n5o3ieyF-Zz0MYN4L9uAqMFCzhTw8D0qjV7rzkByiDKzMJ2P0CAOYbb1z1ib7F1Ttjk9PK9Nc3NzYZsLgm0kV7Fa35GHsSMwo5w0jTdNaB5gyKMm5-Npd1rQGevOUezaKQlBRFRL7nG4aFRdNd3_tq2tfORZ2OQ&page=1&q=python+programming+examples)


