# Python List

#### Slicing and dicing 
Selecting single values from a list is just one part of the story. It's also possible to slice your list, which means selecting multiple elements from your list. Use the following syntax:

    my_list[start:end]

However, ***it's also possible not to specify these indexes***. If you don't specify the begin index, Python figures out that you want to start your slice at the beginning of your list. If you don't specify the end index, the slice will go all the way to the last element of your list. To experiment with this:

In [1]:
x = ["a", "b", "c", "d"]

print(x[:2])
print(x[2:])
print(x[:])

['a', 'b']
['c', 'd']
['a', 'b', 'c', 'd']


In [2]:
# Create the areas list
areas = [
    "hallway",
    11.25,
    "kitchen",
    18.0,
    "living room",
    20.0,
    "bedroom",
    10.75,
    "bathroom",
    9.50,
]

# Use slicing to create a list, downstairs, that contains the first 6 elements of areas
downstairs = areas[0:6]

# Alternative slicing to create a new variable, upstairs, that contains the last 4 elements of areas
upstairs = areas[-4:]

# Print out downstairs and upstairs
print(f"{downstairs}\n{upstairs}")

['hallway', 11.25, 'kitchen', 18.0, 'living room', 20.0]
['bedroom', 10.75, 'bathroom', 9.5]


#### Replace list elements 
Replacing list elements is pretty easy. Simply subset the list and assign new values to the subset. You can select single elements or you can change entire list slices at once.

In [3]:
# Update the area of the bathroom area to be 10.50 square meters instead of 9.50
areas[-1] = 10.50

# Change "living room" to "chill zone"
areas[-6] = "chill zone"

# Print the new list
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5]


### Extend a list 
If you can change elements in a list, you sure want to be able to add elements to it, right? You can use the + operator:

In [4]:
# Add poolhouse data to areas, new list is areas_1
areas = areas + ["poolhouse", 24.5]

# Print the new added element in the list
print(areas)

['hallway', 11.25, 'kitchen', 18.0, 'chill zone', 20.0, 'bedroom', 10.75, 'bathroom', 10.5, 'poolhouse', 24.5]


### Inner workings of lists 
The Python code below already creates a list with the name areas and a copy named areas_copy_without_list_func. Next, the first element in the areas_copy_without_list_func list is changed and the areas list is printed out. If you run the code you'll see that, although you've changed areas_copy_without_list_func, the change also takes effect in the areas list. That's because areas and areas_copy **point** to the same list.

***If you want to prevent changes*** in areas_copy from also taking effect in areas, you'll have to do a more explicit copy of the areas list. You can do this with **list()** or by using **[:]**.

In [5]:
# Create list areas
areas = [11.25, 18.0, 20.0, 10.75, 9.50]

# Create areas_copy with list() function or [:]
areas_copy = areas[:]

# Create areas_copy
areas_copy_without_list_func = areas

# Change areas_copy_without_list_func
areas_copy_without_list_func[0] = 2.0

# Print areas and areas_copy
print(f"areas: {areas}")
print(f"areas_copy: {areas_copy}")

areas: [2.0, 18.0, 20.0, 10.75, 9.5]
areas_copy: [11.25, 18.0, 20.0, 10.75, 9.5]


#### List methods 
- **index()**, to get the index of the first element of a list that matches its input and
- **count()**, to get the number of times an element appears in a list.
- **append()**, that adds an element to the list it is called on,
- **remove()**, that removes the first element of a list that matches the input, and
- **reverse()**, that reverses the order of the elements in the list it is called on

In [6]:
numbers = [11.25, 18.0, 20.0, 10.75, 9.50]

# Print out the index of the element 20.0
print(f"index of 20: {numbers.index(20.0)}")

# Print out how often 9.50 appears in numbers
print(f"occurrences of 9.50: {numbers.count(9.5)}")

# Use append and print the list
numbers.append(10)
print(f"appended: {numbers}")

# Reverse the orders of the elements and print the list
numbers.reverse()
print(f"reversed: {numbers}")

index of 20: 2
occurrences of 9.50: 1
appended: [11.25, 18.0, 20.0, 10.75, 9.5, 10]
reversed: [10, 9.5, 10.75, 20.0, 18.0, 11.25]


# NumPy

#### NumPy Array 
In this chapter, we're going to dive into the world of baseball. Along the way, you'll get comfortable with the basics of NumPy, a powerful package to do data science.

A list baseball has already been defined below, representing the height of some baseball players in centimeters.

In [7]:
# Import the numpy package as np
import numpy as np

# Create list baseball
baseball = [180, 215, 210, 210, 188, 176, 209, 200]

# Create a numpy array from baseball: np_baseball
np_baseball = np.array(baseball)

# Print out type of np_baseball
print(type(np_baseball))

<class 'numpy.ndarray'>


NumPy is great for doing vector arithmetic. If you compare its functionality with regular Python lists, however, some things have changed.

First of all, NumPy arrays cannot contain elements with different types. If you try to build such a list, some of the elements' types are changed to end up with a homogeneous list. This is known as type **coercion**.

Second, the typical arithmetic operators, such as +, -, * and / have a different meaning for regular Python lists and NumPy arrays.

Have a look at this line of code:

In [8]:
np.array([True, 1, 2]) + np.array([3, 4, False])

array([4, 5, 2])

Python lists and NumPy arrays sometimes behave differently. Luckily, there are still certainties in this world. For example, subsetting (using the square bracket notation on lists or arrays) works exactly the same.

In [9]:
# np_baseball = [180, 215, 210, 210, 188, 176, 209, 200]
np_baseball[:3]

array([180, 215, 210])

#### 2D NumPy Array 
Before working on the actual data, let's try to create a 2D NumPy array from a small list of lists.

In this exercise, baseball is a list of lists. The main list contains 4 elements. Each of these elements is a list containing the height and the weight of 4 baseball players, in this order. 

In [10]:
# Create baseball, a list of lists
baseball_2d_list = [[180, 78.4], [215, 102.7], [210, 98.5], [188, 75.2]]

# Create a 2D numpy array from baseball: np_baseball
np_baseball_2d_list = np.array(baseball_2d_list)

# Print out the type of np_baseball
print(type(np_baseball_2d_list))

# Print out the shape of np_baseball
print(f"(rows, columns): {np_baseball_2d_list.shape}")

<class 'numpy.ndarray'>
(rows, columns): (4, 2)


#### Transforming Baseball Data 
The MLB was, again, very helpful and passed you the data in a different structure, a Python list of lists. In this list of lists, each sublist represents the height and weight of a single baseball player. The name of this embedded list is baseball

In [11]:
import pandas as pd

df_baseball = pd.read_csv("../data/baseball.csv")

# Select the height and weight column from the DataFrame and convert to 2D Array
baseball_2d = df_baseball[["Height", "Weight"]].values

# Create a 2D numpy array from baseball_2d: np_baseball
np_baseball = np.array(baseball_2d)

# Print out the shape of np_baseball
print(f"(rows, columns): {np_baseball.shape}")

(rows, columns): (1015, 2)


#### Subsetting 2D NumPy Arrays 
If your 2D numpy array has a regular structure, i.e. each row and column has a fixed number of values, complicated ways of subsetting become very easy. Have a look at the code below where the elements "a" and "c" are extracted from a list of lists.

For regular Python lists, this is a real pain. For 2D numpy arrays, however, it's pretty intuitive! The indexes before the comma refer to the rows, while those after the comma refer to the columns. The : is for slicing; in this example, it tells Python to include all rows.

In [12]:
# regular list of lists
x = [["a", "b"], ["c", "d"]]
print([x[0][0], x[1][0]])

# numpy
np_x = np.array(x)
print(np_x[:, 0])

['a', 'c']
['a' 'c']


In [13]:
# Print out the 50th row of np_baseball
print(f"50th row: {np_baseball[49, :]}")

# Print the entire second column of np_baseball: np_weight_lb
print(f"entire second column (weight col.): {np_baseball[:, 1]}")

# Print out height of 124th player
print(f"124th player's height: {np_baseball[123, 0]}")

50th row: [ 70 195]
entire second column (weight col.): [180 215 210 ... 205 190 195]
124th player's height: 75


#### NumPy: Basic Statistics 

In [14]:
# Create np_height_in from np_baseball
np_height_in = np_baseball[:, 0]

# Print out the mean of np_height_in
print(f"mean: {np.mean(np_height_in)}")

# Print out the median of np_height_in
print(f"median: {np.median(np_height_in)}")

# Print out the standard deviation of np_height_in
print(f"standard deviation: {np.std(np_height_in)}")

# Print out correlation between first and second column of baseball data
print(f"correlation: {np.corrcoef(np_baseball[:, 0], np_baseball[:, 1])}")

mean: 73.6896551724138
median: 74.0
standard deviation: 2.312791881046546
correlation: [[1.         0.53153932]
 [0.53153932 1.        ]]


#### Compare Arrays 

In [15]:
my_house = np.array([18.0, 20.0, 10.75, 9.50])
your_house = np.array([14.0, 24.0, 14.25, 9.0])

# my_house greater than or equal to 18
print(my_house >= 18)

# my_house less than your_house
print(my_house < your_house)

# my_house greater than 18.5 or smaller than 10
print(np.logical_or(my_house > 18.5, my_house < 10))

# Both my_house and your_house smaller than 11
print(np.logical_and(my_house < 11, your_house < 11))

[ True  True False False]
[False  True  True False]
[False  True False  True]
[False False False  True]
