## List comprehension

Just think of it as a one-line way to create a list from another list.

Here it is adding a one to each number of `my_list` and storing the results in `compre_list`

In [None]:
my_list = [1,2,3,4,5]

compre_list = [i**2+10 for i in my_list]
print(my_list)
print(compre_list)

In [None]:
my_list = ['Notre Dame', 'Michigan', 'Notre Dame','Michigan','Michigan']

count = 0
for i in my_list:
    if i == 'Notre Dame':
        count += 1
        
print("there are {} items that are Notre Dame".format(count))
    

In [None]:
my_list = ['Notre Dame', 'Michigan', 'Notre Dame','Michigan','Michigan']

mendoza_indicator = [1 if x=='Notre Dame' else 0 for x in my_list]

print( sum(mendoza_indicator))

print(my_list)
print(mendoza_indicator)
print("Notre Dame is listed {} number of times".format(sum(mendoza_indicator)))

## Fixed Type Arrays in Python (Available since Python 3.3)

In [None]:
# arrays are part of python core but not automatically available
import array

In [None]:
basic_list = list(range(10))
print(basic_list)
basic_array = array.array('i',basic_list)
basic_array

In [None]:
# force list items into floats

basic_list = list(range(10))
basic_array_of_floats = array.array('f',basic_list)
basic_array_of_floats

In [None]:
basic_array_of_floats.append(42)
print(basic_array_of_floats)
basic_array_of_floats.append('wat?')

# NumPy - Numerical Python 

## Importing numpy array

In [None]:
import numpy as np
np.__version__

Note: Some of the examples and content are from our text book PDSH Ch. 2

## Fundamentals of NumPy Arrays


### Creating NumPy array from a list

In [None]:
list_of_ints = [0, 1, 2, 3, 4, 5]
np_ints = np.array(list_of_ints)

In [None]:
print(np_ints)

### Creating NumPy array from scratch

In [None]:
np.zeros(10)

In [None]:
np.full(10, 3.1415)

In [None]:
np.arange(5,25, 5)

In [None]:
np.arange?

## Accessing NumPy Array

In the last two classes we have used Python's built-in `list` and now we are using array type provided by the NumPy package. 

<div class="alert alert-block alert-info">
The actual name of the NumPy array type is `ndarray`. So, when we refer to a Numpy Array or an `ndarray` we are talking about the same thing.</div> 

Another similiarity between the two types is how you can retrieve the data elements they contain.

For either type, you can access an element by simple adding brackets to the end of the object name with the index of the element you want inside the bracket.

<div class="alert alert-block alert-warning">
Just remember the first element is `0`, not `1`. Forget this and you'll have all sorts of problems.
</div> 


In [None]:
numeric_array = np.array(range(1, 11))
numeric_array

In [None]:
print(numeric_array[5])
print(numeric_array[0])
print(numeric_array[3:8])
print(numeric_array[-1])

In [None]:
print(numeric_array[10])

## Array Slicing: Accessing Subarrays <a name="slicing"></a>

Just as we can use square brackets to access individual array elements, we can also use them to access subarrays with the *slice* notation. 

The NumPy slicing syntax follows that of the standard Python `list`; to access a slice of an array `x`, use the `start:stop:step` notation inside of brackets:
``` python
x[start:stop:step]
```

If any of these are unspecified, they default to the values: 
* `start=0`
* `stop=`*`size of dimension`*
* `step=1`.

#### Sample Slice Notations
Here are some sample slice notations and their meanings.
* `1:5:1`: Return elements 2 through 5 in normal order. (Remember that with 0-based indexing, 1 is the second element.
* `:8:1`: Return elements 0 through 8 in normal order. Since the `start` parameter is left out, it assumes it's default value.
* `::-1`: Return all elements (first two parameters are default values) in reserve order.

Let's try slice notations

In [None]:
numeric_array = np.array(range(1, 11))
numeric_array

In [None]:
# appending more elements to a numpy array
# numeric_array.append(9) ## does not work
np.append(numeric_array, [11,12,13])

In [None]:
# Return everything up to the 4th element
numeric_array[:4]

In [None]:
# Return the 4th up to 8th skipping every other
numeric_array[3:8:2]

In [None]:
# Go in reverse
numeric_array[::-1]

## `ndarray` Slices are Segments, not Copies <a name="subcopy"></a>

This is the last topic for this tutorial.

When we use slice notation on an `ndarray`, it returns a segment of the original - as opposed to a copy. This means that if you change the elements of a slice, you will also change them in the original array.

<div class="alert alert-block alert-danger">
It is very important for experienced Python programmers to note since this is the opposite of what happens when you slice a `list` object.
</div> 

While this may seem like a bad thing at first, particular for those of us who are used to slicing `list` objects, this can actually help us when we want to process little chuncks of data at a time.

Let's demonstrate:

In [None]:
# assignment by reference like lists

na_1 = np.random.randint(10, size=10)  
na_2 = na_1
na_2[0] = "42"
na_1

In [None]:
numeric_array = np.random.randint(100, size=10)  
numeric_array

In [None]:
part_of_it = numeric_array[3:6]
part_of_it

In [None]:
part_of_it[0:2] = [1000, 2000]

# And re-output `numeric_array`
numeric_array

### Numpy Arrays are Homogeneous
You can only have one data type in the array but it works a little different than python arrays

In [None]:
random_list = [1, 2.2, "three"]
random_nparray = np.array(random_list)
random_nparray

In [None]:
floaty_array = np.array([1.1,2.2,3.3])
floaty_array

In [None]:
wordy_array = np.array(['one', 'two', 'three'])
np.append(floaty_array, wordy_array)

### Forcing an Array's Data Type
You also have the option of forcing all NumPy array elements to have a specific data type.  Here I'll take a `list` of `float` objects and coerce them into `int` objects by specifying the **`dtype`** parameter on the `np.array` function.

In [None]:
list_of_floats = [11.2, 57.6, 95.1]
forced_int_array = np.array(list_of_floats, dtype=int)
forced_int_array


In [None]:
# but it won't remember...

np.append(forced_int_array, wordy_array)

<div class="alert alert-block alert-info">
<p>The <code>dtype</code> parameter can take many different values in additional these standard Python types: <code>int, float, bool, complex</code>.</p>

<p>For a complete list, you can visit <a href="https://docs.scipy.org/doc/numpy/user/basics.types.html" target="_blank">NumPy website.</a>
</div>

### Arrays can be Multi-Dimensional
NumPy arrays can be multi-dimensional. We'll start with a two dimensional array, which can be thought of a grid with x and y coordinates.

In [None]:
# Don't focus on how this is being created. Just focus on the output.
two_dimensional_array = np.array([range(i, i + 3) for i in [2, 4, 6]])
two_dimensional_array

#### Accessing Multi-Dimensional Arrays

In [None]:
# Get the 0th indexed row (only specify an "x" coordinate)
print(two_dimensional_array[0])

# Get the 3rd row (index of 2)
print(two_dimensional_array[2])

# Get 3rd indexed row ([2]) and then get the 2nd indexed column ([1]) in that row. 
print(two_dimensional_array[2][1])


In [None]:
print(two_dimensional_array,"\n")

print('\n Go through second ([1]) and third ([2]) array and collect the first element in each')
print("This will return a new array")
print(two_dimensional_array[1:3,0])

print('\nGo through second ([1]) and third ([2]) array and return the first of those arrays')
print("This will return an existing array in the larger 2D array")
print(two_dimensional_array[1:3][0])

In [None]:
two_dimensional_array

In [None]:
# Same as above
two_dimensional_array[:3,1]

In [None]:
# grab a slice of the array (' : ' returns the entire thing) 
# then slice again return a sub array of i element
# this returns a 2D array
two_dimensional_array[:,:2]

In [None]:
two_dimensional_array

In [None]:
# This gets translated in the Python as two_dimensional_array[0:3, 1]
two_dimensional_array[:,1]

In [None]:
# Same as above but first element
two_dimensional_array[:,0]

In [None]:
# grab a slice of the array (again the whole thing) then return a slice of the first two elements
two_dimensional_array[:,:1]

In [None]:
# grab a slice of the array (up to the 3rd) then return a slice of the first two elements
two_dimensional_array[0:2,:2]

**NOTE**: See the distiction of accessing `two_dimensional_array[:,0]` which gets you the 0th indexed column as an array (single dimensional), whereas `two_dimensional_array[:,:1]` gets you the same 0th indexed column but as a two dimensional array. 

## NumPy Array Attributes
Like all objects, NumPy Arrays have a set of attributes. Let's take moment to discuss some that you may find particularly helpful:

1. `ndim`
1. `shape`
1. `size`
1. `dtype`

To get started, let's create 3 arrays.

In [None]:
x1 = np.random.randint(10, size=6)  # One-dimensional array
x2 = np.random.randint(10, size=(3, 4))  # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5))  # Three-dimensional array

print("One Dimensional Array",   x1, sep='\n')
print("Two Dimensional Array",   x2, sep='\n')
print("Three Dimensional Array", x3, sep='\n')

### `ndim` Attribute
Returns the number of dimensions a given array has.

In [None]:
print(x1.ndim, x2.ndim, x3.ndim)

### `shape` Attribute
Returns the number of elements a given dimension has. For our `x2` array we'd get this output:

```python
x2.shape
(3, 4)
```

If you look above, you'll see that the full represenation of our `x2` array was like this:

```
[[6 7 9 1]
 [3 4 1 2]
 [6 7 0 7]]
```

The `(3, 4)` returned by `x2.shape` is referring to the fact that our array had three "rows" and four "columns" in each row.

In [None]:
print(x2)
print(x2.shape)

In [None]:
print(x3.shape)

In [None]:
row,col = x2.shape
print(row)
print(col)

In [None]:
# you must have matching number of values
row,col = x3.shape
print(row)
print(col)

### `size` Attribute
Returns the total number of elements across all dimensions of an array.

So, for our one-dimensional array `size` will be 6.  
For our 2 dimensional array had dimensions of 3 & 4 elements, so `size` will be 12.  
Our 3 dimensional array had dimensions of 3, 4, and 5. Therefore it will have a total size of 60 elements.

In [None]:
print("x1 size:", x1.size)
print("x2 size:", x2.size)
print("x3 size:", x3.size)

### `dtype` Attribute
Previously, we passed a `dtype` argument to the `np.array` constructor function to coerce the data type of the array's elements.

It comes about again here, but this time as an attribute of all NumPy array's. Unsurprisingly, it tells you the data type of the objects held inside the array.

In [None]:
float_array = np.array([1, 2, 3.1])
float_array.dtype

In [None]:
float_array = np.array([1, 2, 3])
float_array.dtype

In [None]:
string_array = np.array("Python is something else".split())
print(string_array)
string_array.dtype

### `itemsize` and `nbytes` Attributes
Other attributes include ``itemsize``, which lists the size (in bytes) of each array element, and ``nbytes``, which lists the total size (in bytes) of the array:

In [None]:
print("array size:", x3.size)
print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")
print("calulated nbytes: ",x3.itemsize * x3.size)

In general, we expect that ``nbytes`` is equal to ``itemsize`` times ``size``.

# NumPy Array Sorting, Merging, Reshaping and Splitting

## ND Football roster data

The Notre Dame football athletes information is a publicly accessible data that provides information about the athletes, their jersey number, name, position, height, weight, class and hometown. 

To get start, we will work with an array of football player weights from your tutorial dataset.

In [None]:
# We will understand each of the packages in the course as we move along, for now let us import them
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')
%matplotlib inline

### Make sure this Jupyter file (and all others)  and the 'data' folder are in the same folder. Also make sure the 'data' folder contains the data file 'nd-football-2017-roster.csv' downloaded from Sakai (or Google drive) 

In [None]:
# At this point you don't have to know the details of following data loading. 
# However, understand that it is loading the weights of all the athletes
nd_player_weights = np.array(pd.read_csv('./data/nd-football-2018-roster.csv')['Weight'])

In [None]:
nd_player_weights

## Activity:

Compute the following details

* Number of players in the dataset
* The average weight of the players
* The weight of the lightest player
* The weight of the heaviest player

In [None]:
# number of players using basic python and/or numpy's .size()


In [None]:
# averge weight with basic python and/or numpy's and/or .size()


In [None]:
# lighest weight using python


In [None]:
# heaviest weight using python


## Array Sorting
A very common use case for real world datasets is that they aren't going to be sorted in the ways we want. This is the case with our football player weights array.  Take a look at the messy order it is in right now.

In [None]:
nd_player_weights

In [None]:
# And look how that would be visualized...You don't have to know the details!
plt.axes().plot(nd_player_weights)

### np.sort
The `np.sort` function returns a sorted copy of an array. Let's go ahead and use that to get a sorted version of our player's weights.

In [None]:
sorted_player_weights = np.sort(nd_player_weights)
print(sorted_player_weights)

In [None]:
axes = plt.axes()
axes.plot(sorted_player_weights) 

In [None]:
axes = plt.axes()
axes.plot(nd_player_weights) 

<div class="alert alert-block alert-info">
<p>You can also sort an array by calling the `sort` method on the array itself. For example, `nd_player_weights.sort()` would also get you a sorted list.</p>
<p>There is one important difference however. If you call `sort` on the array itself like this, it sorts the array **"in-place"**. This means that the original array elements are rearranged. Using `np.sort` on the other hand returns a new sorted copy of the original array.
</p>
</div> 

In [None]:
nd_player_weights

In [None]:
nd_player_weights.sort()
print(nd_player_weights)

In [None]:
nd_player_weights

## Activity: 

Print the weights of five lightest and five heaviest players

In [None]:
# 5 lightest


In [None]:
# 5 heaviest


### Activity on heights

* Tallest Height
* Shortest Height
* 3 Tallest Heights
* 3 Shortest Heights

In [None]:
nd_player_heights = np.array(pd.read_csv('./data/nd-football-2018-roster.csv')['Height'])
nd_player_heights

In [None]:
# tallest


In [None]:
# shortest


In [None]:
# sort numbers


In [None]:
# 3 tallest


In [None]:
# 3 shortest


## Array Merging
Now that we've discussed how to split arrays apart, let's explore how we can piece multiple arrays together with the `np.concatenate`, `np.hstack`, and `np.vstack` functions.

### `np.concatenate`


For some reason, the weight information for the football team has been send to you in two different emails. Now, you've got 2 separate arrays that you need to put together in order to generate the correct statistics for the team.

In [None]:
# Here are your two arrays
# One has 25 data points and the other has the remaining 91
nd_player_weights_1 = nd_player_weights[:25]
nd_player_weights_2 = nd_player_weights[25:]
print(len(nd_player_weights_1), len(nd_player_weights_2))

In [None]:
# It is super easy to concatenate them together with `np.concatenate`
merged_array = np.concatenate([nd_player_weights_1, nd_player_weights_2])
print(len(merged_array))

## Array Reshaping

If you recall, every array has a `shape` attribute which conveys the dimensions of the array.

In [None]:
# Reshape our 116 records into 29 rows with 4 columns
nd_player_weights_grid = nd_player_weights.reshape(29, 4)
nd_player_weights_grid

In [None]:
# Or into a single row of 116 elements
nd_player_weights_grid.reshape(116)

## Array Splitting

Now let's turn our attention to how we can split arrays into smaller chuncks using the `np.split`, `np.array_split`, `np.hsplit`, and `np.vsplit` functions.

### `np.split`

In [None]:
# You can split a single dimensional array with `np.split` 
# by specifying the index values on which to split the array.

# Here we will use it to split the array into 5 arrays. 
np.split(nd_player_weights, indices_or_sections=[10, 30, 60, 70, 80])

<div class="alert alert-block alert-info">
<p>It is important to note that the index values you provide will become the first element of the new arrays rather than the last element of the previous split.</p> 
<p>So, in this case, the elements at indexes 10, 30, 60, 70, 80 become the first elements of the new arrays.</p>
</div> 

### `np.array_split`

In [None]:
# If you want to split an array into equal parts 
# (or a close to equal as possible), use `np.array_split`

# Here we will divided our team into 6 parts.
np.array_split(nd_player_weights, 6)

### MORE INFORMATION: Slicing with Multi-Dimentional Arrays

In [None]:
two_dim_array = np.random.randint(10, size=(5, 3))  
two_dim_array

In [None]:
# Slice the first 4 rows and 1st column of each row
two_dim_array[:4, :1]

In [None]:
# Slice the first 4 rows and then select the last one
print(two_dim_array[:4][-1])
print(two_dim_array[3])


In [None]:
# Slice the first 4 rows and collect the last element
two_dim_array[:4,-1]

<div class="alert alert-block alert-danger">
<h5>Warning</h5>
<p>When we were selecting individual elements of arrays, we demonstrated that there were two ways of getting to a specific element of a multi-dimensional array.
</p>
<ul>
    <li><code>two_dim_array[1, 1]</code></li>
    <li><code>two_dim_array[1][1]</code></li>
</ul>

You can <strong>not</strong> use the second form when slicing an array.
<code>two_dim_array[:4, :1]</code> will give very different results than <code>two_dim_array[:4][:1]</code>

</div> 