### What is the difference between a NumPy array and a list?

1. What is the difference between a NumPy array and a list?

    We'll cover some common questions about scientific computing in Python. We'll start with NumPy arrays and compare them to Python lists.

2. NumPy array

    It is a special data structure from the NumPy module representing a fundamental package for scientific computing with Python. The easiest way to create an array is to pass a list of values to the array() constructor. At the first glance, there isn't a big difference to Python's native lists.

3. Similarities between an array and a list

    Both data structures are Iterables. In both cases we can use indexing to access elements. 

    Moreover, NumPy arrays and Python lists can be modified similarly. What's so special about NumPy arrays then?

    Compared to lists, NumPy arrays are optimized for high efficiency computations. How? First of all, NumPy arrays only store data of the same type.

4. dtype property
    
    When we have an array, we can retrieve the data type it stores by accessing its .dtype property. In this case, our array stores integers represented by 64 bits.

8. Changing the data type of an element
    
    Compared to lists, if we try to modify an element with a different data type, we'll get ValueError.

9. Specifying the data type explicitly
    
    Actually, we can explicitly specify the data type when we create an array using the dtype keyword argument.

    Independently from the list we pass, we can specify other data types like a string, for example. The output we see here means a one-character string.

        num_array = np.array([1,2,3,4,5], dtype=np.type('str'))
        num_array.dtype
        > dtype('<U1')

11. Object as a data type

    If we want an array to behave like a list with respect to modification, we can use the dtype equal to 'O' which stands for Object. In this case, we can mix data types. However, we also limit the set of operations we can apply to such an array.

        num_array = np.array([1,2,3,4,5], dtype = np.dtype('O'))
    
12. Difference between an array and a list - Accessing items
    
    The second property of NumPy arrays is that they offer a special way to access their elements.
    
    Let's assume we have this two-dimensional list. As an array it can be defined like this. To retrieve a single item, say the 8 in this case, both lists and arrays provide similar options.

        array2d = np.array([
            [1,2,3,4,5],
            [6,7,8,9,10],
            [11,12,13,14,15]
        ])

14. Accessing items

    With arrays though, it's not necessary to specify additional square brackets.

        # Retrieve 8
        array2d[1,2]
        > 8

    But how do we retrieve an entire data block?

    The solution for a list can be tricky.

    An array provides a more elegant and efficient way via slicing.

        array2d[0:2,1:4]
        > array([2,3,4],
                [7,8,9])

18. Difference between an array and a list

    Third, operations work differently on arrays. For simplicity, we'll focus only on numeric arrays.

19. Operations +, -, *, / with lists

    Let's recall that, given two lists, most of the simple mathematical operations will result in TypeError. Addition is an exception; it concatenates given lists.

    In case of NumPy arrays, operations are performed element-wise. As a result, we get a new array.

        num_array1 = np.array([1,2,3])
        num_array2 = np.array([10,20,30])
        num_array1 + num_array2
        > array([11,22,33])
        num_array1*num_array2
        > array([10,40,90])

    The same applies to multidimensional arrays.

22. Conditional operations

    Conditional operations are especially useful. Applying them on an array returns a new array of booleans indicating whether the condition is satisfied or not. The cool part is that we can use these conditions to filter our arrays. This operation takes much more effort with lists.

        num_array = np.array([-5,-4,-3,0,3,4,5])
        num_array[num_array <0]
        > array([-5,-4,-3])

23. Broadcasting

    Another important feature of arrays is broadcasting. It describes how operations work on arrays of different dimensions. For example, what happens if we multiply this array by 3? We certainly know the answer for lists: they get extended. In case of arrays, each element is multiplied by 3 resulting in a new array. The same applies to other operations. We say that 3 broadcasts itself to all the array elements meaning that 3 operates on each element separately.
        
        num_list = [1,2,3]
        num_list * 3
        > [1,2,3,1,2,3,1,2,3]
        
        num_array = np.array([1,2,3])
        num_array * 3
        > array([3,6,9])


24. Broadcasting with multidimensional arrays

    We can do broadcasting with multidimensional arrays. For this example, the one-dimensional array broadcasts itself to all three rows of the two-dimensional array. Broadcasting is applied to rows by default. If we want to broadcast to columns, we need to modify our one-dimensional array to be a column vector. And here's the result.

        array1d = np.array([[1], [2], [3]])
        
26. Let's practice

In [4]:
import numpy as np

In [5]:
# What is the type of the following array?
np.array([1,(2,3),4]).dtype

ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

#### Accessing subarrays

Let's access elements in NumPy arrays! Your task is to convert a square two-dimensional array square of size size to a list created by following a spiral pattern:

Traversing the matrix in spiral way

Rather than simply accessing certain slices, you will define a more general solution using a for loop (the solution should work for all the square two-dimensional arrays of odd size).

The module numpy is already imported as np.

You will need the reversed() function, which reverses an Iterable.

In [7]:
square = np.array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [10]:
spiral = []
size = len(square)

for i in range(0, size):
    # Convert each part marked by a red arrow to a list
    spiral += list(square[i, i:size-i])
    # Convert each part marked by a green arrow to a list
    spiral += list(square[i+1:size-i, size-i-1])
    # Convert each part marked by a blue arrow to a list
    spiral += list(reversed(square[size-i-1, i:size-i-1]))
    # Convert each part marked by a magenta arrow to a list
    spiral += list(reversed(square[i+1:size-i-1, i]))
        
print(spiral)

[1, 2, 3, 4, 5, 10, 15, 20, 25, 24, 23, 22, 21, 16, 11, 6, 7, 8, 9, 14, 19, 18, 17, 12, 13]


#### Operations with NumPy arrays

The following blocks of code create new lists given input lists input_list1, input_list2, input_list3 (you can check their values in the console). If you had analogous NumPy arrays with the same values input_array1, input_array2, input_array3 (you can check their values in the console), how would you create similar output as NumPy arrays using the knowledge on broadcasting, accessing element in NumPy arrays, and performing element-wise operations?

Block 1

list(map(lambda x: [5*i for i in x], input_list1))

Block 2

list(filter(lambda x: x % 2 == 0, input_list2))

Block 3

[[i*i for i in j] for j in input_list3]

In [13]:
input_array1 = np.array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
input_array2 = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
input_array3 = np.array([[1, 2],
       [3, 4],
       [5, 6]])

input_list1 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
input_list2 = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
input_list3 = [[1, 2], [3, 4], [5, 6]]

In [14]:
# Substitute the code in the block 1 given the input_array1
output_array1 = input_array1 * 5 
print(list(map(lambda x: [5*i for i in x], input_list1)))
print(output_array1)

[[5, 10, 15], [20, 25, 30], [35, 40, 45]]
[[ 5 10 15]
 [20 25 30]
 [35 40 45]]


In [15]:
# Substitute the code in the block 2 given the input_array2
output_array2 = input_array2[ input_array2 % 2 == 0]
print(list(filter(lambda x: x % 2 == 0, input_list2)))
print(output_array2)

[0, 2, 4, 6, 8]
[0 2 4 6 8]


In [16]:
# Substitute the code in the block 3 given the input_array3
output_array3 = input_array3 * input_array3
print([[i*i for i in j] for j in input_list3])
print(output_array3)

[[1, 4], [9, 16], [25, 36]]
[[ 1  4]
 [ 9 16]
 [25 36]]


### How to use the .apply() method on a DataFrame?

1. How to use the .apply() method on a DataFrame?

    Let's move to DataFrames! We'll cover one of the most frequently used methods, .apply().

2. Dataset

    First, let's pick a dataset. We'll work with data on 100 students and their performance on different subjects. Each performance score varies between 0 and 100.

3. Default .apply()

Let's use the .apply() method. It requires one argument - a function that, by default, is applied on each column of a DataFrame. However, the output of .apply() may differ. For example, applying the sqrt() function results in a DataFrame with square roots of original values.

4. Default .apply()
00:45 - 00:53
However, using the mean() function returns a Series. Why?

5. Default .apply()
00:53 - 01:10
The columns we apply the function to are passed as pandas Series. When we use sqrt(), we simply modify each value in a column and return an object of the same size. When we use mean(), we summarize the Series with a single value.

6. Default .apply(): own functions
01:10 - 01:24
For example, let's define a function halving our scores. We get a modified DataFrame because passing columns to our defined function results in an object of the same size.

7. Default .apply(): own functions
01:24 - 01:36
On the contrary, if we return only one value - for example, a perfect score - we summarize each column by a single value. Therefore, we get pandas Series.

8. Lambda expressions
01:36 - 01:39
Of course, our functions can be substituted with lambda expressions!

9. Lambda expressions
01:39 - 01:46
It will simplify our code with no changes in our output.

10. Additional arguments: axis
01:46 - 01:55
Let's have a look at additional arguments we can pass to the .apply() method. We'll start with the axis argument.

11. Additional arguments: axis
01:55 - 01:57
which can be either 0, which is default,

12. Additional arguments: axis
01:57 - 01:58
or 1.

13. Additional arguments: axis
01:58 - 02:13
0 means that the function is applied over the columns of a DataFrame, 1 - over the rows. Specifying this argument is useful for functions resulting in a single value like mean().

14. Additional arguments: axis
02:13 - 02:19
Zero implies no difference from the default behavior: we get the mean of each column.

15. Additional arguments: axis
02:19 - 02:25
1 implies averaging values in each row instead.

16. Additional arguments: result_type
02:25 - 03:07
The next argument we'll discuss is result_type. We'll consider only some of the values it can take. The first one is expand. To understand it, let's define a function that returns a list with the minimum and the maximum value of the input. When we apply the function to the DataFrame, we get a pandas Series with the corresponding summary for each column. Notice that the list returned by the span() function is considered as a single value summarizing our input, despite the fact that its size is 2. Therefore, the .apply() method results in a pandas Series.

17. Additional arguments: result_type
03:07 - 03:14
Specifying the keyword argument unwraps our list resulting in the following DataFrame.

18. Additional arguments: result_type
03:14 - 03:22
Adding the axis argument and setting it to 1 applies the span() function row-wise and unfolds the list for each row.

19. Additional arguments: result_type
03:22 - 03:32
The second useful value for result_type is broadcast. To understand it, let's consider applying the mean() function again.

20. Additional arguments: result_type
03:32 - 03:42
Specifying broadcasting results in a DataFrame of the original size where each column is filled with the corresponding output from the mean() function.

21. More than one argument in a function
03:42 - 03:48
So far, our functions we used .apply() with had only one argument.

22. More than one argument in a function
03:48 - 04:06
But what if we have more arguments including keyword arguments? For example, let's have a function that by default checks if the calculated mean is within a certain interval. If the value of the keyword argument changes to False, then we check an opposite scenario.

23. Applying the function
04:06 - 04:13
Let's use .apply() with our function. We get TypeError because we didn't specify its arguments!

24. Additional arguments: args
04:13 - 04:44
They can be specified in the args argument of the .apply() method. It's a list containing positional arguments for our function. Let's try it now. It works! Notice, the values in the list should have the same order as the function arguments. We didn't specify the 'inside' keyword argument, so the function executes with its default value. What if we want to pass another value?

25. Additional arguments: args
04:44 - 04:53
We can simply insert it afterwards. As expected, setting it to False produces an inverted result.

26. Let's practice!