<div class="alert alert-block alert-info">
<h2>NumPy Arrays</h2>
<ul>
<li>Both lists and dictionaries have their limitations, e.g. they are not flexible in statistical modelling and data manipulation. Yet two advanced data structures called <b>Arrays</b> and <b>Data Frames</b> are ideal for arithmetic data manipulation.</li>
<li> Yet Python does not have built-in support for arrays and data frames. For that, we'll need to work with the <b>NumPy</b> and <b>pandas</b> modules which need to be installed first on your machines and then imported into the working environment of Jupyter Notebook or any other coding environment.</li>
<li>Arrays and lists share some similarities, but the difference between the two is key to what you can do to data.</li>
<li>On the similarity side, arrays and lists can both 1) store data; 2) store any type of data (numbers, characters etc); 3) be indexed and iterated through.</li>
<li>However, most of arithmetic manipulation that can be performed on NymPy arrays cannot be performed on Python lists.</li>
</ul>
</div>

In [1]:
# Let's create a new list with numerical values and store it in the variable called myList.

myList = [4, 8, 12]
myList

[4, 8, 12]

In [2]:
# But see what happens when you try to perform some math, e.g. divide each value by 2

myList = myList/2
print(myList)

TypeError: unsupported operand type(s) for /: 'list' and 'int'

In [3]:
# the way around the problem is to loop trough each item in the list and divide each item by 2

for i in myList:
    print(i/2)

# But then we also need to assemble each divided item back into a list

2.0
4.0
6.0


In [4]:
# to address the above problem, write a loop but also start with an epty list to assemble new values

myList2 = []

for i in myList:
    x = int(i/2) # the division will result in floats, hence pass the division result to the int() function
    myList2.append(x)
    
myList2

[2, 4, 6]

In [5]:
# That's right, the code throws an error!
# Unlike lists, NumPy arrays have to be declared by using the keyword 'array'. 
# Array syntax requires that you put lists in parenthesis.
# Let's take the values of myList and create an array, but first import the numpy package to use the functionality of arrays. 
# Use the alias 'np' to make it quicker to declare the methods specific to the numpy module.

import numpy as np

myArray = np.array([4, 8, 12])

myArray

array([ 4,  8, 12])

In [6]:
# With the type() function check what data structure is myArray

type(myArray)

# Numpy creates and works in nd arrays

numpy.ndarray

In [7]:
# Once we convert a Python list into a Numpy array, the division operation is faster and code-efficient

myArray2 = myArray/2

myArray2

array([2., 4., 6.])

In [8]:
# Like lists, arrays can be iterated 

for i in myArray2:
    print(i)

2.0
4.0
6.0


In [9]:
# Arrays may have multiple dimensions.

# myArray above is an example of 1-dimensional array.

# We can use nested lists as 2-dimentional or n-dimentional arrays, for example:

ListAsArray = [[4, 8, 12, 16, 20, 2, 50], [5, 7, 30, 4, 1, 15, 100]]
ListAsArray

# Note that this is a tuple in terms of data type

[[4, 8, 12, 16, 20, 2, 50], [5, 7, 30, 4, 1, 15, 100]]

<div class="alert alert-block alert-info">
<p>This is the view of the 2-d array created in the code cell above. Horizontally, we have two rows. Vertically, we have seven columns.We can query and manipulate data values in this 2-d array by rows and by columns.</p><br>
<img src="array.JPG">
</div>

In [10]:
# We can calculate the mean of values contained in the 2-d array with numpy's method mean().

np.mean(ListAsArray)

19.571428571428573

In [11]:
# We can search for the smallest value in the array with the numpy's min() method.

np.min(ListAsArray)

1

In [12]:
# We can identify the biggest value, no matter how big is our data.

np.max(ListAsArray)

100

In [13]:
# If the axis argument is set to 1, it will return values by row.
# Thus we can pick up the lowest values in each row of our 2-dimensional array.

np.min(ListAsArray, axis=1)

array([2, 1])

<img src="arrayTwo.JPG">

In [14]:
# If the axis is set to 0, it will return values by column.
# The code below will return the smallest values found in each column.

np.min(ListAsArray, axis=0)

array([ 4,  7, 12,  4,  1,  2, 50])

<img src="arrayThree.JPG">

<div class="alert alert-block alert-info">
<p>Although we've seen above that we can use Python nested lists to serve the purpose of multi-dimensional arrays, Python lists are slow to process.</p>

<p>NumPy aims to provide an array object that is up to 50x faster that traditional Python lists.</p>

In [15]:
# Let's create a 2-dimensional Numpy array using its array() method.

NpArray = np.array([[4, 8, 12, 16, 20, 2, 50], [5, 7, 30, 4, 1, 15, 100]])
NpArray

array([[  4,   8,  12,  16,  20,   2,  50],
       [  5,   7,  30,   4,   1,  15, 100]])

In [16]:
# We can concatenate or, in other words, put together several Numpy arrays
# the two arrays we want to combine must have the same number of dimensions

NpArray2 = np.array([[220, 10, 300, 160, 56, 2, 135], [15, 37, 300, 42, 41, 415, 12]])

NpArray3 = np.concatenate((NpArray,NpArray2))

# The resulting array now has 4 dimensions

NpArray3

array([[  4,   8,  12,  16,  20,   2,  50],
       [  5,   7,  30,   4,   1,  15, 100],
       [220,  10, 300, 160,  56,   2, 135],
       [ 15,  37, 300,  42,  41, 415,  12]])

In [17]:
# We can reshape arrays by changing their dimensions with the Numpy's reshape() method.
# The above array is of size 28 so we can reshape it into 2 dimensions, each with 14 elements

NpArray4 = NpArray3.reshape(2, 14)


NpArray4

array([[  4,   8,  12,  16,  20,   2,  50,   5,   7,  30,   4,   1,  15,
        100],
       [220,  10, 300, 160,  56,   2, 135,  15,  37, 300,  42,  41, 415,
         12]])

In [18]:
# We can reshape a multidimensional array into a 1-d array
# The conversion of n-dimensional arrays into a 1-d array is called flatenning

NpArray5 = NpArray4.reshape(-1)
NpArray5

array([  4,   8,  12,  16,  20,   2,  50,   5,   7,  30,   4,   1,  15,
       100, 220,  10, 300, 160,  56,   2, 135,  15,  37, 300,  42,  41,
       415,  12])

In [19]:
# We can search and filter the values of arrays by using loops and conditionals, as we did with lists
# But with Numpy arrays, we can achieve it in a more straighforward and faster way

# Let's filter the flattened NpArray5 to return only values higher than 100

Filter = NpArray5 > 100

filteredArr = NpArray5[Filter]

filteredArr

# With lists, this fast way won't be possible

array([220, 300, 160, 135, 300, 415])

In [20]:
# And as above with lists that we used for the purpose of arrays, we can model statistically Numpy arrays

# The Numpy's min() method will return the smallest value in a Numpy array.

np.min(filteredArr)

135