# Getting started with `numpy`
Prepared by: Gregory J. Bott, Ph.D. 

This notebook provides an introduction to the `numpy` package. The content borrows heavily from multiple sources:
* (one of our textbooks) The book *Data Science Handbook*, which was written by Jake VanderPlas available at https://jakevdp.github.io/PythonDataScienceHandbook/ (accessed 12/17/2019).
* Content from Dr. Nick Freeman (https://github.com/nkfreeman/Python_Tutorials)
* SciPy 2018 Conference session: Intro to Numerical Computing with NumPy by Alex Chabot-Leclerc (https://www.youtube.com/watch?v=V0D2mhVt7NE&list=PL2fyLI4jtOlCvb6QchwsZ0xm51PP5G0wf&index=17&t=147s)

NumPy (short for <em>**Num**erical **Py**thon</em>) provides efficient storage and manipulation of numerical arrays, which makes NumPy an integral package for scientific computing. Its origins can be traced back to two other specialized packages for calculating arrays: Numeric and Numarray. The Numpy library, first released in 2006 (1.0 version) is the unification of these two libraries. It is highly efficient handling multidimensional and large arrays and offers a large number of functions to manipulate arrays and perform mathematical calculations.

## The `numpy` package
NumPy is a foundational Python library written using Python and C that provides fast, efficient support for multi-dimensional arrays, mathemetical functions, slicing and selection capabilities, and broadcasting functions. It is available as a standalone or part of the Anaconda distribution.

In [None]:
%conda list | grep numpy

## Installing `numpy`
If numpy is not installed use either of the following to install the library:
```python
conda install numpy
pip install numpy
```
Then import the library. Although you can use a different variable, by convention, NumPy is imported as 'np'.
```python
import numpy as np
```

In [None]:
import numpy as np
np.version.version

## Importing `numpy`

If numpy is not installed use either of the following to install the library:
```python
conda install numpy
pip install numpy
```

Users can import available packages and modules using Python's `import` statement. Two forms of import expressions are commonly used.
1. The first common import expression takes the form **import mypackage as mp**. This statement imports a package named *mypackage*, and assigns it to the alias *mp*. Suppose that *mypackage* contains the definition for a function named *myfunction*. If this were true, we would call *myfunction* using the syntax `mp.myfunction(*args)`, where `*args` is a placeholder for any function arguments.<br>

2. The second common import expression takes the form **from mypackage import mysubmodule**. This statement imports a specific submodule from a package named *mypackage*. Since there is no alias, this type of import will bring in the functions specified in the submodule as they are writtin. For example, if the submodule *mysubmodule* includes a function called *myfunction*, we would call *myfunction* using the syntax `myfunction(*args)`, where `*args` is a placeholder for any function arguments.

<div class="alert alert-block alert-danger">
    <b>Name conflicts with <i>from - import</i> approach:</b> When using the <i>from - import</i> approach sepcified in bullet 2, it is important to make sure that method names in package or module you are importing do not conflict with names defined in the importing code. For example, if we import a submodule named *mysubmodule* that includes a function called *myfunction*, but we also have a function named *myfunction* in the importing code, there will be a naming conflict.
</div>

The following code block uses the *import - as* approach to import NumPy. The alias *np* is a standard convention.

In [None]:
import numpy as np

# Why use `numpy`?
Before we look at specific details of the `numpy` package, it is important to understand its motivation. `numpy` was developed to support scientific computations via the efficient implementation of a multi-dimensional array. In addition to an efficient array implementation, NumPy also includes functions for performing operations on NumPy arrays that are optimized for computational efficeincy. The following code block illustrate the substantial increase in efficiency that `numpy` provides in comparison to a standard Python list. Specifically, the example considers the task of adding two vectors of a specified size using both standard Python lists and `numpy` arrays. The time of the addition, and the size of the resulting objects are reported for comparison purposes.

The core object in the NumPy library is the ndarray (n-dimensional array) object. In this section we'll examine data types, how to create arrays, learn about the basic attributes of arrays, how to access elements within an array, and then how to slice, reshape, concatenate, and split arrays. 

> <div class="alert alert-block alert-info">
    <b>The <i>del()</i> function:</b> The <i>del()</i> function is a Python method that deletes a created object from memory. For example, <i>del(my_var)</i> deletes a Python variable named <i>my_var</i>, freeing any computer memory that was being used to store the variable. The <i>del(my_var)</i> can take multiple arguments. For example, <i>del(my_var1, my_var2)</i> deletes the Python variables named <i>my_var1</i> and <i>my_var2</i>. If you pass an argument to <i>del(my_var)</i> that does not correspond to an existing Python object, an error will be raised.
</div>

>The **np.arange()** function is an array creation routine that creates an instance of an ndarray with evenly spaced values. It takes the form:
```Python
numpy.arange([start, ]stop, [step, ], dtype=None)
```

In [None]:
import time
import sys
import matplotlib.pyplot as plt
 
# 10 Million
SIZE = 10000000

list1 = range(SIZE)
list2 = range(SIZE)

start = time.time()

#Loop through every item in the array and 'zip' them together
result = [(x+y) for x,y in zip(list1,list2)]
list_time = (time.time() - start)*1000
#print("Using Python lists, the addition took",(time.time() - start)*1000,"milliseconds.")
print(f"Using Python lists, the addition took {list_time:.0f} milliseconds.")
print("The size of the result object based on Python lists is",sys.getsizeof(result),"bytes.\n")

del(list1, list2, result)

nparray1 = np.arange(SIZE)
nparray2 = np.arange(SIZE)
start = time.time()

# No need to loop through every item using NumPy. Simply use the '+' operator.
result = nparray1 + nparray2
array_time = (time.time() - start)*1000
print(f"Using NumPy arrays, the addition took {array_time:.0f} milliseconds.")
#print("Using NumPy arrays, the addition took",(time.time() - start)*1000,"milliseconds.")
print("The size of the result object based on NumPy arrays is",sys.getsizeof(result),"bytes.\n")

del(nparray1, nparray2, result)

objects = ('ndarray', 'list')
y_pos = np.arange(len(objects))
performance = [array_time, list_time]

plt.barh(y_pos, performance, align='center', alpha=.9, height=.6)
plt.yticks(y_pos, objects)
plt.xlabel('milliseconds')
plt.title('Speed (shorter is faster)')

plt.show()

In addition to demonstrating the substantial performance gains offered by NumPy, the previous code block also illsutrates some of the subtle differences of working with Python lists versus NumPy arrays.

- The `time.time()` function, from the `time` module, returns the current system time. Saving the value of the current time in a variable `start` and then computing the difference `time.time() - start` returns the seconds elapsing between the two calls to `time.time()` in seconds. Multiplying by 1000 converts the elapsed time to milliseconds. Another option capture the elapsed time for a line of code within an IPython notebook is to use the magic command `%timeit`.


- When working with Python lists, the `range()` function returns a sequence of integers starting at zero and ending at the argument passed to `range()`. In our example, we pass a variable `SIZE` to the `range()` function. Thus, the sequence stored in the list is 0, 1, ..., `SIZE`-2, `SIZE`-1.


- When working with NumPy arrays, the `np.arange()` function returns a sequence of integers starting at zero and ending at the argument passed to `np.arange()`. In our example, we pass a variable `SIZE` to the `np.arange()` function. Thus, the sequence stored in the NumPy array is 0, 1, ..., `SIZE`-2, `SIZE`-1.


- The `sys.getsizeof()` function, from the `sys` library, returns the size of an object in bytes.


- When working with Python lists, the `zip()` function essentially combines two or more list objects (like zipping up a jacket) and allows element-wise operations to be performed.


- When working with NumPy arrays, there is no need to *zip* arrays. Instead, element-wise operations are performed using standard mathematical operators.

### Jupyter's timeit Magic Command
<div class="alert alert-block alert-info">
    <b>Jupyter's <i>timeit</i> magic command:</b> Another approach for timing operations that is <b>specific to Jupyter notebooks</b> is the <i>timeit</i> magic command. This command can be used with syntax that follows the form <b>%timeit [-n &lt;N&gt; -r &lt;R&gt; [-t|-c] -q -p &lt;P&gt; -o]</b>, where

<li> -n &lt;N&gt;: specifies to execute the given statement &lt;N&gt; times in a loop. If <N> is not provided, <N> is determined so as to get sufficient accuracy.</li>

<li> -r &lt;R&gt;: specifies the number of repeats &lt;R&gt;, each consisting of &lt;N&gt; loops, and take the best result. Default: 7</li>

<li> -t: specifies to use time.time to measure the time, which is the default on Unix. This function measures wall time, i.e., elapsed real time.</li>

<li> -c: specifies to use time.clock to measure the time, which is the default on Windows and measures wall time. On Unix, resource.getrusage is used instead and returns the CPU user time.</li>

<li> -p &lt;P&gt;: specifies to use a precision of &lt;P&gt; digits to display the timing result. Default: 3</li>

<li> -q: specifies quiet calculation, where no results are printed.
</div>

The following block performs a the timing check using the `timeit` magic command, with 5 repeats of 10 executions. Note that by performing the calcuations multiple times, the `timeit` magic is able to provide estimates of the variability in computational time.

In [None]:
import matplotlib.pyplot as plt; plt.rcdefaults()
SIZE = 100000

list1 = range(SIZE)
list2 = range(SIZE)

print("Time statistics for Python lists:")
%timeit -n 10 -r 5 [(x+y) for x,y in zip(list1,list2)]

del(list1,list2)

nparray1 = np.arange(SIZE)
nparray2 = np.arange(SIZE)

print("\nTime statistics for NumPy arrays (micro seconds = milli/1000):")
%timeit -n 10 -r 5 nparray1 + nparray2

del(nparray1, nparray2)

## How to find useful `numpy` objects
`numpy` contains hundreds objects providing powerful options for the data scientiest. Use the lookfor() function to search by docstring keywords (all words must be present, but can be in any order). For example, if your solution requires taking the standard deviation, use the lookfor() function to display all `numpy` objects that have the key words "standard deviation" in their docstring.

In [None]:
# Using the lookfor() function to find functions related to standard deviation (if error, run again)
import numpy as np
np.lookfor("standard deviation")

## Getting help
Remember that help is available via the help() function. Also remember that you must provide the proper context.

In [None]:
# ERROR - no context
help(linspace)

In [None]:
help(np.linspace)

# `numpy` data types
In contrast to Python lists where items of different types are acceptable members, each value in a `numpy` array must be the same type. Mixing types (e.g., a float with integers) will result in an upcast of the array (e.g., to float64 instead of an int).

<table style="border-collapse: collapse;border-top: 0.5pt solid ; border-bottom: 0.5pt solid ; border-left: 0.5pt solid ; border-right: 0.5pt solid ; "><colgroup><col class="tcol1 align-left"><col class="tcol2 align-left"></colgroup><thead><tr><th style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Data Type</p></th><th style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Description</p></th></tr></thead><tbody><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">bool_</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Boolean (true or false) stored as a byte</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">int_</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Default integer type (same as C long; normally either <span class="EmphasisFontCategoryNonProportional ">int64</span> or <span class="EmphasisFontCategoryNonProportional ">int32</span>)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">intc</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Identical to C int (normally <span class="EmphasisFontCategoryNonProportional ">int32</span> or <span class="EmphasisFontCategoryNonProportional ">int64</span>)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">intp</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Integer used for indexing (same as C <span class="EmphasisFontCategoryNonProportional ">size_t</span>; normally either <span class="EmphasisFontCategoryNonProportional ">int32</span> or <span class="EmphasisFontCategoryNonProportional ">int64</span>)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">int8</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Byte (–128 to 127)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">int16</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Integer (–32768 to 32767)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">int32</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Integer (–2147483648 to 2147483647)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">int64</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Integer (–9223372036854775808 to 9223372036854775807)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">uint8</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Unsigned integer (0 to 255)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">uint16</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Unsigned integer (0 to 65535)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">uint32</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Unsigned integer (0 to 4294967295)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">uint64</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Unsigned integer (0 to 18446744073709551615)</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">float_</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Shorthand for <span class="EmphasisFontCategoryNonProportional ">float64</span></p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">float16</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Half precision float: sign bit, 5-bit exponent, 10-bit mantissa</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">float32</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Single precision float: sign bit, 8-bit exponent, 23-bit mantissa</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">float64</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Double precision float: sign bit, 11-bit exponent, 52-bit mantissa</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">complex_</span>
                        </p></td><td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Shorthand for complex128</p></td></tr><tr><td style="border-right: 0.5pt solid ; border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">complex64</span>
                        </p></td>
    <td style="border-bottom: 0.5pt solid ; text-align: left;"><p class="SimplePara">Complex number, represented by two 32-bit floats (real and imaginary components)</p></td></tr><tr><td style="border-right: 0.5pt solid ; text-align: left;"><p class="SimplePara">
                          <span class="EmphasisFontCategoryNonProportional ">complex128</span>
                        </p></td><td style="text-align: left;"><p class="SimplePara">Complex number, represented by two 64-bit floats (real and imaginary components)

# Creating `numpy` arrays
There are many different ways to create `numpy` arrays.

## Using np.array

In [None]:
import numpy as np
# Using a list to create a one-dimensional (1-d) array
array_1 = np.array([5, 10, 15, 20])
print(array_1)

In [None]:
# The type of array is a NumPy N-Dimensional array
type(array_1)

In [None]:
# Display data type (varies by system)
print(array_1.dtype)

In [None]:
# If any member is a float, the array will be upcast (float64, system-dependent)
array_3 = np.array([1, 2.0, 3, 4])
print(array_3.dtype)
print(array_3)

In [None]:
array_2 = np.array([1, 3, 7, 9])

print("Addition:")
print(array_1, "+", array_2)
print(array_1 + array_2)

In [None]:
# Works with all other operators (multiplication, division, exponentiation, etc.)
print(array_1, "*", array_2)
print(array_1 * array_2)

In [None]:
# Display the length of the array 
#   along each dimension
print(array_1.ndim)
print(array_1.shape)

## Using np.arange()
The **np.arange()** function is a commonly-used array creation routine that creates an instance of an ndarray with evenly spaced values. It takes the form:
```Python
numpy.arange([start, ]stop, [step, ], dtype=None)
```
1. **start** is the first value in the array (optional)
2. **stop** is the last value in the array (required)
3. **step** defines the incremental difference between each consecutive number in the range (optional, default is 1)
4. **dytpe** specifies the output array data type

In [None]:
# Create an array: start = 0 (default), stop = 10, 
#   step = 1 (default)
my_array = np.arange(10)
print(my_array)

In [None]:
import pprint as pp
# Create an array of even numbers 2 to 100
my_array = np.arange(2, 101, 2)
my_array

## Creating 2-d arrays
The basic unit of memory in `numpy` is a Row. Shape is referenced like this: (row, column). So a 2 x 4 array has 2 rows and four columns.

> Note the additional set of square brackets. The array function expects a single iterator as an argument. Passing multiple objects will raise an exception.

Although arrays are often created via reading from disk or other function, they can also be created manually using a list of lists.

In [None]:
# ERROR: Why?
array_2 = np.array([1,2,3,4],
                   [5,6,7,8])
print(array_2)
print(f"The type of this array is {type(array_2)} and its number of dimensions is {array_2.ndim}.")
print(f"The shape of this array is {array_2.shape}")

In [None]:
# Using a list of lists to creating a two-dimensional (2-d) array
array_2 = np.array([[1,2,3,4],
                    [5,6,7,8]])
print(array_2)
print(f"The type of this array is {type(array_2)} and its number of dimensions is {array_2.ndim}.")
print(f"The shape of this array is {array_2.shape}")

In [None]:
# Use arange() to create a 2D (10 x 6) array 0 to 295
array_50 = np.arange(0,300,5).reshape(10,6)
print(array_50)

## Creating arrays using functions


In [None]:
import numpy as np

### Using the zeroes Function

In [None]:
# Creating a two-dimensional (2-d) array of size 4 x 5 (rows X columns)
# that is filled with zeros
np.zeros((4,5))

### Using the ones Function

In [None]:
# Creating a two-dimensional (2-d) array of size 3 x 4 (rows X columns)
# that is filled with ones
np.ones((3,4))

### Identity Matrix
When the identity matrix is the product of two square matrices, the two matrices are said to be the inverse of each other.

In [None]:
# Creating a two-dimensional 5 x 5 identity matrix
identity_matrix = np.eye(5)
random_2 = np.random.randint(5,100,size=(5,5))
print("Random 5 x 5 matrix")
print(random_2,"\n")
print("This is the identity matrix provided by the NumPy eye() method.")
print(identity_matrix)
print("\nA given matrix multiplied by an identity matrix equals itself.")

print(np.dot(random_2,identity_matrix))

### Random Number Generator

In [None]:
# Create 1-, 2-, and 3-dimensionsl random arrays using NumPy's random number generator.

np.random.seed(0) # use seed() to produce the same numbers each time

# Generate a one-dimensional array between 2 and 9 with six elements
random_1 = np.random.randint(2, 10, size=6)
print("One-dimensional random array:")
print(random_1)

# Generate a two-dimensional (3 x 4) array (0 to 9)
random_2 = np.random.randint(10, size=(3,4))
print("\nTwo-dimensional random array:")
print(random_2)

# Generate a three-dimensional (3 x 4 x 5) array (0 to 9)
random_3 = np.random.randint(10, size=(3,4,5))
print("\nThree-dimensional random array:")
print(random_3)

In [None]:
# Creating a two-dimensional 5 x 3 matrix filled with
# values randomly drawn from a continuous uniform distribution
# that ranges from 0.0 to 100.0
np.random.uniform(low = 0.0, high = 100.0, size = (5, 3))

In [None]:
# Creating a two-dimensional 3 x 3 matrix filled with
# values randomly drawn from a normal distribution with
# a mean of 10.0 and standard deviation of 2.0 
np.random.normal(loc = 10.0, scale = 2.0, size = (3,3))

## Using the linspace Method
Use the linspace method to create equally spaced elements in an array. The method commonly takes three arguments: start, stop, and num. Num is the number of evenly spaced numbers to create within the start and stop range.
```Python
# linspace(start_index, stop_index, num)
my_array = np.linspace(1, 10, 40)
```

In [None]:
my_array = np.linspace(1, 10, 40, endpoint=False)
print(my_array)

# `numpy` Array attributes
In this section we'll discuss useful array attributes.

In [None]:
import numpy as np
array_1 = np.array([True, False, False, True, True])

print(array_1, "\n")

# Display the number of dimensions, size, shape, number of bytes
print(f"Shape = {array_1.shape}")
print(f"Dimensions = {array_1.ndim}")
print(f"Size = {array_1.size}")
print(f"Item size = {array_1.itemsize}")
print(f"Bytes = {array_1.nbytes}")

In [None]:
exam_scores = np.array([["exam1","exam2","exam3"],[99.5,100,67]])
print(exam_scores,"\n")
print(f"Shape = {exam_scores.shape}")
print(f"Dimensions = {exam_scores.ndim}")
print(f"Size = {exam_scores.size}")
print(f"Item size = {exam_scores.itemsize}")
print(f"Bytes = {exam_scores.nbytes}")

## Array indexing: Acessing single elements

Arrays are zero-based. 

In [None]:
my_array = np.array([7,2,4,6,8,100,])

# arrays are zero-based
print(f"The first element is {my_array[0]}")


Like lists, dictionaries, and arrays in Python, NumPy arrays are mutable.

In [None]:
# Change my_array[5] to "10"
print(f"Current value of element 5 = {my_array[4]}")
my_array[5] = 10
print(f"New value of element 5 = {my_array[4]}")

### Type Coercion
Beware of type coercion--adding a float into an integer array truncates the decimal part. The float 2.5 becomes the integer 2.

In [None]:
import numpy as np
a = np.array([0,0,1])
a[1] = 2.5
print(a)

### Deleting Portions of Arrays

In [None]:
my_array [0] = 3.141
print(my_array)

In [None]:
# Create an array 0 to 59 and shape it into 6 rows by 10 columns
a = np.arange(0,60).reshape(6,10)
print(a)
print()
# Delete the last four columns (6 through 9) (see Array Slicing)
a = np.delete(a,a[:,6:])
print(a.reshape(6,6))

## Array slicing: Accessing subarrays
```Python
var[lower:upper:step]
```
Slicing an array extracts a subset of an array by specifying a lower bound, upper bound and (optionally) a step value. As in the Range (and ARange) functions the lower bound element is included in the result set, but the upper boudn value is not.

In [None]:
# Print slice of first five elements (0 - 4)
my_array = np.array([0,1,2,3,4,5,6,7,8,9])
new_array = my_array[0:5] # The 5th element is 25
print(new_array)

In [None]:
# Index of last element
my_array[-1]

In [None]:
# Display the last two elements
my_array[-2:]

In [None]:
# Display every other element
# ::step
my_array[::2]

### Slicing 2D Array

In [None]:
# 2D array from Deletion example
print(a)
print("Reshape array into 6 x 6")
a = a.reshape(6,6)
print(a)

![image](images_numpy/array6x6.jpg)

In [None]:
# First row = 0, show elements 3 up to 5, but not including 5
yellow = a[0, 3:5]
print("yellow =", yellow)

In [None]:
# include all rows (":"), limit to column 2
red = a[:, 2]
print("red = ", red)

In [None]:
# start at row 2 and step 2
# then start at column 0 and step 2
dark_blue = a[2::2, ::2]
print("dark blue = \n", dark_blue)

In [None]:
# start at 4th row and go to 4th column
blue = a[4:, 4:]
print("blue =", blue)


## Transposing Arrays
Use T to transpose arrays. Transpose swaps the order of the axes.

In [None]:
# orignal array
print(random_3.shape)
print(random_3)

#Transpose array
print(random_3.T.shape)
print(random_3.T)

## Array Concatenation and Splitting

In [None]:
# Use np.concatenate to join arrays
x = np.array([1, 2, 3]) 
y = np.array([3, 2, 1]) 
np.concatenate([x, y])

In [None]:
# np.concatenate enables you to join more than two arrays
z = [99, 99, 99] 
print(np.concatenate([x, y, z]))

In [None]:
grid = np.array([[1, 2, 3], [4, 5, 6]])
# concatenate along the first axis 
np.concatenate([grid, grid])

In [None]:
# concatenate along the second axis (zero-indexed) 
np.concatenate([grid, grid], axis=1)

### Joining arrays with mixed dimensions

You may find it more straightforward to use vstack and hstack to join arrays.

In [None]:
x = np.array([1, 2, 3]) 
grid = np.array([[9, 8, 7], [6, 5, 4]])
print(grid)
print(x.shape)
print(grid.shape)

In [None]:
# vertically stack the arrays 
np.vstack([x, grid])

In [None]:
# horizontally stack the arrays 
y = np.array([[99], [99]])
np.hstack([grid, y])

## Splitting Arrays
Use np.split, np.vsplit, and np.hsplit to split an array. For each function, pass a list of indices giving the split points.

In [None]:
x = [1, 2, 3, 99, 99, 3, 2, 1] 
x1, x2, x3 = np.split(x, [2, 6]) 
print(x1, x2, x3)

Notice that N split points lead to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar.

In [None]:
grid = np.arange(16).reshape((4, 4)) 
grid

In [None]:
upper, lower = np.vsplit(grid, [2]) 
print(upper) 
print()
print(lower)

In [None]:
print(grid)
print()
left, right = np.hsplit(grid, [1]) 
print(left) 
print()
print(right)

In [None]:
# Display documentation for a specific ufunc (e.g., sine)
np.info(np.sin)

## Motivation for Ufuncs

In [None]:
# Example: Compte the reciprocal of each value in the array using non-vectorized approach 
import numpy as np 
np.random.seed(0)

def compute_reciprocals(values): 
    output = np.empty(len(values)) 
    for i in range(len(values)): 
        output[i] = 1.0 / values[i] #Python examines the object type at each loop (SLOW!)
    return output

values = np.random.randint(1, 10, size=5) 
compute_reciprocals(values)

In [None]:
# Benchmark the non-vectorized operation
big_array = np.random.randint(1, 100, size=1000000) 
%timeit compute_reciprocals(big_array)

NumPy provides a convenient interface into statically typed, compiled routines which means avoiding the slowness caused by type-checking at each pass in a loop.

These routines are known as *vectorize* operations. A vectorized routine or function works not just on a single value, but on a whole vector of values at the same time.

Computations using vectorization thorugh ufuncs are nearly always more efficient than their counterpart implemented through Python loops, especially as the arrays grow in size.

In [None]:
print(compute_reciprocals(values)) 
print(1.0 / values)

In [None]:
%timeit (1.0 / big_array)

## Array arithmetic
NumPy’s ufuncs feel very natural to use because they make use of Python’s native arithmetic operators. The standard addition, subtraction, multiplication, and division can all be used:

In [None]:
x = np.arange(4) 
print(x)

In [None]:
print("x=", x)
print("x + 5 =", x + 5) 
print("x - 5 =", x - 5) 
print("x * 2 =", x * 2)
print("x / 2 =", x / 2) 
print("x // 2 =", x // 2) # floor division
print("-x= ", -x)
print("x ** 2 = ", x ** 2) 
print("x % 2 = ", x % 2)

## Ufunc trigonometry example - Sine

In [None]:
import matplotlib.pylab as plt

np.sin(np.array((0., 30., 45., 60., 90.)) * np.pi / 180. )

x = np.linspace(-np.pi, np.pi, 201)
plt.plot(x, np.sin(x))
plt.xlabel('Angle [rad]')
plt.ylabel('sin(x)')
plt.axis('tight')
plt.show()

In [None]:
# You could use a simple selection sort
import numpy as np
def selection_sort(x): 
    for i in range(len(x)): 
        swap = i + np.argmin(x[i:]) 
        (x[i], x[swap]) = (x[swap], x[i])
    return x

In [None]:
x = np.array([2, 1, 4, 3, 5]) 
selection_sort(x)

In [None]:
x = np.array([2, 1, 4, 3, 5]) 
np.sort(x)

# Using `np.polyfit`
`np.polyfit' performs a least squares polynomial fit--that is, it returns an vector of coefficients that bests fits the given data while minimizing the sqaured error.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt


msba_students = {'hours_studied': [24,49,23,19,30,22,47,40,34,32,16,20,17,41,25,34,14,47,41,23,8,22,11,41,42,24,48,44,21,21,10,14,11,19,51],
            'test_score':         [48.98,91.13,34.73,31.58,54.23,35.73,84.93,76.68,58.28,62.38,25.43,33.73,33.63,70.63,48.03,68.48,19.28,87.03,72.63,37.83,15.13,37.83,16.28,73.68,81.88,47.98,97.18,78.78,35.68,34.68,7.08,19.28,19.28,30.63,97.23]}

df_student_data = pd.DataFrame(data=msba_students)

In [None]:
df_student_data

In [None]:
x = df_student_data['hours_studied']
y = df_student_data['test_score']

In [None]:
plt.scatter(x,y)

In [None]:
model = np.polyfit(x, y, 1)

In [None]:
print("np.polyfit result: ",model)
print(model[1], ' is m (slope) ', model[0], ' is b (intercept).')

In [None]:
# To fit a regression line to the data

x_predict = range(0, 51) # Predict for 0 hours to 50 hours

# Equation for a line (y = mx + b)
y_predict = x_predict * model[0] + model[1]


#  This is equivalent to the previous line (y = mx + b)
#  You can comment out the previous line and uncomment these two lines
#   and the graph will remain the same.
#predict = np.poly1d(model)
#y_predict = predict(x_predict)

plt.scatter(x, y)
plt.plot(x_predict, y_predict, c = 'r')
