
<p><img align="left" src="https://www.cqf.com/themes/custom/creode/logo.svg" style="vertical-align: top; padding-top: 23px;" width="10%"/>
<img align="right" src="https://upload.wikimedia.org/wikipedia/commons/c/c3/Python-logo-notext.svg" style="vertical-align: middle;" width="12%"/>
<font color="#306998"><h1><center>Python Labs</center></h1></font></p>
<p></p><h1><center>Introduction to NumPy</center></h1>
<center><h3>Kannan Singaravelu</h3></center>
<center>kannan.singaravelu@fitchlearning.com</center>



<h2 id="Numerical-Python">Numerical Python<a class="anchor-link" href="#Numerical-Python">¶</a></h2><p>Python doesn't have built-in tools for mathematical or scientific computation. We rely on many helpful libraries for analysis. NumPy is one of the most important and the first among these libraries used for data analysis, machine learning and scientific computing. It is a core python library and is the fundamental building blocks of Scikit-Learn, SciPy, Pandas and TensorFlow packages.</p>
<p>More than 4900 packages have NumPy as a dependency. This is a huge feat and it will not be wrong to say that NumPy is the biggest reason for the success of Machine Learning in Python.</p>
<h3 id="NumPy">NumPy<a class="anchor-link" href="#NumPy">¶</a></h3><p>NumPy features can be classified into three-fold:</p>
<ul>
<li>mathematical functions</li>
<li><code>random</code> submodule</li>
<li><code>ndarray</code> object</li>
</ul>
<p>While datasets can have a wide range of sources and formats, NumPy helps us to think all data fundamentally as arrays of numbers. The first step in data preprocessing is making the data analyzable by transforming them into arrays of numbers.</p>
<p>Numpy is known for its</p>
<p><strong>Syntax</strong>: compact and vectorized syntax allowing for even 100,000 calculations within a single line of code.<br/>
<strong>Speed</strong>: faster as the majority of the code is implemented in C.</p>



<p><em>Note: To run all of the code cells in this example, select <strong>Run All</strong> from the <strong>Cell</strong> menu.</em></p>



<h2 id="The-Basics">The Basics<a class="anchor-link" href="#The-Basics">¶</a></h2><p>Data manipulation in Python is almost always equated with NumPy array manipulation. NumPy arrays are a) <strong>homogenous</strong>; b) elements are all of the <strong>same types</strong>. In NumPy, dimensions are called axes.</p>
<p><br/></p><center><strong> A NumPy array is similar to an n-dimensional matrix </strong><center><br/>
$$\begin{bmatrix}
    x_{11} &amp; x_{12} &amp; x_{13} &amp; \dots &amp; x_{1n} \\
    x_{21} &amp; x_{22} &amp; x_{23} &amp; \dots &amp; x_{2n} \\  
    \vdots &amp; \vdots &amp; \vdots &amp; \vdots &amp; \vdots \\
    x_{d1} &amp; x_{d2} &amp; x_{d3} &amp; \dots &amp; x_{dn} \\
\end{bmatrix}$$
<br/><center>N-dimensional array</center>
<p><br/></p><center><strong> 1D array </strong><center><br/>
$$\begin{bmatrix}
    x_{11} &amp; x_{12} &amp; x_{13} \\ 
\end{bmatrix}$$
<p><br/></p><center>The above array has 1 axis with 3 elements in it, so we say it has a length of 3</center>
<p><br/></p><center><strong> 2D array </strong><center><br/>
$$\begin{bmatrix}
    x_{11} &amp; x_{12} &amp; x_{13} \\ 
    x_{21} &amp; x_{22} &amp; x_{23} \\  
\end{bmatrix}$$
<p><br/></p><center>The above array has 2 axes. The first axis has a length of 2, the second axis has a length of 3</center>
</center></center></center></center></center></center>


<h3 id="Installation">Installation<a class="anchor-link" href="#Installation">¶</a></h3>



<p>We'll install the required libraries that we'll use in this example.</p>


In [None]:

# Instal Numpy library
# ! pip install numpy




<h3 id="Importing">Importing<a class="anchor-link" href="#Importing">¶</a></h3><p>We'll import the required libraries that we'll use in this example.</p>


In [None]:

# Import required libraries
import numpy as np



In [None]:

# Check the version
np.__version__




<h3 id="Builtin-Documentation">Builtin Documentation<a class="anchor-link" href="#Builtin-Documentation">¶</a></h3><p>One of the most useful functions of Jupyter platform is to shorten the learning curve by offering efficient help/documentation search. The following are the various ways to acess <em>doc string</em>.</p>
<ul>
<li>accessing documentation with <code>?</code><br/><br/></li>
<li>accessing source code with <code>??</code><br/><br/></li>
<li>exploring modules with Tab-Completion</li>
</ul>



<h2 id="Array-Creation">Array Creation<a class="anchor-link" href="#Array-Creation">¶</a></h2><p>We'll create arrays from scratch using builtin NumPy routines. NumPy arrays are also called as <code>ndarray</code>, a N-dimensional array. An <code>ndarray</code> is a multidimensional container of items of the same type and size. First, we can use <code>np.array</code> to create arrays from the list.</p>
<div class="highlight"><pre><span></span><span class="c1"># Creating an Array</span>
<span class="n">In</span> <span class="p">[</span> <span class="p">]</span> <span class="p">:</span>  <span class="n">np</span><span class="o">.</span><span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span><span class="mi">2</span><span class="p">,</span><span class="mi">3</span><span class="p">,</span><span class="mi">4</span><span class="p">,</span><span class="mi">5</span><span class="p">])</span>
<span class="n">Out</span><span class="p">[</span> <span class="p">]</span> <span class="p">:</span>  <span class="n">array</span><span class="p">([</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">])</span>
</pre></div>



<h4 id="Arrays-from-list">Arrays from list<a class="anchor-link" href="#Arrays-from-list">¶</a></h4>


In [None]:

# Array of integers
np.array([1,2,3,4,5])



In [None]:

# Numpy upcast for homogeneity
np.array([1,2,3.5,4,5])



In [None]:

# Explicit specification of datatype
np.array([1, 2, 3, 4, 5], dtype='float32')



In [None]:

# Create a matrix or multidimensional arrays 
np.array([[1,2],[3,4],[5,6]])




<h4 id="Arrays-from-scratch">Arrays from scratch<a class="anchor-link" href="#Arrays-from-scratch">¶</a></h4><p>In most cases, creating arrays from scratch would efficient. We can use NumPy to initialize the values of the array. NumPy provides methods like ones(), zeros(), and random.random() for these cases.</p>


In [None]:

# Create an empty array of length-5
np.empty(5)



In [None]:

# Create a length-5 integer array filled with zeros
np.zeros(5, dtype=int)



In [None]:

# Create a length-5 integer array filled with ones
np.ones(5, dtype=int)



In [None]:

# Create a 3x5 array filled with ones
np.ones((3,5))



In [None]:

# Create a 3x5 array filled with 5
np.full((3,5), 5)



In [None]:

# Creating a linear sequence array
np.arange(0,10,2) #start, end, step



In [None]:

# Create an evenly spaced array between 0 and 1
np.linspace(0,1,5)



In [None]:

# Create a randon numbers of length-3
np.random.random(3)



In [None]:

# Create a randon numbers of 3x3 - uniformly distributed
np.random.random((3,3))



In [None]:

# Create a normally distributed 3x3 random numbers
np.random.normal(0,1,(3,3))



In [None]:

# Create a 3x3 identity matrix
np.eye(3)




<h2 id="Array-Attributes">Array Attributes<a class="anchor-link" href="#Array-Attributes">¶</a></h2><p>The following table lists the important attributes of an <code>ndarray</code> object:</p>
<table>
<thead><tr>
<th>Attributes</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>ndarray.ndim</code></td>
<td>Number of axes (dimensions) of the array</td>
</tr>
<tr>
<td><code>ndarray.shape</code></td>
<td>Dimensions of the array</td>
</tr>
<tr>
<td><code>ndarray.size</code></td>
<td>Number of elements of the array</td>
</tr>
<tr>
<td><code>ndarray.dtype</code></td>
<td>Type of the elements in the array</td>
</tr>
<tr>
<td><code>ndarray.itemsize</code></td>
<td>Size in bytes of each element of the array</td>
</tr>
<tr>
<td><code>ndarray.nbytes</code></td>
<td>Total size in bytes of the array</td>
</tr>
</tbody>
</table>
<p>We'll now create three random arrays: 1D, 2D and 3D array using NumPy's random number generator and determine the above attributes of arrays.</p>


In [None]:

# Seed for reproducibility
np.random.seed(1)  

# 1D array
x1 = np.random.randint(10, size=5)  

# 2D array
x2 = np.random.randint(10, size=(2, 3))  

# 3D array
x3 = np.random.randint(10, size=(2, 3, 5))  



In [None]:

x1



In [None]:

x2



In [None]:

x3



In [None]:

# Print out the array attributes
print("The Array Attributes are:")
print("-"*25)
print(f"x3 ndim:      {x3.ndim}")
print(f"x3 shape:     {x3.shape}")
print(f"x3 size:      {x3.size}")
print(f"x3 dtype:     {x3.dtype}")
print(f"x3 itemsize:  {x3.itemsize}")
print(f"x3 nbytes:    {x3.nbytes}")




<p>The dimensions of a 3D array are described by the number of layers the array contains, and the number of rows and columns in each layer. All layers must have the same number of rows and columns. For a matrix with l layers, n rows and m columns, the shape will be <code>(l,n,m)</code>.</p>



<h2 id="Array-Manipulation">Array Manipulation<a class="anchor-link" href="#Array-Manipulation">¶</a></h2><p>We'll cover a few categories of basic array manipulations. These include <code>indexing</code>, <code>slicing</code>, <code>reshaping</code> and <code>joining &amp; splitting</code> of arrays.<br/><br/></p>
<center><img src="https://www.oreilly.com/library/view/python-for-data/9781449323592/httpatomoreillycomsourceoreillyimages2172112.png" style="width:400px; height:300px"/></center>
<center>A two dimesntional representation of indexing elements in a NumPy array</center>



<h3 id="Array-Indexing">Array Indexing<a class="anchor-link" href="#Array-Indexing">¶</a></h3><p>Array indexing is a methodology to access single elements of an array. The default Python indexing is zero-based and 1D arrays can be indexed much like the lists.</p>


In [None]:

x1



In [None]:

# Access first element 
x1[0]




<p>To index from the end of the array, we can use negative indices.</p>


In [None]:

# Access last element using negative indexing
x1[-1]




<p>In a multidimensional array, we use comma-separated tuple of indices to access an element of the array.</p>


In [None]:

x2



In [None]:

# Access first element in first row
x2[0,0]



In [None]:

# Access second element in first row
x2[0,1]



In [None]:

# Access last element in last row
x2[-1,-1]




<h3 id="Array-Slicing">Array Slicing<a class="anchor-link" href="#Array-Slicing">¶</a></h3><p>Array indexing is a methodology to access single elements of an array. We can slice the array using the colon (<code>:</code>) character. The slicing syntax follows that of the standard Python list.</p>
<div class="highlight"><pre><span></span><span class="n">x</span><span class="p">[</span><span class="n">start</span><span class="p">:</span><span class="n">stop</span><span class="p">:</span><span class="n">step</span><span class="p">]</span>
</pre></div>
<p>The default values are <code>start=0</code>, <code>stop=</code><em><code>size of dimension</code></em>, <code>step=1</code>. We'll now access the sub-arrays in one dimension and in multiple dimensions.</p>


In [None]:

x = np.arange(10)
x



In [None]:

x[:5]  # first five elements



In [None]:

x[5:7]  # middle subarray



In [None]:

x[::2]  # every other element



In [None]:

x[1::2]  # every other element, starting at index 1




<p>Slicing using a negative step value.</p>


In [None]:

x[::-1]  # all elements, reversed



In [None]:

x[::-2]  # every other element in reverse 




<p>Multidimensional slices work in the samey way</p>


In [None]:

x2



In [None]:

# Accessing first row
x2[0,:]



In [None]:

# Accessing first column
x2[:,0]



In [None]:

x2[:2, :3] #two rows, three columns



In [None]:

x2[:, ::2] #all rows, every other column



In [None]:

x2[::-1, ::-1] #reversing the array




<blockquote><p><em>Unlike the Python list slicing, Array slices return views rather than copies of the array data. Use <code>copy()</code> method to creat copies of array</em></p>
</blockquote>


In [None]:

# Create a subarray or slice of original array
x2_sub = x2[:2,:2]
print(x2_sub)



In [None]:

# Change first element to 10
x2_sub[0,0] = 10
print(x2_sub)



In [None]:

# Modified original array x2
x2




<h3 id="Reshaping-Arrays">Reshaping Arrays<a class="anchor-link" href="#Reshaping-Arrays">¶</a></h3><p>Chainging the shape of an array is done using the <code>reshape()</code> method.</p>


In [None]:

x4 = np.arange(1,10)
x4



In [None]:

x5 = x4.reshape((3,3))
x5



In [None]:

x6 = np.array([1,2,3]) # row vector
x6



In [None]:

# row vector
x6.reshape(1,3)



In [None]:

# column vector
x6.reshape(3,1)




<h3 id="Transposing-Arrays">Transposing Arrays<a class="anchor-link" href="#Transposing-Arrays">¶</a></h3><p>Transposing is a special form of reshaping which transposes a view on the underlying data without copying.</p>


In [None]:

arr = np.arange(15). reshape((3,5))
arr



In [None]:

arr.T




<h3 id="Sorting-Arrays">Sorting Arrays<a class="anchor-link" href="#Sorting-Arrays">¶</a></h3><p>We'll use NumPy <code>sort</code> and <code>argsort</code> functions to perform sorting operations.</p>


In [None]:

arr = np.random.randn(10)
arr



In [None]:

np.sort(arr)



In [None]:

arr2 = np.random.random((5,3))
arr2



In [None]:

# Column-wise sorting, i.e., across rows
np.sort(arr2, axis=0)



In [None]:

# Indices of the sorted elements
np.argsort(arr)



In [None]:

np.argsort(arr2, axis=0)




<h3 id="Array-Concatenation">Array Concatenation<a class="anchor-link" href="#Array-Concatenation">¶</a></h3><p>We'll now combine multiple arrays into one using array concatenation. Concatenation, or joining of two arrays are done using <code>np.concatenate</code>, <code>np.vstack</code>, and <code>np.hstack</code>.</p>


In [None]:

x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.concatenate([x, y])



In [None]:

# Concatenate more than two arrays at once:
z = [10, 100, 1000]
print(np.concatenate([x, y, z]))



In [None]:

arr = np.array([[1, 2, 3],
                 [4, 5, 6]])



In [None]:

# Concatenate along the first axis
np.concatenate([arr, arr])



In [None]:

x = np.array([1, 2, 3])
arr = np.array([[9, 8, 7],
                 [6, 5, 4]])

# Vertically stack the arrays
np.vstack([x, arr])



In [None]:

# Horizontally stack the arrays
y = np.array([[55],
              [55]])
np.hstack([arr, y])




<h3 id="Splitting-of-arrays">Splitting of arrays<a class="anchor-link" href="#Splitting-of-arrays">¶</a></h3><p>The opposite of concatenation is splitting, which is implemented by the functions <code>np.split</code>, <code>np.hsplit</code>, and <code>np.vsplit</code>. We can then pass a list of indices giving the split points.</p>


In [None]:

x = np.arange(9)
x1, x2, x3 = np.split(x, 3)
print(x1, x2, x3)



In [None]:

x1, x2, x3 = np.split(x, [3,5])
print(x1, x2, x3)



In [None]:

# Vertically split the arrays
arr = np.arange(16).reshape((4, 4))
arr



In [None]:

upper, lower = np.vsplit(arr, [2])
print(upper)
print(lower)



In [None]:

# Horizontally split the arrays
left, right = np.hsplit(arr, [2])
print(left)
print(right)




<h2 id="UFuncs">UFuncs<a class="anchor-link" href="#UFuncs">¶</a></h2><p>Universal functions or <code>ufuncs</code> is function that performs element-wise operations in ndarray. It is a fast vectorized wrappers for simple functions. It is categorised into two: <code>unary ufuncs</code> that operator on a single input and <code>binary ufuncs</code> that operate on two inputs.</p>
<blockquote><p><code>Unary ufuncs  : abs, sqrt, square, exp, log, sign, ceil, floor, cos, sin, tan</code><br/>
<code>Binary ufuncs : add, subtract, multiply, divide, power, maximum, minimum, mod</code></p>
</blockquote>
<p>There are more than 60 ufuncs defined in NumPy covering a wide variety of operations. <a href="https://numpy.org/devdocs/reference/ufuncs.html">Available ufuncs</a></p>


In [None]:

arr = np.arange(10)
arr



In [None]:

np.sqrt(arr)



In [None]:

np.exp(arr)



In [None]:

x = np.random.randn(10)
y = np.random.randn(10)



In [None]:

x



In [None]:

y



In [None]:

np.maximum(x,y)



In [None]:

np.minimum(x,y)




<p>In advanced ufuncs features, we specify the output. This will be helpful for large calculations.</p>


In [None]:

x = np.arange(5)
y = np.empty(5)
np.multiply(x,10, out=y)
print(y)




<h3 id="Aggregates-for-ufuncs">Aggregates for ufuncs<a class="anchor-link" href="#Aggregates-for-ufuncs">¶</a></h3><p>For binary ufuncs, we can use aggregates that can be computed directly from the object. A <code>reduce</code> method applies a given operation to the elements until a single results remains while the <code>outer</code> method can be used to compute the output of pairs of two different inputs.</p>


In [None]:

x = np.arange(1,6)
x




<h3 id="Reduce">Reduce<a class="anchor-link" href="#Reduce">¶</a></h3>


In [None]:

np.add.reduce(x)



In [None]:

np.multiply.reduce(x)




<h3 id="Accumulate">Accumulate<a class="anchor-link" href="#Accumulate">¶</a></h3>


In [None]:

# Store intermediate results
np.add.accumulate(x)



In [None]:

np.multiply.accumulate(x)




<h3 id="Outer">Outer<a class="anchor-link" href="#Outer">¶</a></h3>


In [None]:

# Outer products
np.multiply.outer(x, x)




<h2 id="Array-Aggregation">Array Aggregation<a class="anchor-link" href="#Array-Aggregation">¶</a></h2><p>NumPy makes it easier to compute summary statistics for the data. It has built-in aggregation functions for working on arrays.</p>


In [None]:

# Arithmetic operations
x = np.arange(5)

print("x      =", x)
print("x + 5  =", x + 5)
print("x - 5  =", x - 5)
print("x * 2  =", x * 2)
print("x / 2  =", x / 2)
print("x // 2 =", x // 2)  # floor division



In [None]:

numbers = np.random.rand(100000)



In [None]:

print(f'Mininum{np.min(numbers): .4f}, Maximim{np.max(numbers): .4f}')



In [None]:

# Shorter syntax to use
numbers.min(), numbers.max(), numbers.sum()



In [None]:

%timeit min(numbers)
%timeit np.min(numbers)
%timeit numbers.min()




<pre><code>                           List of key aggregation functions available in NumPy


</code></pre>
<table>
<thead><tr>
<th>Function Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>np.sum</code></td>
<td>Compute sum of elements</td>
</tr>
<tr>
<td><code>np.prod</code></td>
<td>Compute product of elements</td>
</tr>
<tr>
<td><code>np.mean</code></td>
<td>Compute mean of elements</td>
</tr>
<tr>
<td><code>np.std</code></td>
<td>Compute standard deviation</td>
</tr>
<tr>
<td><code>np.var</code></td>
<td>Compute variance</td>
</tr>
<tr>
<td><code>np.min</code></td>
<td>Find minimum value</td>
</tr>
<tr>
<td><code>np.max</code></td>
<td>Find maximum value</td>
</tr>
<tr>
<td><code>np.argmin</code></td>
<td>Find index of minimum value</td>
</tr>
<tr>
<td><code>np.argmax</code></td>
<td>Find index of maximum value</td>
</tr>
<tr>
<td><code>np.median</code></td>
<td>Compute median of elements</td>
</tr>
<tr>
<td><code>np.percentile</code></td>
<td>Compute rank-based statistics of elements</td>
</tr>
</tbody>
</table>



<h3 id="Multidimensional-Aggregation">Multidimensional Aggregation<a class="anchor-link" href="#Multidimensional-Aggregation">¶</a></h3><p>In a multidimesional array, we aggregate along a row and column.</p>


In [None]:

arr = np.random.random((3,4))
arr



In [None]:

arr.sum()



In [None]:

# Aggregation across column
arr.sum(axis=0)



In [None]:

# Aggregation across rows
arr.sum(axis=1)




<h2 id="Boolean">Boolean<a class="anchor-link" href="#Boolean">¶</a></h2><p>We'll now use the Boolean masks to manipulate array values. List of operators and their equivalent ufunc are listed below:</p>
<table>
<thead><tr>
<th>Operator</th>
<th>Equivalent ufunc</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>&amp;</code></td>
<td>np.bitwise_and</td>
</tr>
<tr>
<td><code>|</code></td>
<td>np.bitwise_or</td>
</tr>
<tr>
<td><code>^</code></td>
<td>np.bitwise_xor</td>
</tr>
<tr>
<td><code>~</code></td>
<td>np.bitwise_not</td>
</tr>
</tbody>
</table>


In [None]:

x = np.arange(1,6)
x



In [None]:

# Less than
x < 3



In [None]:

# Greater than
x > 3



In [None]:

# Equal 
x == 3



In [None]:

# Not equal 
x != 3



In [None]:

# Count values less than 4
np.count_nonzero(x<4)



In [None]:

# Count values less than 4
np.sum(x<4)



In [None]:

# Are they any values > 8?
np.any(x>8)



In [None]:

# Are all values < 8?
np.all(x<8)



In [None]:

np.sum((x>2) & (x<5))




<h2 id="File-Input-&amp;-Output-with-Arrays">File Input &amp; Output with Arrays<a class="anchor-link" href="#File-Input-&amp;-Output-with-Arrays">¶</a></h2><p>The <code>ndarray</code> objects can save and load data from disk files. The IO functions <code>load</code> abd <code>save</code> handle binary files with <code>.npy</code> extension while <code>loadtxt</code> and <code>savetxt</code> functions handle normal text files.</p>


In [None]:

# Save input array with .npy extension
np.save('outfile', x)



In [None]:

# Save input array as a text file
np.savetxt('outfile.txt', x)



In [None]:

# Load and reconstruct array from outflie.npy
print(np.load('outfile.npy'))



In [None]:

# Load and reconstruct array from outflie.txt
print(np.loadtxt('outfile.txt'))




<h1 id="References">References<a class="anchor-link" href="#References">¶</a></h1><ul>
<li><p>Numpy documentation <a href="https://docs.scipy.org/doc/numpy/">https://docs.scipy.org/doc/numpy/</a></p>
</li>
<li><p>Jake VanderPlas (2016), Python Data Science Handbook: Essential tools for working with data</p>
</li>
<li><p>McKinney (2018), Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython</p>
</li>
<li><p>Python Resources <a href="https://github.com/kannansingaravelu/PythonResources">https://github.com/kannansingaravelu/PythonResources</a></p>
</li>
</ul>
