# Introduction to Numpy

NumPy is a fundamental package for scientific computing in Python. It provides support for arrays, matrices, and many mathematical functions to operate on these data structures efficiently. NumPy is essential for numerical computations and serves as the foundation for many other scientific libraries, including SciPy, Pandas, and scikit-learn. 

It would take years to cover all there is to know about NumPy, so here we'll focus on understanding the basics and buliding the tools to find and use the functions that we need as we need them. If you want your basics straight from the NumPy team, check out their [beginner's guide](https://numpy.org/doc/stable/user/absolute_beginners.html). 

To start, recall how we import a package. For this notebook, we'll use the traditional np alias.

In [None]:
import numpy as np

?np

The first thing we can get from NumPy are improved/new math functions/constants.

In [None]:
print(np.sqrt(5))
print(np.pi)
print(np.average([1,2,3]))
print(np.sum([1,2,3]))

But that's not all that impressive. NumPy's biggest feature is the introduction of the ndarray class. Arrays are much like lists, but have far greater functionality. Unlike lists, arrays are meant to contain a single data type, usually something numeric. This is not enforced, but much or all of the added functionality of NumPy over lists is lost by assigning heterogenous data types. 

Let's create our first array by using NumPy's array() function, which attempts to convert its argument into an array. We'll use a list since these are almost always convertible to arrays.

In [None]:
arr = np.array([1,2,3])

print(type(arr))

There are many other ways to initialize arrays using a variety of fuctions kindly offered by NumPy

In [None]:
zeros = np.zeros(10)
ones = np.ones(6)
empty = np.empty(8)

counting = np.arange(4)
intervals = np.linspace(10, 20, 5)

imitate0 = np.zeros_like(counting)
imitate1 = np.ones_like(intervals)
imitate_empty = np.empty_like(zeros)

print(zeros, '\n',
      ones, '\n',
      empty, '\n',
      counting, '\n',
      intervals, '\n',
      imitate0, '\n',
      imitate1, '\n',
      imitate_empty)

ndarrays have a number of attributes we can access using the '.' syntax.

In [None]:
a = np.arange(9)

print(a.size)
print(a.shape)
print(a.dtype)

# Arrays also behave well with the len() function
print(len(a))

That might seem well and good, but you still may be wondering why NumPy is the backbone of any numerical work in Python. Aside from the excellent additions to default Python that NumPy provides, NumPy performs much, *much* better than default Python. That is because in the background, NumPy is implemented in C, which allows it to far outpace the canonically slow Python interpreter. Let's take a look at an example to see what I mean. For the next couple lines, we'll be using the magic phrase %%timeit; don't worry about this. All it does is time the execution of the cell over several iterations.

In this simple example, we'll create an array/list with 1 million numbers (0-999,999). Then we'll update the list by multiplying each number by 5. Note that neither implementation below is the most efficient way to achieve the final result using lists or arrays, but it is a fair comparsion because they do they same operations in the same order.

In [None]:
# To do this in default Python, we need to use loops.
# Our first loop will create the list by sequentially appending each number.
# Our second loop will access each element and multiply its value by 5.

# At this point, you should be fairly comfortable with this code and 
#  could even write it yourself. In fact, think about how you might
#  improve it!

%%timeit

x = []
for i in range(int(1e6)):
    x.append(i)
    
for i in range(len(x)):
    x[i] *= 5

35.1 ms ± 562 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)


In [None]:
# For the NumPy implementation, we don't need to create a loop explicitly
#  (NumPy is doing it in the background in C). Intead, we'll use the 
#  arange() function to generate the list, and take advantage of the property 
#  of ndarrays whereby simple math operation are applied elementwise to the 
#  whole array.

%%timeit

y = numpy.arange(int(1e6))
y *= 5

344 µs ± 6.92 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)


Notice the time difference! This will be machine dependent, but you should something like a factor of 100 speed up. The advantage actually widens as the lengths get larger, so NumPy single-handedly makes large numerical operations in Python viable!

## Challenge

To drive home how much easier it is to create and manipulate data in NumPy, let's do one more example. Our goal will be to generate a 2-D list containing the numbers 1-100

In [17]:
x = []
for i in range(int(1e6)):
    x.append(i)
    
for i in range(len(x)):
    x[i] *= 5
    
y = numpy.arange(int(1e6))
y *= 5

print(x[:10], type(x))
print(y[:10], type(y))

[0, 5, 10, 15, 20, 25, 30, 35, 40, 45] <class 'list'>
[ 0  5 10 15 20 25 30 35 40 45] <class 'numpy.ndarray'>


numpy or np?

In [None]:
import numpy as np

print(np.sum([1,2,3]))
print(numpy.sum([1,2,3]))

https://numpy.org/doc/stable/reference/routines.html

In [19]:
import numpy as print

In [None]:
print("hi")

In [None]:
print.arange(10)