# Introducing Python Libraries

##  Introduction

A library (or a module/package) is a pre-written piece of software that you can re-use rather than having to write that functionality yourself. So instead of having to write the code from scratch to plot a bar chart, you can use the Matplotlib library.

In this lesson, we will look at some of the key libraries used in Python for Data Science. 

## Objectives
You will be able to:
- Explain the purpose of common Python libraries in the data science toolkit


## Python Libraries for Data Science

## Scientific Computation

One of the key requirements for a Data Scientist is to be able to convert data into an easy-to-process format. Data, represented inside a computer, may become too large to be processed efficiently by Python's native lists and dictionaries and using Python's built-in methods. The following libraries add scientific computation abilities to Python for working efficiently with larger data sets. 

### NumPy 

In Python, the most fundamental package used for scientific computation is **NumPy** (Numerical Python). It provides lots of useful functionality for mathematical operations on vectors and matrices in Python. Matrix computation is the primary strength of NumPy. 

<img src="images/numpy.jpeg" width="250">

The library provides these mathematical operations using the NumPy **array** data type, which enhances performance and speeds up execution when compared to Python's default methods and data types. It contains among other things:

* A powerful N-dimensional array object
* Sophisticated (broadcasting) functions
* Tools for integrating C/C++ and Fortran code
* Useful linear algebra, Fourier transform, and random number capabilities
* Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

Numpy is also used as a foundation for other, more advanced, libraries as we shall see below.

### SciPy

In the Data Science domain, Python’s SciPy stack (a collection of software specifically designed for scientific computing) is used heavily for conducting scientific experiments. The SciPy library is as an integral part of this stack.

<img src="images/scipy.png" width="150">

SciPy is a library of software for engineering and science applications and contains functions for **linear algebra**, **optimization**, **integration**, and **statistics**. 

The functionality of the SciPy library is built on top of NumPy, and its data structures make heavy use of NumPy. It provides efficient numerical computational routines and comes packaged with a number of specific submodules. The following are a few modules from this library which are very commonly applied to Data Science experiments: 


* `stats`: Statistical functions
* `linalg`: Linear algebra routines
* `fftpack`: Discrete Fourier transform algorithms
* `signal`: Signal processing tools
* `optimize`: Optimization algorithms including linear programming

### Statsmodels 
Statsmodels is a library for Python that enables its users to conduct data exploration via the use of various methods of estimation of statistical models and performing statistical assertions and analysis.

<img src="images/statsmodels-logo-300.png" width="250">

One of the many useful features it provides is a comprehensive set of descriptive statistics. The library provides insights when diagnosing issues with linear regression models, generalized linear models, discrete choice models, robust linear models, and time series analysis models with various estimators.

The library also provides extensive plotting functions that are designed specifically for statistical analysis and are optimized for good performance with large data sets.

### Pandas

Pandas is a Python package designed to work with “relational” data and helps replicates the functionality of relational databases in a simple and intuitive way. Pandas is a great tool for data wrangling. It is designed for quick and easy data cleansing, manipulation, aggregation, and visualization.

<img src="images/pandas-300x300.jpg" width="200">

There are two main data structures in the library: 

1. “Series” - one-dimensional
2. “DataFrames” - two-dimensional

These data types can be manipulated in a number of ways for analytical needs. Here are a few ways in which Pandas may come in handy:

* Easily delete and add columns from DataFrame
* Convert data structures to DataFrame objects
* Handle missing data and outliers
* Powerful grouping and aggregation functionality
* Offers visualization functionality to plot complex statistical visualizations on the go
* The data structures in Pandas are highly compatible with most of the other libraries 


## Data Visualization

Data visualization is one of the most common tasks that Data Scientists have to perform. Traditionally, drawing visualizations would involve providing pixel level details and complex mathematical functions to create the plots. Luckily, Python has good library support for data visualization from plotting routine visualizations in Matplotlib, to developing graphical dashboards in Plotly and Bokeh. In this course, we will cover the following graphical packages: 

### MatplotLib


Matplotlib is another SciPy stack package and a library that is tailored for the generation of simple and powerful visualizations. It is a sophisticated package which is making Python (with the help of NumPy, SciPy, and Pandas) an industry standard analytics tool. 

<img src="images/matplotlib.png" width="300">


Matplotlib is a flexible plotting library for creating interactive 2D and 3D plots that can also be saved as manuscript-quality figures. The API in many ways reflects that of MATLAB, easing the transition of MATLAB users to Python. Many examples, along with the source code to re-create them, are available in the Matplotlib gallery. With a bit of effort you can make just about any visualization, including:

```
Line plots
Scatter plots
Bar charts and Histograms
Pie charts
Stem plots
Contour plots
Quiver plots
Spectrograms
``` 

There are also facilities for creating labels, grids, legends, and many other formatting entities with Matplotlib. Basically, everything is customizable.

The library, however, is pretty low-level which means that you will need to write more code for advanced visualizations and will generally need more effort.

### Seaborn

Seaborn is complementary to Matplotlib and it specifically targets statistical data visualizations, which may be more time-consuming to implement using Matplotlib. Seaborn extends the functionality of Matplotlib and that’s why it can address the two biggest issues with Matplotlib - the quality of plots and parameter defaults. A full overview of Seaborn's capabilities (as the image below) can be found [here](https://seaborn.pydata.org/examples/index.html).

<img src="images/seaborn_2.png" width="500">

> If Matplotlib "tries to make easy things easy and hard things possible," then Seaborn tries to make a well-defined set of hard things easy too.

Since Seaborn complements and extends Matplotlib, if you know Matplotlib, you’ll already have most of Seaborn down. Your plots with Seaborn will be more attractive, need less time to create, and will reveal more information. 

## Machine Learning 

### Scikit-Learn 

Scikits are Scientific "kits" on top of the SciPy stack. These are designed to add specific functionality to SciPy like image processing and machine learning facilitation. For machine learning, one of the most heavily used packages is **scikit-learn**. The package makes heavy use of its mathematical operations to model and test complex computational algorithms.

<img src="images/sklearn.png" width="200">

Scikit-learn (sometimes abbreviated to sklearn) offers a consistent interface to common Machine Learning (ML) algorithms, making it simple to bring ML into production systems. The library combines quality code and good documentation, ease of use and high performance, and has become industry standard for machine learning with Python. The image below highlights the key machine learning algorithms that come packaged with sklearn for problems in classification, regression, clustering, and dimensionality reduction. You can find an interactive version of the machine learning map below [here](https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html).

<img src="images/drop_shadows_background.png" width="850">


## Deep Learning  (Keras / TensorFlow)

For Deep Learning, one of the most popular and convenient libraries for Python is Keras, which builds on top of TensorFlow.

### TensorFlow

Developed by a team of ML experts at Google, TensorFlow is an open-source library of data flow graph computations, which are fine-tuned for heavy duty Machine Learning. TensorFlow was designed to meet the performance requirements of Google for training Deep Neural Networks in order to analyze visual and textual data. However, TensorFlow isn't limited to scientific use - it is general enough to use in a variety of real-world applications.

<img src="images/tf.png" width="320">

The key feature of TensorFlow is its multi-layered nodes system that enables quick training of artificial neural networks on big data. This is the library that powers Google’s voice recognition and object recognition in real time. 


### Keras

Keras is an open-source library for building Neural Networks with a high-level of interface abstraction. The Keras library is written in Python so Python developers find it much easier to start coding for deep networks in Keras than Tensorflow, which demands a deeper understanding of graph computation. Keras is much more minimalistic and straightforward while still being highly extensible. Under the hood, it can use either Theano (another deep learning library) or TensorFlow.


<img src="images/keras.jpg" width="320">

Keras is really easy to get started with and for quick prototyping, it is highly modular and extensible. Notwithstanding its ease, simplicity, and high-level orientation, Keras is still deep and powerful enough for serious modeling. In the deep learning section of our course, we will introduce you to Keras to help you dive into deep neural networks.

## Summary 

A big part of your journey as a Data Scientist will be building comfort and familiarity with the key Python Data Science libraries that we've outlined in this lesson. As the course progresses, you'll get plenty of hands-on experience with each one of them!

# Introduction to Numpy

## Introduction

In this section, we'll take a more formal look at *NumPy*. Besides being ubiquitous in Data Science, NumPy also provides us with blistering fast and efficient, list-like, data types called N-Dimensional Arrays or **ndarrays** or more simply arrays. This list-like data type is effectively a lighter weight version of a Python **list**, as it uses less of your computer's memory, which makes it more efficient, especially when dealing with large datasets. Don't worry if that seems a little vague. We will take a closer look at NumPy and how its arrays work in this lesson.

An important note: *Pandas* was actually built on top of *NumPy*! So many of the functionalities that *NumPy* has, are also part of *Pandas*. It is still important to cover *NumPy* separately as *NumPy* arrays are very important building blocks in many Data Science applications!

## Objectives
You will be able to: 

- Describe why NumPy is used at times over standard Python 
- Instantiate a NumPy array with specified values 
- Use broadcasting to perform a math operation on an entire NumPy array 

## Getting Started With NumPy

Just like with any other library, we need to first install the library with a package manager like `pip`. We have already installed this library in the background, so, no need to worry about this step. Next, we need to import the dependency into our code. 

The conventional method to import NumPy is by aliasing it as `np`, like this:

In [1]:
import numpy as np

That was easy! Now we can use any functions from the NumPy library by simply typing `np.function_name()`. For example, if we wanted to create a NumPy array containing the values 1, 2, 3, and 4, we would write `np.array([1,2,3,4])`. Let's try it out below:

In [2]:
numpy_arr = np.array([1, 2, 3, 4])
print('Here is a NumPy array:', numpy_arr)
print('You know it is a NumPy array because its type is:', type(numpy_arr))

Here is a NumPy array: [1 2 3 4]
You know it is a NumPy array because its type is: <class 'numpy.ndarray'>


While we'll focus on one-dimensional arrays in this lesson, it is important to mention that NumPy is very famous for its multi-dimensional arrays, like:

In [3]:
numpy_ndarr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
numpy_ndarr

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

The benefits of using NumPy here will become more obvious in our math-heavier, linear algebra / matrices sections!

## Performing array operations

So why would you want to use a NumPy array instead of a list? Because compared to a list, NumPy makes it very easy to perform array operations, like adding, multiplying, and otherwise operating on each element of the array. 

First, let's take a look at a simple example. We have a list of integers, and we want to add 3 to each element in the list. One might try the following:

In [4]:
list_of_integers = [0, 1, 2, 3]

In [5]:
# Add 3 to each element
list_of_integers + 3

TypeError: can only concatenate list (not "int") to list

You'll see that this doesn't work, because Python expects a list-like object. And even if you convert the integer 3 to a list-like element, you won't exactly get the desired result.

In [6]:
# Add 3 to each element
list_of_integers + [3]

[0, 1, 2, 3, 3]

Let's see what happens if we convert our list to a *NumPy* array!

In [7]:
# Convert to NumPy Array
array_of_integers = np.array(list_of_integers)
# Add 3
array_of_integers + 3

array([3, 4, 5, 6])

It worked this time! So what actually happens behind the scenes here, is referred to as *broadcasting*. The term broadcasting describes how NumPy treats objects with different shapes during arithmetic operations: So, what this means in this context is that the value "3" when performing the addition is actually being *reused* throughout the entire array. This might seem trivial, but lists don't support this behavior!

So, we see that NumPy can operate on each element just by giving an operation to a NumPy array. But NumPy can *also* use two arrays to operate on one another. This is useful in cases where you have two sets of data that are indirectly related, but commonly used to create statistics like population and area of a given city or state, which would give us population density (i.e. nyc_population_density = nyc_population / nyc_square_miles )
 
What if we had a friend who is trying to figure out the square footage of their apartment. They've measured out the lengths of each room and put those into a list for us, and then made another list for the widths of each room. Instead of trying to figure out this bizarre way our friend grouped their data, let's use NumPy to create a list with the area in square feet for each room.

In [8]:
lengths_of_each_room = np.array([10, 12, 20, 5])
widths_of_each_room = np.array([13, 15, 16, 4])
areas_of_each_room = lengths_of_each_room * widths_of_each_room
print ('Here is an array with the square footages for each room:', areas_of_each_room)

Here is an array with the square footages for each room: [130 180 320  20]


### A Temperature Conversion Example

Now, let's imagine we have a list of temperatures that represent the average high temperatures for each month of the year in NYC. Currently, this list has all the temperatures in Fahrenheit. However, since NYC has such a large international presence and population, it would be great to have these numbers in Celsius as well. Without NumPy, we would have to access each element individually, get its value, convert the value to Celsius, and add the new value to a new array. With NumPy, we can just multiply each element by the factor we need to convert Fahrenheit to Celsius.

The formula for converting Fahrenheit to Celsius is below: 
```
T(°C) = (T(°F) - 32) × 5/9
```
Let's see an example of how we would perform this conversion with a Python list and a NumPy array.

In [9]:
# Average temps in NYC from January -> December (in fahrenheit)
nyc_avg_temps_f = [39, 42, 50, 62, 72, 80, 85, 84, 76, 65, 54, 44]

# ----- Without NumPy -----
nyc_avg_temps_c = list(range(0,12))
nyc_avg_temps_c[0] = (nyc_avg_temps_f[0] - 32) * (5/9)
nyc_avg_temps_c[1] = (nyc_avg_temps_f[1] - 32) * (5/9)
nyc_avg_temps_c[2] = (nyc_avg_temps_f[2] - 32) * (5/9)
nyc_avg_temps_c[3] = (nyc_avg_temps_f[3] - 32) * (5/9)
nyc_avg_temps_c[4] = (nyc_avg_temps_f[4] - 32) * (5/9)
nyc_avg_temps_c[5] = (nyc_avg_temps_f[5] - 32) * (5/9)
nyc_avg_temps_c[6] = (nyc_avg_temps_f[6] - 32) * (5/9)
nyc_avg_temps_c[7] = (nyc_avg_temps_f[7] - 32) * (5/9)
nyc_avg_temps_c[8] = (nyc_avg_temps_f[8] - 32) * (5/9)
nyc_avg_temps_c[9] = (nyc_avg_temps_f[9] - 32) * (5/9)
nyc_avg_temps_c[10] = (nyc_avg_temps_f[10] - 32) * (5/9)
nyc_avg_temps_c[11] = (nyc_avg_temps_f[11] - 32) * (5/9)
# -------------------------

# ------ With NumPy -------
np_nyc_avg_temps_f = np.array(nyc_avg_temps_f)
np_nyc_avg_temps_c = (np_nyc_avg_temps_f - 32) * (5/9)
# -------------------------

print('1. Without NumPy:', nyc_avg_temps_c)
print('2. With NumPy:', np_nyc_avg_temps_c)

1. Without NumPy: [3.8888888888888893, 5.555555555555555, 10.0, 16.666666666666668, 22.22222222222222, 26.666666666666668, 29.444444444444446, 28.88888888888889, 24.444444444444446, 18.333333333333336, 12.222222222222223, 6.666666666666667]
2. With NumPy: [ 3.88888889  5.55555556 10.         16.66666667 22.22222222 26.66666667
 29.44444444 28.88888889 24.44444444 18.33333333 12.22222222  6.66666667]


Woah! Okay, we can see that in the first example, without NumPy, it took us **thirteen (13)** lines of code to accomplish the conversion from Fahrenheit to Celsius. With a NumPy array, we condensed that operation to **two (2)** lines of code. 

Let's break this down. Essentially the problem was to operate on each number in the list of NYC average monthly temperatures. The operation was to convert the number in Fahrenheit to Celsius. To do this, without NumPy, we must access each value from the `nyc_avg_temps_f` list separately, use the value to convert it to Celsius, and assign the converted value to the `nyc_avg_temps_c` list. *With* NumPy, we just need to use the variable name for the list, as if it were a single element, within the operation. NumPy then quickly performs the operation on each element and returns a **new** array.

Don't worry too much about how this is implemented behind the scenes. The key takeaway is that when we have large datasets that we want to operate on, NumPy can usually greatly simplify our code as well as make it more performant, which we will learn about later!

## Performance Benefits of NumPy Arrays

Another benefit to NumPy arrays, as we mentioned earlier, is that they use less memory and therefore make it easier for us to perform operations on them. However, this performance benefit is only really noticed when dealing with very large datasets. So, for now, the performance benefits of NumPy are purely educational, and we do not need to worry about them just yet. 

Let's take a look at an example. We will perform a simple operation on two sets of data. One is a regular list and the other is a NumPy array. Don't worry about the code. We are only focusing on the time difference between how long it takes us to perform the same operation with and without NumPy.

In [10]:
import time

# Using 1 million integers
huge_list_of_integers = list(range(0, 1000000))
huge_np_array_of_integers = np.array(huge_list_of_integers)

def add_one(list_of_ints):
    return [num + 1 for num in list_of_ints]


start_time = time.perf_counter() # Time when operation starts
add_one(huge_list_of_integers) # Adds 1 to each number in the list of integers above
end_time = time.perf_counter() # Time when operation finishes
total_time = (end_time - start_time) # Total time for operation


start_time_with_np = time.perf_counter() # Time when operation starts
huge_np_array_of_integers + 1 # Adds 1 to each number in the array of integers
end_time_with_np = time.perf_counter() # Time when operation finishes
total_time_with_np = (end_time_with_np - start_time_with_np) # Total time for operation

print('Time it takes to add 1 to each element in a list without NumPy:', total_time)
print('Time it takes to add 1 to each element in a list with NumPy:', total_time_with_np)

percent_faster = int((((total_time - total_time_with_np)/total_time)*100))
print('NumPy completes the operation', percent_faster, '% faster than a traditional list')

Time it takes to add 1 to each element in a list without NumPy: 0.061647489000002054
Time it takes to add 1 to each element in a list with NumPy: 0.002932601000001256
NumPy completes the operation 95 % faster than a traditional list


## Simulations with NumPy

To conclude this lesson, it is important to mention that NumPy is a very useful library to perform random sampling. What this means is that, given that we know what a certain population looks like, we can use `numpy.random` to essentially "produce" random samples given the population information.

Don't worry if this doesn't make sense right now. We'll explore this NumPy functionality later on.

## Summary

In this lesson, we introduced using NumPy to create arrays in Python. NumPy is a library that is very commonly used in Python when performing scientific computing operations. We looked at how NumPy can greatly reduce the amount of code we write while keeping our code very clear and concise. Next, we briefly looked at an example of the performance benefits of NumPy compared to a traditional list in Python. Finally, we touched upon NumPy's random number generating capabilities.

# Getting Started with NumPy

## Introduction

NumPy is one of the main libraries for performing scientific computing in Python. Using NumPy, you can create high-performance multi-dimensional arrays, and several tools to work with these arrays. 

A NumPy array can store a grid of values. All the values must be of the same type. NumPy arrays are n-dimensional, and the number of dimensions is denoted by the *rank* of the NumPy array. The shape of an array is a tuple of integers which holds the size of the array along each of the dimensions.

For more information on NumPy, refer to http://www.numpy.org/.


## Objectives

You will be able to:
  
- Use broadcasting to perform a math operation on an entire numpy array    
- Perform vector and matrix operations with numpy 
- Access the shape of a numpy array    
- Use indexing with numpy arrays    




## NumPy array creation and basic operations

First, remember that it is customary to import NumPy as `np`.

In [1]:
import numpy as np

One easy way to create a numpy array is from a Python list. The two are similar in a number of manners but NumPy is optimized in a number of ways for performing mathematical operations, including having a number of built-in methods that will be extraordinarily useful.

In [2]:
x = np.array([1, 2, 3])
print(type(x))

<class 'numpy.ndarray'>


## Broadcasting Mathematical Operations

Notice right off the bat how basic mathematical operations will be applied elementwise in a NumPy array versus a literal interpretation with a Python list:

In [3]:
# Multiplies each element by 3
x * 3 

array([3, 6, 9])

In [4]:
# Returns the list 3 times
[1, 2, 3] * 3 

[1, 2, 3, 1, 2, 3, 1, 2, 3]

In [5]:
# Adds two to each element
x + 2 

array([3, 4, 5])

In [6]:
# Returns an error; different data types
[1, 2, 3] + 2 

TypeError: can only concatenate list (not "int") to list

## Even more math!

### Scalar Math

|   |   |
|---|---|
|`np.add(arr,1)` | Add 1 to each array element  |
|`np.subtract(arr,2)`  | Subtract 2 from each array element  |
|`np.multiply(arr,3)`  | Multiply each array element by 3 |
|`np.divide(arr,4)`    | Divide each array element by 4 (returns `np.nan` for division by zero) |
|`np.power(arr,5)`     | Raise each array element to the 5th power | 




### Vector Math

|   |   |
|---|---|
|`np.add(arr1,arr2)` | Elementwise add arr2 to arr1  |
|`np.subtract(arr1,arr2)`  | Elementwise subtract arr2 from arr1  |
|`np.multiply(arr1,arr2)`  | Elementwise multiply arr1 by arr2 |
|`np.divide(arr1,arr2)`    | Elementwise divide arr1 by arr2 |
|`np.power(arr1,arr2)`     | Elementwise raise arr1 raised to the power of arr2 | 
|`np.array_equal(arr1,arr2)`| Returns True if the arrays have the same elements and shape |
|`np.sqrt(arr)`            |  Square root of each element in the array                    |
|`np.sin(arr)`             |  Sine of each element in the array                           |
|`np.log(arr)`             |  Natural log of each element in the array                    |
|`np.abs(arr)`             |  Absolute value of each element in the array                 |
|`np.ceil(arr)`            |  Rounds up to the nearest int                                |
|`np.floor(arr)`           |  Rounds down to the nearest int                              |
|`np.round(arr)`           |  Rounds to the nearest int                                   |

### Here's a few more examples from the list above

In [7]:
# Adding raw lists is just appending
[1, 2, 3] + [4, 5, 6] 

[1, 2, 3, 4, 5, 6]

In [8]:
# Adds elements
np.array([1, 2, 3]) + np.array([4, 5, 6]) 

array([5, 7, 9])

In [9]:
# Same as above with built-in method
x = np.array([1, 2, 3])
y = np.array([4, 5, 6])
np.add(x, y)

array([5, 7, 9])



## Multidimensional Arrays
NumPy arrays are also very useful for storing multidimensional data such as matrices. Notice how NumPy tries to nicely align the elements.

In [10]:

# An ordinary nested list
y = [[1, 2], [3, 4]]
print(type(y))
y

<class 'list'>


[[1, 2], [3, 4]]

In [11]:
# Reformatted as a NumPy array
y = np.array([[1, 2], [3, 4]])
print(type(y))
y

<class 'numpy.ndarray'>


array([[1, 2],
       [3, 4]])

## The Shape Attribute
One of the most important attributes to understand with this is the shape of a NumPy array.

In [12]:
y.shape

(2, 2)

In [13]:
y = np.array([[1, 2, 3],[4, 5, 6]])
print(y.shape)
y

(2, 3)


array([[1, 2, 3],
       [4, 5, 6]])

In [14]:


y = np.array([[1, 2],[3, 4],[5, 6]])
print(y.shape)
y

(3, 2)


array([[1, 2],
       [3, 4],
       [5, 6]])

### We can also have higher dimensional data such as working with 3 dimensional data
<img src="images/Image_195_3D array.png" width=500>

In [15]:
y = np.array([[[1, 2],[3, 4],[5, 6]],
             [[1, 2],[3, 4],[5, 6]]
             ])
print(y.shape)
y

(2, 3, 2)


array([[[1, 2],
        [3, 4],
        [5, 6]],

       [[1, 2],
        [3, 4],
        [5, 6]]])

## Built-in Methods for Creating Arrays
NumPy also has several built-in methods for creating arrays that are useful in practice. These methods are particularly useful:
* `np.zeros(shape)` 
* `np.ones(shape)`
* `np.full(shape, fill)`

In [16]:
# One dimensional; 5 elements
np.zeros(5)  

array([0., 0., 0., 0., 0.])

In [17]:
# Two dimensional; 2x2 matrix
np.zeros([2, 2]) 

array([[0., 0.],
       [0., 0.]])

In [18]:
# 2 dimensional;  3x5 matrix
np.zeros([3, 5]) 

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [19]:
# 3 dimensional; 3 4x5 matrices
np.zeros([3, 4, 5]) 

array([[[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]],

       [[0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]]])

### Similarly the `np.ones()` method returns an array of ones

In [20]:
np.ones(5)

array([1., 1., 1., 1., 1.])

In [21]:
np.ones([3, 4])

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### The `np.full()` method allows you to create an array of arbitrary values

In [22]:
# Create a 1d array with 5 elements, all of which are 3
np.full(5, 3) 

array([3, 3, 3, 3, 3])

In [23]:
# Create a 1d array with 5 elements, filling them with the values 0 to 4
np.full(5, range(5)) 

array([0, 1, 2, 3, 4])

In [24]:
# Sadly this trick won't work for multidimensional arrays
np.full([2, 5], range(10))

ValueError: could not broadcast input array from shape (10) into shape (2,5)

In [25]:
# NumPy also has useful built-in mathematical numbers
np.full([2, 5], np.pi) 

array([[3.14159265, 3.14159265, 3.14159265, 3.14159265, 3.14159265],
       [3.14159265, 3.14159265, 3.14159265, 3.14159265, 3.14159265]])

## Numpy array subsetting

You can subset NumPy arrays very similarly to list slicing in python.

In [26]:
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]])
print(x.shape)
x

(4, 3)


array([[ 1,  2,  3],
       [ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

In [27]:
# Retrieving the first row
x[0] 

array([1, 2, 3])

In [28]:
# Retrieving all rows after the first row
x[1:] 

array([[ 4,  5,  6],
       [ 7,  8,  9],
       [10, 11, 12]])

### This becomes particularly useful in multidimensional arrays when we can slice on multiple dimensions

In [29]:
# All rows, column 0
x[:,0] 

array([ 1,  4,  7, 10])

In [30]:
# Rows 2 through 4, columns 1 through 3
x[2:4,1:3] 

array([[ 8,  9],
       [11, 12]])

### Notice that you can't slice in multiple dimensions naturally with built-in lists

In [31]:
x = [[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]
x

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]]

In [32]:
x[0]

[1, 2, 3]

In [33]:
x[:,0]

TypeError: list indices must be integers or slices, not tuple

In [34]:
# To slice along a second dimension with lists we must verbosely use a list comprehension
[i[0] for i in x]

[1, 4, 7, 10]

In [35]:
# Doing this in multiple dimensions with lists
[i[1:3] for i in x[2:4]]

[[8, 9], [11, 12]]

### 3D Slicing

In [36]:
# With an array
x = np.array([
              [[1,2,3], [4,5,6]],
              [[7,8,9], [10,11,12]]
             ])
x

array([[[ 1,  2,  3],
        [ 4,  5,  6]],

       [[ 7,  8,  9],
        [10, 11, 12]]])

In [37]:
x.shape

(2, 2, 3)

In [38]:
x[:,:,-1]

array([[ 3,  6],
       [ 9, 12]])

## Summary

Great! You learned about a bunch of NumPy commands. Now, let's move over to the lab to put your new skills into practice!