In [31]:
from IPython.display import HTML
from IPython.display import display

# Taken from https://stackoverflow.com/questions/31517194/how-to-hide-one-specific-cell-input-or-output-in-ipython-notebook
tag = HTML('''<script>

//https://jupyter-notebook.readthedocs.io/en/latest/examples/Notebook/JavaScript%20Notebook%20Extensions.html
$('#run_all_cells, #run_all_cells_above, #run_all_cells_below').click(function() {
    setTimeout(function() {
        // Find running cell and click the first one
        if ($('.running').length > 0) {
            $('.running')[0].click();
        }
    }, 250);
});

code_show=true; 
function code_toggle() {
    if (code_show){
        $('div.cell.code_cell.rendered.selected div.input').hide();
    } else {
        $('div.cell.code_cell.rendered.selected div.input').show();
    }
    code_show = !code_show
} 
$( document ).ready(code_toggle);

</script>
<style>
    @import url('https://fonts.googleapis.com/css?family=Raleway&display=swap');
    
    div.text_cell_render h1 { /* Main titles bigger, centered */
        font-size: 2.2em;
        line-height:1.4em;
        text-align:center;
        color: #00090d;
    }
    div.text_cell_render h2 { /*  Parts names nearer from text */
        font-size: 1.8em;
        color:#f2f2f2;;
        border-radius: 3px;
        background: #2b916a;
        padding: 15px;
        width: 99%;
        height: 2em;
    }
    div.text_cell_render h3 { /*  Parts names nearer from text */
        font-size: 1.5em;
        color:#f2f2f2;
        background: #1eb4a6;
        border-radius: 3px;
        padding: 15px;
        width: 99%;
        height: 2em;
    }
    div.text_cell_render h4 { /*  Parts names nearer from text */
        font-size: 1.2em; 
        font-style: normal;
        color:#f2f2f2;;
        border-radius: 3px;
        background: #008874;
        padding: 5px;
        display: inline-block;
    }
    div.text_cell_render h5 { /*  Parts names nearer from text */
        font-size: 1em;
        font-style: normal;
        color:#f2f2f2;;
        border-radius: 3px;
        background: #0070b8;
        padding: 5px;
        display: inline-block;
    }
    div.text_cell_render h6 { /*  Parts names nearer from text */
        font-size: 1em;
        color: #0082a3;
        font-style: normal;
    }
    
    /* Customize text cells */
    div.text_cell_render { 
        font-family: 'Raleway', sans-serif;
        text-align: justify;
    }    
    
    p,li,span {
        color:#0f0f0f;
        text-align: justify;
    }
       
    .text_cell_render,.rendered_html {
        font-style: normal;
        text-align: justify;
    }
    
    .link_background {
        color: #f2f2f2;
    }

    .box {
      border-radius: 3px;
      border: 2px solid #60b985;
      padding: 20px;
      width: 99%;
      height: 2em;
    }

    .key {
        color: #fdfdfd;
        background-color: #d37a7a;
        padding: 3px;
    }
    .highlight{
        color:#ca5e5e;
        background-color: #fbfacb;
        padding: 3px;
    }
    .mark{
        background-color: #f1d3d3;
        padding: 3px;
    }
    .note{
        background-color: #d9f3d8;
        color: #333333;
        padding: 3px;
    }
    
    .grow{
        font-size: 2em;
        font-weight: bold;
        color: #b31919;
    }
</style>
To show/hide this cell's raw code input, click <a href="javascript:code_toggle()">here</a>.''')
display(tag)

############### Write code below ##################

# Introduction to NumPy

This chapter outlines techniques for effectively loading, storing, and manipulating in-memory data in Python.
The topic is very broad: datasets can come from a wide range of sources and a wide range of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else.
Despite this apparent heterogeneity, it will help us to think of all data fundamentally as arrays of numbers.

For example, 
- images–particularly digital images–can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area.
- Sound clips can be thought of as one-dimensional arrays of intensity versus time.
- Text can be converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words.

No matter what the data are, the first step in making it analyzable will be to transform them into arrays of numbers.

#### What is Numpy?

NumPy (short for *Numerical Python*) provides an efficient interface to store and operate on dense data buffers.
In some ways, NumPy arrays are like Python's built-in ``list`` type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size.

If you already have installed the Anaconda stack, you already have NumPy installed and ready to go.
If you're more the do-it-yourself type, you can go to http://www.numpy.org/ and follow the installation instructions found there. Or, read this https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/

##### How to use Conda from the Jupyter Notebook

In [None]:
# Install a conda package in the current Jupyter kernel
import sys
!conda install --yes --prefix {sys.prefix} numpy

That bit of extra boiler-plate makes certain that conda installs the package in the currently-running Jupyter kernel.

##### How to use Pip from the Jupyter Notebook

In [None]:
# Install a pip package in the current Jupyter kernel
import sys
!{sys.executable} -m pip install numpy

If you are installing it for the first time, logs would look something like:

```
Collecting numpy
  Downloading numpy-1.18.1-cp38-cp38-macosx_10_9_x86_64.whl (15.2 MB)
     |████████████████████████████████| 15.2 MB 5.9 MB/s eta 0:00:01
Installing collected packages: numpy
Successfully installed numpy-1.18.1
```

That bit of extra boiler-plate makes certain that you are running the pip version associated with the current Python kernel, so that the installed packages can be used in the current notebook. This is related to the fact that, even setting Jupyter notebooks aside, it's better to install packages using:

```
$ python -m pip install <package>
```

Once you do, you can import NumPy and double-check the version:

In [10]:
import numpy
numpy.__version__

'1.18.1'

By convention, you'll find that most people in the SciPy/PyData world will import NumPy using ``np`` as an alias:

In [3]:
# import numpy library
import numpy as np

It's usually fixed in size and each element is of the same type. We can cast a list to a numpy array by first importing numpy: 

## Understanding Data Types in Python

Effective data-driven science and computation requires understanding how data is stored and manipulated.
This section outlines and contrasts how arrays of data are handled in the Python language itself, and how NumPy improves on this.
Understanding this difference is fundamental to understanding much of the material throughout the rest of the book.

Users of Python are often drawn-in by its ease of use, one piece of which is dynamic typing.
While a statically-typed language like C or Java requires each variable to be explicitly declared, a dynamically-typed language like Python skips this specification. For example, in C you might specify a particular operation as follows:

```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

While in Python the equivalent operation could be written this way:

```python
# Python code
result = 0
for i in range(100):
    result += i
```

Notice the main difference: in C, the data types of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, that we can assign any kind of data to any variable:

```python
# Python code
x = 4
x = "four"
```

Here we've switched the contents of ``x`` from an integer to a string. The same thing in C would lead (depending on compiler settings) to a compilation error or other unintented consequences:

```C
/* C code */
int x = 4;
x = "four";  // FAILS
```

This sort of flexibility is one piece that makes Python and other dynamically-typed languages convenient and easy to use.
Understanding *how* this works is an important piece of learning to analyze data efficiently and effectively with Python.
But what this type-flexibility also points to is the fact that Python variables are more than just their value; they also contain extra information about the type of the value.

#### A Python Integer Is More Than Just an Integer

The standard Python implementation is written in C.
This means that every Python object is simply a cleverly-disguised C structure, which contains not only its value, but other information as well. For example, when we define an integer in Python, such as ``x = 10000``, ``x`` is not just a "raw" integer. It's actually a pointer to a compound C structure, which contains several values.
Looking through the Python 3.x source code, we find that the integer (long) type definition effectively looks like this (once the C macros are expanded):

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

A single integer in Python 3.x actually contains four pieces:

- ``ob_refcnt``, a reference count that helps Python silently handle memory allocation and deallocation
- ``ob_type``, which encodes the type of the variable
- ``ob_size``, which specifies the size of the following data members
- ``ob_digit``, which contains the actual integer value that we expect the Python variable to represent.

This means that there is some overhead in storing an integer in Python as compared to an integer in a compiled language like C, as illustrated in the following figure:

<img src='../../resources/images/cint_vs_pyint.png' width='25%;' height='25%' style='float:left;'>

Here ``PyObject_HEAD`` is the part of the structure containing the reference count, type code, and other pieces mentioned before.

<b>Notice the difference here:</b> <br>
- A C integer is essentially a label for a position in memory whose bytes encode an integer value.
- A Python integer is a pointer to a position in memory containing all the Python object information, including the bytes that contain the integer value.

This extra information in the Python integer structure is what allows Python to be coded so freely and dynamically.
All this additional information in Python types comes at a cost, however, which becomes especially apparent in structures that combine many of these objects.

Let’s say you had the following code that defines the variable ``x``:

In **C** language
```C
int x = 1;
```
This one line of code has several, distinct steps when executed:
1. Allocate enough memory for an integer
1. Assign the value 2337 to that memory location
1. Indicate that x points to that value

Shown in a simplified view of memory, it might look like this:

<img src='../../resources/images/mem_alloc_1.png' width='15%;' height='15%' style='float:left;'>

Here, you can see that the variable ``x`` has a fake memory location of ``0x7f1`` and the value ``1``. If, later in the program, you want to change the value of ``x``, you can do the following:

In **C** language
```C
int x = 2;
```
The above code assigns a new value (``2``) to the variable ``x``, thereby overwriting the previous value. This means that the variable ``x`` is mutable. The updated memory layout shows the new value:

<img src='../../resources/images/mem_alloc_2.png' width='15%;' height='15%' style='float:left;'>

Notice that the location of ``x`` didn’t change, just the value itself. This is a significant point. It means that ``x`` is the memory location, not just a name for it.

If we write, the below code:
```C
int y = x;
```
This code creates a new box called ``y`` and copies the value from ``x`` into the box. Now the memory layout will look like this:

<img src='../../resources/images/mem_alloc_2.png' width='15%;' height='15%' style='float:left;'><img src='../../resources/images/mem_alloc_3.png' width='15%;' height='15%'>

Notice the new location ``0x7f5`` of ``y``. Even though the value of ``x`` was copied to ``y``, the variable ``y`` owns some new address in memory. Therefore, you could overwrite the value of ``y`` without affecting ``x``:

```C
int y = 3;
```
Now the memory layout will look like this:

<img src='../../resources/images/mem_alloc_2.png' width='15%;' height='15%' style='float:left;'><img src='../../resources/images/mem_alloc_4.png' width='15%;' height='15%'>

Again, you have modified the value at ``y``, but not its location. In addition, you have not affected the original ``x`` variable at all. 

Python does not have variables. It has names. Yes, this is a pedantic point, and you can certainly use the term variables as much as you like. It is important to know that there is a difference between variables and names.

Let’s take the equivalent code from the above C example and write it in Python:

In [12]:
x = 1;

Much like in C, the above code is broken down into several distinct steps during execution:
1. Create a PyObject
1. Set the typecode to integer for the PyObject
1. Set the value to 1 for the PyObject
1. Create a name called x
1. Point x to the new PyObject
1. Increase the refcount of the PyObject by 1

**Note:** The PyObject is not the same as Python’s object. It’s specific to CPython and represents the base structure for all Python objects.

PyObject is defined as a C struct, so if you’re wondering why you can’t call typecode or refcount directly, its because you don’t have access to the structures directly. Method calls like sys.getrefcount() can help get some internals.

In memory, it might looks something like this:

<img src='../../resources/images/mem_alloc_5.png' width='35%;' height='35%' style='float:left;'>

You can see that the memory layout is vastly different than the C layout from before. Instead of ``x`` owning the block of memory where the value ``1`` resides, the newly created Python object owns the memory where ``1`` lives. The Python name ``x`` doesn’t directly own any memory address in the way the C variable ``x`` owned a static slot in memory.

If you were to try to assign a new value to ``x``, you could try the following:

In [13]:
x = 2;

What’s happening here is different than the C equivalent, but not too different from the original bind in Python.

This code:
1. Creates a new PyObject
1. Sets the typecode to integer for the PyObject
1. Sets the value to 2 for the PyObject
1. Points x to the new PyObject
1. Increases the refcount of the new PyObject by 1
1. Decreases the refcount of the old PyObject by 1

Now in memory, it would look something like this:

<img src='../../resources/images/mem_alloc_6.png' width='35%;' height='35%' style='float:left;'>

This diagram helps illustrate that x points to a reference to an object and doesn’t own the memory space as before. It also shows that the ``x = 2`` command is not an assignment, but rather binding the name ``x`` to a reference.

In addition, the previous object (which held the ``1`` value) is now sitting in memory with a ref count of ``0`` and will get cleaned up by the garbage collector.

You could introduce a new name, ``y``, to the mix as in the C example:

In [14]:
y = x;

<img src='../../resources/images/mem_alloc_7.png' width='35%;' height='35%' style='float:left;'>

Now you can see that a new Python object has not been created, just a new name that points to the same object. Also, the object’s refcount has increased by one. You could check for object identity equality to confirm that they are the same:

In [15]:
y is x

True

https://realpython.com/pointers-in-python/

#### A Python List Is More Than Just a List

Let's consider now what happens when we use a Python data structure that holds many Python objects.
The standard mutable multi-element container in Python is the list.
We can create a list of integers as follows:

In [None]:
L = list(range(10))
L

In [None]:
type(L[0])

Or, similarly, a list of strings:

In [None]:
L2 = [str(c) for c in L]
L2

In [None]:
type(L2[0])

Because of Python's dynamic typing, we can even create heterogeneous lists:

In [None]:
L3 = [True, "2", 3.0, 4]
[type(item) for item in L3]

But this flexibility comes at a cost: to allow these flexible types, each item in the list must contain its own type info, reference count, and other information–that is, each item is a complete Python object.
In the special case that all variables are of the same type, much of this information is redundant: it can be much more efficient to store data in a fixed-type array.
The difference between a dynamic-type list and a fixed-type (NumPy-style) array is illustrated in the following figure:

<img src='../../resources/images/array_vs_list.png' width='50%;' height='50%' style='float:left;'>

**Refer:**
- https://www.python-course.eu/numpy.php
- https://www.slideshare.net/nnja/memory-management-in-python-the-basics
- https://towardsdatascience.com/how-fast-numpy-really-is-e9111df44347

## Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data buffers.
<span class='note'>The built-in ``array`` module (available since Python 3.3) can be used to create dense arrays of a uniform type:</span>

In [2]:
import array
L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

<span class='note'>Here ``'i'`` is a type code indicating the contents are integers.</span>

Much more useful, however, is the ``ndarray`` object of the NumPy package.
While Python's ``array`` object provides efficient storage of array-based data, NumPy adds to this efficient *operations* on that data.
We will explore these operations in later sections; here we'll demonstrate several ways of creating a NumPy array.

We'll start with the standard NumPy import, under the alias ``np``:

In [11]:
import numpy as np

#### Creating Arrays from Python Lists

First, we can use ``np.array`` to create arrays from Python lists:

In [12]:
# integer array:
np.array([1, 4, 2, 5, 3])

array([1, 4, 2, 5, 3])

<span class='mark'>Remember that unlike Python lists, NumPy is constrained to arrays that all contain the same type.
If types do not match, NumPy will upcast if possible (here, integers are up-cast to floating point):</span>

In [13]:
np.array([3.14, 4, 2, 3])

array([3.14, 4.  , 2.  , 3.  ])

If we want to explicitly set the data type of the resulting array, we can use the ``dtype`` keyword:

In [14]:
np.array([1, 2, 3, 4], dtype='float32')

array([1., 2., 3., 4.], dtype=float32)

Finally, unlike Python lists, NumPy arrays can explicitly be multi-dimensional; here's one way of initializing a multidimensional array using a list of lists:

In [15]:
range(2, 2 + 3)

range(2, 5)

In [18]:
# nested lists result in multi-dimensional arrays
np.array([range(i, i + 3) for i in [2, 4, 6]])

array([[2, 3, 4],
       [4, 5, 6],
       [6, 7, 8]])

The inner lists are treated as rows of the resulting two-dimensional array.

#### Creating Arrays from Scratch

Especially for larger arrays, it is more efficient to create arrays from scratch using routines built into NumPy.
Here are several examples:

In [20]:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

In [21]:
# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [22]:
# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

array([[3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14],
       [3.14, 3.14, 3.14, 3.14, 3.14]])

In [23]:
# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar to the built-in range() function)
np.arange(0, 20, 2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [24]:
# Create an array of five values evenly spaced, between 0 and 1
np.linspace(0, 1, 5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [25]:
# Create a 3x3 array of uniformly distributed random values, between 0 and 1
np.random.random((3, 3))

array([[0.06253477, 0.36196913, 0.49807541],
       [0.3920686 , 0.6111534 , 0.40388863],
       [0.07751704, 0.63335595, 0.51501987]])

In [26]:
# Create a 3x3 array of normally distributed random values, with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

array([[-1.21955594, -0.24483796,  0.03531857],
       [-0.96542248,  0.02339453, -1.53552933],
       [ 0.44853537,  1.52854889,  0.12850929]])

In [27]:
# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

array([[8, 8, 9],
       [2, 3, 0],
       [2, 8, 3]])

In [28]:
# Create a 3x3 identity matrix
np.eye(3)

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

In [29]:
# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

array([1., 1., 1.])

## NumPy Standard Data Types

NumPy arrays contain values of a single type, so it is important to have detailed knowledge of those types and their limitations.
Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table.
Note that when constructing an array, they can be specified using a string:

```python
np.zeros(10, dtype='int16')
```

Or using the associated NumPy object:

```python
np.zeros(10, dtype=np.int16)
```

| Data type	    | Description |
|---------------|-------------|
| ``bool_``     | Boolean (True or False) stored as a byte |
| ``int_``      | Default integer type (same as C ``long``; normally either ``int64`` or ``int32``)| 
| ``intc``      | Identical to C ``int`` (normally ``int32`` or ``int64``)| 
| ``intp``      | Integer used for indexing (same as C ``ssize_t``; normally either ``int32`` or ``int64``)| 
| ``int8``      | Byte (-128 to 127)| 
| ``int16``     | Integer (-32768 to 32767)|
| ``int32``     | Integer (-2147483648 to 2147483647)|
| ``int64``     | Integer (-9223372036854775808 to 9223372036854775807)| 
| ``uint8``     | Unsigned integer (0 to 255)| 
| ``uint16``    | Unsigned integer (0 to 65535)| 
| ``uint32``    | Unsigned integer (0 to 4294967295)| 
| ``uint64``    | Unsigned integer (0 to 18446744073709551615)| 
| ``float_``    | Shorthand for ``float64``.| 
| ``float16``   | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa| 
| ``float32``   | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa| 
| ``float64``   | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa| 
| ``complex_``  | Shorthand for ``complex128``.| 
| ``complex64`` | Complex number, represented by two 32-bit floats| 
| ``complex128``| Complex number, represented by two 64-bit floats| 

More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the [NumPy documentation](http://numpy.org/).
NumPy also supports compound data types, which will be covered later under the topic "Structured Data: NumPy's Structured Arrays".