# NumPy

## Understanding Data Types in Python


```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

While in Python the equivalent operation could be written this way:
```python
# Python code
result = 0
for i in range(100):
    result += i
```


In [1]:
x = 4

In [2]:
x = 'four'

In [3]:
x = True

```C
/* C code */
int x = 4;
x = "four";  // FAILS
```

### A Python Integer Is More Than Just an Integer

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

A single integer in Python 3.4 actually contains four pieces:
- ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
- ob_type, which encodes the type of the variable
- ob_size, which specifies the size of the following data members
- ob_digit, which contains the actual integer value that we expect the Python variable to represent.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" alt="Integer Memory Layout">

### A Python List Is More Than Just a List


In [5]:
L  = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [7]:
type(L[0])

int

In [8]:
L2 = [str(c) for c in L]

In [9]:
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [10]:
type(L2[0])

str

In [11]:
L3 = [True, "2", 3.0, 4]


<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" alt="Array Memory Layout">

### Fixed-Type Arrays in Python


In [12]:
import array

L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## How Vectorization Makes Code Faster



<p><img alt="Translating Python code to bytecode" src="https://s3.amazonaws.com/dq-content/289/bytecode.svg"></p>


<table>
<thead>
<tr>
<th>Language Type</th>
<th>Example</th>
<th>Time taken to write program</th>
<th>Control over program performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Level</td>
<td>Python</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Low-Level</td>
<td>C</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>



<p><img alt="For loop to sum rows" src="https://s3.amazonaws.com/dq-content/289/for_loop.svg"></p>

In [13]:
my_numbers = [[6,5],[1,3],[5,6]]

sums = []

for row in my_numbers:
    row_sum = row[0] + row[1]
    sums.append(row_sum)
    
print(sums)

[11, 4, 11]



<p><img alt="Unvectorized operation" src="https://s3.amazonaws.com/dq-content/289/unvectorized.svg"></p>

<p><img alt="Vectorized operation" src="https://s3.amazonaws.com/dq-content/289/vectorized.svg"></p>



## Numpy

In [14]:
import numpy as np

### NumPy ndarrays



<p><img alt="Dimensional Arrays" src="https://s3.amazonaws.com/dq-content/289/dimensional_arrays.svg"></p>



#### Create an array



In [16]:
list1 = [6,7.5,78,45,9,6,58]
arr1 = np.array(list1)
print(arr1)

[ 6.   7.5 78.  45.   9.   6.  58. ]


In [17]:
type(arr1)

numpy.ndarray

In [18]:
type(list1)

list

In [19]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)

In [23]:
print(arr2)

[[1 2 3 4]
 [5 6 7 8]]


In [24]:
# Ones
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [26]:
# Arange
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [28]:
# Zeros
np.zeros((10,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [29]:
# Linespace
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [30]:
#random.randint
np.random.randint(0, 10, (4,4))

array([[6, 4, 4, 6],
       [8, 1, 3, 4],
       [7, 4, 2, 5],
       [8, 5, 3, 1]])

In [31]:
#random.random
np.random.random((3,3))

array([[0.37913246, 0.07178432, 0.04763755],
       [0.59336497, 0.78205293, 0.82733929],
       [0.91378039, 0.51960217, 0.97452465]])

In [33]:
#eye
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [34]:
#full
np.full((5,6), 8)

array([[8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8]])

In [44]:
#empty
np.empty((3,3))

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

#### Understanding NumPy ndarrays

In [46]:
data3 = np.random.randint(0, 10, (4,7))

In [47]:
data3

array([[5, 5, 7, 7, 9, 8, 0],
       [0, 2, 7, 6, 5, 6, 8],
       [9, 7, 2, 3, 4, 4, 3],
       [1, 3, 8, 8, 7, 3, 2]])

In [48]:
#ndim
data3.ndim

2

In [49]:
#shape
data3.shape #(rows, columns)

(4, 7)

In [51]:
#size
data3.size

28

In [52]:
data3.itemsize

8

In [53]:
data3.nbytes

224

#### Selecting and Slicing Rows and Items from ndarrays

<p><img alt="Selecting rows from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_rows.svg"></p>



This is how we select a single item from a 2D ndarray:

<p><img alt="Selecting a single item from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_item.svg"></p>


In [None]:
# ndarray[row,colum]

- int 5
- slice 0:5, 5:
- :
- [1,5,9]
- boolean array

In [55]:
test_arr = np.random.randint(10, size=(5,5))

In [56]:
test_arr

array([[8, 5, 4, 8, 3],
       [3, 2, 2, 4, 6],
       [4, 7, 7, 7, 2],
       [2, 8, 6, 5, 0],
       [4, 0, 4, 3, 6]])

In [61]:
#prva vrstica
first_row = test_arr[0]
first_row

array([8, 5, 4, 8, 3])

In [68]:
#zadnja vrstica
test_arr[-1]

array([4, 0, 4, 3, 6])

In [64]:
# 2 in 3 vrstica
#test_arr[[1,2]]
test_arr[1:3]

array([[3, 2, 2, 4, 6],
       [4, 7, 7, 7, 2]])

In [65]:
#vrstica 2 in 4
test_arr[[1,3]]

array([[3, 2, 2, 4, 6],
       [2, 8, 6, 5, 0]])

In [66]:
#vrstica 2 do konca
test_arr[1:]

array([[3, 2, 2, 4, 6],
       [4, 7, 7, 7, 2],
       [2, 8, 6, 5, 0],
       [4, 0, 4, 3, 6]])

#### Selecting Columns and Custom Slicing ndarrays

Let's continue by learning how to select one or more columns of data:

<p><img alt="Selecting columns from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_columns.svg"></p>



If we wanted to select a partial 1D slice of a row or column, we can combine a single value for one dimension with a slice for the other dimension:

<p><img alt="Selecting partial 1D slices from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_1darray.svg"></p>

Lastly, if we wanted to select a 2D slice, we can use slices for both dimensions:

<p><img alt="Selecting a 2D slice from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_2darray.svg"></p>



In [69]:
test_arr2 = np.random.randint(10, size=(5,5))

In [70]:
test_arr2

array([[3, 9, 3, 8, 9],
       [9, 1, 5, 2, 5],
       [1, 8, 4, 6, 2],
       [4, 2, 5, 2, 0],
       [1, 3, 8, 2, 8]])

In [73]:
# stolpec 2
test_arr2[:, 1]

array([9, 1, 8, 2, 3])

In [76]:
# stolpec 1,2
test_arr2[:,:2]

array([[3, 9],
       [9, 1],
       [1, 8],
       [4, 2],
       [1, 3]])

In [77]:
# stolpec 2,4,5
test_arr2[:,[1,3,4]]

array([[9, 8, 9],
       [1, 2, 5],
       [8, 6, 2],
       [2, 2, 0],
       [3, 2, 8]])

In [78]:
test_arr2

array([[3, 9, 3, 8, 9],
       [9, 1, 5, 2, 5],
       [1, 8, 4, 6, 2],
       [4, 2, 5, 2, 0],
       [1, 3, 8, 2, 8]])

In [82]:
#vrstica 3, stolpec emementi 2 do 4 
test_arr2[2, 1:4]

array([8, 4, 6])

In [83]:
test_arr2[1:4, :3]

array([[9, 1, 5],
       [1, 8, 4],
       [4, 2, 5]])

#### Modify values in ndarray



In [84]:
test_arr2

array([[3, 9, 3, 8, 9],
       [9, 1, 5, 2, 5],
       [1, 8, 4, 6, 2],
       [4, 2, 5, 2, 0],
       [1, 3, 8, 2, 8]])

In [85]:
test_arr2[0,0] = 125

In [86]:
test_arr2

array([[125,   9,   3,   8,   9],
       [  9,   1,   5,   2,   5],
       [  1,   8,   4,   6,   2],
       [  4,   2,   5,   2,   0],
       [  1,   3,   8,   2,   8]])

In [87]:
test_arr2[0,1] = 136.589

In [88]:
test_arr2

array([[125, 136,   3,   8,   9],
       [  9,   1,   5,   2,   5],
       [  1,   8,   4,   6,   2],
       [  4,   2,   5,   2,   0],
       [  1,   3,   8,   2,   8]])

In [None]:
# decimalno vrednost odreže

#### Datatypes

[Več o datatypes](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

[List of scalars](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in)

In [89]:
x = np.array([1,2])
print(x.dtype)

int64


In [90]:
x = np.array([1.0,2.0])
print(x.dtype)

float64


In [92]:
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

In [93]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

### Computation on NumPy Arrays: Universal Functions


#### The Slowness of Loops



#### Introducing UFuncs (Universal functions)

[Docs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html())



### Uvoz realnih podatkov


- Row 1 is RatecodeID
- Row 2 is PULocationID
- Row 3 is DOLocationID
- Row 4 is passenger_count
- Row 5 is trip_distance
- Row 6 is fare_amount
- Row 7 is extra
- Row 8 is mta_tax
- Row 9 is tip_amount
- Row 10 is tolls_amount
- Row 11 is improvement_surcharge
- Row 12 is total_amount
- Row 13 is payment_type
- Row 14 is trip_type

### Vector Math




Here's what happened behind the scenes:

<p><img alt="Vectorized Addition" src="https://s3.amazonaws.com/dq-content/289/vectorized_addition.svg"></p>


- `vector_a + vector_b` - Addition
- `vector_a - vector_b7` - Subtraction
- `vector_a * vector_b` - Multiplication (this is unrelated to the vector multiplication used in linear algebra).
- `vector_a / vector_b` - Division
- `vector_a % vector_b` - Modulus (find the remainder when vector_a is divided by vector_b)
- `vector_a ** vector_b` - Exponent (raise vector_a to the power of vector_b)
- `vector_a // vector_b` - Floor Division (divide vector_a by vector_b, rounding down to the nearest integer)


<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The following table lists the arithmetic operators implemented in NumPy:</p>
<table>
<thead><tr>
<th>Operator</th>
<th>Equivalent ufunc</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>+</code></td>
<td><code>np.add</code></td>
<td>Addition (e.g., <code>1 + 1 = 2</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.subtract</code></td>
<td>Subtraction (e.g., <code>3 - 2 = 1</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.negative</code></td>
<td>Unary negation (e.g., <code>-2</code>)</td>
</tr>
<tr>
<td><code>*</code></td>
<td><code>np.multiply</code></td>
<td>Multiplication (e.g., <code>2 * 3 = 6</code>)</td>
</tr>
<tr>
<td><code>/</code></td>
<td><code>np.divide</code></td>
<td>Division (e.g., <code>3 / 2 = 1.5</code>)</td>
</tr>
<tr>
<td><code>//</code></td>
<td><code>np.floor_divide</code></td>
<td>Floor division (e.g., <code>3 // 2 = 1</code>)</td>
</tr>
<tr>
<td><code>**</code></td>
<td><code>np.power</code></td>
<td>Exponentiation (e.g., <code>2 ** 3 = 8</code>)</td>
</tr>
<tr>
<td><code>%</code></td>
<td><code>np.mod</code></td>
<td>Modulus/remainder (e.g., <code>9 % 4 = 1</code>)</td>
</tr>
</tbody>
</table>
<p>Additionally there are Boolean/bitwise operators; we will explore these in <a href="02.06-boolean-arrays-and-masks.html">Comparisons, Masks, and Boolean Logic</a>.</p>

</div>
</div>

[Mathematical expressions](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html#arithmetic-operations)

### Calculating Statistics For 1D ndarrays



### Calculating Statistics For 2D ndarrays



<p><img alt="Array method without axis parameter" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_none.svg"></p>



<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_1.svg"></p>



<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_0.svg"></p>



<p><img alt="The axis parameter" src="https://s3.amazonaws.com/dq-content/289/axis_param.svg"></p>



### Adding Rows and Columns to ndarrays


### Sorting ndarrays


###  Reading CSV files with NumPy

###  Boolean Arrays





A similar pattern occurs– the 'less than five' operation is applied to each value in the array. The diagram below shows this step by step:

<p><img alt="Vectorized boolean operation" src="https://s3.amazonaws.com/dq-content/290/vectorized_bool.svg"></p>

### Boolean Indexing with 1D ndarrays




<p><img alt="Boolean indexing 1D ndarrays 1" src="https://s3.amazonaws.com/dq-content/290/1d_bool_1.svg"></p>



<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/1d_bool_2.svg"></p>




### Boolean Indexing with 2D ndarrays


<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/bool_dims.svg"></p>


### Assigning Values in ndarrays

### Subarrays as no-copy views



### Copying Data
