# NumPy

## Understanding Data Types in Python


```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

While in Python the equivalent operation could be written this way:
```python
# Python code
result = 0
for i in range(100):
    result += i
```


In [1]:
x = 4

In [2]:
x = 'four'

In [3]:
x = True

```C
/* C code */
int x = 4;
x = "four";  // FAILS
```

### A Python Integer Is More Than Just an Integer

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

A single integer in Python 3.4 actually contains four pieces:
- ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
- ob_type, which encodes the type of the variable
- ob_size, which specifies the size of the following data members
- ob_digit, which contains the actual integer value that we expect the Python variable to represent.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" alt="Integer Memory Layout">

### A Python List Is More Than Just a List


In [5]:
L  = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [7]:
type(L[0])

int

In [8]:
L2 = [str(c) for c in L]

In [9]:
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [10]:
type(L2[0])

str

In [11]:
L3 = [True, "2", 3.0, 4]


<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" alt="Array Memory Layout">

### Fixed-Type Arrays in Python


In [12]:
import array

L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## How Vectorization Makes Code Faster



<p><img alt="Translating Python code to bytecode" src="https://s3.amazonaws.com/dq-content/289/bytecode.svg"></p>


<table>
<thead>
<tr>
<th>Language Type</th>
<th>Example</th>
<th>Time taken to write program</th>
<th>Control over program performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Level</td>
<td>Python</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Low-Level</td>
<td>C</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>



<p><img alt="For loop to sum rows" src="https://s3.amazonaws.com/dq-content/289/for_loop.svg"></p>

In [13]:
my_numbers = [[6,5],[1,3],[5,6]]

sums = []

for row in my_numbers:
    row_sum = row[0] + row[1]
    sums.append(row_sum)
    
print(sums)

[11, 4, 11]



<p><img alt="Unvectorized operation" src="https://s3.amazonaws.com/dq-content/289/unvectorized.svg"></p>

<p><img alt="Vectorized operation" src="https://s3.amazonaws.com/dq-content/289/vectorized.svg"></p>



## Numpy

In [14]:
import numpy as np

### NumPy ndarrays



<p><img alt="Dimensional Arrays" src="https://s3.amazonaws.com/dq-content/289/dimensional_arrays.svg"></p>



#### Create an array



In [16]:
list1 = [6,7.5,78,45,9,6,58]
arr1 = np.array(list1)
print(arr1)

[ 6.   7.5 78.  45.   9.   6.  58. ]


In [17]:
type(arr1)

numpy.ndarray

In [18]:
type(list1)

list

In [19]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)

In [23]:
print(arr2)

[[1 2 3 4]
 [5 6 7 8]]


In [24]:
# Ones
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [26]:
# Arange
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [28]:
# Zeros
np.zeros((10,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [29]:
# Linespace
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [30]:
#random.randint
np.random.randint(0, 10, (4,4))

array([[6, 4, 4, 6],
       [8, 1, 3, 4],
       [7, 4, 2, 5],
       [8, 5, 3, 1]])

In [31]:
#random.random
np.random.random((3,3))

array([[0.37913246, 0.07178432, 0.04763755],
       [0.59336497, 0.78205293, 0.82733929],
       [0.91378039, 0.51960217, 0.97452465]])

In [33]:
#eye
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [34]:
#full
np.full((5,6), 8)

array([[8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8]])

In [44]:
#empty
np.empty((3,3))

array([[1., 0., 0.],
       [0., 1., 0.],
       [0., 0., 1.]])

#### Understanding NumPy ndarrays

In [46]:
data3 = np.random.randint(0, 10, (4,7))

In [47]:
data3

array([[5, 5, 7, 7, 9, 8, 0],
       [0, 2, 7, 6, 5, 6, 8],
       [9, 7, 2, 3, 4, 4, 3],
       [1, 3, 8, 8, 7, 3, 2]])

In [48]:
#ndim
data3.ndim

2

In [49]:
#shape
data3.shape #(rows, columns)

(4, 7)

In [51]:
#size
data3.size

28

In [52]:
data3.itemsize

8

In [53]:
data3.nbytes

224

#### Selecting and Slicing Rows and Items from ndarrays

<p><img alt="Selecting rows from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_rows.svg"></p>



This is how we select a single item from a 2D ndarray:

<p><img alt="Selecting a single item from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_item.svg"></p>


In [None]:
# ndarray[row,colum]

- int 5
- slice 0:5, 5:
- :
- [1,5,9]
- boolean array

In [55]:
test_arr = np.random.randint(10, size=(5,5))

In [56]:
test_arr

array([[8, 5, 4, 8, 3],
       [3, 2, 2, 4, 6],
       [4, 7, 7, 7, 2],
       [2, 8, 6, 5, 0],
       [4, 0, 4, 3, 6]])

In [61]:
#prva vrstica
first_row = test_arr[0]
first_row

array([8, 5, 4, 8, 3])

In [68]:
#zadnja vrstica
test_arr[-1]

array([4, 0, 4, 3, 6])

In [64]:
# 2 in 3 vrstica
#test_arr[[1,2]]
test_arr[1:3]

array([[3, 2, 2, 4, 6],
       [4, 7, 7, 7, 2]])

In [65]:
#vrstica 2 in 4
test_arr[[1,3]]

array([[3, 2, 2, 4, 6],
       [2, 8, 6, 5, 0]])

In [66]:
#vrstica 2 do konca
test_arr[1:]

array([[3, 2, 2, 4, 6],
       [4, 7, 7, 7, 2],
       [2, 8, 6, 5, 0],
       [4, 0, 4, 3, 6]])

#### Selecting Columns and Custom Slicing ndarrays

Let's continue by learning how to select one or more columns of data:

<p><img alt="Selecting columns from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_columns.svg"></p>



If we wanted to select a partial 1D slice of a row or column, we can combine a single value for one dimension with a slice for the other dimension:

<p><img alt="Selecting partial 1D slices from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_1darray.svg"></p>

Lastly, if we wanted to select a 2D slice, we can use slices for both dimensions:

<p><img alt="Selecting a 2D slice from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_2darray.svg"></p>



In [69]:
test_arr2 = np.random.randint(10, size=(5,5))

In [70]:
test_arr2

array([[3, 9, 3, 8, 9],
       [9, 1, 5, 2, 5],
       [1, 8, 4, 6, 2],
       [4, 2, 5, 2, 0],
       [1, 3, 8, 2, 8]])

In [73]:
# stolpec 2
test_arr2[:, 1]

array([9, 1, 8, 2, 3])

In [76]:
# stolpec 1,2
test_arr2[:,:2]

array([[3, 9],
       [9, 1],
       [1, 8],
       [4, 2],
       [1, 3]])

In [77]:
# stolpec 2,4,5
test_arr2[:,[1,3,4]]

array([[9, 8, 9],
       [1, 2, 5],
       [8, 6, 2],
       [2, 2, 0],
       [3, 2, 8]])

In [78]:
test_arr2

array([[3, 9, 3, 8, 9],
       [9, 1, 5, 2, 5],
       [1, 8, 4, 6, 2],
       [4, 2, 5, 2, 0],
       [1, 3, 8, 2, 8]])

In [82]:
#vrstica 3, stolpec emementi 2 do 4 
test_arr2[2, 1:4]

array([8, 4, 6])

In [83]:
test_arr2[1:4, :3]

array([[9, 1, 5],
       [1, 8, 4],
       [4, 2, 5]])

#### Modify values in ndarray



In [84]:
test_arr2

array([[3, 9, 3, 8, 9],
       [9, 1, 5, 2, 5],
       [1, 8, 4, 6, 2],
       [4, 2, 5, 2, 0],
       [1, 3, 8, 2, 8]])

In [85]:
test_arr2[0,0] = 125

In [86]:
test_arr2

array([[125,   9,   3,   8,   9],
       [  9,   1,   5,   2,   5],
       [  1,   8,   4,   6,   2],
       [  4,   2,   5,   2,   0],
       [  1,   3,   8,   2,   8]])

In [87]:
test_arr2[0,1] = 136.589

In [88]:
test_arr2

array([[125, 136,   3,   8,   9],
       [  9,   1,   5,   2,   5],
       [  1,   8,   4,   6,   2],
       [  4,   2,   5,   2,   0],
       [  1,   3,   8,   2,   8]])

In [None]:
# decimalno vrednost odreže

#### Datatypes

[Več o datatypes](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

[List of scalars](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in)

In [89]:
x = np.array([1,2])
print(x.dtype)

int64


In [90]:
x = np.array([1.0,2.0])
print(x.dtype)

float64


In [92]:
np.zeros(10, dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

In [93]:
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

### Computation on NumPy Arrays: Universal Functions


#### The Slowness of Loops



In [2]:
import numpy as np

In [3]:
def comute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output

In [4]:
np.random.seed(0)

values = np.random.randint(1,10, size=5)

In [5]:
values

array([6, 1, 4, 4, 8])

In [6]:
comute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [7]:
big_array = np.random.randint(1, 100, size=1000000)

In [8]:
%timeit comute_reciprocals(big_array)

2.55 s ± 24.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Introducing UFuncs (Universal functions)

[Docs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html())



In [9]:
l1 = [1,2,3]
l2 = [4,5,6]

In [10]:
l1+l2

[1, 2, 3, 4, 5, 6]

In [11]:
1/l1

TypeError: unsupported operand type(s) for /: 'int' and 'list'

In [12]:
print(comute_reciprocals(values))

[0.16666667 1.         0.25       0.25       0.125     ]


In [13]:
1.0/values

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [14]:
%timeit (1.0/big_array)

1.24 ms ± 20.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)


In [16]:
np.arange(5)

array([0, 1, 2, 3, 4])

In [17]:
np.arange(1,6)

array([1, 2, 3, 4, 5])

In [15]:
np.arange(5) / np.arange(1,6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

In [20]:
np.arange(9)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

In [18]:
x = np.arange(9).reshape((3,3))

In [19]:
x

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [21]:
2 ** x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

### Uvoz realnih podatkov


In [22]:
!head data/DATA_taxi_data.csv

VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type

2,2017-01-01 00:01:15,2017-01-01 00:11:05,N,1,42,166,1,1.71,9,0,0.5,0,0,,0.3,9.8,2,1
2,2017-01-01 00:03:34,2017-01-01 00:09:00,N,1,75,74,1,1.44,6.5,0.5,0.5,0,0,,0.3,7.8,2,1
2,2017-01-01 00:04:02,2017-01-01 00:12:55,N,1,82,70,5,3.45,12,0.5,0.5,2.66,0,,0.3,15.96,1,1
2,2017-01-01 00:01:40,2017-01-01 00:14:23,N,1,255,232,1,2.11,10.5,0.5,0.5,0,0,,0.3,11.8,2,1
2,2017-01-01 00:00:51,2017-01-01 00:18:55,N,1,166,239,1,2.76,11.5,0.5,0.5,0,0,,0.3,12.8,2,1
2,2017-01-01 00:00:28,2017-01-01 00:13:31,N,1,179,226,1,4.14,15,0.5,0.5,0,0,,0.3,16.3,1,1
2,2017-01-01 00:02:39,2017-01-01 00:26:28,N,1,74,167,1,4.22,19,0.5,0.5,0,0,,0.3,20.3,2,1
2,2017-01-01 00:15:21,2017-01-01 00:28:06,N,1,112,37,1,2.83,11,0.5,0.5,0,0,,0.3,12.3,2,1


In [23]:
import csv

In [24]:
with open('data/DATA_taxi_data.csv', 'r') as f:
    taxi_list = list(csv.reader(f))

In [27]:
taxi_list = taxi_list[2:]

In [29]:
#taxi_list[:3]

In [30]:
taxi_list[2]

['2',
 '2017-01-01 00:04:02',
 '2017-01-01 00:12:55',
 'N',
 '1',
 '82',
 '70',
 '5',
 '3.45',
 '12',
 '0.5',
 '0.5',
 '2.66',
 '0',
 '',
 '0.3',
 '15.96',
 '1',
 '1']

In [32]:
converted_taxi_list = []

for row in taxi_list:
    converted_row = []
    for item in row:
        try:
            converted_row.append(float(item))
        except:
            continue
    converted_taxi_list.append(converted_row)        

In [35]:
#converted_taxi_list[:2]

In [36]:
taxi = np.array(converted_taxi_list)

In [37]:
taxi

array([[  2.  ,   1.  ,  42.  , ...,   9.8 ,   2.  ,   1.  ],
       [  2.  ,   1.  ,  75.  , ...,   7.8 ,   2.  ,   1.  ],
       [  2.  ,   1.  ,  82.  , ...,  15.96,   1.  ,   1.  ],
       ...,
       [  1.  ,   1.  , 228.  , ...,  80.3 ,   1.  ,   1.  ],
       [  1.  ,   1.  ,   7.  , ...,  17.3 ,   1.  ,   1.  ],
       [  1.  ,   1.  , 255.  , ...,  12.8 ,   1.  ,   1.  ]])

- Row 1 is RatecodeID
- Row 2 is PULocationID
- Row 3 is DOLocationID
- Row 4 is passenger_count
- Row 5 is trip_distance
- Row 6 is fare_amount
- Row 7 is extra
- Row 8 is mta_tax
- Row 9 is tip_amount
- Row 10 is tolls_amount
- Row 11 is improvement_surcharge
- Row 12 is total_amount
- Row 13 is payment_type
- Row 14 is trip_type

### Vector Math



In [38]:
x = np.array([1,2,3])
y = np.array([4,5,6])

In [39]:
x+y

array([5, 7, 9])

In [40]:
x-y

array([-3, -3, -3])

In [41]:
x*y

array([ 4, 10, 18])

In [42]:
x/y

array([0.25, 0.4 , 0.5 ])

In [43]:
x**2

array([1, 4, 9])

In [44]:
x.dot(y)

32

In [45]:
z = np.array([y, y**2])

In [46]:
z

array([[ 4,  5,  6],
       [16, 25, 36]])

In [47]:
z.shape

(2, 3)

In [51]:
a = z.T

In [53]:
a

array([[ 4, 16],
       [ 5, 25],
       [ 6, 36]])

In [52]:
a.shape

(3, 2)

In [54]:
col1 = taxi[:,6]
col2 = taxi[:,8]
col3 = taxi[:,11]

In [55]:
sums = col1 + col2 + col3 

In [56]:
sums

array([ 9.8,  7.3, 12.8, ..., 61.8, 15.3, 10.8])

In [57]:
#mora biti ista oblika!
np.arange(3) + np.arange(4)

ValueError: operands could not be broadcast together with shapes (3,) (4,) 


Here's what happened behind the scenes:

<p><img alt="Vectorized Addition" src="https://s3.amazonaws.com/dq-content/289/vectorized_addition.svg"></p>


- `vector_a + vector_b` - Addition
- `vector_a - vector_b7` - Subtraction
- `vector_a * vector_b` - Multiplication (this is unrelated to the vector multiplication used in linear algebra).
- `vector_a / vector_b` - Division
- `vector_a % vector_b` - Modulus (find the remainder when vector_a is divided by vector_b)
- `vector_a ** vector_b` - Exponent (raise vector_a to the power of vector_b)
- `vector_a // vector_b` - Floor Division (divide vector_a by vector_b, rounding down to the nearest integer)

In [58]:
taxi.shape

(19998, 15)

In [59]:
trip_distance = taxi[:, 5]
trip_price = taxi[:,12]

In [60]:
price_per_mile = trip_price/trip_distance

  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.


In [63]:
price_per_mile

array([5.73099415, 5.41666667, 4.62608696, ..., 3.70046083, 4.11904762,
       5.12      ])

In [64]:
np.array([1,2,3])/0

  """Entry point for launching an IPython kernel.


array([inf, inf, inf])

In [65]:
# isto kot price_per_mile = trip_price/trip_distance
price_per_mile2 = np.divide(trip_price, trip_distance)

  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.



<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The following table lists the arithmetic operators implemented in NumPy:</p>
<table>
<thead><tr>
<th>Operator</th>
<th>Equivalent ufunc</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>+</code></td>
<td><code>np.add</code></td>
<td>Addition (e.g., <code>1 + 1 = 2</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.subtract</code></td>
<td>Subtraction (e.g., <code>3 - 2 = 1</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.negative</code></td>
<td>Unary negation (e.g., <code>-2</code>)</td>
</tr>
<tr>
<td><code>*</code></td>
<td><code>np.multiply</code></td>
<td>Multiplication (e.g., <code>2 * 3 = 6</code>)</td>
</tr>
<tr>
<td><code>/</code></td>
<td><code>np.divide</code></td>
<td>Division (e.g., <code>3 / 2 = 1.5</code>)</td>
</tr>
<tr>
<td><code>//</code></td>
<td><code>np.floor_divide</code></td>
<td>Floor division (e.g., <code>3 // 2 = 1</code>)</td>
</tr>
<tr>
<td><code>**</code></td>
<td><code>np.power</code></td>
<td>Exponentiation (e.g., <code>2 ** 3 = 8</code>)</td>
</tr>
<tr>
<td><code>%</code></td>
<td><code>np.mod</code></td>
<td>Modulus/remainder (e.g., <code>9 % 4 = 1</code>)</td>
</tr>
</tbody>
</table>
<p>Additionally there are Boolean/bitwise operators; we will explore these in <a href="02.06-boolean-arrays-and-masks.html">Comparisons, Masks, and Boolean Logic</a>.</p>

</div>
</div>

[Mathematical expressions](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html#arithmetic-operations)

### Calculating Statistics For 1D ndarrays



In [66]:
price_min = taxi[:,12].min()

In [67]:
price_min

-60.0

In [68]:
price_max = taxi[:,12].max()

In [69]:
price_max

240.0

In [70]:
#funkcija
np.max(taxi[:,12])

240.0

In [71]:
#metoda
taxi[:,12].max()

240.0

In [73]:
taxi[:,12].mean()

15.860465046504654

### Calculating Statistics For 2D ndarrays



<p><img alt="Array method without axis parameter" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_none.svg"></p>



<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_1.svg"></p>



<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_0.svg"></p>



<p><img alt="The axis parameter" src="https://s3.amazonaws.com/dq-content/289/axis_param.svg"></p>



In [74]:
taxi_first_20 = taxi[:20]

In [75]:
cene_skupaj = taxi_first_20[:, 6:12]

In [76]:
cene_skupaj_preracunano = taxi_first_20[:, 12]

In [77]:
cena_sestevek = cene_skupaj.sum(axis=1)

In [80]:
cena_sestevek.round()

array([10.,  8., 16., 12., 13., 16., 20., 12.,  6., 15., 12., 23., 31.,
        8., 26., 11.,  3., 12., 22., 12.])

In [81]:
cene_skupaj_preracunano.round()

array([10.,  8., 16., 12., 13., 16., 20., 12.,  6., 15., 12., 23., 31.,
        8., 26., 11.,  3., 12., 22., 12.])

### Adding Rows and Columns to ndarrays


In [82]:
ones = np.ones((2,3))

In [83]:
ones

array([[1., 1., 1.],
       [1., 1., 1.]])

In [84]:
zeros = np.zeros(3)

In [85]:
zeros

array([0., 0., 0.])

In [86]:
combined = np.concatenate([ones, zeros], axis=0)

ValueError: all the input arrays must have same number of dimensions

In [87]:
ones.shape

(2, 3)

In [88]:
zeros.shape

(3,)

In [89]:
zeros_2d = np.expand_dims(zeros, axis=0)

In [91]:
zeros

array([0., 0., 0.])

In [90]:
zeros_2d

array([[0., 0., 0.]])

In [92]:
zeros_2d.shape

(1, 3)

In [94]:
combined = np.concatenate([ones, zeros_2d], axis=0)

In [95]:
combined

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.]])

In [96]:
price_per_mile.shape

(19998,)

In [99]:
price_per_mile

array([5.73099415, 5.41666667, 4.62608696, ..., 3.70046083, 4.11904762,
       5.12      ])

In [97]:
price_per_mile_2d = np.expand_dims(price_per_mile, axis=1)

In [98]:
price_per_mile_2d

array([[5.73099415],
       [5.41666667],
       [4.62608696],
       ...,
       [3.70046083],
       [4.11904762],
       [5.12      ]])

In [100]:
price_per_mile_2d.shape

(19998, 1)

In [101]:
taxi = np.concatenate([taxi, price_per_mile_2d], axis=1)

In [102]:
taxi[:,-1]

array([5.73099415, 5.41666667, 4.62608696, ..., 3.70046083, 4.11904762,
       5.12      ])

### Sorting ndarrays


In [103]:
sadje = np.array(['pomaranca', 'banana', 'jabolka', 'grozdje', 'cesnja'])

In [104]:
sadje

array(['pomaranca', 'banana', 'jabolka', 'grozdje', 'cesnja'], dtype='<U9')

In [105]:
sadje[2]

'jabolka'

In [107]:
print(sadje[[2,1]])

['jabolka' 'banana']


In [108]:
sorted_order = np.argsort(sadje)

In [109]:
sorted_order

array([1, 4, 3, 2, 0])

In [110]:
sortirano_sadje = sadje[sorted_order]

In [112]:
print(sortirano_sadje)

['banana' 'cesnja' 'grozdje' 'jabolka' 'pomaranca']


In [113]:
#2d array

In [115]:
int_square = np.random.randint(10, size=(5,5))
int_square

array([[0, 3, 1, 8, 6],
       [9, 9, 7, 6, 0],
       [9, 1, 1, 3, 9],
       [2, 6, 1, 1, 7],
       [8, 9, 5, 9, 6]])

Zaželen output: 

```
[9, 9, 7, 6, 0]
[8, 9, 5, 9, 6]
[0, 3, 1, 8, 6]
[2, 6, 1, 1, 7]
[9, 1, 1, 3, 9]

```

In [116]:
last_column = int_square[:,4]
last_column

array([6, 0, 9, 7, 6])

In [117]:
sorted_order = np.argsort(last_column)

In [118]:
sorted_order

array([1, 0, 4, 3, 2])

In [119]:
last_column_sorted = last_column[sorted_order]

In [120]:
last_column_sorted

array([0, 6, 6, 7, 9])

In [121]:
int_square_sorted = int_square[sorted_order]

In [124]:
#vse v eni vrstici
int_square_sorted = int_square[np.argsort(int_square[:,4])]

In [125]:
int_square_sorted

array([[9, 9, 7, 6, 0],
       [0, 3, 1, 8, 6],
       [8, 9, 5, 9, 6],
       [2, 6, 1, 1, 7],
       [9, 1, 1, 3, 9]])

In [123]:
int_square

array([[0, 3, 1, 8, 6],
       [9, 9, 7, 6, 0],
       [9, 1, 1, 3, 9],
       [2, 6, 1, 1, 7],
       [8, 9, 5, 9, 6]])

In [122]:
int_square_sorted

array([[9, 9, 7, 6, 0],
       [0, 3, 1, 8, 6],
       [8, 9, 5, 9, 6],
       [2, 6, 1, 1, 7],
       [9, 1, 1, 3, 9]])

In [126]:
#taxi primer
sorted_order = np.argsort(taxi[:,-1])

In [128]:
taxi_sorted = taxi[sorted_order]

In [129]:
taxi_sorted[:20, -1]

array([         -inf,          -inf,          -inf,          -inf,
                -inf,          -inf,          -inf,          -inf,
                -inf,          -inf,          -inf, -693.33333333,
       -380.        , -380.        , -380.        , -192.5       ,
       -190.        , -138.94736842, -126.66666667,  -95.        ])

###  Reading CSV files with NumPy

In [130]:
!head data/DATA_taxi_data.csv

VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type

2,2017-01-01 00:01:15,2017-01-01 00:11:05,N,1,42,166,1,1.71,9,0,0.5,0,0,,0.3,9.8,2,1
2,2017-01-01 00:03:34,2017-01-01 00:09:00,N,1,75,74,1,1.44,6.5,0.5,0.5,0,0,,0.3,7.8,2,1
2,2017-01-01 00:04:02,2017-01-01 00:12:55,N,1,82,70,5,3.45,12,0.5,0.5,2.66,0,,0.3,15.96,1,1
2,2017-01-01 00:01:40,2017-01-01 00:14:23,N,1,255,232,1,2.11,10.5,0.5,0.5,0,0,,0.3,11.8,2,1
2,2017-01-01 00:00:51,2017-01-01 00:18:55,N,1,166,239,1,2.76,11.5,0.5,0.5,0,0,,0.3,12.8,2,1
2,2017-01-01 00:00:28,2017-01-01 00:13:31,N,1,179,226,1,4.14,15,0.5,0.5,0,0,,0.3,16.3,1,1
2,2017-01-01 00:02:39,2017-01-01 00:26:28,N,1,74,167,1,4.22,19,0.5,0.5,0,0,,0.3,20.3,2,1
2,2017-01-01 00:15:21,2017-01-01 00:28:06,N,1,112,37,1,2.83,11,0.5,0.5,0,0,,0.3,12.3,2,1


In [131]:
taxi = np.genfromtxt('data/DATA_taxi_data.csv', 
                    delimiter=',',
                    skip_header=2)

In [132]:
taxi

array([[ 2.  ,   nan,   nan, ...,  9.8 ,  2.  ,  1.  ],
       [ 2.  ,   nan,   nan, ...,  7.8 ,  2.  ,  1.  ],
       [ 2.  ,   nan,   nan, ..., 15.96,  1.  ,  1.  ],
       ...,
       [ 1.  ,   nan,   nan, ..., 80.3 ,  1.  ,  1.  ],
       [ 1.  ,   nan,   nan, ..., 17.3 ,  1.  ,  1.  ],
       [ 1.  ,   nan,   nan, ..., 12.8 ,  1.  ,  1.  ]])

In [134]:
taxi.shape

(19998, 19)

###  Boolean Arrays





In [135]:
False

False

In [136]:
True

True

In [137]:
3 < 10

True

In [138]:
np.array([1,2,3,4]) + 10

array([11, 12, 13, 14])

In [139]:
np.array([2,4,6,8]) < 5

array([ True,  True, False, False])

A similar pattern occurs– the 'less than five' operation is applied to each value in the array. The diagram below shows this step by step:

<p><img alt="Vectorized boolean operation" src="https://s3.amazonaws.com/dq-content/290/vectorized_bool.svg"></p>

In [140]:
a = np.array([1,2,3,4,5])
a_bool = a<3
a_bool

array([ True,  True, False, False, False])

In [141]:
b = np.array(['blue', 'red', 'black'])
b_bool = b == 'red'

b_bool

array([False,  True, False])

### Boolean Indexing with 1D ndarrays




<p><img alt="Boolean indexing 1D ndarrays 1" src="https://s3.amazonaws.com/dq-content/290/1d_bool_1.svg"></p>



<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/1d_bool_2.svg"></p>




In [142]:
passenger_count = taxi[:,7]

In [143]:
passenger_count

array([1., 1., 5., ..., 2., 1., 1.])

In [148]:
passenger_count.shape

(19998,)

In [163]:
two_pass_bool = passenger_count == 3

In [164]:
two_passengers = passenger_count[two_pass_bool]

In [167]:
#two_passengers

In [166]:
two_passengers.shape

(450,)

### Boolean Indexing with 2D ndarrays


<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/bool_dims.svg"></p>


In [168]:
arr = np.random.randint(10, size=(4,3))

In [169]:
arr

array([[5, 3, 6],
       [8, 0, 7],
       [0, 6, 5],
       [5, 8, 9]])

In [172]:
bool_1 = [True, False, True, True]
bool_2 = [True, False, True]

In [173]:
arr[:,bool_2]

array([[5, 6],
       [8, 7],
       [0, 5],
       [5, 9]])

In [171]:
arr[:,bool_1]

IndexError: boolean index did not match indexed array along dimension 1; dimension is 3 but corresponding boolean dimension is 4

### Assigning Values in ndarrays

In [184]:
b = np.array(['blue', 'red', 'black'])

In [185]:
b

array(['blue', 'red', 'black'], dtype='<U5')

In [186]:
b[1] = 'orange'

In [183]:
b

array(['blue', 'orange', 'blackeee'], dtype='<U8')

In [187]:
b[1:] = 'pink'

In [188]:
b

array(['blue', 'pink', 'pink'], dtype='<U5')

In [189]:
ones = np.ones((3,5))

In [190]:
ones

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [191]:
ones[1,2] = 99

In [192]:
ones

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1., 99.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

In [193]:
ones[0] = 42

In [194]:
ones

array([[42., 42., 42., 42., 42.],
       [ 1.,  1., 99.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

### Subarrays as no-copy views



In [197]:
r = np.ones((4,4))

In [198]:
r

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [199]:
r2 = r[:2,:2]

In [200]:
r2

array([[1., 1.],
       [1., 1.]])

In [201]:
r2[:] = 0

In [202]:
r2

array([[0., 0.],
       [0., 0.]])

In [203]:
r

array([[0., 0., 1., 1.],
       [0., 0., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### Copying Data


In [204]:
r3 = np.random.randint(10, size=(5,5))

In [205]:
r3

array([[3, 5, 7, 3, 4],
       [8, 3, 5, 7, 0],
       [5, 8, 3, 2, 7],
       [0, 2, 8, 0, 6],
       [2, 1, 6, 3, 1]])

In [206]:
r3_sub_copy = r3[:2,:2].copy()

In [207]:
r3_sub_copy

array([[3, 5],
       [8, 3]])

In [208]:
r3_sub_copy[:] = 0

In [209]:
r3_sub_copy

array([[0, 0],
       [0, 0]])

In [210]:
r3

array([[3, 5, 7, 3, 4],
       [8, 3, 5, 7, 0],
       [5, 8, 3, 2, 7],
       [0, 2, 8, 0, 6],
       [2, 1, 6, 3, 1]])