# NumPy

## Understanding Data Types in Python


```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

While in Python the equivalent operation could be written this way:
```python
# Python code
result = 0
for i in range(100):
    result += i
```


In [None]:
x = 4

In [None]:
x = 'four'

In [None]:
x = True

```C
/* C code */
int x = 4;
x = "four";  // FAILS
```

### A Python Integer Is More Than Just an Integer

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

A single integer in Python 3.4 actually contains four pieces:
- ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
- ob_type, which encodes the type of the variable
- ob_size, which specifies the size of the following data members
- ob_digit, which contains the actual integer value that we expect the Python variable to represent.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" alt="Integer Memory Layout">

### A Python List Is More Than Just a List


In [2]:
L = list(range(10))
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [3]:
type(L[0])

int

In [4]:
L2 = [str(c) for c in L]
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [5]:
type(L2[0])

str

In [7]:
L3 = [True]


<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" alt="Array Memory Layout">

### Fixed-Type Arrays in Python


In [9]:
import array

L = list(range(10))
A = array.array('i', L)
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## How Vectorization Makes Code Faster



<p><img alt="Translating Python code to bytecode" src="https://s3.amazonaws.com/dq-content/289/bytecode.svg"></p>


<table>
<thead>
<tr>
<th>Language Type</th>
<th>Example</th>
<th>Time taken to write program</th>
<th>Control over program performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Level</td>
<td>Python</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Low-Level</td>
<td>C</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>



<p><img alt="For loop to sum rows" src="https://s3.amazonaws.com/dq-content/289/for_loop.svg"></p>

In [10]:
my_numbers = [[6,5],[1,3],[5,6]]
sums = []

for row in my_numbers:
    row_sum = row[0] + row[1]
    sums.append(row_sum)
    
print(sums)
    


[11, 4, 11]



<p><img alt="Unvectorized operation" src="https://s3.amazonaws.com/dq-content/289/unvectorized.svg"></p>

<p><img alt="Vectorized operation" src="https://s3.amazonaws.com/dq-content/289/vectorized.svg"></p>



## Numpy

In [11]:
import numpy as np

### NumPy ndarrays



<p><img alt="Dimensional Arrays" src="https://s3.amazonaws.com/dq-content/289/dimensional_arrays.svg"></p>



#### Create an array



In [12]:
list1 = [6,7.5,78,45,9,6,58] # isti podatkovni tip
arr1 = np.array(list1)
print(arr1)

[ 6.   7.5 78.  45.   9.   6.  58. ]


In [13]:
type(arr1)

numpy.ndarray

In [14]:
type(list1)

list

In [16]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)
print(arr2)

[[1 2 3 4]
 [5 6 7 8]]


In [17]:
print(data2)

[[1, 2, 3, 4], [5, 6, 7, 8]]


In [18]:
# Ones, vsebuje same 1
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [19]:
# Arange, podobno kot v pythonu range
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [29]:
# Zeros, privzeto tipa float
np.zeros(10)

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [20]:
np.zeros((10,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [21]:
# Linspace, razdelimo interval 0 - 1 na 5 enakih delov, vključno 0 in 1
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [22]:
np.random.randint(0,10,(4,4)) # 0 - 10, matrika 4x4

array([[3, 6, 0, 4],
       [2, 1, 8, 1],
       [4, 7, 1, 5],
       [1, 1, 2, 4]])

In [23]:
#random.random
np.random.random((3,3))

array([[0.33852159, 0.26471471, 0.68072035],
       [0.0394094 , 0.87350926, 0.7699223 ],
       [0.867346  , 0.93337412, 0.27522545]])

In [24]:
#eye diagonalna matrika 1
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [25]:
#full, velikost matrike in 8 je konstanta
np.full((5,6), 8)

array([[8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8]])

In [38]:
#empty
#np.empty(6)

array([3.31033942e-033, 4.57664832e-071, 4.26205720e-086, 3.35758758e-143,
       6.01433264e+175, 6.93885958e+218])

In [26]:
np.empty((3,3)) # se zapolni shranjeno od prej, drugače je prazno

array([[0.33852159, 0.26471471, 0.68072035],
       [0.0394094 , 0.87350926, 0.7699223 ],
       [0.867346  , 0.93337412, 0.27522545]])

#### Understanding NumPy ndarrays

In [27]:
data3 = np.random.randint(0,10,(4,7))
data3

array([[6, 5, 2, 9, 0, 8, 0],
       [0, 3, 9, 7, 6, 4, 0],
       [5, 2, 3, 3, 7, 4, 9],
       [0, 7, 5, 4, 8, 8, 6]])

In [28]:
#ndim 2d dimenzija
data3.ndim

2

In [29]:
#shape velikost matrike
data3.shape

(4, 7)

In [30]:
#size število elementov
data3.size

28

In [31]:
data3.itemsize # velikost enega elementa v arrayu v bajtih

8

In [32]:
data3.nbytes #8*28

224

#### Selecting and Slicing Rows and Items from ndarrays

<p><img alt="Selecting rows from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_rows.svg"></p>



This is how we select a single item from a 2D ndarray:

<p><img alt="Selecting a single item from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_item.svg"></p>


In [None]:
# ndarray[row,colum]

- int 5
- slice 0:5, 5:
- :
- [1,5,9]
- boolean array

In [33]:
test_array = np.random.randint(10, size=(5,5))
test_array

array([[7, 6, 5, 2, 8],
       [4, 2, 2, 9, 7],
       [4, 5, 4, 5, 8],
       [4, 4, 9, 9, 0],
       [5, 6, 8, 3, 3]])

In [34]:
#prva vrstica
first_row = test_array[0] # 0 pomeni prva vrstica in so vsi elemnti prve vrstice
first_row

array([7, 6, 5, 2, 8])

In [35]:
#zadnja vrstica
test_array[-1,:] #lahko samo -1, brez dvopičja

array([5, 6, 8, 3, 3])

In [36]:
# 2 in 3 vrstica
#test_array[[1,2]]
test_array[1:3]

array([[4, 2, 2, 9, 7],
       [4, 5, 4, 5, 8]])

In [37]:
#vrstica2 in 4
test_array[[1,3]]

array([[4, 2, 2, 9, 7],
       [4, 4, 9, 9, 0]])

In [38]:
#vsrtica 2 do konca
test_array[1:]

array([[4, 2, 2, 9, 7],
       [4, 5, 4, 5, 8],
       [4, 4, 9, 9, 0],
       [5, 6, 8, 3, 3]])

#### Selecting Columns and Custom Slicing ndarrays

Let's continue by learning how to select one or more columns of data:

<p><img alt="Selecting columns from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_columns.svg"></p>



If we wanted to select a partial 1D slice of a row or column, we can combine a single value for one dimension with a slice for the other dimension:

<p><img alt="Selecting partial 1D slices from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_1darray.svg"></p>

Lastly, if we wanted to select a 2D slice, we can use slices for both dimensions:

<p><img alt="Selecting a 2D slice from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_2darray.svg"></p>



In [39]:
test_arr2 = np.random.randint(10, size=(5,5))
test_arr2

array([[3, 1, 7, 2, 8],
       [8, 2, 4, 4, 1],
       [4, 4, 6, 9, 9],
       [1, 0, 0, 2, 2],
       [9, 7, 4, 2, 4]])

In [40]:
# stolpec 2
test_arr2[:,1]

array([1, 2, 4, 0, 7])

In [41]:
#stolpec 1,2
test_arr2[:, 0:2] # lahko tudi [:, :2]

array([[3, 1],
       [8, 2],
       [4, 4],
       [1, 0],
       [9, 7]])

In [42]:
#stolpec 2,4,5
test_arr2[:,[1,3,4]]

array([[1, 2, 8],
       [2, 4, 1],
       [4, 9, 9],
       [0, 2, 2],
       [7, 2, 4]])

In [43]:
#vrstica 3, stolpec elementi 2 do 4
test_arr2[2, 1:4]

array([4, 6, 9])

In [44]:
test_arr2[1:4, :3] # stolpec o do 3 in vsrtice 2..

array([[8, 2, 4],
       [4, 4, 6],
       [1, 0, 0]])

#### Modify values in ndarray



In [45]:
test_arr2

array([[3, 1, 7, 2, 8],
       [8, 2, 4, 4, 1],
       [4, 4, 6, 9, 9],
       [1, 0, 0, 2, 2],
       [9, 7, 4, 2, 4]])

In [46]:
test_arr2[0,0] = 125
test_arr2

array([[125,   1,   7,   2,   8],
       [  8,   2,   4,   4,   1],
       [  4,   4,   6,   9,   9],
       [  1,   0,   0,   2,   2],
       [  9,   7,   4,   2,   4]])

In [47]:
test_arr2[0,1] = 136.589 # decimalno število odreže ne zaokrožuje
test_arr2

array([[125, 136,   7,   2,   8],
       [  8,   2,   4,   4,   1],
       [  4,   4,   6,   9,   9],
       [  1,   0,   0,   2,   2],
       [  9,   7,   4,   2,   4]])

#### Datatypes

[Več o datatypes](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

[List of scalars](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in)

In [48]:
#x = np.array([1,2])
x = np.array([1.0,2.0])
print(x.dtype)

float64


In [49]:
#np.zeros(10, dtype=np.int16) #ali
np.zeros(10, dtype='int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

### Computation on NumPy Arrays: Universal Functions


#### The Slowness of Loops



#### Introducing UFuncs (Universal functions)

[Docs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html())



In [50]:
import numpy as np

In [53]:
np.arange(9)

array([0, 1, 2, 3, 4, 5, 6, 7, 8])

### Uvoz realnih podatkov


In [54]:
!head data/taxi_data.csv

VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type

2,2017-01-01 00:01:15,2017-01-01 00:11:05,N,1,42,166,1,1.71,9,0,0.5,0,0,,0.3,9.8,2,1
2,2017-01-01 00:03:34,2017-01-01 00:09:00,N,1,75,74,1,1.44,6.5,0.5,0.5,0,0,,0.3,7.8,2,1
2,2017-01-01 00:04:02,2017-01-01 00:12:55,N,1,82,70,5,3.45,12,0.5,0.5,2.66,0,,0.3,15.96,1,1
2,2017-01-01 00:01:40,2017-01-01 00:14:23,N,1,255,232,1,2.11,10.5,0.5,0.5,0,0,,0.3,11.8,2,1
2,2017-01-01 00:00:51,2017-01-01 00:18:55,N,1,166,239,1,2.76,11.5,0.5,0.5,0,0,,0.3,12.8,2,1
2,2017-01-01 00:00:28,2017-01-01 00:13:31,N,1,179,226,1,4.14,15,0.5,0.5,0,0,,0.3,16.3,1,1
2,2017-01-01 00:02:39,2017-01-01 00:26:28,N,1,74,167,1,4.22,19,0.5,0.5,0,0,,0.3,20.3,2,1
2,2017-01-01 00:15:21,2017-01-01 00:28:06,N,1,112,37,1,2.83,11,0.5,0.5,0,0,,0.3,12.3,2,1


In [55]:
import csv

In [56]:
with open('data/taxi_data.csv', 'r') as f:
    taxi_list = list(csv.reader(f))


In [57]:
taxi_list = taxi_list[2:]

In [59]:
taxi_list[2]

['2',
 '2017-01-01 00:04:02',
 '2017-01-01 00:12:55',
 'N',
 '1',
 '82',
 '70',
 '5',
 '3.45',
 '12',
 '0.5',
 '0.5',
 '2.66',
 '0',
 '',
 '0.3',
 '15.96',
 '1',
 '1']

In [60]:
converted_taxi_list = []

for row in taxi_list:
    converted_row = []
    for item in row:
        try:
            converted_row.append(float(item))
        except:
            continue
    converted_taxi_list.append(converted_row)

In [61]:
#converted_taxi_list[:2]

In [63]:
taxi = np.array(converted_taxi_list)

In [64]:
taxi

array([[  2.  ,   1.  ,  42.  , ...,   9.8 ,   2.  ,   1.  ],
       [  2.  ,   1.  ,  75.  , ...,   7.8 ,   2.  ,   1.  ],
       [  2.  ,   1.  ,  82.  , ...,  15.96,   1.  ,   1.  ],
       ...,
       [  1.  ,   1.  , 228.  , ...,  80.3 ,   1.  ,   1.  ],
       [  1.  ,   1.  ,   7.  , ...,  17.3 ,   1.  ,   1.  ],
       [  1.  ,   1.  , 255.  , ...,  12.8 ,   1.  ,   1.  ]])

- Row 1 is RatecodeID
- Row 2 is PULocationID
- Row 3 is DOLocationID
- Row 4 is passenger_count
- Row 5 is trip_distance
- Row 6 is fare_amount
- Row 7 is extra
- Row 8 is mta_tax
- Row 9 is tip_amount
- Row 10 is tolls_amount
- Row 11 is improvement_surcharge
- Row 12 is total_amount
- Row 13 is payment_type
- Row 14 is trip_type

### Vector Math



In [65]:

x = np.array([1,2,3])

y = np.array([4,5,6])

x + y

array([5, 7, 9])

In [66]:
x + y

array([5, 7, 9])

In [67]:
x*y
x/y
x**y
x.dot(y)
z = np.array([y, y**2])
z.shape

(2, 3)

In [68]:
col1 = taxi[:,6]
col2 = taxi[:,8]
col3 = taxi[:,11]

In [69]:
sums = col1 + col2 + col3

In [70]:
sums

array([ 9.8,  7.3, 12.8, ..., 61.8, 15.3, 10.8])

In [75]:
np.arange(3) + np.arange(9)

TypeError: 'builtin_function_or_method' object is not subscriptable


Here's what happened behind the scenes:

<p><img alt="Vectorized Addition" src="https://s3.amazonaws.com/dq-content/289/vectorized_addition.svg"></p>


- `vector_a + vector_b` - Addition
- `vector_a - vector_b7` - Subtraction
- `vector_a * vector_b` - Multiplication (this is unrelated to the vector multiplication used in linear algebra).
- `vector_a / vector_b` - Division
- `vector_a % vector_b` - Modulus (find the remainder when vector_a is divided by vector_b)
- `vector_a ** vector_b` - Exponent (raise vector_a to the power of vector_b)
- `vector_a // vector_b` - Floor Division (divide vector_a by vector_b, rounding down to the nearest integer)

In [76]:
taxi.shape

(19998, 15)

In [77]:
trip_distance = taxi[:,5]
trip_price = taxi[:,12]

In [78]:
price_per_mile = trip_price/trip_distance

  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.


In [79]:
price_per_mile

array([5.73099415, 5.41666667, 4.62608696, ..., 3.70046083, 4.11904762,
       5.12      ])

In [80]:
np.array([1,2,3])/0

  """Entry point for launching an IPython kernel.


array([inf, inf, inf])

In [81]:
price_per_mile2 = np.divide(trip_price, trip_distance)
#enako kot price_per_mile = trip_price/trip_distance

  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.



<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The following table lists the arithmetic operators implemented in NumPy:</p>
<table>
<thead><tr>
<th>Operator</th>
<th>Equivalent ufunc</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>+</code></td>
<td><code>np.add</code></td>
<td>Addition (e.g., <code>1 + 1 = 2</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.subtract</code></td>
<td>Subtraction (e.g., <code>3 - 2 = 1</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.negative</code></td>
<td>Unary negation (e.g., <code>-2</code>)</td>
</tr>
<tr>
<td><code>*</code></td>
<td><code>np.multiply</code></td>
<td>Multiplication (e.g., <code>2 * 3 = 6</code>)</td>
</tr>
<tr>
<td><code>/</code></td>
<td><code>np.divide</code></td>
<td>Division (e.g., <code>3 / 2 = 1.5</code>)</td>
</tr>
<tr>
<td><code>//</code></td>
<td><code>np.floor_divide</code></td>
<td>Floor division (e.g., <code>3 // 2 = 1</code>)</td>
</tr>
<tr>
<td><code>**</code></td>
<td><code>np.power</code></td>
<td>Exponentiation (e.g., <code>2 ** 3 = 8</code>)</td>
</tr>
<tr>
<td><code>%</code></td>
<td><code>np.mod</code></td>
<td>Modulus/remainder (e.g., <code>9 % 4 = 1</code>)</td>
</tr>
</tbody>
</table>
<p>Additionally there are Boolean/bitwise operators; we will explore these in <a href="02.06-boolean-arrays-and-masks.html">Comparisons, Masks, and Boolean Logic</a>.</p>

</div>
</div>

[Mathematical expressions](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html#arithmetic-operations)

### Calculating Statistics For 1D ndarrays



In [82]:
price_min = taxi[:,12].min()

In [83]:
price_min

-60.0

In [84]:
price_max = taxi[:,12].max()

In [85]:
#funkcija
np.max(taxi[:,12])

240.0

In [86]:
#metoda
taxi[:,12].max()

240.0

In [87]:
taxi[:,12].mean()

15.860465046504654

### Calculating Statistics For 2D ndarrays

For now, we're going to look at how we can calculate statistics for two-dimensional ndarrays. If we use the arrays without additional parameters, they will return a single value, just like they do with a 1D array:

<p><img alt="Array method without axis parameter" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_none.svg"></p>

But what if we wanted to find the maximum value of each row? For that, we need to use the axis parameter, and specify a value of 1, which indicates we want to calculate values for each row.

<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_1.svg"></p>

If we want to find the maximum value of each column, we use an axis value of 0:

<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_0.svg"></p>

To help you remember which is which, you can think of the first axis as rows, and the second axis as columns, just in the same way as when we're indexing a 2D NumPy array we use ndarray[row,column]. Then you think about which axis you want to apply the method along. The tricky part is to remember that when you apply the method along one axis, you get results in the other axis. Here is an illustration of that:

<p><img alt="The axis parameter" src="https://s3.amazonaws.com/dq-content/289/axis_param.svg"></p>



In [88]:
taxi_first_20 = taxi[:20]

In [89]:
cene_skupaj = taxi_first_20[:,6:12] # vsi stoplci, vrstica od 6-12 oz obratno

In [90]:
cene_skupaj_preracunano = taxi_first_20[:,12]

In [91]:
cene_sestevek = cene_skupaj.sum(axis=1)

In [92]:
cene_sestevek.round()

array([10.,  8., 16., 12., 13., 16., 20., 12.,  6., 15., 12., 23., 31.,
        8., 26., 11.,  3., 12., 22., 12.])

In [93]:
cene_skupaj_preracunano.round()

array([10.,  8., 16., 12., 13., 16., 20., 12.,  6., 15., 12., 23., 31.,
        8., 26., 11.,  3., 12., 22., 12.])

### Adding Rows and Columns to ndarrays


In [94]:
ones = np.ones((2,3))

In [95]:
ones

array([[1., 1., 1.],
       [1., 1., 1.]])

In [96]:
zeros = np.zeros(3)

In [97]:
zeros

array([0., 0., 0.])

In [98]:
combined = np.concatenate([ones, zeros], axis=0)

ValueError: all the input arrays must have same number of dimensions

In [99]:
ones.shape

(2, 3)

In [100]:
zeros.shape

(3,)

In [101]:
zeros_2d = np.expand_dims(zeros, axis=0)

In [103]:
zeros_2d.shape

(1, 3)

In [104]:
combined = np.concatenate([ones, zeros_2d], axis=0)

In [105]:
combined

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.]])

In [106]:
price_per_mile.shape

(19998,)

In [107]:
price_per_mile_2d = np.expand_dims(price_per_mile, axis=1)

In [108]:
price_per_mile_2d.shape

(19998, 1)

In [109]:
taxi = np.concatenate([taxi, price_per_mile_2d], axis=1)

In [110]:
taxi[:,-1]

array([5.73099415, 5.41666667, 4.62608696, ..., 3.70046083, 4.11904762,
       5.12      ])

### Sorting ndarrays


In [116]:
sadje = np.array(['pomaranca', 'banana', 'jabolka', 'grozdje', 'cesnja'])
sadje
sadje[2]
print(sadje[[2,1]])
sorted_order = np.argsort(sadje)
print(sorted_order)
sortirano_sadje = sadje[sorted_order]
print(sortirano_sadje)

['jabolka' 'banana']
[1 4 3 2 0]
['banana' 'cesnja' 'grozdje' 'jabolka' 'pomaranca']


In [119]:
int_square = np.random.randint(10,size=(5,5))
int_square

array([[1, 6, 9, 6, 9],
       [3, 5, 3, 8, 7],
       [1, 6, 1, 2, 7],
       [2, 6, 3, 2, 1],
       [6, 2, 7, 8, 2]])

In [120]:
#int_square = np.random.randint(10,size=(5,5))
#int_square
last_column = int_square[:,4]
last_column

array([9, 7, 7, 1, 2])

In [123]:
#int_square = np.random.randint(10,size=(5,5))
#int_square
#last_column = int_square[:,4]
#last_column
sorted_order = np.argsort(last_column)
sorted_order

array([3, 4, 1, 2, 0])

In [125]:
#int_square = np.random.randint(10,size=(5,5))
#int_square
last_column = int_square[:,4]
last_column
sorted_order = np.argsort(last_column)
sorted_order
last_column_sorted = last_column[sorted_order]
last_column_sorted
int_square
int_square_sorted = int_square[np.argsort(int_square[:,4])] # vse v eni vrstici
int_square_sorted

array([[2, 6, 3, 2, 1],
       [6, 2, 7, 8, 2],
       [3, 5, 3, 8, 7],
       [1, 6, 1, 2, 7],
       [1, 6, 9, 6, 9]])

In [126]:
#taxi primer
sorted_order = np.argsort(taxi[:,-1])
taxi_sorted = taxi[sorted_order]
taxi_sorted[:20,-1]#?

array([         -inf,          -inf,          -inf,          -inf,
                -inf,          -inf,          -inf,          -inf,
                -inf,          -inf,          -inf, -693.33333333,
       -380.        , -380.        , -380.        , -192.5       ,
       -190.        , -138.94736842, -126.66666667,  -95.        ])

###  Reading CSV files with NumPy

In [127]:
taxi = np.genfromtxt('data/taxi_data.csv',
                    delimiter=',',
                    skip_header=2)
taxi

array([[ 2.  ,   nan,   nan, ...,  9.8 ,  2.  ,  1.  ],
       [ 2.  ,   nan,   nan, ...,  7.8 ,  2.  ,  1.  ],
       [ 2.  ,   nan,   nan, ..., 15.96,  1.  ,  1.  ],
       ...,
       [ 1.  ,   nan,   nan, ..., 80.3 ,  1.  ,  1.  ],
       [ 1.  ,   nan,   nan, ..., 17.3 ,  1.  ,  1.  ],
       [ 1.  ,   nan,   nan, ..., 12.8 ,  1.  ,  1.  ]])

In [128]:
taxi.shape

(19998, 19)

###  Boolean Arrays





In [129]:
False

False

In [130]:
True

True

In [131]:
3 < 10

True

In [132]:
np.array([1,2,3,5])+10

array([11, 12, 13, 15])

In [133]:
np.array([2,4,6,8]) < 5

array([ True,  True, False, False])

A similar pattern occurs– the 'less than five' operation is applied to each value in the array. The diagram below shows this step by step:

<p><img alt="Vectorized boolean operation" src="https://s3.amazonaws.com/dq-content/290/vectorized_bool.svg"></p>

In [134]:
a = np.array([1,2,3,4,5])
a_bool = a<3
a_bool

array([ True,  True, False, False, False])

In [135]:
b = np.array(['blue','red','black'])
b_bool = b=='red'
b_bool

array([False,  True, False])

### Boolean Indexing with 1D ndarrays




<p><img alt="Boolean indexing 1D ndarrays 1" src="https://s3.amazonaws.com/dq-content/290/1d_bool_1.svg"></p>



<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/1d_bool_2.svg"></p>




In [136]:
passenger_count = taxi[:,7]
passenger_count
two_pass_bool = passenger_count == 2
two_passengers = passenger_count[two_pass_bool]
two_passengers
two_passengers.shape

(1801,)

In [138]:
two_pass_bool = passenger_count == 2
two_passengers = passenger_count[two_pass_bool]
two_passengers

array([2., 2., 2., ..., 2., 2., 2.])

### Boolean Indexing with 2D ndarrays


<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/bool_dims.svg"></p>


In [145]:
arr = np.random.randint(10,size=(4,3))
arr
bool_1 = [True, False, True, True]
bool_2 = [True, False, True]
arr[:, bool_1]
arr[:,bool_2]

IndexError: boolean index did not match indexed array along dimension 1; dimension is 3 but corresponding boolean dimension is 4

In [147]:
bool_1 = [True, False, True, True]
bool_2 = [True, False, True]
arr[:,bool_1]
arr[:,bool_2]

IndexError: boolean index did not match indexed array along dimension 1; dimension is 3 but corresponding boolean dimension is 4

### Assigning Values in ndarrays

In [149]:
b = np.array(['blue','red','black'])
b
b[1]='orange'
b

array(['blue', 'orang', 'black'], dtype='<U5')

In [150]:
b = np.array(['blue','red','blackee'])
b
b[1]='orange'
b

array(['blue', 'orange', 'blackee'], dtype='<U7')

In [151]:
b = np.array(['blue','red','black'])
b
b[1]='orange'
b
b[1:]='pink'
b

array(['blue', 'pink', 'pink'], dtype='<U5')

In [152]:
ones = np.ones((3,5))
ones
ones[1,2]=99
ones

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1., 99.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

In [153]:
ones = np.ones((3,5))
ones
ones[1,2]=99
ones[0]=42
ones

array([[42., 42., 42., 42., 42.],
       [ 1.,  1., 99.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

### Subarrays as no-copy views



In [154]:
r = np.ones((4,4))

In [155]:
r

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [156]:
r2 = r[:2,:2]
r2

array([[1., 1.],
       [1., 1.]])

In [157]:
r2[:]=0
r2

array([[0., 0.],
       [0., 0.]])

In [158]:
r

array([[0., 0., 1., 1.],
       [0., 0., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### Copying Data


In [160]:
r3 = np.random.randint(10,size=(5,5))
r3

array([[1, 0, 4, 9, 9],
       [3, 3, 3, 8, 5],
       [1, 9, 8, 2, 1],
       [7, 4, 2, 1, 8],
       [9, 8, 0, 8, 7]])

In [161]:
r3_sub_copy = r3[:2,:2].copy
r3_sub_copy

<function ndarray.copy>