# Uvod v Numpy

## Datatypes in Python

```C
# Primer C:
int x = 4;
x = "string;
```

In [2]:
# Primer Python:
x = 4
print(type(x), x)

x = "string"
print(type(x), x)

<class 'int'> 4
<class 'str'> string


## Integers in Python

```C
# Python3.4 implementation of long integer
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

![Int in C vs int in py](./images/cint_vs_pyint.png)

## Lists in Python

In [3]:
l = [1, "string", 2.3, False]
print(type(l), l)
datatypes = [type(i) for i in l]
print(datatypes)

<class 'list'> [1, 'string', 2.3, False]
[<class 'int'>, <class 'str'>, <class 'float'>, <class 'bool'>]


![array_vs_list](./images/array_vs_list.png)

In [4]:
l = [0,1,2,3,4,5,6]
print(type(l), l)
print(type(l[0]), l[0])

<class 'list'> [0, 1, 2, 3, 4, 5, 6]
<class 'int'> 0


### Fixed-Type Arrays in Python

In [5]:
import array

L = list(range(10))
A = array.array("i", L)
print(A)

A[0] = 2.3

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


TypeError: 'float' object cannot be interpreted as an integer

# Vectorization

In [7]:
my_numbers = [
    [6,5],
    [1,3],
    [5,6],
    [3,7],
    [1,4],
    [5,8],
    [3,5],
    [8,4]
]

sums = []
for row in my_numbers:
    row_sum = row[0] + row[1]
    sums.append(row_sum)
print(sums)

[11, 4, 11, 10, 5, 13, 8, 12]


![numpy_for_gif](./images/numpy_for.gif)

![numpy_vectorized](./images/numpy_vectorized.gif)

# Numpy Library

[Dokumentacija](http://www.numpy.org/)

```bash
$ pip install numpy
```

In [8]:
import numpy as np

np.__version__

'1.23.5'

## Speed comparison

### %timeit

In [9]:
n = 100_000
sum([1. / i**2 for i in range(1, n)])

1.6449240667982423

In [10]:
%timeit -r 5 -n 30 sum([1. / i**2 for i in range(1, n)])

73 ms ± 10.1 ms per loop (mean ± std. dev. of 5 runs, 30 loops each)


In [11]:
%%timeit -r 5 -n 30
s = 0.
for i in range(1, n):
    s += 1. / i**2

52.1 ms ± 4.2 ms per loop (mean ± std. dev. of 5 runs, 30 loops each)


In [12]:
import numpy as np
size = 5_000_000

list1 = [i for i in range(size)]
list2 = [i for i in range(size)]

array1 = np.arange(size)
array2 = np.arange(size)

print(list1[:5])
print(array1[:5])

[0, 1, 2, 3, 4]
[0 1 2 3 4]


In [13]:
%%timeit -n 10 -r 1
# Python code

sums = []
for el1, el2 in zip(list1, list2):
    sums.append(el1 + el2)
print(sums[:5])

[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
[0, 2, 4, 6, 8]
1.18 s ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)


In [14]:
%%timeit -n 10 -r 1
sums = array1 + array2
print(sums[:5])

[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
[0 2 4 6 8]
29.9 ms ± 0 ns per loop (mean ± std. dev. of 1 run, 10 loops each)


<hr>

## Ndarray

![1-dimenzional array](./images/one_dim.svg)

<a href="https://numpy.org/doc/stable/reference/generated/numpy.array.html#numpy.array">numpy.array() - documentation</a>

In [15]:
l = [6,1,5,2,3,5,6,4]
ndarray = np.array(l)
print(type(ndarray))
print(ndarray)

<class 'numpy.ndarray'>
[6 1 5 2 3 5 6 4]


In [16]:
print(ndarray[0])
print(ndarray[4])
print(ndarray[-1])

6
3
4


![two dimensional array](./images/Two_Dim.svg)

In [17]:
list_ = [
    [6,5],
    [1,3],
    [2,3],
    [4,3],
    [6,5],
    [2,3]
]

ndarray = np.array(list_)
print(type(ndarray))
print(ndarray)

<class 'numpy.ndarray'>
[[6 5]
 [1 3]
 [2 3]
 [4 3]
 [6 5]
 [2 3]]


---

# Learning through real example

## Importing data

* `pickup_year` - leto vožnje
* `puckup_month` - mesec vožnje. `1` predstavlja Januar, `12` predstavlja December
* `pickup_day` - predstavlja dan v mesecu vožnje
* `pickup_dayofweek` - predstavlja dan v tednu. `1` predstavlja Ponedeljek, `7` predstavlja Soboto
* `pickup_time` - predstavlja čas, ko se je vožnja pričela:
    - `0` - 0:00am-3:59am.
    - `1` - 4:00am-7:59am.
    - `2` - 8:00am-11:59am.
    - `3` - 12:00pm-3:59pm.
    - `4` - 4:00pm-7:59pm.
    - `5` - 8:00pm-11:59pm.
* `pickup_location_code` - predstavlja mesto ali letališče kjer se je vožnja začela
    - `0` - Bronx.
    - `1` - Brooklyn.
    - `2` - JFK Airport.
    - `3` - LaGuardia Airport.
    - `4` - Manhattan.
    - `5` - Newark Airport.
    - `6` - Queens.
    - `7` - Staten Island.
* `dropoff_location_code` - predstavlja meto ali letališle, kjer se je vožnja končala
* `trip_distance` - predstavlja dolžino opravljene poti v miljah
* `trip_length` - predstavlja koliko časa je pot trajala, v sekundah
* `fare_amount` - predstavlja koliko je bila osnovna cena potovanja, v dolarjih
* `fees_amount` - predstavlja koliko dodatnih stroškov je stala vožnja
* `tolls_amount` - predstavlja koliko cestnine se je plačalo za vožnjo
* `tip_amount` - predstavlja koliko napitnine je pustila stranka
* `total_amount` - predstavlja koliko je bila končna cena, katero je plačal potnik. V dolarjih
* `payment_type` - predstavlja način plačila:
    - `1` - Credit card.
    - `2` - Cash.
    - `3` - No charge.
    - `4` - Dispute.
    - `5` - Unknown.
    - `6` - Voided trip.

In [18]:
import csv
import numpy as np

with open("data/nyc_taxis.csv") as f:
    taxi_list = list(csv.reader(f))
    
for row in taxi_list[:5]:
    print(row)

['pickup_year', 'pickup_month', 'pickup_day', 'pickup_dayofweek', 'pickup_time', 'pickup_location_code', 'dropoff_location_code', 'trip_distance', 'trip_length', 'fare_amount', 'fees_amount', 'tolls_amount', 'tip_amount', 'total_amount', 'payment_type']
['2016', '1', '1', '5', '0', '2', '4', '21.00', '2037', '52.00', '0.80', '5.54', '11.65', '69.99', '1']
['2016', '1', '1', '5', '0', '2', '1', '16.29', '1520', '45.00', '1.30', '0.00', '8.00', '54.30', '1']
['2016', '1', '1', '5', '0', '2', '6', '12.70', '1462', '36.50', '1.30', '0.00', '0.00', '37.80', '2']
['2016', '1', '1', '5', '0', '2', '6', '8.70', '1210', '26.00', '1.30', '0.00', '5.46', '32.76', '1']


In [19]:
taxi = taxi_list[1:]
taxi = np.array(taxi, dtype="float")
print(taxi.shape)
print(type(taxi))
for row in taxi[:4]:
    print(row)

(89560, 15)
<class 'numpy.ndarray'>
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 4.000e+00
 2.100e+01 2.037e+03 5.200e+01 8.000e-01 5.540e+00 1.165e+01 6.999e+01
 1.000e+00]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 1.000e+00
 1.629e+01 1.520e+03 4.500e+01 1.300e+00 0.000e+00 8.000e+00 5.430e+01
 1.000e+00]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
 1.270e+01 1.462e+03 3.650e+01 1.300e+00 0.000e+00 0.000e+00 3.780e+01
 2.000e+00]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
 8.700e+00 1.210e+03 2.600e+01 1.300e+00 0.000e+00 5.460e+00 3.276e+01
 1.000e+00]


---

[numpy.genfromtxt](https://numpy.org/doc/stable/reference/generated/numpy.genfromtxt.html)

In [20]:
import numpy as np
taxi = np.genfromtxt("data/nyc_taxis.csv", delimiter=",")
print(taxi.shape)

(89561, 15)


In [21]:
taxi.dtype

dtype('float64')

In [22]:
# taxi = np.genfromtxt("data/nyc_taxis.csv", delimiter=",", dtype="float32")
taxi = np.genfromtxt("data/nyc_taxis.csv", delimiter=",", dtype=np.float64)
print(taxi.dtype)

float64


In [23]:
for row in taxi[:5]:
    print(row)

[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 4.000e+00
 2.100e+01 2.037e+03 5.200e+01 8.000e-01 5.540e+00 1.165e+01 6.999e+01
 1.000e+00]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 1.000e+00
 1.629e+01 1.520e+03 4.500e+01 1.300e+00 0.000e+00 8.000e+00 5.430e+01
 1.000e+00]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
 1.270e+01 1.462e+03 3.650e+01 1.300e+00 0.000e+00 0.000e+00 3.780e+01
 2.000e+00]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
 8.700e+00 1.210e+03 2.600e+01 1.300e+00 0.000e+00 5.460e+00 3.276e+01
 1.000e+00]


In [24]:
taxi = taxi[1:]
print(taxi[:4])

[[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 4.000e+00
  2.100e+01 2.037e+03 5.200e+01 8.000e-01 5.540e+00 1.165e+01 6.999e+01
  1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 1.000e+00
  1.629e+01 1.520e+03 4.500e+01 1.300e+00 0.000e+00 8.000e+00 5.430e+01
  1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
  1.270e+01 1.462e+03 3.650e+01 1.300e+00 0.000e+00 0.000e+00 3.780e+01
  2.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
  8.700e+00 1.210e+03 2.600e+01 1.300e+00 0.000e+00 5.460e+00 3.276e+01
  1.000e+00]]


In [25]:
taxi = np.genfromtxt("data/nyc_taxis.csv", delimiter=",", skip_header=1)
print(taxi)

[[2.016e+03 1.000e+00 1.000e+00 ... 1.165e+01 6.999e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 8.000e+00 5.430e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 0.000e+00 3.780e+01 2.000e+00]
 ...
 [2.016e+03 6.000e+00 3.000e+01 ... 5.000e+00 6.334e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 8.950e+00 4.475e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 0.000e+00 5.484e+01 2.000e+00]]


---

## Datatypes

<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

In [26]:
x = np.array([1,2])
print(x)
print(x.dtype)

x = np.array([1.2, 2.3])
print(x)
print(x.dtype)

[1 2]
int32
[1.2 2.3]
float64


In [27]:
x = np.array([1,2], dtype=np.int8)
print(x)
print(x.dtype)

x = np.array([12,3], dtype=np.float32)
print(x)
print(x.dtype)

[1 2]
int8
[12.  3.]
float32


In [28]:
x = np.array([1,2,128], dtype=np.int8)
print(x.dtype)
print(x)

int8
[   1    2 -128]



---

In [29]:
print(taxi[:4])

[[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 4.000e+00
  2.100e+01 2.037e+03 5.200e+01 8.000e-01 5.540e+00 1.165e+01 6.999e+01
  1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 1.000e+00
  1.629e+01 1.520e+03 4.500e+01 1.300e+00 0.000e+00 8.000e+00 5.430e+01
  1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
  1.270e+01 1.462e+03 3.650e+01 1.300e+00 0.000e+00 0.000e+00 3.780e+01
  2.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
  8.700e+00 1.210e+03 2.600e+01 1.300e+00 0.000e+00 5.460e+00 3.276e+01
  1.000e+00]]


In [30]:
np.set_printoptions(precision=2, suppress=True)
print(taxi[:4])

[[2016.      1.      1.      5.      0.      2.      4.     21.   2037.
    52.      0.8     5.54   11.65   69.99    1.  ]
 [2016.      1.      1.      5.      0.      2.      1.     16.29 1520.
    45.      1.3     0.      8.     54.3     1.  ]
 [2016.      1.      1.      5.      0.      2.      6.     12.7  1462.
    36.5     1.3     0.      0.     37.8     2.  ]
 [2016.      1.      1.      5.      0.      2.      6.      8.7  1210.
    26.      1.3     0.      5.46   32.76    1.  ]]


---

## Ndarray shape

In [31]:
print(taxi)

[[2016.      1.      1.   ...   11.65   69.99    1.  ]
 [2016.      1.      1.   ...    8.     54.3     1.  ]
 [2016.      1.      1.   ...    0.     37.8     2.  ]
 ...
 [2016.      6.     30.   ...    5.     63.34    1.  ]
 [2016.      6.     30.   ...    8.95   44.75    1.  ]
 [2016.      6.     30.   ...    0.     54.84    2.  ]]


In [32]:
data = [[1, 2, 3],
       [50, 100, 150]]

ndarray = np.array(data)
print(ndarray.shape)

(2, 3)


In [33]:
print(taxi.shape)

(89560, 15)


In [34]:
ndarray = np.array(data)
print(ndarray.shape)

(2, 3)


In [35]:
ndarray.size

6

In [36]:
ndarray.ndim

2

## Selecting elements and Slicing

![slicing](./images/selection_rows.svg)

In [37]:
data = [[5,2,8,1,8],
       [9,6,4,0,1],
       [9,5,2,4,4],
       [5,0,8,5,5],
       [2,4,7,1,7]]
ndarray = np.array(data)
print(ndarray)

[[5 2 8 1 8]
 [9 6 4 0 1]
 [9 5 2 4 4]
 [5 0 8 5 5]
 [2 4 7 1 7]]


In [38]:
print(data[2])
print(ndarray[2])

[9, 5, 2, 4, 4]
[9 5 2 4 4]


In [39]:
print(data[2:])
print(ndarray[2:])

[[9, 5, 2, 4, 4], [5, 0, 8, 5, 5], [2, 4, 7, 1, 7]]
[[9 5 2 4 4]
 [5 0 8 5 5]
 [2 4 7 1 7]]


In [40]:
print(data[1][4])
print(ndarray[1][4])

1
1


In [41]:
print(data[1][2:])
print(ndarray[1][2:])

[4, 0, 1]
[4 0 1]



```python
ndarray[row_index , column_index]
```

In [42]:
print(ndarray)

second_row = ndarray[1]
print(second_row)

print(ndarray[1][4])
print(ndarray[1, 4])

[[5 2 8 1 8]
 [9 6 4 0 1]
 [9 5 2 4 4]
 [5 0 8 5 5]
 [2 4 7 1 7]]
[9 6 4 0 1]
1
1


In [43]:
print(ndarray)
print()
print(ndarray[:, 3])

[[5 2 8 1 8]
 [9 6 4 0 1]
 [9 5 2 4 4]
 [5 0 8 5 5]
 [2 4 7 1 7]]

[1 0 4 5 1]


In [44]:
print(ndarray[1:4, 2:4])

[[4 0]
 [2 4]
 [8 5]]


In [45]:
print(ndarray)
print()

print(ndarray[[1,3,4], 2])

[[5 2 8 1 8]
 [9 6 4 0 1]
 [9 5 2 4 4]
 [5 0 8 5 5]
 [2 4 7 1 7]]

[4 8 7]


In [46]:
print(ndarray)

print(ndarray[[False, True, False, False, True], 2:])

[[5 2 8 1 8]
 [9 6 4 0 1]
 [9 5 2 4 4]
 [5 0 8 5 5]
 [2 4 7 1 7]]
[[4 0 1]
 [7 1 7]]


## Vectorized Operations

In [47]:
ndarray1 = np.array([6,1,5,2,3,5])
ndarray2 = np.array([5,7,6,3,8,7])

data = ndarray1 + ndarray2
print(data)

[11  8 11  5 11 12]


In [48]:
data = ndarray1 - ndarray2
print(data)

[ 1 -6 -1 -1 -5 -2]


In [49]:
data = ndarray1 * ndarray2
print(data)

[30  7 30  6 24 35]


In [50]:
data = ndarray1 / ndarray2
print(data)

[1.2  0.14 0.83 0.67 0.38 0.71]


In [51]:
ndarray1 = np.array([1,2,3])
ndarray2 = np.array([1,2,3,4])

data = ndarray1 + ndarray2
print(data)

ValueError: operands could not be broadcast together with shapes (3,) (4,) 

In [52]:
ndarray1 = np.array([6,1,5,2,3,5])
ndarray2 = np.array([5,7,6,3,8,7])

data = np.add(ndarray1, ndarray2)
print(data)

[11  8 11  5 11 12]


In [53]:
trip_distance = taxi[:, 7]
print(trip_distance)

[21.   16.29 12.7  ... 17.48 12.76 17.54]


In [54]:
trips_length_seconds = taxi[:, 8]
trips_length_hours = trips_length_seconds / (60 *60)

In [55]:
trips_mph = trip_distance / trips_length_hours
trips_mph

array([37.11, 38.58, 31.27, ..., 22.3 , 42.42, 36.9 ])

---

## Calculating Statistics

[ndarray.min()](https://numpy.org/doc/stable/reference/generated/numpy.ndarray.min.html)

In [63]:
trips_mph.min()

0.0

<ul>
<li><a target="_blank" href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.min.html#numpy.ndarray.min"><code>ndarray.min()</code> izračun minimalne vrednosti</a></li>
<li><a target="_blank" href="https://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.ndarray.max.html"><code>ndarray.max()</code> izračun maximalne vrednosti</a></li>
<li><a target="_blank" href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.mean.html#numpy.ndarray.mean"><code>ndarray.mean()</code> izračun povprečne vrednosti</a></li>
<li><a target="_blank" href="https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.sum.html#numpy.ndarray.sum"><code>ndarray.sum()</code> izračun sum vrednosti</a></li>
</ul>

In [59]:
trips_mph.max()

82800.0

In [64]:
trips_mph.mean()

32.24258580925573

In [65]:
np.min(trips_mph)

0.0

In [66]:
np.max(trips_mph)

82800.0

In [67]:
np.mean(trips_mph)

32.24258580925573

In [71]:
trips_mph.median()

AttributeError: 'numpy.ndarray' object has no attribute 'median'

In [72]:
np.median(trips_mph)

22.542601629863185

![](./images/array_method_axis_none.svg)

In [79]:
data = np.random.randint(0, 10, size=[5,5])
data

array([[8, 6, 4, 3, 1],
       [1, 0, 8, 9, 6],
       [2, 7, 2, 1, 8],
       [0, 6, 8, 5, 0],
       [6, 3, 4, 5, 1]])

In [80]:
data.max()

9

In [81]:
data.min()

0

![](./images/array_method_axis_1.svg)

In [82]:
print(data)

[[8 6 4 3 1]
 [1 0 8 9 6]
 [2 7 2 1 8]
 [0 6 8 5 0]
 [6 3 4 5 1]]


In [83]:
data.max(axis=1) # max vrednost vsake vrstice

array([8, 9, 8, 8, 6])

In [84]:
data.min(axis=1)

array([1, 0, 1, 0, 1])

![](./images/array_method_axis_0.svg)

In [85]:
print(data)

[[8 6 4 3 1]
 [1 0 8 9 6]
 [2 7 2 1 8]
 [0 6 8 5 0]
 [6 3 4 5 1]]


In [86]:
data.max(axis=0)

array([8, 7, 8, 9, 8])

In [87]:
data.min(axis=0)

array([0, 0, 2, 1, 0])

![](./images/axis_param.svg)

---

In [88]:
taxi_10rows = taxi[:10, :]
taxi_10rows

array([[2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    4.  ,
          21.  , 2037.  ,   52.  ,    0.8 ,    5.54,   11.65,   69.99,
           1.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    1.  ,
          16.29, 1520.  ,   45.  ,    1.3 ,    0.  ,    8.  ,   54.3 ,
           1.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    6.  ,
          12.7 , 1462.  ,   36.5 ,    1.3 ,    0.  ,    0.  ,   37.8 ,
           2.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    6.  ,
           8.7 , 1210.  ,   26.  ,    1.3 ,    0.  ,    5.46,   32.76,
           1.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    2.  ,    6.  ,
           5.56,  759.  ,   17.5 ,    1.3 ,    0.  ,    0.  ,   18.8 ,
           2.  ],
       [2016.  ,    1.  ,    1.  ,    5.  ,    0.  ,    4.  ,    2.  ,
          21.45, 2004.  ,   52.  ,    0.8 ,    0.  ,   52.8 ,  105.6 ,
           1.  ],
       [2016.  ,    1.  ,    1.  ,    5.

In [91]:
fare_costs = taxi_10rows[:, 9:13]
fare_costs

array([[ 52.  ,   0.8 ,   5.54,  11.65],
       [ 45.  ,   1.3 ,   0.  ,   8.  ],
       [ 36.5 ,   1.3 ,   0.  ,   0.  ],
       [ 26.  ,   1.3 ,   0.  ,   5.46],
       [ 17.5 ,   1.3 ,   0.  ,   0.  ],
       [ 52.  ,   0.8 ,   0.  ,  52.8 ],
       [ 24.5 ,   1.3 ,   0.  ,   6.45],
       [ 21.5 ,   1.3 ,   0.  ,   0.  ],
       [109.5 ,   0.8 ,  11.08,  10.  ],
       [ 36.  ,   1.3 ,   0.  ,   0.  ]])

In [93]:
fare_costs_total = fare_costs.sum(axis=1)
print(fare_costs_total)
print(taxi_10rows[:, 13])

[ 69.99  54.3   37.8   32.76  18.8  105.6   32.25  22.8  131.38  37.3 ]
[ 69.99  54.3   37.8   32.76  18.8  105.6   32.25  22.8  131.38  37.3 ]


# Boolean Indexing

## Boolean Arrays

In [104]:
bool_array = [True, False, True, False]
data = np.array([1,2,3,4])
data[bool_array]

array([1, 3])

In [106]:
data = np.random.randint(10, size=(5,))
print(data.shape, type(data), data)

data = data + 10
print(data.shape, type(data), data)

(5,) <class 'numpy.ndarray'> [3 3 7 5 5]
(5,) <class 'numpy.ndarray'> [13 13 17 15 15]


In [107]:
data = np.random.randint(10, size=(5,))
print(data.shape, type(data), data)

bool_array = data < 5
print(bool_array.shape, type(bool_array), bool_array)

(5,) <class 'numpy.ndarray'> [4 4 5 4 2]
(5,) <class 'numpy.ndarray'> [ True  True False  True  True]


In [108]:
data = np.random.randint(10, size=(5,))
print(data.shape, type(data), data)

bool_array = data > 5
print(bool_array.shape, type(bool_array), bool_array)

(5,) <class 'numpy.ndarray'> [3 8 5 9 2]
(5,) <class 'numpy.ndarray'> [False  True False  True False]


In [111]:
data = np.random.randint(10, size=(5,))
print(data.shape, type(data), data)

bool_array = data == 5
print(bool_array.shape, type(bool_array), bool_array)

(5,) <class 'numpy.ndarray'> [8 1 7 8 4]
(5,) <class 'numpy.ndarray'> [False False False False False]


## Boolean Indexing

In [116]:
data = np.random.randint(10, size=(5,))
print(data.shape, type(data), data)

filter_ = data > 5
print(filter_)

slice_ = data[filter_]
print(slice_)

(5,) <class 'numpy.ndarray'> [1 9 9 6 2]
[False  True  True  True False]
[9 9 6]


## Boolean Indexing 2D ndarrays

In [117]:
data = np.random.randint(10, size=(5,5))
print(data)

[[0 3 7 4 6]
 [4 4 5 3 1]
 [6 2 5 0 2]
 [2 6 8 4 9]
 [2 5 0 6 0]]


In [118]:
filter_rows = np.array([True, False, False, False, True])
slice_ = data[filter_rows, :]
print(slice_)

[[0 3 7 4 6]
 [2 5 0 6 0]]


In [122]:
filter_cols = np.array([True, False, False, True, False])
slice_ = data[:, filter_cols]
print(slice_)

[[0 4]
 [4 3]
 [6 0]
 [2 4]
 [2 6]]


In [124]:
print(data)
filter_ = np.array([[True, False, False, True, False],
                   [True, False, True, True, False],
                   [True, False, False, True, False],
                   [False, True, False, True, False],
                   [True, False, False, True, True]])
slice_ = data[filter_]
print(slice_)

[[0 3 7 4 6]
 [4 4 5 3 1]
 [6 2 5 0 2]
 [2 6 8 4 9]
 [2 5 0 6 0]]
[0 4 4 5 3 6 0 6 4 2 6 0]


In [125]:
print(data.shape)
print(data)

filter_ = data > 5
print(filter_.shape)
print(filter_)

slice_ = data[filter_]
print(slice_.shape)
print(slice_)

(5, 5)
[[0 3 7 4 6]
 [4 4 5 3 1]
 [6 2 5 0 2]
 [2 6 8 4 9]
 [2 5 0 6 0]]
(5, 5)
[[False False  True False  True]
 [False False False False False]
 [ True False False False False]
 [False  True  True False  True]
 [False False False  True False]]
(7,)
[7 6 6 6 8 9 6]


---

In [127]:
months = taxi[:, 1]
print(months.shape)
print(months)

(89560,)
[1. 1. 1. ... 6. 6. 6.]


In [129]:
filter_ = months == 1
print(filter_.shape)
print(filter_)

(89560,)
[ True  True  True ... False False False]


In [133]:
januar_rides = taxi[filter_]
print(januar_rides.shape)
print(januar_rides[:5])

(13481, 15)
[[2016.      1.      1.      5.      0.      2.      4.     21.   2037.
    52.      0.8     5.54   11.65   69.99    1.  ]
 [2016.      1.      1.      5.      0.      2.      1.     16.29 1520.
    45.      1.3     0.      8.     54.3     1.  ]
 [2016.      1.      1.      5.      0.      2.      6.     12.7  1462.
    36.5     1.3     0.      0.     37.8     2.  ]
 [2016.      1.      1.      5.      0.      2.      6.      8.7  1210.
    26.      1.3     0.      5.46   32.76    1.  ]
 [2016.      1.      1.      5.      0.      2.      6.      5.56  759.
    17.5     1.3     0.      0.     18.8     2.  ]]


----

In [136]:
for month in range(1, 13):
    month_array = taxi[:, 1]
    filter_ = month_array == month
    month_rides = taxi[filter_]
    print(f"Month {month}: {month_rides.shape[0]}")

Month 1: 13481
Month 2: 13333
Month 3: 15547
Month 4: 14810
Month 5: 16650
Month 6: 15739
Month 7: 0
Month 8: 0
Month 9: 0
Month 10: 0
Month 11: 0
Month 12: 0


----

In [138]:
print(trips_mph.shape)
print(trips_mph)

(89560,)
[37.11 38.58 31.27 ... 22.3  42.42 36.9 ]


In [139]:
filter_ = trips_mph > 20_000
print(filter_.shape)

(89560,)


In [140]:
fast_trips = taxi[filter_, :]
print(fast_trips.shape)
print(fast_trips)

(7, 15)
[[2016.      1.     22.      5.      3.      2.      2.     23.      1.
     2.5     0.8     0.      0.      3.3     2.  ]
 [2016.      2.     13.      6.      3.      2.      2.     19.6     1.
     2.5     0.8     0.      0.      3.3     2.  ]
 [2016.      3.     23.      3.      2.      2.      2.     16.7     2.
    52.      0.8     0.     10.55   63.35    1.  ]
 [2016.      3.     28.      1.      4.      3.      3.     17.8     2.
     2.5     1.8     0.      0.      4.3     2.  ]
 [2016.      3.     30.      3.      4.      2.      2.     17.2     2.
     2.5     1.8     0.      0.      4.3     2.  ]
 [2016.      4.     24.      7.      5.      3.      3.     16.9     3.
    52.      0.8     0.      0.     52.8     3.  ]
 [2016.      6.     30.      4.      3.      2.      2.     27.1     4.
    75.      0.8     0.      0.     75.8     2.  ]]


In [141]:
fast_trips[:, [7, 8]]

array([[23. ,  1. ],
       [19.6,  1. ],
       [16.7,  2. ],
       [17.8,  2. ],
       [17.2,  2. ],
       [16.9,  3. ],
       [27.1,  4. ]])

---

# Changing Values

In [144]:
# Changing single value
data = np.random.randint(10, size=(10,))
print(data)

data[0] = 100
print(data)

[1 1 0 0 1 5 6 1 5 3]
[100   1   0   0   1   5   6   1   5   3]


In [146]:
# Changing values with slicing for multiple elements
data = np.random.randint(10, size=(10,))
print(data)

data[:5] = 100
print(data)

[6 2 1 3 2 1 6 6 8 5]
[100 100 100 100 100   1   6   6   8   5]


In [148]:
# Changing values for 2D
data = np.random.randint(10, size=(5,5))
print(data)

data[0][3]= 100
print(data)

[[1 8 1 1 1]
 [3 3 5 6 5]
 [1 9 2 2 6]
 [3 5 6 9 7]
 [3 0 0 9 0]]
[[  1   8   1 100   1]
 [  3   3   5   6   5]
 [  1   9   2   2   6]
 [  3   5   6   9   7]
 [  3   0   0   9   0]]


In [149]:
# Changing whole row
data = np.random.randint(10, size=(5,5))
print(data)

data[0] = 100
print(data)

[[9 4 4 0 9]
 [4 2 4 6 6]
 [8 2 3 6 3]
 [4 4 9 1 0]
 [3 2 2 3 6]]
[[100 100 100 100 100]
 [  4   2   4   6   6]
 [  8   2   3   6   3]
 [  4   4   9   1   0]
 [  3   2   2   3   6]]


In [151]:
# Changing whole columns
data = np.random.randint(10, size=(5,5))
print(data)

data[:, 3] = 100
print(data)

[[3 3 5 1 0]
 [3 9 8 5 7]
 [5 5 4 6 0]
 [4 0 2 0 4]
 [1 7 1 9 4]]
[[  3   3   5 100   0]
 [  3   9   8 100   7]
 [  5   5   4 100   0]
 [  4   0   2 100   4]
 [  1   7   1 100   4]]


In [160]:
# Changin value using boolean array
data = np.random.randint(10, size=(5,5))
print(data)

filter_ = data > 5
data[filter_] = 100
print(data)

[[6 5 4 6 4]
 [6 8 2 3 0]
 [0 6 2 7 2]
 [5 8 6 5 8]
 [8 5 1 9 6]]
[[100   5   4 100   4]
 [100 100   2   3   0]
 [  0 100   2 100   2]
 [  5 100 100   5 100]
 [100   5   1 100 100]]


----

In [164]:
total_amount = taxi[:, -2]

filter_ = total_amount < 0
taxi[filter_, 13] = 0

print(taxi[:4])

[[2016.      1.      1.      5.      0.      2.      4.     21.   2037.
    52.      0.8     5.54   11.65   69.99    1.  ]
 [2016.      1.      1.      5.      0.      2.      1.     16.29 1520.
    45.      1.3     0.      8.     54.3     1.  ]
 [2016.      1.      1.      5.      0.      2.      6.     12.7  1462.
    36.5     1.3     0.      0.     37.8     2.  ]
 [2016.      1.      1.      5.      0.      2.      6.      8.7  1210.
    26.      1.3     0.      5.46   32.76    1.  ]]


## Ndarray copy

In [169]:
# Slicing data doesnt create a copy of data
data = np.random.randint(10, size=(5,5))
print(data)

data_slice = data[1:4, 1:4]
print(data_slice)

data_slice[0, 0] = -10
print("Data slice")
print(data_slice)

print("Original data")
print(data)

[[9 2 8 9 8]
 [0 9 3 2 7]
 [2 7 1 9 6]
 [2 7 4 2 5]
 [4 5 7 6 4]]
[[9 3 2]
 [7 1 9]
 [7 4 2]]
Data slice
[[-10   3   2]
 [  7   1   9]
 [  7   4   2]]
Original data
[[  9   2   8   9   8]
 [  0 -10   3   2   7]
 [  2   7   1   9   6]
 [  2   7   4   2   5]
 [  4   5   7   6   4]]


In [171]:
# Creating a copy of data
data = np.random.randint(10, size=(5,5))
print(data)

data_slice = data[1:4, 1:4].copy()
print(data_slice)

data_slice[0, 0] = -10
print("Data slice")
print(data_slice)

print("Original data")
print(data)

[[6 4 5 5 6]
 [9 6 7 9 6]
 [7 0 8 5 6]
 [8 7 7 7 4]
 [8 9 3 5 2]]
[[6 7 9]
 [0 8 5]
 [7 7 7]]
Data slice
[[-10   7   9]
 [  0   8   5]
 [  7   7   7]]
Original data
[[6 4 5 5 6]
 [9 6 7 9 6]
 [7 0 8 5 6]
 [8 7 7 7 4]
 [8 9 3 5 2]]


In [172]:
# Creating a copy of data
data = np.random.randint(10, size=(5,5))
print(data)

[[7 2 2 9 4]
 [0 5 1 1 8]
 [1 6 0 3 2]
 [2 2 4 7 1]
 [8 9 3 1 8]]


In [173]:
# Analiza 1
data1 = data.copy()
data1[0] = 100
print(data1)

[[100 100 100 100 100]
 [  0   5   1   1   8]
 [  1   6   0   3   2]
 [  2   2   4   7   1]
 [  8   9   3   1   8]]


In [174]:
# Analiza2
data2 = data.copy()
data2[0] = 200
print(data)

[[7 2 2 9 4]
 [0 5 1 1 8]
 [1 6 0 3 2]
 [2 2 4 7 1]
 [8 9 3 1 8]]


In [176]:
print(data)

[[7 2 2 9 4]
 [0 5 1 1 8]
 [1 6 0 3 2]
 [2 2 4 7 1]
 [8 9 3 1 8]]


---

# Changing ndarrays

## Extending rows and columns

[numpy.concatenate()](https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html)

In [190]:
nd1 = np.zeros(shape=3)
print(nd1.shape, nd1)

nd2 = np.ones(shape=3)
print(nd2.shape, nd2)

(3,) [0. 0. 0.]
(3,) [1. 1. 1.]


In [193]:
data = np.concatenate([nd1, nd2], axis=0)
print(data.shape)
print(data)

(6,)
[0. 0. 0. 1. 1. 1.]


In [185]:
nd1 = np.zeros(shape=(2,3))
print(nd1.shape)
print(nd1)

nd2 = np.ones(shape=(3))
print(nd2.shape)
print(nd2)

data = np.concatenate([nd1, nd2], axis=0)
print(data.shape)
print(data)

(2, 3)
[[0. 0. 0.]
 [0. 0. 0.]]
(3,)
[1. 1. 1.]


ValueError: all the input arrays must have same number of dimensions, but the array at index 0 has 2 dimension(s) and the array at index 1 has 1 dimension(s)

[numpy.reshape()](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html)

In [187]:
print(nd2.shape)

nd2 = np.reshape(nd2, newshape=(1,3))
print(nd2.shape)
print(nd2)

(3,)
(1, 3)
[[1. 1. 1.]]


In [189]:
data = np.concatenate([nd1, nd2], axis=0)
print(data.shape)
print(data)

(3, 3)
[[0. 0. 0.]
 [0. 0. 0.]
 [1. 1. 1.]]


---

In [201]:
nd1 = np.zeros(shape=3)
print(nd1.shape, nd1)

nd2 = np.ones(shape=3)
print(nd2.shape, nd2)

data = np.concatenate([nd1, nd2], axis=0)
print("Data before reshape")
print(data)

data = np.reshape(data, newshape=(2,3))
#data = np.reshape(data, newshape=(2,-1)) # numpy sam izračuna manjkajočo dimenzijo
print("Data after reshape")
print(data)

(3,) [0. 0. 0.]
(3,) [1. 1. 1.]
Data before reshape
[0. 0. 0. 1. 1. 1.]
Data after reshape
[[0. 0. 0.]
 [1. 1. 1.]]


In [199]:
nd1 = np.zeros(shape=(1,3))
print(nd1.shape, nd1)

nd2 = np.ones(shape=(1,3))
print(nd2.shape, nd2)

data = np.concatenate([nd1, nd2], axis=0)
print(data)

(1, 3) [[0. 0. 0.]]
(1, 3) [[1. 1. 1.]]
[[0. 0. 0.]
 [1. 1. 1.]]


In [203]:
nd1 = np.zeros(shape=4)
print(nd1.shape, nd1)

nd2 = np.ones(shape=3)
print(nd2.shape, nd2)

data = np.concatenate([nd1, nd2], axis=0)
print("Data before reshape")
print(data)

data = np.reshape(data, newshape=(2,-1)) # numpy sam izračuna manjkajočo dimenzijo
print("Data after reshape")
print(data)

(4,) [0. 0. 0. 0.]
(3,) [1. 1. 1.]
Data before reshape
[0. 0. 0. 0. 1. 1. 1.]


ValueError: cannot reshape array of size 7 into shape (2,newaxis)

## How shapes are handeled

In [210]:
a = np.arange(12)
print(a)

[ 0  1  2  3  4  5  6  7  8  9 10 11]


<pre class="lang-py s-code-block"><code class="hljs language-python">┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  <span class="hljs-number">0</span> │  <span class="hljs-number">1</span> │  <span class="hljs-number">2</span> │  <span class="hljs-number">3</span> │  <span class="hljs-number">4</span> │  <span class="hljs-number">5</span> │  <span class="hljs-number">6</span> │  <span class="hljs-number">7</span> │  <span class="hljs-number">8</span> │  <span class="hljs-number">9</span> │ <span class="hljs-number">10</span> │ <span class="hljs-number">11</span> │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
</code></pre>

In [211]:
a.__array_interface__["data"]

(2218900872016, False)

In [212]:
a.shape

(12,)

<pre class="lang-py s-code-block"><code class="hljs language-python">i= <span class="hljs-number">0</span>    <span class="hljs-number">1</span>    <span class="hljs-number">2</span>    <span class="hljs-number">3</span>    <span class="hljs-number">4</span>    <span class="hljs-number">5</span>    <span class="hljs-number">6</span>    <span class="hljs-number">7</span>    <span class="hljs-number">8</span>    <span class="hljs-number">9</span>   <span class="hljs-number">10</span>   <span class="hljs-number">11</span>
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  <span class="hljs-number">0</span> │  <span class="hljs-number">1</span> │  <span class="hljs-number">2</span> │  <span class="hljs-number">3</span> │  <span class="hljs-number">4</span> │  <span class="hljs-number">5</span> │  <span class="hljs-number">6</span> │  <span class="hljs-number">7</span> │  <span class="hljs-number">8</span> │  <span class="hljs-number">9</span> │ <span class="hljs-number">10</span> │ <span class="hljs-number">11</span> │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
</code></pre>

In [213]:
a[2]

2

In [214]:
b = a.reshape((3,4))
print(b)

[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]


In [215]:
b.__array_interface__["data"]

(2218900872016, False)

<pre class="lang-py s-code-block"><code class="hljs language-python">i= <span class="hljs-number">0</span>    <span class="hljs-number">0</span>    <span class="hljs-number">0</span>    <span class="hljs-number">0</span>    <span class="hljs-number">1</span>    <span class="hljs-number">1</span>    <span class="hljs-number">1</span>    <span class="hljs-number">1</span>    <span class="hljs-number">2</span>    <span class="hljs-number">2</span>    <span class="hljs-number">2</span>    <span class="hljs-number">2</span>
j= <span class="hljs-number">0</span>    <span class="hljs-number">1</span>    <span class="hljs-number">2</span>    <span class="hljs-number">3</span>    <span class="hljs-number">0</span>    <span class="hljs-number">1</span>    <span class="hljs-number">2</span>    <span class="hljs-number">3</span>    <span class="hljs-number">0</span>    <span class="hljs-number">1</span>    <span class="hljs-number">2</span>    <span class="hljs-number">3</span>
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  <span class="hljs-number">0</span> │  <span class="hljs-number">1</span> │  <span class="hljs-number">2</span> │  <span class="hljs-number">3</span> │  <span class="hljs-number">4</span> │  <span class="hljs-number">5</span> │  <span class="hljs-number">6</span> │  <span class="hljs-number">7</span> │  <span class="hljs-number">8</span> │  <span class="hljs-number">9</span> │ <span class="hljs-number">10</span> │ <span class="hljs-number">11</span> │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘
</code></pre>

In [216]:
b[0, 2]

2

---

# Katero letališče je najbolj obiskano

In [220]:
jfk_filter = taxi[:, 6] == 2
jfk_subset = taxi[jfk_filter]
# print(jfk_filter.sum())
print(jfk_subset.shape[0])

11832


In [222]:
laguardia_filter = taxi[:, 6] == 3
laguardia_subset= taxi[laguardia_filter]
# print(laguardia_filter.sum())
print(laguardia_subset.shape[0])

16602


In [224]:
newark_filter = taxi[:, 6] == 5
newark_subset = taxi[newark_filter]
# print(newark_filter.sum())
print(newark_subset.shape[0])

63


---

# Povprečna cena vožnje na miljo do letališča

In [226]:
filter_ = taxi[:, 6] == 2
subset = taxi[filter_]

total_amount_mean = subset[:, 13].mean()
distance_mean = subset[:, 7].mean()
avg_dollar_per_mile = total_amount_mean / distance_mean
print(f"JFK - Avg. $ / mile: {avg_dollar_per_mile:.2f}")

JFK - Avg. $ / mile: 3.76


In [227]:
filter_ = taxi[:, 6] == 3
subset = taxi[filter_]

total_amount_mean = subset[:, 13].mean()
distance_mean = subset[:, 7].mean()
avg_dollar_per_mile = total_amount_mean / distance_mean
print(f"La Guardia - Avg. $ / mile: {avg_dollar_per_mile:.2f}")

La Guardia - Avg. $ / mile: 4.29


In [228]:
filter_ = taxi[:, 6] == 5
subset = taxi[filter_]

total_amount_mean = subset[:, 13].mean()
distance_mean = subset[:, 7].mean()
avg_dollar_per_mile = total_amount_mean / distance_mean
print(f"Newark - Avg. $ / mile: {avg_dollar_per_mile:.2f}")

Newark - Avg. $ / mile: 4.23


---