# NumPy

## Understanding Data Types in Python


```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

While in Python the equivalent operation could be written this way:
```python
# Python code
result = 0
for i in range(100):
    result += i
```


In [1]:
x = 4 

In [2]:
x = 'four'

```C
/* C code */
int x = 4;
x = "four";  // FAILS
```

### A Python Integer Is More Than Just an Integer

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

A single integer in Python 3.4 actually contains four pieces:
- ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
- ob_type, which encodes the type of the variable
- ob_size, which specifies the size of the following data members
- ob_digit, which contains the actual integer value that we expect the Python variable to represent.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" alt="Integer Memory Layout">

### A Python List Is More Than Just a List


In [6]:
l = list(range(10))
l

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [7]:
type(l)

list

In [8]:
l2 = [str(c) for c in l]
l2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

List lahko vsebuje elemente, ki so različnih tipov:
Za vsak element se poleg vrednostizr aven shrani še en kup drugih informacij o elementu.

In [9]:
l3 = [1, "kk", True]
l3

[1, 'kk', True]

Numpy narediobje kt, ki samo enkrat shrani podatke, ne za vsak element posebej, zato je bolj učinkovit in hitrejši. 
Slabost tega je, da v numpy lahko shranjujemo samo elemente istega tipa.


<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" alt="Array Memory Layout">

### Fixed-Type Arrays in Python


## How Vectorization Makes Code Faster



<p><img alt="Translating Python code to bytecode" src="https://s3.amazonaws.com/dq-content/289/bytecode.svg"></p>


<table>
<thead>
<tr>
<th>Language Type</th>
<th>Example</th>
<th>Time taken to write program</th>
<th>Control over program performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Level</td>
<td>Python</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Low-Level</td>
<td>C</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>



<p><img alt="For loop to sum rows" src="https://s3.amazonaws.com/dq-content/289/for_loop.svg"></p>

Kako pohitrimo kodo: pri pythonu nimamo vpogleda v memory, lahko pa hitreje pišemo program. Pri C ju je ravno obratno, več časa porabimo za pisanje, imamo pa dober vpogled v program. Numpy nam nudi hitro pisanje (ker smo v pythonu) in vpogled v memory. To imenujemo vektorizacija.

PRIMER: Seštevamo elemente v listu listov. Python si za vsako zanko vzame čas, sešteva element po element.

In [13]:
my_numbers = [[6,5],[1,3],[5,6]]
sums = []
for row in my_numbers:
    row_sum= row[0] + row[1]
    sums.append(row_sum)
    
print(sums)

[11, 4, 11]


Pri vektorizcaiji vse operacije lahko izvajamo hkrati (ne le eno po eno). Okrajšava za vektorizacijo je SIMD.
Vektorizacija je že implementirana v numpy knjižnico, zato je numpy veliko hitrejši od same python kode.


<p><img alt="Unvectorized operation" src="https://s3.amazonaws.com/dq-content/289/unvectorized.svg"></p>

<p><img alt="Vectorized operation" src="https://s3.amazonaws.com/dq-content/289/vectorized.svg"></p>



## Numpy

Numpy je kratica za numerical pyhton, uporablja se za numerično računanje v pythonu.
Je tudi osnova za ostale knjižnice (pandas npr. temelji na numpy-u).

Numpy vsebuje orodja za linerno algebro, branje, generiranje random števil,...
Generiran je bil za obdelovanje velikega števila podatkov, zato so operacije več desetkrat hitrejše kot v samem pythonu in porabijo veliko manj RAMa.

Numpy ima zelo dobro dokumentacijo na spletu.

In [16]:
#najprej moramo uvoziti numpy knjižnico, da jo lahko uporabljamo
import numpy as np

### NumPy ndarrays

ndarray je večdimenzionalni array



<p><img alt="Dimensional Arrays" src="https://s3.amazonaws.com/dq-content/289/dimensional_arrays.svg"></p>



#### Create an array



In [19]:
#naredimo array
list1 = [6,7.4,8,56,77]

#spremenimo v numpy array
arr1 = np.array(list1)
print(arr1)

[ 6.   7.4  8.  56.  77. ]


PAZI: vsi elementi ndarraya morajo bii enakega tipa. 
Če je en element integer in en decimal, bo numpy vse spremenil v decimal tip.

In [21]:
type(list1)

list

In [22]:
type(arr1)

numpy.ndarray

In [23]:
#večdimenzionalni ndarray
data2 = [[2,4,5.6],[3,7,9]]
arr2 = np.array(data2)
print(arr2)

[[2.  4.  5.6]
 [3.  7.  9. ]]


Obstaja več načinov za definiranje novih ndarrayev:

In [25]:
#ones - naredi matriko enic velikosti (n,m)
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [27]:
#arange: podobna je funkciji arrange v pythonu
np.arange(0,20,4)

array([ 0,  4,  8, 12, 16])

In [30]:
#zeros
#enodimenzionalni array:
np.zeros((10)) 

#dvodimenzionalni ndarray
np.zeros((3,5) )

array([[0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0.]])

In [34]:
#linspace : definiramo interval in povemo na koliko delov želimo, da ga razdeli
np.linspace(0,1,5)

array([0.  , 0.25, 0.5 , 0.75, 1.  ])

In [40]:
#random: random števila od 1 do 10, velikost matrike je 4x4
np.random.randint(0,10,(4,4))

array([[5, 3, 7, 2],
       [9, 4, 2, 5],
       [4, 4, 6, 0],
       [9, 8, 0, 4]])

In [43]:
#random.random: random števila med 0 in 1, velikosti 3 x3
np.random.random((3,3))

array([[0.94205927, 0.04114966, 0.29969938],
       [0.76715343, 0.70233411, 0.68278897],
       [0.47694932, 0.87846478, 0.55794307]])

In [44]:
#eye: po diagonali so enice
np.eye(4)

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [45]:
#full: zgenerira velikost matrike s konstantnimi vrednostmi
np.full((2,5),7)

array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

In [49]:
#empty: če ima kje shranjeno matriko take velikosti, vrne vrednosti, 
#ki so trenutno v tej matriki, če nima, vrne enice po diagonali
np.empty((4,4))

array([[1., 0., 0., 0.],
       [0., 1., 0., 0.],
       [0., 0., 1., 0.],
       [0., 0., 0., 1.]])

In [50]:
np.empty((3,3))

array([[0.94205927, 0.04114966, 0.29969938],
       [0.76715343, 0.70233411, 0.68278897],
       [0.47694932, 0.87846478, 0.55794307]])

#### Understanding NumPy ndarrays

In [56]:
data3 = np.random.randint(0,10,(4,7))
print(data3)

[[1 9 9 9 4 1 0]
 [7 2 5 9 4 2 0]
 [7 5 4 0 7 8 9]
 [8 7 9 2 6 4 0]]


In [53]:
#ndim: število dimenzij nd arraya
data3.ndim

2

In [54]:
#shape: velikost in dimenzija (= toliko kot je števil v tuple-u)
data3.shape

(4, 7)

In [59]:
#size: število vseh elementov v arrayu
data3.size

28

In [63]:
#itemsize: pove koliko prostora zavazem ene elemnt v pomnilniku
data3.itemsize

8

In [62]:
#število bytov, ki jih zavzame array
data3.nbytes

224

#### Selecting and Slicing Rows and Items from ndarrays

<p><img alt="Selecting rows from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_rows.svg"></p>



This is how we select a single item from a 2D ndarray:

<p><img alt="Selecting a single item from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_item.svg"></p>


ndarray[n, m]: izberemo element v n-ti vrstici in m-tem stolpcu

Načini izbiranja elementov:
- int 5
- slice 0:5, 5:
- :
- [1,5,8]
- boolean array

In [66]:
test_array = np.random.randint(10, size = (5,5))
test_array

array([[8, 9, 7, 2, 6],
       [5, 9, 0, 7, 5],
       [3, 8, 0, 6, 1],
       [7, 9, 6, 0, 4],
       [0, 3, 3, 5, 2]])

In [77]:
#prva vrstica
first_row = test_array[0]
first_row

array([8, 9, 7, 2, 6])

In [78]:
#zadnja vrstica
last_row = test_array[-1]
last_row

array([0, 3, 3, 5, 2])

In [82]:
#2. in 3. vrstica
test_array[[1,2]]
test_array[1:2]

array([[5, 9, 0, 7, 5]])

In [83]:
#od vrstice 2 do konca
test_array[1:]

array([[5, 9, 0, 7, 5],
       [3, 8, 0, 6, 1],
       [7, 9, 6, 0, 4],
       [0, 3, 3, 5, 2]])

#### Selecting Columns and Custom Slicing ndarrays

Let's continue by learning how to select one or more columns of data:

<p><img alt="Selecting columns from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_columns.svg"></p>



If we wanted to select a partial 1D slice of a row or column, we can combine a single value for one dimension with a slice for the other dimension:

<p><img alt="Selecting partial 1D slices from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_1darray.svg"></p>

Lastly, if we wanted to select a 2D slice, we can use slices for both dimensions:

<p><img alt="Selecting a 2D slice from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_2darray.svg"></p>



In [84]:
test_array2 = np.random.randint(10, size = (5,5))
test_array2

array([[1, 2, 4, 7, 7],
       [2, 7, 4, 5, 3],
       [6, 8, 6, 5, 3],
       [5, 4, 4, 7, 9],
       [0, 0, 6, 8, 7]])

In [85]:
#stolpec 2
test_array2[:,1]

array([2, 7, 8, 4, 0])

In [87]:
#1. in 2. stolpec
test_array2[:, 0:2]
test_array2[:, :2]

array([[1, 2],
       [2, 7],
       [6, 8],
       [5, 4],
       [0, 0]])

In [89]:
#stolpci 2,4,5
test_array2[:, [1,3,4]]

array([[2, 7, 7],
       [7, 5, 3],
       [8, 5, 3],
       [4, 7, 9],
       [0, 8, 7]])

In [94]:
#vrstica 3, stolpci 2 do 4
test_array2[2, 1:3]

array([8, 6])

In [None]:
#elemente v vrstici od 1???
test_array2[]

#### Modify values in ndarray



In [95]:
test_array2

array([[1, 2, 4, 7, 7],
       [2, 7, 4, 5, 3],
       [6, 8, 6, 5, 3],
       [5, 4, 4, 7, 9],
       [0, 0, 6, 8, 7]])

In [97]:
# spremenimo element na mestu (1,1)
test_array2[1,1] = 125
test_array2

array([[  1,   2,   4,   7,   7],
       [  2, 125,   4,   5,   3],
       [  6,   8,   6,   5,   3],
       [  5,   4,   4,   7,   9],
       [  0,   0,   6,   8,   7]])

In [102]:
#PAZI: ker imamo array integerjev, mu bo, če mu spremenimo en element v decimal, odrezal decimalna mesta
#numpy array ima samo en fiksni tip, ki ga določimo ob generiranju arraya
test_array2[0,1] = 16.544
test_array2

array([[  1,  16,   4,   7,   7],
       [  2, 125,   4,   5,   3],
       [  6,   8,   6,   5,   3],
       [  5,   4,   4,   7,   9],
       [  0,   0,   6,   8,   7]])

#### Datatypes

[Več o datatypes](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

[List of scalars](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in)

In [101]:
#preberi si o datatipih numpy

In [103]:
#dtype nam pove, kateri numpy type je uporabljen
x = np.array([1,2])
print(x.dtype) 

int64


In [104]:
x = np.array([1.0,2.6])
print(x.dtype)

float64


In [105]:
#default-no naredi float tip
np.zeros(10) 

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.])

In [108]:
#definiramo podatkovni tip elementov v arrayu
np.zeros(10, dtype = np.int16)
np.zeros(10, dtype= 'int16')

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

### Computation on NumPy Arrays: Universal Functions


#### The Slowness of Loops



#### Introducing UFuncs (Universal functions)

[Docs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html())



### Uvoz realnih podatkov


- Row 1 is RatecodeID
- Row 2 is PULocationID
- Row 3 is DOLocationID
- Row 4 is passenger_count
- Row 5 is trip_distance
- Row 6 is fare_amount
- Row 7 is extra
- Row 8 is mta_tax
- Row 9 is tip_amount
- Row 10 is tolls_amount
- Row 11 is improvement_surcharge
- Row 12 is total_amount
- Row 13 is payment_type
- Row 14 is trip_type

### Vector Math

