# NumPy

## Understanding Data Types in Python


```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```

While in Python the equivalent operation could be written this way:
```python
# Python code
result = 0
for i in range(100):
    result += i
```


In [None]:
x = 4

In [None]:
x = 'four'

In [None]:
x = True

```C
/* C code */
int x = 4;
x = "four";  // FAILS
```

### A Python Integer Is More Than Just an Integer

```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```

A single integer in Python 3.4 actually contains four pieces:
- ob_refcnt, a reference count that helps Python silently handle memory allocation and deallocation
- ob_type, which encodes the type of the variable
- ob_size, which specifies the size of the following data members
- ob_digit, which contains the actual integer value that we expect the Python variable to represent.

<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" alt="Integer Memory Layout">

### A Python List Is More Than Just a List


In [12]:
L2 = list(range(10))
L2

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [11]:
type(L2[0])

int

In [13]:
L3 = ['a','b','c']
L3

['a', 'b', 'c']

In [16]:
import numpy as np
L4 = np.array (range(10))
L4

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])


<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" alt="Array Memory Layout">

### Fixed-Type Arrays in Python


In [23]:
import array

B=array.array('i',L2)
B


array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## How Vectorization Makes Code Faster



<p><img alt="Translating Python code to bytecode" src="https://s3.amazonaws.com/dq-content/289/bytecode.svg"></p>


<table>
<thead>
<tr>
<th>Language Type</th>
<th>Example</th>
<th>Time taken to write program</th>
<th>Control over program performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Level</td>
<td>Python</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Low-Level</td>
<td>C</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>



<p><img alt="For loop to sum rows" src="https://s3.amazonaws.com/dq-content/289/for_loop.svg"></p>

In [21]:
my_numbers = [[6,5],[1,3],[5,6]]
sums =[]
for row in my_numbers:
    row_sum = row[0]+row[1]
    sums.append(row_sum)
print(sums)

[11, 4, 11]


<p><img alt="Unvectorized operation" src="https://s3.amazonaws.com/dq-content/289/unvectorized.svg"></p>

<p><img alt="Vectorized operation" src="https://s3.amazonaws.com/dq-content/289/vectorized.svg"></p>



## Numpy
Numerical python

In [24]:
import numpy as np

### NumPy ndarrays



<p><img alt="Dimensional Arrays" src="https://s3.amazonaws.com/dq-content/289/dimensional_arrays.svg"></p>



#### Create an array



In [26]:
list1 = [2,7.3,4435,454,3443,235]
arr1 = np.array(list1)
print(arr1)

[2.000e+00 7.300e+00 4.435e+03 4.540e+02 3.443e+03 2.350e+02]


In [27]:
type(arr1)

numpy.ndarray

In [28]:
type(list1)

list

In [29]:
podatki = [[1,2,3],[44,3,5]]
arr2 = np.array(podatki)

In [30]:
print(arr2)

[[ 1  2  3]
 [44  3  5]]


In [31]:
np.ones((3,5))

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [33]:
# arrange (podobna f-ji range)
np.arange(0,20,2)

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

In [34]:
# Zeros
np.zeros((10,6))

array([[0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0.]])

In [39]:
# Linespace
np.linspace(0,1000,5)

array([   0.,  250.,  500.,  750., 1000.])

In [41]:
np.random.randint(0, 10, (4,4))

array([[4, 9, 4, 8],
       [7, 6, 6, 1],
       [6, 8, 8, 6],
       [0, 9, 4, 9]])

#### Understanding NumPy ndarrays

In [42]:
np.random.random((3,2))

array([[0.5294731 , 0.44145311],
       [0.57488104, 0.08955476],
       [0.41966459, 0.49310907]])

In [43]:
np.eye(5)

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [44]:
# full
np.full((5,6),8)


array([[8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8],
       [8, 8, 8, 8, 8, 8]])

In [46]:
#empty
np.empty((3,3))

array([[ 111.11111111,  222.22222222,  333.33333333],
       [ 444.44444444,  555.55555556,  666.66666667],
       [ 777.77777778,  888.88888889, 1000.        ]])

In [47]:
pod1 = np.random.randint(0,10,(4,7))

In [48]:
print(pod1)

[[8 3 4 1 0 0 6]
 [1 5 2 8 0 5 1]
 [7 0 8 4 9 7 9]
 [5 1 4 6 8 5 8]]


In [49]:
pod1.ndim

2

In [50]:
pod1.shape #rows in columns

(4, 7)

In [51]:
#size
pod1.size #nr pf elements

28

In [52]:
pod1.itemsize #velikost elementa v arrayu

8

In [53]:
pod1.nbytes #velikost celega arraya

224

#### Selecting and Slicing Rows and Items from ndarrays

<p><img alt="Selecting rows from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_rows.svg"></p>



This is how we select a single item from a 2D ndarray:

<p><img alt="Selecting a single item from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_item.svg"></p>


In [None]:
# ndarray[row,colum]

- int 3 
- slice 0:5, 5:
- :
- [1,5,9]
- boolean array

In [55]:
test_array=np.random.randint(10,size=(5,5))
test_array

array([[7, 3, 5, 7, 8],
       [6, 0, 3, 7, 4],
       [8, 0, 8, 7, 1],
       [4, 5, 2, 8, 9],
       [2, 5, 8, 3, 0]])

In [57]:
firt_row=test_array[0]
firt_row

array([7, 3, 5, 7, 8])

In [60]:
last_row=test_array[-1]
last_row

array([2, 5, 8, 3, 0])

In [64]:
#2 in 3 vrsto
test_array[1:3] #od 1-3 brez trojke
#ali
#test_array[[1,2]]

array([[6, 0, 3, 7, 4],
       [8, 0, 8, 7, 1]])

In [65]:
# od 1 do konca
test_array[1:]

array([[6, 0, 3, 7, 4],
       [8, 0, 8, 7, 1],
       [4, 5, 2, 8, 9],
       [2, 5, 8, 3, 0]])

#### Selecting Columns and Custom Slicing ndarrays

Let's continue by learning how to select one or more columns of data:

<p><img alt="Selecting columns from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_columns.svg"></p>



If we wanted to select a partial 1D slice of a row or column, we can combine a single value for one dimension with a slice for the other dimension:

<p><img alt="Selecting partial 1D slices from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_1darray.svg"></p>

Lastly, if we wanted to select a 2D slice, we can use slices for both dimensions:

<p><img alt="Selecting a 2D slice from a 2D ndarray" src="https://s3.amazonaws.com/dq-content/289/selection_2darray.svg"></p>



In [77]:
test_array2=np.random.randint((10), size=(5,5))
test_array2


array([[0, 4, 1, 2, 1],
       [1, 8, 8, 3, 3],
       [4, 0, 5, 1, 2],
       [8, 1, 3, 1, 1],
       [1, 5, 3, 4, 6]])

In [66]:
#stolpec 2
test_array2[:1]

array([[7, 3, 5, 7, 8]])

In [74]:
#stolpce 2,4,5
test_array2[:,[1,3,4]]

array([[5, 4, 0],
       [8, 6, 9],
       [7, 1, 5],
       [8, 4, 7],
       [3, 2, 7]])

In [76]:
#3 vrstica stolpec elementi 2 do 4
test_array2[2,1:4]

array([7, 8, 1])

In [78]:
test_array2[0,0] = 125

In [79]:
test_array2

array([[125,   4,   1,   2,   1],
       [  1,   8,   8,   3,   3],
       [  4,   0,   5,   1,   2],
       [  8,   1,   3,   1,   1],
       [  1,   5,   3,   4,   6]])

In [80]:
#pazi na to da ti ne odreže float decimalo spremeni v celo vse to spada pod Modify values in ndarray
test_array2[0,1] = 125.212
test_array2

array([[125, 125,   1,   2,   1],
       [  1,   8,   8,   3,   3],
       [  4,   0,   5,   1,   2],
       [  8,   1,   3,   1,   1],
       [  1,   5,   3,   4,   6]])

#### Modify values in ndarray



#### Datatypes

[Več o datatypes](https://docs.scipy.org/doc/numpy/reference/arrays.dtypes.html)

[List of scalars](https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html#arrays-scalars-built-in)

In [82]:
x = np.array([1,2])
print(x.dtype)

int64


In [83]:
x = np.array([1.0,2.0])
print(x.dtype)

float64


In [84]:
np.zeros(10,dtype=np.int16)

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0], dtype=int16)

<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

### Computation on NumPy Arrays: Universal Functions


#### The Slowness of Loops



Pogledaki si bomo hitre funkcije za hitro računanje

In [1]:
import numpy as np

In [18]:
def comute_reciprocals(values):
    output =np.empty(len(values))
    for i in range (len(values)):
        output[i]=1.0/values[i]
    return output


In [19]:
np.random.seed(0)
values =np.random.randint(1,10,size=5)

In [20]:
values

array([6, 1, 4, 4, 8])

In [21]:
comute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [24]:
big_array=np.random.randint(1,100,size=1000000)

In [25]:
%timeit comute_reciprocals(big_array)

14.7 s ± 360 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [26]:
L1=[1,2,3]
L2=[4,5,6]

In [27]:
L1+L2

[1, 2, 3, 4, 5, 6]

In [28]:
1.0/values

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [31]:
%timeit (1.0/big_array)

8.62 ms ± 76 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


In [33]:
np.arange(5)/np.arange(1,6)

array([0.        , 0.5       , 0.66666667, 0.75      , 0.8       ])

In [35]:
x=np.arange(9).reshape((3,3))

In [36]:
x

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [37]:
2**x

array([[  1,   2,   4],
       [  8,  16,  32],
       [ 64, 128, 256]])

#### Introducing UFuncs (Universal functions)

[Docs](https://docs.scipy.org/doc/numpy/reference/ufuncs.html())



### Uvoz realnih podatkov


In [41]:
!head data/DATA_taxi_data.csv

VendorID,lpep_pickup_datetime,lpep_dropoff_datetime,store_and_fwd_flag,RatecodeID,PULocationID,DOLocationID,passenger_count,trip_distance,fare_amount,extra,mta_tax,tip_amount,tolls_amount,ehail_fee,improvement_surcharge,total_amount,payment_type,trip_type

2,2017-01-01 00:01:15,2017-01-01 00:11:05,N,1,42,166,1,1.71,9,0,0.5,0,0,,0.3,9.8,2,1
2,2017-01-01 00:03:34,2017-01-01 00:09:00,N,1,75,74,1,1.44,6.5,0.5,0.5,0,0,,0.3,7.8,2,1
2,2017-01-01 00:04:02,2017-01-01 00:12:55,N,1,82,70,5,3.45,12,0.5,0.5,2.66,0,,0.3,15.96,1,1
2,2017-01-01 00:01:40,2017-01-01 00:14:23,N,1,255,232,1,2.11,10.5,0.5,0.5,0,0,,0.3,11.8,2,1
2,2017-01-01 00:00:51,2017-01-01 00:18:55,N,1,166,239,1,2.76,11.5,0.5,0.5,0,0,,0.3,12.8,2,1
2,2017-01-01 00:00:28,2017-01-01 00:13:31,N,1,179,226,1,4.14,15,0.5,0.5,0,0,,0.3,16.3,1,1
2,2017-01-01 00:02:39,2017-01-01 00:26:28,N,1,74,167,1,4.22,19,0.5,0.5,0,0,,0.3,20.3,2,1
2,2017-01-01 00:15:21,2017-01-01 00:28:06,N,1,112,37,1,2.83,11,0.5,0.5,0,0,,0.3,12.3,2,1


In [44]:
import csv

In [47]:
with open('data/DATA_taxi_data.csv','r')as f:
    taxi_list=list(csv.reader(f))

In [48]:
taxi_list=taxi_list[2:]

In [68]:
converted_taxi_list=[]
for row in taxi_list:
    converted_row=[]
    for item in row:
        try:
            converted_row.append(float(item))
        except:
            continue
    converted_taxi_list.append(converted_row)

In [70]:
taxi=np.array(converted_taxi_list)

- Row 1 is RatecodeID
- Row 2 is PULocationID
- Row 3 is DOLocationID
- Row 4 is passenger_count
- Row 5 is trip_distance
- Row 6 is fare_amount
- Row 7 is extra
- Row 8 is mta_tax
- Row 9 is tip_amount
- Row 10 is tolls_amount
- Row 11 is improvement_surcharge
- Row 12 is total_amount
- Row 13 is payment_type
- Row 14 is trip_type

### Vector Math



In [53]:
x=np.array([1,2,3])
y=np.array([4,5,6])

In [54]:
x-y

array([-3, -3, -3])

In [55]:
x*y

array([ 4, 10, 18])

In [57]:
x**2

array([1, 4, 9])

In [58]:
x.dot(y)

32

In [60]:
z=np.array([y,y**2])


Here's what happened behind the scenes:

<p><img alt="Vectorized Addition" src="https://s3.amazonaws.com/dq-content/289/vectorized_addition.svg"></p>


- `vector_a + vector_b` - Addition
- `vector_a - vector_b7` - Subtraction
- `vector_a * vector_b` - Multiplication (this is unrelated to the vector multiplication used in linear algebra).
- `vector_a / vector_b` - Division
- `vector_a % vector_b` - Modulus (find the remainder when vector_a is divided by vector_b)
- `vector_a ** vector_b` - Exponent (raise vector_a to the power of vector_b)
- `vector_a // vector_b` - Floor Division (divide vector_a by vector_b, rounding down to the nearest integer)

In [61]:
a=z.T
a

array([[ 4, 16],
       [ 5, 25],
       [ 6, 36]])

In [62]:
z.shape

(2, 3)

In [71]:
taxi=np.array(converted_taxi_list)

In [72]:
col1=taxi[:,6]
col2=taxi[:,8]
col3=taxi[:,11]
sums=col1+col2+col3

In [73]:
taxi.shape

(19998, 15)

In [74]:
trip_distance=taxi[:,5]#računanje cene vseh voženj s taksijem posebi
trip_price=taxi[:,12]

In [77]:
price_per_mile=trip_price/trip_distance #error zaradi deljenja z 0, on to zna preskočiti, spremeni v nan

  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.


In [76]:
price_per_mile

array([5.73099415, 5.41666667, 4.62608696, ..., 3.70046083, 4.11904762,
       5.12      ])

In [78]:
np.array([1,2,3])/0

  """Entry point for launching an IPython kernel.


array([inf, inf, inf])

In [80]:
price_per_mile2=np.divide(trip_price,trip_distance)

  """Entry point for launching an IPython kernel.
  """Entry point for launching an IPython kernel.



<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The following table lists the arithmetic operators implemented in NumPy:</p>
<table>
<thead><tr>
<th>Operator</th>
<th>Equivalent ufunc</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>+</code></td>
<td><code>np.add</code></td>
<td>Addition (e.g., <code>1 + 1 = 2</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.subtract</code></td>
<td>Subtraction (e.g., <code>3 - 2 = 1</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.negative</code></td>
<td>Unary negation (e.g., <code>-2</code>)</td>
</tr>
<tr>
<td><code>*</code></td>
<td><code>np.multiply</code></td>
<td>Multiplication (e.g., <code>2 * 3 = 6</code>)</td>
</tr>
<tr>
<td><code>/</code></td>
<td><code>np.divide</code></td>
<td>Division (e.g., <code>3 / 2 = 1.5</code>)</td>
</tr>
<tr>
<td><code>//</code></td>
<td><code>np.floor_divide</code></td>
<td>Floor division (e.g., <code>3 // 2 = 1</code>)</td>
</tr>
<tr>
<td><code>**</code></td>
<td><code>np.power</code></td>
<td>Exponentiation (e.g., <code>2 ** 3 = 8</code>)</td>
</tr>
<tr>
<td><code>%</code></td>
<td><code>np.mod</code></td>
<td>Modulus/remainder (e.g., <code>9 % 4 = 1</code>)</td>
</tr>
</tbody>
</table>
<p>Additionally there are Boolean/bitwise operators; we will explore these in <a href="02.06-boolean-arrays-and-masks.html">Comparisons, Masks, and Boolean Logic</a>.</p>

</div>
</div>

[Mathematical expressions](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html#arithmetic-operations)

### Calculating Statistics For 1D ndarrays



In [81]:
price_min=taxi[:,12].min()

In [82]:
price_min

-60.0

In [85]:
price_max=taxi[:,12].max()

In [86]:
price_max

240.0

In [88]:
#funkcija
np.max(taxi[:,12])

240.0

In [87]:
#metoda
taxi[:,12].max()

240.0

### Calculating Statistics For 2D ndarrays

For now, we're going to look at how we can calculate statistics for two-dimensional ndarrays. If we use the arrays without additional parameters, they will return a single value, just like they do with a 1D array:

<p><img alt="Array method without axis parameter" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_none.svg"></p>

But what if we wanted to find the maximum value of each row? For that, we need to use the axis parameter, and specify a value of 1, which indicates we want to calculate values for each row.

<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_1.svg"></p>

If we want to find the maximum value of each column, we use an axis value of 0:

<p><img alt="Array method without axis 1" src="https://s3.amazonaws.com/dq-content/289/array_method_axis_0.svg"></p>

To help you remember which is which, you can think of the first axis as rows, and the second axis as columns, just in the same way as when we're indexing a 2D NumPy array we use ndarray[row,column]. Then you think about which axis you want to apply the method along. The tricky part is to remember that when you apply the method along one axis, you get results in the other axis. Here is an illustration of that:

<p><img alt="The axis parameter" src="https://s3.amazonaws.com/dq-content/289/axis_param.svg"></p>



In [89]:
taxi_first_20=taxi[:20]

In [90]:
cene_skupaj=taxi_first_20[:,6:12]


In [91]:
cene_skupaj_prepracunano=taxi_first_20[:,12]

In [92]:
cene_sestevek=cene_skupaj.sum(axis=1)

In [95]:
cene_sestevek.round()

array([10.,  8., 16., 12., 13., 16., 20., 12.,  6., 15., 12., 23., 31.,
        8., 26., 11.,  3., 12., 22., 12.])

In [96]:
cene_skupaj_prepracunano.round()

array([10.,  8., 16., 12., 13., 16., 20., 12.,  6., 15., 12., 23., 31.,
        8., 26., 11.,  3., 12., 22., 12.])

### Adding Rows and Columns to ndarrays


In [97]:
ones=np.ones((2,3))

In [98]:
ones

array([[1., 1., 1.],
       [1., 1., 1.]])

In [99]:
zeros=np.zeros(3)

In [100]:
zeros

array([0., 0., 0.])

In [101]:
combined=np.concatenate([ones,zeros], axis=0)

ValueError: all the input arrays must have same number of dimensions

In [102]:
ones.shape

(2, 3)

In [103]:
zeros.shape

(3,)

In [105]:
zeros_2d=np.expand_dims(zeros, axis=0)
zeros_2d

array([[0., 0., 0.]])

In [106]:
zeros

array([0., 0., 0.])

In [107]:
combined=np.concatenate([ones,zeros_2d],axis=0)
combined

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.]])

In [108]:
price_per_mile.shape

(19998,)

In [110]:
price_per_mile_2d=np.expand_dims(price_per_mile,axis=1)
price_per_mile_2d

array([[5.73099415],
       [5.41666667],
       [4.62608696],
       ...,
       [3.70046083],
       [4.11904762],
       [5.12      ]])

In [111]:
price_per_mile_2d.shape

(19998, 1)

In [112]:
taxi=np.concatenate([taxi,price_per_mile_2d],axis=1)

In [113]:
taxi[:,-1]

array([5.73099415, 5.41666667, 4.62608696, ..., 3.70046083, 4.11904762,
       5.12      ])

### Sorting ndarrays


In [115]:
sadje=np.array(['pomaranca','banana','jabolka','grozdje','cesnja'])

In [116]:
sadje[2]

'jabolka'

In [118]:
sadje[[2,1]]

array(['jabolka', 'banana'], dtype='<U9')

In [120]:
sortred_order=np.argsort(sadje) #razvršča indekse in ne dejansko sadje
sortred_order

array([1, 4, 3, 2, 0])

In [122]:
sortirano_sadje=sadje[sortred_order]
sortirano_sadje

array(['banana', 'cesnja', 'grozdje', 'jabolka', 'pomaranca'], dtype='<U9')

###2D Array

In [124]:
int_square=np.random.randint(10,size=(5,5))
int_square

array([[3, 4, 1, 9, 0],
       [2, 1, 0, 1, 0],
       [7, 2, 2, 5, 5],
       [3, 1, 3, 6, 8],
       [4, 0, 7, 3, 6]])

'''([[3, 4, 1, 9, 0],
       [2, 1, 0, 1, 0],
       [7, 2, 2, 5, 5],
       [3, 1, 3, 6, 8],
       [4, 0, 7, 3, 6]])''' #želimo imeti 

In [127]:
last_column=int_square[:,4]
last_column

array([0, 0, 5, 8, 6])

In [128]:
sorted_order=np.argsort(last_column)
sorted_order

array([0, 1, 2, 4, 3])

In [129]:
last_column_sorted=last_column[sorted_order]
last_column_sorted

array([0, 0, 5, 6, 8])

In [131]:
int_square_sorted=int_square[np.argsort(int_square[:,4])]
int_square_sorted

array([[3, 4, 1, 9, 0],
       [2, 1, 0, 1, 0],
       [7, 2, 2, 5, 5],
       [4, 0, 7, 3, 6],
       [3, 1, 3, 6, 8]])

In [133]:
#nazaj v taxi dataset in price per colum naredi od največje do najmanjše
sorted_order=np.argsort(taxi[:,-1])

In [134]:
taxi_sorted=taxi[sorted_order]
taxi_sorted[:20,-1]

array([         -inf,          -inf,          -inf,          -inf,
                -inf,          -inf,          -inf,          -inf,
                -inf,          -inf,          -inf, -693.33333333,
       -380.        , -380.        , -380.        , -192.5       ,
       -190.        , -138.94736842, -126.66666667,  -95.        ])

###  Reading CSV files with NumPy

In [137]:
taxi=np.genfromtxt('data/DATA_taxi_data.csv',
                  delimiter=',',
                  skip_header=2)
taxi #stolpce, ki ne zna prebrati jih shrani kot nan, vse stringe označi kot nan.

array([[ 2.  ,   nan,   nan, ...,  9.8 ,  2.  ,  1.  ],
       [ 2.  ,   nan,   nan, ...,  7.8 ,  2.  ,  1.  ],
       [ 2.  ,   nan,   nan, ..., 15.96,  1.  ,  1.  ],
       ...,
       [ 1.  ,   nan,   nan, ..., 80.3 ,  1.  ,  1.  ],
       [ 1.  ,   nan,   nan, ..., 17.3 ,  1.  ,  1.  ],
       [ 1.  ,   nan,   nan, ..., 12.8 ,  1.  ,  1.  ]])

In [136]:
taxi.shape

(19998, 19)

###  Boolean Arrays





In [None]:
#npr izberi vse vrednosti manjše od 5 oz enake 3 .... uporabljamo operacije.

In [138]:
3<5

True

In [139]:
np.array([1,2,3,4])+10 #vsem elementom doda 10

array([11, 12, 13, 14])

In [141]:
np.array([2,4,6,8])<5#vsak element posebi primerja z 5 in napiše, če je res

array([ True,  True, False, False])

In [142]:
a=np.array([1,2,3,4,5])
a_bool=a<3
a_bool

array([ True,  True, False, False, False])

In [143]:
b=np.array(['blue','red','black'])
b_bool=b =='red'
b_bool

array([False,  True, False])

A similar pattern occurs– the 'less than five' operation is applied to each value in the array. The diagram below shows this step by step:

<p><img alt="Vectorized boolean operation" src="https://s3.amazonaws.com/dq-content/290/vectorized_bool.svg"></p>

### Boolean Indexing with 1D ndarrays




<p><img alt="Boolean indexing 1D ndarrays 1" src="https://s3.amazonaws.com/dq-content/290/1d_bool_1.svg"></p>



<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/1d_bool_2.svg"></p>




In [147]:
passenger_count = taxi[:,7]
passenger_count

array([1., 1., 5., ..., 2., 1., 1.])

In [148]:
two_pass_bool = passenger_count == 2

In [149]:
two_passengers = passenger_count[two_pass_bool]
two_passengers

array([2., 2., 2., ..., 2., 2., 2.])

### Boolean Indexing with 2D ndarrays


<p><img alt="Boolean indexing 1D ndarrays 2" src="https://s3.amazonaws.com/dq-content/290/bool_dims.svg"></p>


In [152]:
arr=np.random.randint(10,size=(4,3))
arr

array([[5, 1, 5],
       [8, 6, 9],
       [2, 6, 4],
       [8, 4, 3]])

In [154]:
bool_1=[True,False,True,True]
bool_2=[True,False,True,True]

In [None]:
arr[:,bool_1]

### Assigning Values in ndarrays

In [157]:
b=np.array(['blue','red','blackee'])
b

array(['blue', 'red', 'blackee'], dtype='<U7')

In [158]:
b[1]='orange'#pazi ker ti izpiše naveč toliko zankov kot jih ima najdaljši string
b

array(['blue', 'orange', 'blackee'], dtype='<U7')

In [160]:
b[1:]='pink'
b

array(['blue', 'pink', 'pink'], dtype='<U7')

### Subarrays as no-copy views



In [162]:
r=np.ones((4,4))

In [163]:
r2=r[:2,:2]

In [164]:
r2[:]=0
r2

array([[0., 0.],
       [0., 0.]])

In [165]:
r #v tem primeru je tri r 0 ker je to glavna spremenljivka in nista to dva različne objekta.

array([[0., 0., 1., 1.],
       [0., 0., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

### Copying Data


In [166]:
r3=np.random.randint(10,size=(5,5))
r3

array([[4, 5, 9, 8, 5],
       [8, 2, 7, 3, 6],
       [6, 7, 2, 5, 0],
       [1, 6, 4, 5, 0],
       [6, 8, 1, 1, 8]])

In [168]:
r3_sub_copy=r3[:2,:2].copy()
r3_sub_copy

array([[4, 5],
       [8, 2]])

In [169]:
r3

array([[4, 5, 9, 8, 5],
       [8, 2, 7, 3, 6],
       [6, 7, 2, 5, 0],
       [1, 6, 4, 5, 0],
       [6, 8, 1, 1, 8]])