# Uvod v NumPy


## Understanding Data Types in Python



```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```


```python
# Python code
result = 0
for i in range(100):
    result += i
```


### A Python Integer Is More Than Just an Integer



```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```



<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" alt="Integer Memory Layout">

### A Python List Is More Than Just a List



In [2]:
L = list(range(10))

In [3]:
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [4]:
type(L[0])

int

In [5]:
L2 = [str(c) for c in L]

In [6]:
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']


<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" alt="Array Memory Layout">

### Fixed-Type Arrays in Python



In [7]:
import array

In [9]:
L = list(range(10))
A = array.array('i', L)

In [10]:
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## How Vectorization Makes Code Faster


<p><img alt="Translating Python code to bytecode" src="https://s3.amazonaws.com/dq-content/289/bytecode.svg"></p>


<table>
<thead>
<tr>
<th>Language Type</th>
<th>Example</th>
<th>Time taken to write program</th>
<th>Control over program performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Level</td>
<td>Python</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Low-Level</td>
<td>C</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>



<p><img alt="For loop to sum rows" src="https://s3.amazonaws.com/dq-content/289/for_loop.svg"></p>

In [3]:
my_numbers = [[6,5], [1,3], [5,6]]

sums = []

for row in my_numbers:
    row_sum = row[0] + row[1]
    sums.append(row_sum)
    
print(sums)    

[11, 4, 11]




<p><img alt="Unvectorized operation" src="https://s3.amazonaws.com/dq-content/289/unvectorized.svg"></p>


<p><img alt="Vectorized operation" src="https://s3.amazonaws.com/dq-content/289/vectorized.svg"></p>



## Numpy library

[Dokumentacija](http://www.numpy.org/)

In [11]:
import numpy as np

## Introduction to Ndarrays

<img alt="Dimensional Arrays" src="./images/one_dim.svg">

In [12]:
data_ndarray = np.array([5, 10, 15, 20])

In [13]:
data_ndarray

array([ 5, 10, 15, 20])

In [14]:
type(data_ndarray)

numpy.ndarray

<img alt="Dimensional Arrays" src="./images/Two_Dim.svg">

In [15]:
data2 = [[1,2,3,4],[5,6,7,8]]
arr2 = np.array(data2)

## Priprava podatkov za delo

<div>

<ul>
<li><code>pickup_year</code>: The year of the trip.</li>
<li><code>pickup_month</code>: The month of the trip (January is <code>1</code>, December is <code>12</code>).</li>
<li><code>pickup_day</code>: The day of the month of the trip.</li>
<li><code>pickup_location_code</code>: The airport or <a target="_blank" href="https://en.wikipedia.org/wiki/Boroughs_of_New_York_City">borough</a> where the trip started.</li>
<li><code>dropoff_location_code</code>: The airport or borough where the trip finished.</li>
<li><code>trip_distance</code>: The distance of the trip in miles.</li>
<li><code>trip_length</code>: The length of the trip in seconds.</li>
<li><code>fare_amount</code>: The base fare of the trip, in dollars.</li>
<li><code>total_amount</code>: The total amount charged to the passenger, including all fees, tolls and tips.</li>
</ul>

</div>

    pickup_year,pickup_month,pickup_day,pickup_dayofweek
    2016,1,1,5
    2016,1,1,5
    2016,1,1,5
    2016,1,1,5

In [16]:
!head -n 7 data/nyc_taxis.csv

pickup_year,pickup_month,pickup_day,pickup_dayofweek,pickup_time,pickup_location_code,dropoff_location_code,trip_distance,trip_length,fare_amount,fees_amount,tolls_amount,tip_amount,total_amount,payment_type
2016,1,1,5,0,2,4,21.00,2037,52.00,0.80,5.54,11.65,69.99,1
2016,1,1,5,0,2,1,16.29,1520,45.00,1.30,0.00,8.00,54.30,1
2016,1,1,5,0,2,6,12.70,1462,36.50,1.30,0.00,0.00,37.80,2
2016,1,1,5,0,2,6,8.70,1210,26.00,1.30,0.00,5.46,32.76,1
2016,1,1,5,0,2,6,5.56,759,17.50,1.30,0.00,0.00,18.80,2
2016,1,1,5,0,4,2,21.45,2004,52.00,0.80,0.00,52.80,105.60,1


    # our list of lists is stored as data_list
    data_ndarray = np.array(data_list)

In [18]:
import csv
import numpy as np

with open('data/nyc_taxis.csv') as f:
    taxi_list = list(csv.reader(f))



In [19]:
print(taxi_list[:3])

[['pickup_year', 'pickup_month', 'pickup_day', 'pickup_dayofweek', 'pickup_time', 'pickup_location_code', 'dropoff_location_code', 'trip_distance', 'trip_length', 'fare_amount', 'fees_amount', 'tolls_amount', 'tip_amount', 'total_amount', 'payment_type'], ['2016', '1', '1', '5', '0', '2', '4', '21.00', '2037', '52.00', '0.80', '5.54', '11.65', '69.99', '1'], ['2016', '1', '1', '5', '0', '2', '1', '16.29', '1520', '45.00', '1.30', '0.00', '8.00', '54.30', '1']]


In [20]:
taxi_list = taxi_list[1:]

In [21]:
converted_taxi_list = []

for row in taxi_list:
    converted_row = []
    for item in row:
        converted_row.append(float(item))
    converted_taxi_list.append(converted_row)

In [22]:
print(converted_taxi_list[:3])

[[2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 4.0, 21.0, 2037.0, 52.0, 0.8, 5.54, 11.65, 69.99, 1.0], [2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 1.0, 16.29, 1520.0, 45.0, 1.3, 0.0, 8.0, 54.3, 1.0], [2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 6.0, 12.7, 1462.0, 36.5, 1.3, 0.0, 0.0, 37.8, 2.0]]


In [24]:
taxi = np.array(converted_taxi_list)

In [29]:
data2 = [[1,2,3,4], [5,6,7,8]]
data2 = np.array(data2)
data2.shape

(2, 4)

In [30]:
taxi.shape

(89560, 15)

In [31]:
89560*15 #število elementov

1343400

In [32]:
taxi.size

1343400

In [33]:
taxi.ndim

2

In [34]:
taxi.itemsize

8

In [35]:
taxi.nbytes

10747200

In [36]:
taxi.nbytes/1024/1024

10.24932861328125

## Array Shapes

In [27]:
print(taxi)

[[2.016e+03 1.000e+00 1.000e+00 ... 1.165e+01 6.999e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 8.000e+00 5.430e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 0.000e+00 3.780e+01 2.000e+00]
 ...
 [2.016e+03 6.000e+00 3.000e+01 ... 5.000e+00 6.334e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 8.950e+00 4.475e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 0.000e+00 5.484e+01 2.000e+00]]


In [28]:
type(taxi)

numpy.ndarray

<div class="alert alert-block alert-info">
<b>Vaja:</b> Assign the shape of taxi to taxi_shape. Print the result.</div>

## Selecting and Slicing Rows and Items from ndarrays

<img alt="Dimensional Arrays" src="./images/selection_rows.svg">

    ndarray[row_index,column_index]

    # or if you want to select all
    # columns for a given set of rows
    ndarray[row_index]

<img alt="Dimensional Arrays" src="./images/selection_item.svg">

In [10]:
import numpy as np

In [11]:
test = np.random.randint(0, 10, (5,5))

In [12]:
test

array([[5, 1, 7, 6, 3],
       [4, 0, 6, 5, 4],
       [1, 0, 0, 0, 1],
       [1, 5, 7, 0, 7],
       [1, 6, 0, 5, 8]])

In [13]:
test[0]

array([5, 1, 7, 6, 3])

In [14]:
test[-1]

array([1, 6, 0, 5, 8])

In [15]:
test[1:3]

array([[4, 0, 6, 5, 4],
       [1, 0, 0, 0, 1]])

In [16]:
row2_3 = test [1:3]

In [17]:
row2_3

array([[4, 0, 6, 5, 4],
       [1, 0, 0, 0, 1]])

<div class="alert alert-block alert-info">
<b>Vaja:</b> From the taxi ndarray:
- Select the row at index 0. Assign it to row_0.
- Select every column for the rows at indexes 391 to 500 inclusive. Assign them to rows_391_to_500.
- Select the item at row index 21 and column index 5. Assign it to row_21_column_5.</div>

In [25]:
row_0 = taxi[0]

In [26]:
row_0

array([2.016e+03, 1.000e+00, 1.000e+00, 5.000e+00, 0.000e+00, 2.000e+00,
       4.000e+00, 2.100e+01, 2.037e+03, 5.200e+01, 8.000e-01, 5.540e+00,
       1.165e+01, 6.999e+01, 1.000e+00])

In [28]:
rows_391_to_500 = taxi[391:501]

In [29]:
rows_391_to_500

array([[2.016e+03, 1.000e+00, 2.000e+00, ..., 0.000e+00, 2.630e+01,
        2.000e+00],
       [2.016e+03, 1.000e+00, 2.000e+00, ..., 3.000e+00, 3.030e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 2.000e+00, ..., 6.670e+00, 4.001e+01,
        1.000e+00],
       ...,
       [2.016e+03, 1.000e+00, 2.000e+00, ..., 4.960e+00, 2.976e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 2.000e+00, ..., 0.000e+00, 3.284e+01,
        2.000e+00],
       [2.016e+03, 1.000e+00, 2.000e+00, ..., 7.050e+00, 4.239e+01,
        1.000e+00]])

In [30]:
row_21_column_5 = [21, 5]

In [31]:
row_21_column_5 

[21, 5]

## Selecting Columns and Custom Slicing ndarrays

<img alt="Dimensional Arrays" src="./images/selection_columns_updated.svg">

<img alt="Dimensional Arrays" src="./images/selection_1darray_updated.svg">

<img alt="Dimensional Arrays" src="./images/selection_2darray_updated.svg">

In [32]:
column_test = np.random.randint(0, 10, (5,5))

In [33]:
column_test

array([[3, 0, 7, 9, 5],
       [5, 9, 2, 1, 5],
       [2, 0, 6, 7, 7],
       [0, 9, 0, 1, 0],
       [6, 3, 2, 4, 0]])

In [34]:
column_test[:,3]

array([9, 1, 7, 1, 4])

In [35]:
column_test[:, 1:3]

array([[0, 7],
       [9, 2],
       [0, 6],
       [9, 0],
       [3, 2]])

In [36]:
column_test[:, [1, 3, 4]]

array([[0, 9, 5],
       [9, 1, 5],
       [0, 7, 7],
       [9, 1, 0],
       [3, 4, 0]])

In [37]:
column_test[2, 1:4]

array([0, 6, 7])

In [38]:
column_test[1:, 4]

array([5, 7, 0, 0])

In [39]:
column_test[1:4, :3]

array([[5, 9, 2],
       [2, 0, 6],
       [0, 9, 0]])

<div class="alert alert-block alert-info">
<b>Vaja:</b> From the taxi ndarray:
- Select every row for the columns at indexes 1, 4, and 7. Assign them to columns_1_4_7.
- Select the columns at indexes 5 to 8 inclusive for the row at index 99. Assign them to row_99_columns_5_to_8.
- Select the rows at indexes 100 to 200 inclusive for the column at index 14. Assign them to rows_100_to_200_column_14.</div>

In [40]:
columns_1_4_7 = taxi[:, [1, 4, 7]]

In [41]:
columns_1_4_7

array([[ 1.  ,  0.  , 21.  ],
       [ 1.  ,  0.  , 16.29],
       [ 1.  ,  0.  , 12.7 ],
       ...,
       [ 6.  ,  5.  , 17.48],
       [ 6.  ,  5.  , 12.76],
       [ 6.  ,  5.  , 17.54]])

In [46]:
row_99_columns_5_to_8 = taxi[99, 5:9]

In [47]:
row_99_columns_5_to_8

array([   2.  ,    4.  ,   20.91, 1744.  ])

In [48]:
rows_100_to_200_column_14 = taxi[100:201, 14]

In [49]:
rows_100_to_200_column_14

array([2., 1., 1., 1., 1., 1., 2., 1., 1., 2., 1., 1., 1., 2., 2., 2., 1.,
       2., 1., 2., 1., 1., 2., 2., 2., 1., 1., 2., 1., 2., 1., 1., 2., 2.,
       1., 1., 2., 2., 1., 1., 1., 2., 1., 1., 1., 2., 2., 2., 2., 2., 1.,
       4., 2., 1., 2., 1., 2., 2., 2., 2., 1., 1., 2., 1., 2., 2., 2., 2.,
       1., 2., 2., 1., 2., 1., 2., 1., 2., 2., 1., 1., 1., 1., 2., 1., 1.,
       2., 2., 1., 1., 2., 2., 1., 1., 2., 1., 1., 1., 1., 1., 2., 2.])

## Vector Math

In [56]:
import numpy as np

In [57]:
my_numbers = [[6,5], [9,1], [2,4], [7,14], [8,6]]

In [51]:
sums = []
for row in my_numbers:
    row_sums = row[0] + row[1]
    sums.append(row_sums)
    
print(sums)

[11, 10, 6, 21, 14]


#vektorsko

In [58]:
my_numbers = np.array(my_numbers)

In [59]:
my_numbers

array([[ 6,  5],
       [ 9,  1],
       [ 2,  4],
       [ 7, 14],
       [ 8,  6]])

In [60]:
sums = my_numbers[:,0] + my_numbers[:,1]

In [61]:
sums

array([11, 10,  6, 21, 14])

<div class="alert alert-block alert-info">
<b>Vaja:</b> 
Use vector addition to add fare_amount and fees_amount. Assign the result to fare_and_fees.
After you have run your code, use the variable inspector below the code box to inspect the variables.</div>

In [62]:
#taxi stolpec indeks 9
fare_amount = taxi[:,9]

In [63]:
#indeks 10
fees_amount = taxi[:,10]

In [65]:
fare_and_fees = taxi[:,9] + taxi[:,10]

In [66]:
fare_and_fees

array([52.8, 46.3, 37.8, ..., 52.8, 35.8, 49.3])

In [None]:
miles_per_hour = distance_in_miles / length_in_hours

In [67]:
distance_in_miles = taxi[:,7]
trip_length_sec = taxi [:,8]

trip_length_hours = trip_length_sec / 3600

In [68]:
trip_mph = distance_in_miles / trip_length_hours

In [69]:
trip_mph

array([37.11340206, 38.58157895, 31.27222982, ..., 22.29907867,
       42.41551247, 36.90473407])



<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The following table lists the arithmetic operators implemented in NumPy:</p>
<table>
<thead><tr>
<th>Operator</th>
<th>Equivalent ufunc</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>+</code></td>
<td><code>np.add</code></td>
<td>Addition (e.g., <code>1 + 1 = 2</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.subtract</code></td>
<td>Subtraction (e.g., <code>3 - 2 = 1</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.negative</code></td>
<td>Unary negation (e.g., <code>-2</code>)</td>
</tr>
<tr>
<td><code>*</code></td>
<td><code>np.multiply</code></td>
<td>Multiplication (e.g., <code>2 * 3 = 6</code>)</td>
</tr>
<tr>
<td><code>/</code></td>
<td><code>np.divide</code></td>
<td>Division (e.g., <code>3 / 2 = 1.5</code>)</td>
</tr>
<tr>
<td><code>//</code></td>
<td><code>np.floor_divide</code></td>
<td>Floor division (e.g., <code>3 // 2 = 1</code>)</td>
</tr>
<tr>
<td><code>**</code></td>
<td><code>np.power</code></td>
<td>Exponentiation (e.g., <code>2 ** 3 = 8</code>)</td>
</tr>
<tr>
<td><code>%</code></td>
<td><code>np.mod</code></td>
<td>Modulus/remainder (e.g., <code>9 % 4 = 1</code>)</td>
</tr>
</tbody>
</table>

</div>
</div>


In [70]:
trip_mph = np.divide(distance_in_miles, trip_length_hours)

In [71]:
trip_mph

array([37.11340206, 38.58157895, 31.27222982, ..., 22.29907867,
       42.41551247, 36.90473407])

## Calculating Statistics For 1D ndarrays

In [72]:
trip_mph.min()

0.0

In [77]:
np.min(trip_mph)

0.0


<p></p><center><img alt="Method syntax" src="https://s3.amazonaws.com/dq-content/289/Method_syntax.svg"></center><p></p>


<div class="alert alert-block alert-info">
<b>Vaja:</b> Use the ndarray.max() method to calculate the maximum value of trip_mph. Assign the result to mph_max.
Use the ndarray.mean() method to calculate the average value of trip_mph. Assign the result to mph_mean.</div>

In [73]:
mph_max = trip_mph.max()

In [74]:
mph_max

82800.0

In [75]:
mph_mean = trip_mph.mean()

In [76]:
mph_mean

32.24258580925573

<div>

<table>
<thead>
<tr>
<th>Calculation</th>
<th>Function Representation</th>
<th>Method Representation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Calculate the minimum value of <code>trip_mph</code></td>
<td><code>np.min(trip_mph)</code></td>
<td><code>trip_mph.min()</code></td>
</tr>
<tr>
<td>Calculate the maximum value of <code>trip_mph</code></td>
<td><code>np.max(trip_mph)</code></td>
<td><code>trip_mph.max()</code></td>
</tr>
<tr>
<td>Calculate the <a target="_blank" href="https://en.wikipedia.org/wiki/Mean">mean average</a> value of <code>trip_mph</code></td>
<td><code>np.mean(trip_mph)</code></td>
<td><code>trip_mph.mean()</code></td>
</tr>
<tr>
<td>Calculate the <a target="_blank" href="https://en.wikipedia.org/wiki/Median">median average</a> value of <code>trip_mph</code></td>
<td><code>np.median(trip_mph)</code></td>
<td>There is no ndarray median method</td>
</tr>
</tbody>
</table>
</div>

## Calculating Statistics For 2D ndarrays

<img alt="Dimensional Arrays" src="./images/array_method_axis_none.svg">

<img alt="Dimensional Arrays" src="./images/array_method_axis_1.svg">

<img alt="Dimensional Arrays" src="./images/array_method_axis_0.svg">



<p><img alt="The axis parameter" src="https://s3.amazonaws.com/dq-content/289/axis_param.svg"></p>


In [78]:
np.ones((4,1))

array([[1.],
       [1.],
       [1.],
       [1.]])

In [80]:
np.ones((1,4))

array([[1., 1., 1., 1.]])

In [81]:
test = np.random.randint(0,9,(5,5))

In [82]:
test

array([[0, 6, 0, 3, 4],
       [1, 5, 2, 4, 7],
       [0, 2, 2, 7, 3],
       [5, 0, 4, 0, 3],
       [8, 3, 5, 6, 7]])

In [83]:
test.max()

8

In [90]:
test.max(axis=0)

array([8, 6, 5, 7, 7])

In [91]:
test.max(axis=1)

array([6, 7, 7, 5, 8])

In [85]:
taxi_first_five = taxi[:5]

In [86]:
fare_components = taxi_first_five[:, 9:13]

In [87]:
fare_components

array([[52.  ,  0.8 ,  5.54, 11.65],
       [45.  ,  1.3 ,  0.  ,  8.  ],
       [36.5 ,  1.3 ,  0.  ,  0.  ],
       [26.  ,  1.3 ,  0.  ,  5.46],
       [17.5 ,  1.3 ,  0.  ,  0.  ]])

In [88]:
fare_components.sum(axis=1)

array([69.99, 54.3 , 37.8 , 32.76, 18.8 ])

In [89]:
taxi_first_five[:,13]

array([69.99, 54.3 , 37.8 , 32.76, 18.8 ])

## Reading CSV files with NumPy

<p>Below is information about selected columns from the data set:</p>
<ul>
<li><code>pickup_year</code>: The year of the trip.</li>
<li><code>pickup_month</code>: The month of the trip (January is <code>1</code>, December is <code>12</code>).</li>
<li><code>pickup_day</code>: The day of the month of the trip.</li>
<li><code>pickup_location_code</code>: The airport or <a target="_blank" href="https://en.wikipedia.org/wiki/Boroughs_of_New_York_City">borough</a> where the the trip started.</li>
<li><code>dropoff_location_code</code>: The airport or borough where the the trip finished.</li>
<li><code>trip_distance</code>: The distance of the trip in miles.</li>
<li><code>trip_length</code>: The length of the trip in seconds.</li>
<li><code>fare_amount</code>: The base fare of the trip, in dollars.</li>
<li><code>total_amount</code>: The total amount charged to the passenger, including all fees, tolls and tips.</li>
</ul>


In [92]:
taxi = np.genfromtxt('data/nyc_taxis.csv', delimiter = ',')

In [93]:
taxi_shape = taxi.shape

In [94]:
taxi_shape

(89561, 15)

In [95]:
taxi.dtype

dtype('float64')

In [96]:
print(taxi)

[[      nan       nan       nan ...       nan       nan       nan]
 [2.016e+03 1.000e+00 1.000e+00 ... 1.165e+01 6.999e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 8.000e+00 5.430e+01 1.000e+00]
 ...
 [2.016e+03 6.000e+00 3.000e+01 ... 5.000e+00 6.334e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 8.950e+00 4.475e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 0.000e+00 5.484e+01 2.000e+00]]


In [97]:
#taxi = taxi[1:] ali skip header
taxi = np.genfromtxt('data/nyc_taxis.csv', delimiter = ',', skip_header = 1)

In [98]:
print(taxi)

[[2.016e+03 1.000e+00 1.000e+00 ... 1.165e+01 6.999e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 8.000e+00 5.430e+01 1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 ... 0.000e+00 3.780e+01 2.000e+00]
 ...
 [2.016e+03 6.000e+00 3.000e+01 ... 5.000e+00 6.334e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 8.950e+00 4.475e+01 1.000e+00]
 [2.016e+03 6.000e+00 3.000e+01 ... 0.000e+00 5.484e+01 2.000e+00]]


## Datatypes

In [99]:
x = np.array([1,2])
print(x.dtype)
print(x.nbytes)

int64
16


In [100]:
x = np.array([1.0,2.0])
print(x.dtype)
print(x.nbytes)

float64
16


In [102]:
x = np.array([1,2], dtype=np.int32)
print(x.dtype)
print(x.nbytes)

int32
8


In [103]:
x = np.array([1,2], dtype=np.int8)
print(x.dtype)
print(x.nbytes)

int8
2


In [104]:
#drug način zapisa
x = np.array([1,2], dtype='int8')
print(x.dtype)
print(x.nbytes)

int8
2


In [105]:
x = np.array([189, 22, 128, -129], dtype = np.int8)

In [106]:
x

array([ -67,   22, -128,  127], dtype=int8)

In [None]:
#rollover

<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

## Boolean Indexing

### Boolean Arrays

In [107]:
True, False

(True, False)

In [108]:
type(3.5) == float

True

In [109]:
5 < 3

False

In [110]:
np.array([2,4,5,9]) + 10

array([12, 14, 15, 19])

In [111]:
np.array([2,4,5,9]) < 5

array([ True,  True, False, False])

<div class="alert alert-block alert-info">
Use vectorized boolean operations to:
<li> Evaluate whether the elements in array a are less than 3. Assign the result to a_bool.</li> 
<li> Evaluate whether the elements in array b are equal to "blue". Assign the result to b_bool.</li> 
<li>  Evaluate whether the elements in array c are greater than 100. Assign the result to c_bool.</li> </div>

In [112]:
a = np.array([1,2,3,4,5])

In [113]:
a_bool = a < 3

In [114]:
a_bool

array([ True,  True, False, False, False])

In [117]:
b = np.array(["blue", "blue", "red", "blue"])

In [118]:
b == 'blue'

array([ True,  True, False,  True])

In [121]:
c = np.array([80.0, 103.4, 96.9, 200.3])

In [122]:
c > 100

array([False,  True, False,  True])

### Boolean Indexing with 1D ndarrays

In [123]:
c = np.array([80.0, 103.4, 96.9, 200.3])

In [124]:
c_bool = c > 100

In [125]:
c_bool

array([False,  True, False,  True])

In [126]:
result = c[c_bool]

In [127]:
result

array([103.4, 200.3])

In [128]:
month = taxi[:,1]

In [129]:
january_bool = month == 1

In [130]:
january = month[january_bool]

In [133]:
january.shape[0]

13481

<div class="alert alert-block alert-info">
Calculate the number of rides in the taxi ndarray that are from February:
<li> Create a boolean array, february_bool, that evaluates whether the items in pickup_month are equal to 2.</li> 
<li> Use the february_bool boolean array to index pickup_month. Assign the result to february.</li> 
<li> Use the ndarray.shape attribute to find the number of items in february. Assign the result to february_rides.</li> </div>

In [135]:
february_bool = month == 2
february = month[february_bool]
february_rides = february.shape[0]
february_rides

13333

In [136]:
pickup_month = taxi [:,1]
rides = []

for month in range(1,13):
    month_bool = pickup_month == month
    month_array = pickup_month[month_bool]
    month_rides = month_array.shape[0]
    rides.append(month_rides)
print(rides)

[13481, 13333, 15547, 14810, 16650, 15739, 0, 0, 0, 0, 0, 0]


### Boolean Indexing with 2D ndarrays

<img alt="Dimensional Arrays" src="./images/bool_dims_updated.svg">

In [137]:
trip_mph = taxi[:,7] / (taxi[:,8]/3600)

In [138]:
trip_mph.max()

82800.0

In [139]:
trips_over_20000 = taxi[trip_mph > 20000, 5:9]

In [140]:
trips_over_20000

array([[ 2. ,  2. , 23. ,  1. ],
       [ 2. ,  2. , 19.6,  1. ],
       [ 2. ,  2. , 16.7,  2. ],
       [ 3. ,  3. , 17.8,  2. ],
       [ 2. ,  2. , 17.2,  2. ],
       [ 3. ,  3. , 16.9,  3. ],
       [ 2. ,  2. , 27.1,  4. ]])

<div class="alert alert-block alert-info">
<b>Vaja: </b>Ceate a boolean array, tip_bool, that determines which rows have values for the tip_amount column of more than 50. Use the tip_bool array to select all rows from taxi with values tip amounts of more than 50, and the columns from indexes 5 to 13 inclusive. Assign the resulting array to top_tips. </div>

In [142]:
tip_amount = taxi[:,12]

In [144]:
top_tips = taxi[tip_amount > 50, 5:14]

In [145]:
top_tips[:2]

array([[4.000e+00, 2.000e+00, 2.145e+01, 2.004e+03, 5.200e+01, 8.000e-01,
        0.000e+00, 5.280e+01, 1.056e+02],
       [3.000e+00, 4.000e+00, 9.200e+00, 1.041e+03, 2.700e+01, 1.300e+00,
        5.540e+00, 6.000e+01, 9.384e+01]])

## Assigning Values

### Assigning Values in ndarrays

    ndarray[location_of_values] = new_value

In [154]:
a = np.array(['red','blue','black','blue','purple'])


In [155]:
a[0] = 'orange'

In [156]:
a

array(['orange', 'blue', 'black', 'blue', 'purple'], dtype='<U6')

In [157]:
a[3:] = 'pink'

In [158]:
a

array(['orange', 'blue', 'black', 'pink', 'pink'], dtype='<U6')

In [160]:
ones = np.ones((3,5))

In [161]:
ones

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

In [162]:
ones[1, 2] = 99

In [163]:
ones

array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1., 99.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])

In [164]:
ones[:,2] = 0

In [165]:
ones

array([[1., 1., 0., 1., 1.],
       [1., 1., 0., 1., 1.],
       [1., 1., 0., 1., 1.]])

In [166]:
ones[0] = 42

In [167]:
ones

array([[42., 42., 42., 42., 42.],
       [ 1.,  1.,  0.,  1.,  1.],
       [ 1.,  1.,  0.,  1.,  1.]])

<div class="alert alert-block alert-info">
<b>Vaja: </b>To help you practice without making changes to our original array, we have used the ndarray.copy() method to make taxi_modified, a copy of our original for these exercises.
<li> The value at column index 5 (pickup_location) of row index 28214 is incorrect. Use assignment to change this value to 1 in the taxi_modified ndarray.</li> 
<li> The first column (index 0) contains year values as four digit numbers in the format YYYY (2016, since all trips in our data set are from 2016). Use assignment to change these values to the YY format (16) in the taxi_modified ndarray.</li> 
<li> The values at column index 7 (trip_distance) of rows index 1800 and 1801 are incorrect. Use assignment to change these values in the taxi_modified ndarray to the mean value for that column.</li> </div>

In [168]:
taxi_modified = taxi.copy()

In [169]:
taxi_modified[28214,5] = 1

In [170]:
taxi_modified[:,0] = 16

In [171]:
taxi_modified[1800:1802,7] = taxi_modified[:,7].mean()

### Assignment Using Boolean Arrays

In [172]:
a2 = np.array([1,2,3,4,5])

In [173]:
a2[a2>2] = 99

In [174]:
a2

array([ 1,  2, 99, 99, 99])

- & and
- | or
- ~not

<div class="alert alert-block alert-info">
<b>Vaja: </b>We again used the ndarray.copy() method to make taxi_copy, a copy of our original for this exercise.
<li> Select the fourteenth column (index 13) in taxi_copy. Assign it to a variable named total_amount.</li> 
<li> For rows where the value of total_amount is less than 0, use assignment to change the value to 0.</li> 
 </div>

In [175]:
taxi_copy = taxi.copy()

In [176]:
total_amount = taxi_copy[:,13]

In [177]:
total_amount[total_amount < 0] = 0

<hr>

In [181]:
b = np.linspace(1,9, num = 9, dtype = np.int)
b

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [185]:
b = np.linspace(1,9, num = 9, dtype = np.int)
b = np.reshape(b, (3,3))
c = b.copy()
b

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [186]:
b[b>4] = 99

In [187]:
b

array([[ 1,  2,  3],
       [ 4, 99, 99],
       [99, 99, 99]])

In [188]:
c[c[:,1] > 2, 1] = 99

In [189]:
c

array([[ 1,  2,  3],
       [ 4, 99,  6],
       [ 7, 99,  9]])

array[array[:, column_for_comparison] == value_for_comparison, column_for_assignment] = new_value

<div class="alert alert-block alert-info">
<b>Vaja: </b>We have created a new copy of our taxi dataset, taxi_modified with an additional column containing the value 0 for every row. In our new column at index 15, assign the value 1 if the pickup_location_code (column index 5) corresponds to an airport location, leaving the value as 0 otherwise by performing these three operations:
<li> For rows where the value for the column index 5 is equal to 2 (JFK Airport), assign the value 1 to column index 15.</li> 
<li>For rows where the value for the column index 5 is equal to 3 (LaGuardia Airport), assign the value 1 to column index 15.</li> 
<li> For rows where the value for the column index 5 is equal to 5 (Newark Airport), assign the value 1 to column index 15.</li> </div>

## Adding Rows and Columns to ndarrays

In [190]:
ones = np.ones((2,3))

In [191]:
ones

array([[1., 1., 1.],
       [1., 1., 1.]])

In [192]:
zeros = np.zeros(3)

In [193]:
zeros

array([0., 0., 0.])

In [194]:
combined = np.concatenate([ones, zeros, axis=0])

SyntaxError: invalid syntax (<ipython-input-194-ae230f081f82>, line 1)

In [195]:
ones.shape

(2, 3)

In [196]:
zeros.shape

(3,)

mi rabimo: (1,3)

In [197]:
zeros_2d = np.expand_dims(zeros, axis=0)

In [198]:
zeros_2d

array([[0., 0., 0.]])

In [199]:
zeros_2d.shape

(1, 3)

In [203]:
combined = np.concatenate([ones, zeros_2d], axis=0)

In [204]:
combined

array([[1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.]])

## Computation on NumPy Arrays: Universal Functions


### The Slowness of Loops



In [206]:
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

array([0.16666667, 1.        , 0.25      , 0.25      , 0.125     ])

In [207]:
big_array = np.random.randint(1,100, size=1000000)
%timeit compute_reciprocals(big_array)

3.44 s ± 688 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


### Introducing UFuncs (Universal functions)


In [208]:
print(compute_reciprocals(values))
print(1.0 / values)

[0.16666667 1.         0.25       0.25       0.125     ]
[0.16666667 1.         0.25       0.25       0.125     ]


In [212]:
%timeit (1.0 / big_array)

2.71 ms ± 323 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Subarrays as no-copy views



In [213]:
r = np.ones((4,4))

In [214]:
r

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [215]:
r2 = r[:2, :2]

In [216]:
r2

array([[1., 1.],
       [1., 1.]])

In [217]:
r2[:] = 0 

In [218]:
r2

array([[0., 0.],
       [0., 0.]])

In [219]:
r

array([[0., 0., 1., 1.],
       [0., 0., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

## Copying Data



In [220]:
r = np.ones((4,4))

In [221]:
r3 = r[:2, :2].copy()

In [222]:
r3

array([[1., 1.],
       [1., 1.]])

In [223]:
r3[:] = 2578

In [224]:
r3

array([[2578., 2578.],
       [2578., 2578.]])

In [225]:
r

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

## Primer: Which is the most popular airport?

In [226]:
jfk = taxi[taxi[:,6] == 2]
jfk.shape[0]

11832

In [227]:
laguardia = taxi[taxi[:,6] == 3]
laguardia.shape[0]

16602

In [228]:
newark = taxi[taxi[:,6] == 5]
newark.shape[0]

63

## Primer: Calculating Statistics for Trips on Clean Data

In [229]:
trip_mph = taxi[:,7] / (taxi[:,8] / 3600)

In [230]:
cleaned_taxi = taxi[trip_mph < 100]

In [231]:
#povprečna razdalja
cleaned_taxi[:,7].mean()

12.666396599932893

In [232]:
#povprečna dolžina
cleaned_taxi[:,8].mean() / 60

37.325060955150434

In [233]:
#mean total amount
cleaned_taxi[:,13].mean()

48.98131853260262