# Uvod v NumPy


## Understanding Data Types in Python



```C
/* C code */
int result = 0;
for(int i=0; i<100; i++){
    result += i;
}
```


```python
# Python code
result = 0
for i in range(100):
    result += i
```


In [1]:
x = 4

In [2]:
x = "primer"

```c
int x = 4
x = "four" //to ne dela
```

### A Python Integer Is More Than Just an Integer



```C
struct _longobject {
    long ob_refcnt;
    PyTypeObject *ob_type;
    size_t ob_size;
    long ob_digit[1];
};
```



<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/cint_vs_pyint.png" alt="Integer Memory Layout">

### A Python List Is More Than Just a List



In [3]:
L = list(range(10))

In [4]:
L

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [5]:
type(L[0])

int

In [6]:
L2 = [str(c) for c in L]

In [8]:
L2

['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']

In [9]:
type(L2[0])

str

In [10]:
L2[4] = True

In [11]:
L2

['0', '1', '2', '3', True, '5', '6', '7', '8', '9']


<img src="https://jakevdp.github.io/PythonDataScienceHandbook/figures/array_vs_list.png" alt="Array Memory Layout">

### Fixed-Type Arrays in Python



In [13]:
import array

L = list(range(10))
A = array.array('i', L)

In [14]:
A

array('i', [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

## How Vectorization Makes Code Faster


<p><img alt="Translating Python code to bytecode" src="https://s3.amazonaws.com/dq-content/289/bytecode.svg"></p>


<table>
<thead>
<tr>
<th>Language Type</th>
<th>Example</th>
<th>Time taken to write program</th>
<th>Control over program performance</th>
</tr>
</thead>
<tbody>
<tr>
<td>High-Level</td>
<td>Python</td>
<td>Low</td>
<td>Low</td>
</tr>
<tr>
<td>Low-Level</td>
<td>C</td>
<td>High</td>
<td>High</td>
</tr>
</tbody>
</table>



<p><img alt="For loop to sum rows" src="https://s3.amazonaws.com/dq-content/289/for_loop.svg"></p>

In [15]:
my_numbers = [[6,5], [1,3], [5,6]]

sums = []

for row in my_numbers:
    row_sum = row[0] + row[1]
    sums.append(row_sum)
    
print(sums)    

[11, 4, 11]


<p><img src="./images/numpy_for.gif"></p>

<p><img src="./images/numpy_vectorized.gif"></p>

## Numpy library

[Dokumentacija](http://www.numpy.org/)

In [17]:
import numpy as np

In [18]:
np.__version__

'1.21.2'

## Introduction to Ndarrays

<img alt="Dimensional Arrays" src="./images/one_dim.svg">

In [19]:
data_ndarray = np.array([5, 10, 15, 20])

In [20]:
type(data_ndarray)

numpy.ndarray

In [23]:
data_ndarray

array([ 5, 10, 15, 20])

In [22]:
print(data_ndarray)

[ 5 10 15 20]


<img alt="Dimensional Arrays" src="./images/Two_Dim.svg">

In [25]:
arr2 = np.array([[1,2,3,4], [5,6,7,8]])

In [26]:
arr2

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [27]:
print(arr2)

[[1 2 3 4]
 [5 6 7 8]]


## Priprava podatkov za delo

<div>

<ul>
<li><code>pickup_year</code>: The year of the trip.</li>
<li><code>pickup_month</code>: The month of the trip (January is <code>1</code>, December is <code>12</code>).</li>
<li><code>pickup_day</code>: The day of the month of the trip.</li>
<li><code>pickup_location_code</code>: The airport or <a target="_blank" href="https://en.wikipedia.org/wiki/Boroughs_of_New_York_City">borough</a> where the trip started.</li>
<li><code>dropoff_location_code</code>: The airport or borough where the trip finished.</li>
<li><code>trip_distance</code>: The distance of the trip in miles.</li>
<li><code>trip_length</code>: The length of the trip in seconds.</li>
<li><code>fare_amount</code>: The base fare of the trip, in dollars.</li>
<li><code>total_amount</code>: The total amount charged to the passenger, including all fees, tolls and tips.</li>
</ul>

</div>

    pickup_year,pickup_month,pickup_day,pickup_dayofweek
    2016,1,1,5
    2016,1,1,5
    2016,1,1,5
    2016,1,1,5

    # our list of lists is stored as data_list
    data_ndarray = np.array(data_list)

In [28]:
import csv
import numpy as np

with open("data/nyc_taxis.csv", "r") as f:
    taxi_list = list(csv.reader(f))

In [30]:
# odstranimo header
taxi_list = taxi_list[1:]

In [34]:
# pretvorimo vse podatke v float tip
converted_taxi_list = []
for row in taxi_list:
    converted_row = []
    for item in row:
        converted_row.append(float(item))
    converted_taxi_list.append(converted_row)

In [36]:
print(converted_taxi_list[:3])

[[2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 4.0, 21.0, 2037.0, 52.0, 0.8, 5.54, 11.65, 69.99, 1.0], [2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 1.0, 16.29, 1520.0, 45.0, 1.3, 0.0, 8.0, 54.3, 1.0], [2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 6.0, 12.7, 1462.0, 36.5, 1.3, 0.0, 0.0, 37.8, 2.0]]


In [40]:
taxi = np.array(converted_taxi_list)

In [41]:
taxi

array([[2.016e+03, 1.000e+00, 1.000e+00, ..., 1.165e+01, 6.999e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 1.000e+00, ..., 8.000e+00, 5.430e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 1.000e+00, ..., 0.000e+00, 3.780e+01,
        2.000e+00],
       ...,
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 5.000e+00, 6.334e+01,
        1.000e+00],
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 8.950e+00, 4.475e+01,
        1.000e+00],
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 0.000e+00, 5.484e+01,
        2.000e+00]])

In [42]:
type(taxi)

numpy.ndarray

## Array Shapes

In [44]:
taxi.shape
# prva vrednost je število vrstic
# druga vrednost je število stolpcev

(89560, 15)

In [45]:
# število elementov v matriki
taxi.size

1343400

In [46]:
# dimenzija matrike
taxi.ndim

2

In [47]:
# velikost v bajtih za posamezno vrednost
taxi.itemsize

8

In [48]:
taxi.size * taxi.itemsize

10747200

In [49]:
# veikost ndarreja v bajtih
taxi.nbytes

10747200

In [50]:
# v MB
taxi.nbytes / 1024 / 1024

10.24932861328125

In [52]:
# za primerjavo - converted_taxi_list
print(type(converted_taxi_list))

<class 'list'>


In [53]:
import sys
sys.getsizeof(converted_taxi_list)

732808

In [56]:
import sys
import gc

def actualsize(input_obj):
    memory_size = 0
    ids = set()
    objects = [input_obj]
    while objects:
        new = []
        for obj in objects:
            if id(obj) not in ids:
                ids.add(id(obj))
                memory_size += sys.getsizeof(obj)
                new.append(obj)
        objects = gc.get_referents(*new)
    return memory_size

In [59]:
actualsize(taxi_list)

58641031

In [60]:
49453448 / 1024 /1024

47.16248321533203

## Selecting and Slicing Rows and Items from ndarrays

<img alt="Dimensional Arrays" src="./images/selection_rows.svg">

    ndarray[row_index,column_index]

    # or if you want to select all
    # columns for a given set of rows
    ndarray[row_index]

<img alt="Dimensional Arrays" src="./images/selection_item.svg">

In [61]:
test = np.random.randint(0,10,(5,5))

In [62]:
test

array([[8, 4, 8, 0, 7],
       [5, 8, 0, 6, 1],
       [1, 4, 3, 0, 0],
       [2, 4, 2, 4, 2],
       [9, 0, 0, 5, 3]])

In [64]:
first_row = test[0] # prva vrstica
first_row

array([8, 4, 8, 0, 7])

In [65]:
# zadnja vrstica
test[-1]

array([9, 0, 0, 5, 3])

In [67]:
# druga in tretja vrstica
test[[1,2]]

array([[5, 8, 0, 6, 1],
       [1, 4, 3, 0, 0]])

In [71]:
# druga in tretja vrstica
test[1:3]

array([[5, 8, 0, 6, 1],
       [1, 4, 3, 0, 0]])

In [75]:
# vse od tretje vrstice naprej
test[2:]

array([[1, 4, 3, 0, 0],
       [2, 4, 2, 4, 2],
       [9, 0, 0, 5, 3]])

In [76]:
test[2,1]

4

<div class="alert alert-block alert-info">
<b>Vaja:</b> From the taxi ndarray:
- Select the row at index 0. Assign it to row_0.
- Select every column for the rows at indexes 391 to 500 inclusive. Assign them to rows_391_to_500.
- Select the item at row index 21 and column index 5. Assign it to row_21_column_5.</div>

In [77]:
row_0 = taxi[0]

In [79]:
rows_391_to_500 = taxi[391:501]

In [78]:
row_21_column_5 = taxi[21,5]

## Selecting Columns and Custom Slicing ndarrays

<img alt="Dimensional Arrays" src="./images/selection_columns_updated.svg">

<img alt="Dimensional Arrays" src="./images/selection_1darray_updated.svg">

<img alt="Dimensional Arrays" src="./images/selection_2darray_updated.svg">

In [81]:
columns_test = np.random.random((5,5))
columns_test

array([[0.22552788, 0.40459469, 0.73101106, 0.60064906, 0.67276544],
       [0.17015996, 0.75034464, 0.36457446, 0.36417411, 0.2251699 ],
       [0.52924206, 0.41571065, 0.12254248, 0.92304617, 0.03355196],
       [0.51817065, 0.95136416, 0.21932917, 0.85732475, 0.40774051],
       [0.32779333, 0.13522074, 0.26968764, 0.17248259, 0.74067299]])

In [88]:
columns_test[2:, 2:]

array([[0.12254248, 0.92304617, 0.03355196],
       [0.21932917, 0.85732475, 0.40774051],
       [0.26968764, 0.17248259, 0.74067299]])

In [87]:
columns_test[2, 1:4]

array([0.41571065, 0.12254248, 0.92304617])

In [82]:
# izbira četrtega stolpca
columns_test[:, 3]

array([0.60064906, 0.36417411, 0.92304617, 0.85732475, 0.17248259])

In [84]:
# 3 in 4 stolpec
columns_test[:, 2:4]

array([[0.73101106, 0.60064906],
       [0.36457446, 0.36417411],
       [0.12254248, 0.92304617],
       [0.21932917, 0.85732475],
       [0.26968764, 0.17248259]])

In [85]:
# 3 in 4 stolpec
columns_test[:, [2,3]]

array([[0.73101106, 0.60064906],
       [0.36457446, 0.36417411],
       [0.12254248, 0.92304617],
       [0.21932917, 0.85732475],
       [0.26968764, 0.17248259]])

<div class="alert alert-block alert-info">
<b>Vaja:</b> From the taxi ndarray:
- Select every row for the columns at indexes 1, 4, and 7. Assign them to columns_1_4_7.
- Select the columns at indexes 5 to 8 inclusive for the row at index 99. Assign them to row_99_columns_5_to_8.
- Select the rows at indexes 100 to 200 inclusive for the column at index 14. Assign them to rows_100_to_200_column_14.</div>

## Vector Math

In [89]:
my_numbers = [[6,5], [9,1], [2,4], [7, 14], [8,6]]

In [90]:
sums = []
for row in my_numbers:
    row_sums = row[0] + row[1]
    sums.append(row_sums)
    
print(sums)

[11, 10, 6, 21, 14]


In [91]:
my_numbers = np.array(my_numbers)

In [92]:
my_numbers

array([[ 6,  5],
       [ 9,  1],
       [ 2,  4],
       [ 7, 14],
       [ 8,  6]])

In [93]:
col1 = my_numbers[:, 0]
col2 = my_numbers[:, 1]

In [94]:
col1 + col2

array([11, 10,  6, 21, 14])

In [95]:
my_numbers[:, 0] + my_numbers[:, 1]

array([11, 10,  6, 21, 14])

<div class="alert alert-block alert-info">
<b>Vaja:</b> 
Use vector addition to add fare_amount and fees_amount. Assign the result to fare_and_fees.
After you have run your code, use the variable inspector below the code box to inspect the variables.</div>

In [96]:
# sesštet stolpec 10 in 11
total_amount = taxi[:, 11] + taxi[:, 12]

In [97]:
trip_distance = taxi[:, 7]
trip_length_seconds = taxi[:, 8]
trip_length_hours = trip_length_seconds / 3600
trip_mph = trip_distance / trip_length_hours

In [98]:
trip_mph[:10]

array([37.11340206, 38.58157895, 31.27222982, 25.88429752, 26.3715415 ,
       38.53293413, 32.81553398, 35.95075239, 51.00702576, 33.20207254])



<div class="inner_cell">
<div class="text_cell_render border-box-sizing rendered_html">
<p>The following table lists the arithmetic operators implemented in NumPy:</p>
<table>
<thead><tr>
<th>Operator</th>
<th>Equivalent ufunc</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>+</code></td>
<td><code>np.add</code></td>
<td>Addition (e.g., <code>1 + 1 = 2</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.subtract</code></td>
<td>Subtraction (e.g., <code>3 - 2 = 1</code>)</td>
</tr>
<tr>
<td><code>-</code></td>
<td><code>np.negative</code></td>
<td>Unary negation (e.g., <code>-2</code>)</td>
</tr>
<tr>
<td><code>*</code></td>
<td><code>np.multiply</code></td>
<td>Multiplication (e.g., <code>2 * 3 = 6</code>)</td>
</tr>
<tr>
<td><code>/</code></td>
<td><code>np.divide</code></td>
<td>Division (e.g., <code>3 / 2 = 1.5</code>)</td>
</tr>
<tr>
<td><code>//</code></td>
<td><code>np.floor_divide</code></td>
<td>Floor division (e.g., <code>3 // 2 = 1</code>)</td>
</tr>
<tr>
<td><code>**</code></td>
<td><code>np.power</code></td>
<td>Exponentiation (e.g., <code>2 ** 3 = 8</code>)</td>
</tr>
<tr>
<td><code>%</code></td>
<td><code>np.mod</code></td>
<td>Modulus/remainder (e.g., <code>9 % 4 = 1</code>)</td>
</tr>
</tbody>
</table>

</div>
</div>


In [99]:
np.divide(trip_distance, trip_length_hours)

array([37.11340206, 38.58157895, 31.27222982, ..., 22.29907867,
       42.41551247, 36.90473407])

## Calculating Statistics For 1D ndarrays

In [100]:
# kot metoda
trip_mph.min()

# kot funkcija
np.min(trip_mph)

0.0

In [101]:
trip_mph.max()

82800.0

In [102]:
trip_mph.mean()

32.24258580925573


<p></p><center><img alt="Method syntax" src="https://s3.amazonaws.com/dq-content/289/Method_syntax.svg"></center><p></p>


<div class="alert alert-block alert-info">
<b>Vaja:</b> Use the ndarray.max() method to calculate the maximum value of trip_mph. Assign the result to mph_max.
Use the ndarray.mean() method to calculate the average value of trip_mph. Assign the result to mph_mean.</div>

<div>

<table>
<thead>
<tr>
<th>Calculation</th>
<th>Function Representation</th>
<th>Method Representation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Calculate the minimum value of <code>trip_mph</code></td>
<td><code>np.min(trip_mph)</code></td>
<td><code>trip_mph.min()</code></td>
</tr>
<tr>
<td>Calculate the maximum value of <code>trip_mph</code></td>
<td><code>np.max(trip_mph)</code></td>
<td><code>trip_mph.max()</code></td>
</tr>
<tr>
<td>Calculate the <a target="_blank" href="https://en.wikipedia.org/wiki/Mean">mean average</a> value of <code>trip_mph</code></td>
<td><code>np.mean(trip_mph)</code></td>
<td><code>trip_mph.mean()</code></td>
</tr>
<tr>
<td>Calculate the <a target="_blank" href="https://en.wikipedia.org/wiki/Median">median average</a> value of <code>trip_mph</code></td>
<td><code>np.median(trip_mph)</code></td>
<td>There is no ndarray median method</td>
</tr>
</tbody>
</table>
</div>

## Calculating Statistics For 2D ndarrays

<img alt="Dimensional Arrays" src="./images/array_method_axis_none.svg">

<img alt="Dimensional Arrays" src="./images/array_method_axis_1.svg">

<img alt="Dimensional Arrays" src="./images/array_method_axis_0.svg">



<p><img alt="The axis parameter" src="https://s3.amazonaws.com/dq-content/289/axis_param.svg"></p>


In [104]:
taxi_5 = taxi[:5]

In [106]:
taxi_izbira = taxi_5[:, 9:13]

In [107]:
taxi_izbira

array([[52.  ,  0.8 ,  5.54, 11.65],
       [45.  ,  1.3 ,  0.  ,  8.  ],
       [36.5 ,  1.3 ,  0.  ,  0.  ],
       [26.  ,  1.3 ,  0.  ,  5.46],
       [17.5 ,  1.3 ,  0.  ,  0.  ]])

In [108]:
taxi_izbira.sum()

213.65

In [109]:
taxi_izbira.sum(axis=0)

array([177.  ,   6.  ,   5.54,  25.11])

In [110]:
taxi_izbira.sum(axis=1)

array([69.99, 54.3 , 37.8 , 32.76, 18.8 ])

## Reading CSV files with NumPy

<p>Below is information about selected columns from the data set:</p>
<ul>
<li><code>pickup_year</code>: The year of the trip.</li>
<li><code>pickup_month</code>: The month of the trip (January is <code>1</code>, December is <code>12</code>).</li>
<li><code>pickup_day</code>: The day of the month of the trip.</li>
<li><code>pickup_location_code</code>: The airport or <a target="_blank" href="https://en.wikipedia.org/wiki/Boroughs_of_New_York_City">borough</a> where the the trip started.</li>
<li><code>dropoff_location_code</code>: The airport or borough where the the trip finished.</li>
<li><code>trip_distance</code>: The distance of the trip in miles.</li>
<li><code>trip_length</code>: The length of the trip in seconds.</li>
<li><code>fare_amount</code>: The base fare of the trip, in dollars.</li>
<li><code>total_amount</code>: The total amount charged to the passenger, including all fees, tolls and tips.</li>
</ul>


In [113]:
taxi = np.genfromtxt("data/nyc_taxis.csv", delimiter=",", skip_header=1)

In [114]:
taxi

array([[2.016e+03, 1.000e+00, 1.000e+00, ..., 1.165e+01, 6.999e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 1.000e+00, ..., 8.000e+00, 5.430e+01,
        1.000e+00],
       [2.016e+03, 1.000e+00, 1.000e+00, ..., 0.000e+00, 3.780e+01,
        2.000e+00],
       ...,
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 5.000e+00, 6.334e+01,
        1.000e+00],
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 8.950e+00, 4.475e+01,
        1.000e+00],
       [2.016e+03, 6.000e+00, 3.000e+01, ..., 0.000e+00, 5.484e+01,
        2.000e+00]])

In [115]:
taxi.shape

(89560, 15)

## Datatypes

In [118]:
x = np.array([1,2])
print(x.dtype)
print(x.nbytes)

int64
16


In [119]:
x = np.array([1.0,2.0])
print(x.dtype)
print(x.nbytes)

float64
16


In [120]:
x = np.array([1,2], dtype=np.int32)
print(x.dtype)
print(x.nbytes)

int32
8


In [121]:
x = np.array([1,2], dtype=np.int8)
print(x.dtype)
print(x.nbytes)

int8
2


In [122]:
x = np.array([189,22, -129], dtype=np.int8)
print(x)
print(x.dtype)
print(x.nbytes)

[-67  22 127]
int8
3


<div class="text_cell_render border-box-sizing rendered_html">
<table>
<thead><tr>
<th>Data type</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td><code>bool_</code></td>
<td>Boolean (True or False) stored as a byte</td>
</tr>
<tr>
<td><code>int_</code></td>
<td>Default integer type (same as C <code>long</code>; normally either <code>int64</code> or <code>int32</code>)</td>
</tr>
<tr>
<td><code>intc</code></td>
<td>Identical to C <code>int</code> (normally <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>intp</code></td>
<td>Integer used for indexing (same as C <code>ssize_t</code>; normally either <code>int32</code> or <code>int64</code>)</td>
</tr>
<tr>
<td><code>int8</code></td>
<td>Byte (-128 to 127)</td>
</tr>
<tr>
<td><code>int16</code></td>
<td>Integer (-32768 to 32767)</td>
</tr>
<tr>
<td><code>int32</code></td>
<td>Integer (-2147483648 to 2147483647)</td>
</tr>
<tr>
<td><code>int64</code></td>
<td>Integer (-9223372036854775808 to 9223372036854775807)</td>
</tr>
<tr>
<td><code>uint8</code></td>
<td>Unsigned integer (0 to 255)</td>
</tr>
<tr>
<td><code>uint16</code></td>
<td>Unsigned integer (0 to 65535)</td>
</tr>
<tr>
<td><code>uint32</code></td>
<td>Unsigned integer (0 to 4294967295)</td>
</tr>
<tr>
<td><code>uint64</code></td>
<td>Unsigned integer (0 to 18446744073709551615)</td>
</tr>
<tr>
<td><code>float_</code></td>
<td>Shorthand for <code>float64</code>.</td>
</tr>
<tr>
<td><code>float16</code></td>
<td>Half precision float: sign bit, 5 bits exponent, 10 bits mantissa</td>
</tr>
<tr>
<td><code>float32</code></td>
<td>Single precision float: sign bit, 8 bits exponent, 23 bits mantissa</td>
</tr>
<tr>
<td><code>float64</code></td>
<td>Double precision float: sign bit, 11 bits exponent, 52 bits mantissa</td>
</tr>
<tr>
<td><code>complex_</code></td>
<td>Shorthand for <code>complex128</code>.</td>
</tr>
<tr>
<td><code>complex64</code></td>
<td>Complex number, represented by two 32-bit floats</td>
</tr>
<tr>
<td><code>complex128</code></td>
<td>Complex number, represented by two 64-bit floats</td>
</tr>
</tbody>
</table>

</div>

## Boolean Indexing

### Boolean Arrays

<div class="alert alert-block alert-info">
Use vectorized boolean operations to:
<li> Evaluate whether the elements in array a are less than 3. Assign the result to a_bool.</li> 
<li> Evaluate whether the elements in array b are equal to "blue". Assign the result to b_bool.</li> 
<li>  Evaluate whether the elements in array c are greater than 100. Assign the result to c_bool.</li> </div>

### Boolean Indexing with 1D ndarrays

<div class="alert alert-block alert-info">
Calculate the number of rides in the taxi ndarray that are from February:
<li> Create a boolean array, february_bool, that evaluates whether the items in pickup_month are equal to 2.</li> 
<li> Use the february_bool boolean array to index pickup_month. Assign the result to february.</li> 
<li> Use the ndarray.shape attribute to find the number of items in february. Assign the result to february_rides.</li> </div>

### Boolean Indexing with 2D ndarrays

<img alt="Dimensional Arrays" src="./images/bool_dims_updated.svg">

<div class="alert alert-block alert-info">
<b>Vaja: </b>Ceate a boolean array, tip_bool, that determines which rows have values for the tip_amount column of more than 50. Use the tip_bool array to select all rows from taxi with values tip amounts of more than 50, and the columns from indexes 5 to 13 inclusive. Assign the resulting array to top_tips. </div>

## Assigning Values

### Assigning Values in ndarrays

In [None]:
a = np.array(['red','blue','black','blue','purple'])


<div class="alert alert-block alert-info">
<b>Vaja: </b>To help you practice without making changes to our original array, we have used the ndarray.copy() method to make taxi_modified, a copy of our original for these exercises.
<li> The value at column index 5 (pickup_location) of row index 28214 is incorrect. Use assignment to change this value to 1 in the taxi_modified ndarray.</li> 
<li> The first column (index 0) contains year values as four digit numbers in the format YYYY (2016, since all trips in our data set are from 2016). Use assignment to change these values to the YY format (16) in the taxi_modified ndarray.</li> 
<li> The values at column index 7 (trip_distance) of rows index 1800 and 1801 are incorrect. Use assignment to change these values in the taxi_modified ndarray to the mean value for that column.</li> </div>

### Assignment Using Boolean Arrays

<div class="alert alert-block alert-info">
<b>Vaja: </b>We again used the ndarray.copy() method to make taxi_copy, a copy of our original for this exercise.
<li> Select the fourteenth column (index 13) in taxi_copy. Assign it to a variable named total_amount.</li> 
<li> For rows where the value of total_amount is less than 0, use assignment to change the value to 0.</li> 
 </div>

<div class="alert alert-block alert-info">
<b>Vaja: </b>We have created a new copy of our taxi dataset, taxi_modified with an additional column containing the value 0 for every row. In our new column at index 15, assign the value 1 if the pickup_location_code (column index 5) corresponds to an airport location, leaving the value as 0 otherwise by performing these three operations:
<li> For rows where the value for the column index 5 is equal to 2 (JFK Airport), assign the value 1 to column index 15.</li> 
<li>For rows where the value for the column index 5 is equal to 3 (LaGuardia Airport), assign the value 1 to column index 15.</li> 
<li> For rows where the value for the column index 5 is equal to 5 (Newark Airport), assign the value 1 to column index 15.</li> </div>

## Adding Rows and Columns to ndarrays

## Computation on NumPy Arrays: Universal Functions


### The Slowness of Loops



In [None]:
np.random.seed(0)

def compute_reciprocals(values):
    output = np.empty(len(values))
    for i in range(len(values)):
        output[i] = 1.0 / values[i]
    return output
        
values = np.random.randint(1, 10, size=5)
compute_reciprocals(values)

### Introducing UFuncs (Universal functions)


## Subarrays as no-copy views



## Copying Data



## Primer: Which is the most popular airport?

## Primer: Calculating Statistics for Trips on Clean Data