## Introduction to Ndarrays

In [1]:
import numpy as np

In [2]:
data_ndarray = np.array([10, 20, 30])

## Understanding Vectorization

The concept of replacing for loops with operations applied to multiple data points at once is called `vectorization` and ndarrays make `vectorization` possible.

![jupyter](./unvectorized.gif)

![Jupyter](./vectorized.gif)

## NYC Taxi-Airport Data

In [4]:
import csv
import numpy as np

# import nyc_taxi.csv as a list of lists
f = open("nyc_taxis.csv", "r")
taxi_list = list(csv.reader(f))

# remove the header row
taxi_list = taxi_list[1:]

# convert all values to floats
converted_taxi_list = []
for row in taxi_list:
    converted_row = []
    for item in row:
        converted_row.append(float(item))
    converted_taxi_list.append(converted_row)

# start writing your code below this comment
taxi = np.array(converted_taxi_list) 

## Array Shapes

In [5]:
taxi_shape = taxi.shape
print(taxi_shape)

(89560, 15)

## Selecting and Slicing Rows and Items from ndarrays

![Jupyter](./selection_rows.svg)

![Jupyter](./selection_item.svg)

In [9]:
row_0 = taxi[0]
rows_391_to_500 = taxi[391:501, :]
row_21_column_5 = taxi[21,5]

## Selecting Columns and Custom Slicing ndarrays

![Jupyter](./selection_columns_updated.svg)

![Jupyter](./selection_1darray_updated.svg)

![Jupyter](./selection_2darray_updated.svg)

In [17]:
columns_1_4_7 = taxi[:,[1,4,7]]
row_99_columns_5_to_8 = taxi[99, 5:9]
rows_100_to_200_column_14 = taxi[100:201, 14]

## Vector Math

![Jupyter](./vectorized_addition.svg)

The result of adding two 1D ndarrays is a 1D ndarray of the same shape (or dimensions) as the original. In this context, ndarrays can also be called `vectors`, a term taken from a branch of mathematics called linear algebra. What we just did, adding two vectors together, is called `vector addition`.

In [21]:
fare_amount = taxi[:,9]
fees_amount = taxi[:,10]

fare_and_fees = fare_amount + fees_amount
fare_and_fees

array([52.8, 46.3, 37.8, ..., 52.8, 35.8, 49.3])

## Vector Math Continued

We can actually use any of the standard Python numeric operators with vectors, including:

* vector_a + vector_b - Addition
* vector_a - vector_b - Subtraction
* vector_a * vector_b - Multiplication (this is **unrelated to the vector multiplication used in linear algebra**).
* vector_a / vector_b - Division

In [22]:
trip_distance_miles = taxi[:,7]
trip_length_seconds = taxi[:,8]

trip_length_hours = trip_length_seconds / 3600 # 3600 seconds is one hour

In [23]:
trip_mph = trip_distance_miles/trip_length_hours

## Calculating Statistics For 1D ndarrays

Numpy ndarrays have methods for many different calculations. A few key methods are:

* ndarray.min() to calculate the minimum value
* ndarray.max() to calculate the maximum value
* ndarray.mean() to calculate the mean or average value
* ndarray.sum() to calculate the sum of the values

You can see the full list of ndarray methods in the [NumPy ndarray documentation.](https://docs.scipy.org/doc/numpy-1.14.0/reference/arrays.ndarray.html#calculation)

In [25]:
mph_min = trip_mph.min()

In [28]:
mph_max = trip_mph.max()
mph_max

82800.0

In [29]:
mph_mean = trip_mph.mean()
print(mph_mean)

32.24258580925573


## Calculating Statistics For 1D ndarrays Continued

| Calculation | Function Representation | Method Representation|
| :-----| :---- | :---- |
| Calculate the minimum value of `trip_mph`    | `np.min(trip_mph)` | `trip_mph.min()` |
| Calculate the maximum value of `trip_mph`     | `np.max(trip_mph)` | `trip_mph.max()` |
| Calculate the mean average value of `trip_mph`  | `np.max(trip_mph)` | `trip_mph.max()` |
| Calculate the median average value of `trip_mph`  | `np.median(trip_mph)` | There is no ndarray median method |

 anything that starts with np (e.g. np.mean()) is a function and anything expressed with an object (or variable) name first (e.g. trip_mph.mean()) is a method.

## Calculating Statistics For 2D ndarrays

In [30]:
# we'll compare against the first 5 rows only
taxi_first_five = taxi[:5]
# select these columns: fare_amount, fees_amount, tolls_amount, tip_amount
fare_components = taxi_first_five[:,9:13]

In [40]:
fare_components.shape

(5, 4)

In [47]:
fare_sums = np.sum(fare_components, axis=1)
fare_sums

array([69.99, 54.3 , 37.8 , 32.76, 18.8 ])

In [50]:
fare_totals = taxi_first_five[:,13]

In [51]:
print(fare_totals)
print(fare_sums)

[69.99 54.3  37.8  32.76 18.8 ]
[69.99 54.3  37.8  32.76 18.8 ]


## Summarization

In this mission we learned:

* How vectorization makes our code faster.
* About n-dimensional arrays, and NumPy's ndarrays.
* How to select specific items, rows, columns, 1D slices, and 2D slices from ndarrays.
* How to apply simple calculations to entire ndarrays.
* How to use vectorized methods to perform calculations across either axis of ndarrays.