# Import csv as a list of lists and convert it to numpy ndarray
Use Python's csv module to import the nyc_taxis.csv file and convert it to a list of lists containing float values. <br>Then convert the `converted_taxi_list` variable to a NumPy ndarray and assign the result to the variable `taxi`.

In [1]:
import csv
import numpy as np

# import nyc_taxi.csv as a list of lists
f = open("datasets/nyc_taxis.csv", "r")
taxi_list = list(csv.reader(f))

# remove the header row
taxi_list = taxi_list[1:]

# convert all values to floats
converted_taxi_list = []
for row in taxi_list:
    converted_row = []
    for item in row:
        converted_row.append(float(item))
    converted_taxi_list.append(converted_row)

# start writing your code below this comment
taxi = np.array(converted_taxi_list)

print(converted_taxi_list[0:4])
print(taxi[:4,:])

[[2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 4.0, 21.0, 2037.0, 52.0, 0.8, 5.54, 11.65, 69.99, 1.0], [2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 1.0, 16.29, 1520.0, 45.0, 1.3, 0.0, 8.0, 54.3, 1.0], [2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 6.0, 12.7, 1462.0, 36.5, 1.3, 0.0, 0.0, 37.8, 2.0], [2016.0, 1.0, 1.0, 5.0, 0.0, 2.0, 6.0, 8.7, 1210.0, 26.0, 1.3, 0.0, 5.46, 32.76, 1.0]]
[[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 4.000e+00
  2.100e+01 2.037e+03 5.200e+01 8.000e-01 5.540e+00 1.165e+01 6.999e+01
  1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 1.000e+00
  1.629e+01 1.520e+03 4.500e+01 1.300e+00 0.000e+00 8.000e+00 5.430e+01
  1.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
  1.270e+01 1.462e+03 3.650e+01 1.300e+00 0.000e+00 0.000e+00 3.780e+01
  2.000e+00]
 [2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 6.000e+00
  8.700e+00 1.210e+03 2.600e+01 1.300e+00 0.000e+00 5.460e+00 3.276e+01
  1.000e+00]]


# Numpy  

NumPy’s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.

### Useful documentation
A [Link](https://docs.scipy.org/doc/numpy-1.13.0/reference/arrays.ndarray.html) with ndarray attributes and methods.  
[Numpy Quickstart tutorial](https://docs.scipy.org/doc/numpy/user/quickstart.html#).  
[NumPy manual contents](https://docs.scipy.org/doc/numpy/contents.html).  
A [Link](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.math.html#arithmetic-operations) with numpy mathematical functions over arrays.  
[NumPy Reference](https://docs.scipy.org/doc/numpy-1.14.0/reference/index.html) at SciPy.org  
SciPy.org [search engine](https://docs.scipy.org/doc/numpy-1.13.0/search.html)  
[General index](https://docs.scipy.org/doc/numpy-1.13.0/genindex.html)  

## Array creation  

There are several ways to create arrays. For example, you can create an array from a regular Python list or tuple using the array function. The type of the resulting array is deduced from the type of the elements in the sequences.

**numpy.array(**object, dtype=None, copy=True, order='K', subok=False, ndmin=0**)**:  
Creates an array. Object is any object exposing the array interface and optional dtype is he desired data-type for the array.

In [2]:
import numpy as np
a = np.array([2,3,4])
a

array([2, 3, 4])

In [3]:
b = np.array([[2,3,4],[1,2,3]])
b

array([[2, 3, 4],
       [1, 2, 3]])

An array can have more than two dimensions:

In [4]:
c = np.array([[[2,3,4],[1,2,3]],[[0,1,-1],[1,0,0]]])
c

array([[[ 2,  3,  4],
        [ 1,  2,  3]],

       [[ 0,  1, -1],
        [ 1,  0,  0]]])

**numpy.zeros()**: Return a new array of given shape and type, filled with zeros.

In [5]:
c = np.zeros((2,2,3))
c

array([[[0., 0., 0.],
        [0., 0., 0.]],

       [[0., 0., 0.],
        [0., 0., 0.]]])

**numpy.ones()**: Return a new array of given shape and type, filled with ones.

**numpy.arange()**: Return evenly spaced values within a given interval (analogous to range but returns arrays instead of lists).

In [6]:
np.arange(4)

array([0, 1, 2, 3])

In [7]:
np.arange(2,6)

array([2, 3, 4, 5])

In [8]:
np.arange(0,8,2)

array([0, 2, 4, 6])

In [9]:
np.arange(12).reshape(4,3) 

array([[ 0,  1,  2],
       [ 3,  4,  5],
       [ 6,  7,  8],
       [ 9, 10, 11]])

## Basic array attributes

**ndarray.ndim**: the number of axes (dimensions) of the array.

In [10]:
a = np.array([2,3,4])
a.ndim

1

In [11]:
b = np.array([[2,3,4],[1,2,3]])
b.ndim

2

In [12]:
c = np.array([[[2,3,4],[1,2,3]],[[0,1,-1],[1,0,0]]])
c.ndim

3

**ndarray.shape**: the dimensions of the array.

In [13]:
a = np.array([2,3,4])
a.shape

(3,)

In [14]:
b = np.array([[2,3,4],[1,2,3]])
b.shape

(2, 3)

In [15]:
c = np.array([[[2,3,4],[1,2,3]],[[0,1,-1],[1,0,0]]])
c.shape

(2, 2, 3)

**ndarray.size**: the total number of elements of the array. This is equal to the product of the elements of shape.

In [16]:
b = np.array([[2,3,4],[1,2,3]])
b.size

6

**ndarray.dtype**: an object describing the type of the elements in the array. One can create or specify dtype’s using standard Python types. Additionally NumPy provides types of its own. numpy.int32, numpy.int16, and numpy.float64 are some examples.

In [17]:
a.dtype

dtype('int32')

In [18]:
a = a.astype('int8')
a.dtype

dtype('int8')

## Numpy array indexing and selection  

* An integer, indicating a specific location, eg `ndarray[2]` (ndim 1) or `ndarray[3,0]` (ndim 2).
* A slice, indicating a range of locations, eg `ndarray[0:3]` (ndim 1) or `ndarray[0:5,6:]` (ndim 2).
* A colon, indicating every location, eg `ndarray[:,2]`.
* A list of values, indicating specific locations, eg `ndarray[[0,1,3,4],0]`.
* A boolean array, indicating specific locations - we'll look at this method in detail in the second mission of this course.
* Or any combination of the above.

**One-dimensional** arrays can be indexed, sliced and iterated over, much like lists and other Python sequences.  

In [19]:
a = np.arange(10)**2
a

array([ 0,  1,  4,  9, 16, 25, 36, 49, 64, 81], dtype=int32)

In [20]:
a[0:5]    #first half of the array

array([ 0,  1,  4,  9, 16], dtype=int32)

In [21]:
a[0:10:2]   #even indices

array([ 0,  4, 16, 36, 64], dtype=int32)

In [22]:
a[::2]   #same as before

array([ 0,  4, 16, 36, 64], dtype=int32)

In [23]:
a[::-1]  #reversed a

array([81, 64, 49, 36, 25, 16,  9,  4,  1,  0], dtype=int32)

Indexing also works for assignments.

In [24]:
a[::2] = 0  #even indices elements replaced by 0.
a

array([ 0,  1,  0,  9,  0, 25,  0, 49,  0, 81], dtype=int32)

**Multidimensional** arrays can have one index per axis. These indices are given separated by commas.

In [25]:
b = np.fromfunction(lambda x, y: 10*x+y ,(5,4),dtype=int)
b

array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])

In [26]:
b[2,3] #element in position (2,3)

23

In [27]:
b[:, 1]                       # each row in the second column of b

array([ 1, 11, 21, 31, 41])

When fewer indices are provided than the number of axes, the missing indices are considered complete slices `:`.

In [28]:
b[-1]

array([40, 41, 42, 43])

Iterating over multidimensional arrays is done with respect to the first axis:

In [29]:
for row in b:
    print(row)

[0 1 2 3]
[10 11 12 13]
[20 21 22 23]
[30 31 32 33]
[40 41 42 43]


However, if one wants to perform an operation on each element in the array, one can use the `flat` attribute  first.

## Numpy array element-wise operations  

Arithmetic operators on arrays apply elementwise. A new array is created and filled with the result.

* `vector_a + vector_b` - Addition
* `vector_a - vector_b` - Subtraction
* `vector_a * vector_b` - Multiplication (elemen-wise multiplication, see `numpy.dot()` function for inner product or matrix multiplication).
* `vector_a / vector_b` - Division
* `vector_a % vector_b` - Modulus (find the remainder of the previous operation)
* `vector_a ** vector_b` - Exponent
* `vector_a // vector_b` - Floor Division (rounded down to the nearest integer)
* `vector_a += vector_b` - Same as addition but with in place modification of `vector_a`
* `vector_a *= vector_b` - Same as multiplication but with in place modification of `vector_a`  

When operating with arrays of different types, the type of the resulting array corresponds to the more general or precise one (a behavior known as upcasting).

**IMPORTANT**: If instead of two ndarrays of the same size we have a ndarray and a scalar value it works as well (see below division by 3600).

In [30]:
# Example of vector selection and element-wise operation
trip_distance_miles = taxi[:,7]
trip_length_seconds = taxi[:,8]

trip_length_hours = trip_length_seconds / 3600 # 3600 seconds is one hour

trip_mph = trip_distance_miles/trip_length_hours

print(trip_distance_miles[0:5], trip_length_seconds[0:5], trip_length_hours[0:5], trip_mph[0:5])

[21.   16.29 12.7   8.7   5.56] [2037. 1520. 1462. 1210.  759.] [0.56583333 0.42222222 0.40611111 0.33611111 0.21083333] [37.11340206 38.58157895 31.27222982 25.88429752 26.3715415 ]


## Numpy array methods  

**Many unary operations**, such as computing the sum of all the elements in the array, **are implemented as methods of the ndarray class**. By default, these operations apply to the array as though it were a list of numbers, regardless of its shape. However, by specifying the axis parameter you can apply an operation along the specified axis of an array.

In [31]:
a = np.random.random((2,3))
a

array([[0.38846303, 0.27033545, 0.3539479 ],
       [0.90813304, 0.93578024, 0.4298039 ]])

In [32]:
a.sum()

3.2864635703478364

In [33]:
a.min()

0.27033544908984786

In [34]:
a.max()

0.935780242319545

In [35]:
b = np.arange(12).reshape(3,4)
b

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

In [36]:
b.sum(axis=0)                            # sum of each column

array([12, 15, 18, 21])

In [37]:
b.min(axis=1)                            # min of each row

array([0, 4, 8])

In [38]:
b.cumsum(axis=1)                         # cumulative sum along each row

array([[ 0,  1,  3,  6],
       [ 4,  9, 15, 22],
       [ 8, 17, 27, 38]], dtype=int32)

## Shape Manipulation  

The shape of an array can be changed with various commands. Note that the following three commands all return a modified array, but do not change the original array

In [39]:
a = np.floor(10*np.random.random((3,4)))
a

array([[5., 6., 9., 9.],
       [1., 5., 5., 1.],
       [1., 8., 5., 1.]])

In [40]:
np.array(a.flat)

array([5., 6., 9., 9., 1., 5., 5., 1., 1., 8., 5., 1.])

In [41]:
a.T

array([[5., 1., 1.],
       [6., 5., 8.],
       [9., 5., 5.],
       [9., 1., 1.]])

In [42]:
a.reshape(6,2)

array([[5., 6.],
       [9., 9.],
       [1., 5.],
       [5., 1.],
       [1., 8.],
       [5., 1.]])

If a dimension is given as -1 in a reshaping operation, the other dimensions are automatically calculated:

In [43]:
a.reshape(6,-1)

array([[5., 6.],
       [9., 9.],
       [1., 5.],
       [5., 1.],
       [1., 8.],
       [5., 1.]])

## Array concatenation  

Several arrays can be stacked together along different axes, nut:  

1. The n arrays must be made the same dimension along the axis where the concatenations is going to be made (0 for concatenation of columns; 1 for concatenation along the columns axis)
2. Concatenate a sequence of array_like elements. The arrays must have the same shape, except in the dimension corresponding to axis.

In [44]:
print(trip_mph.shape)

# Insert a new axis that will appear at the axis position in the expanded array shape
trip_mph_2d = np.expand_dims(trip_mph, axis=1)
print(trip_mph_2d.shape)

print(taxi.shape)
taxi = np.concatenate((taxi, trip_mph_2d), axis=1)
print(taxi.shape)

(89560,)
(89560, 1)
(89560, 15)
(89560, 16)


# Array sorting and searching basics ([ref](https://docs.scipy.org/doc/numpy-1.14.0/reference/routines.sort.html))
* `sort(a[, axis, kind, order])`	Return a sorted copy of an array.
* `argsort(a[, axis, kind, order])`	Returns the indices that would sort an array.
* `ndarray.sort([axis, kind, order])`	Sort an array, in-place.
* `argmax(a[, axis, out])`	Returns the indices of the maximum values along an axis.
* `argmin(a[, axis, out])`	Returns the indices of the minimum values along an axis.
* `nonzero(a)`	Return the indices of the elements that are non-zero.
* `count_nonzero(a[, axis])`	Counts the number of non-zero values in the array a.

In [45]:
# sorting example using numpy.argsort()

trip_mph_sortarg = np.argsort(taxi[:,15])
taxi_sorted = taxi[trip_mph_sortarg,:]

print(taxi[:4,15])
print(taxi[-4:,15])

print(taxi_sorted[:4,15])
print(taxi_sorted[-4:,15])

[37.11340206 38.58157895 31.27222982 25.88429752]
[30.10135135 22.29907867 42.41551247 36.90473407]
[0. 0. 0. 0.]
[30960. 32040. 70560. 82800.]
