# Import csv directly to a numpy ndarray using np.genfromtxt()
Use numpy.genfromtxt() function  to import the nyc_taxis.csv file directly as a Numpy ndarray. <br>Every entry is imported with the same type, in this case float64. <br>In this case it is not possible to store text, which is imported as Nan (headers in the 1st row), but it would be possible to have a numpy array only with string elements.

In [1]:
import numpy as np

taxi = np.genfromtxt('datasets/nyc_taxis.csv', delimiter=',')

# Data is imported directly as a numpy ndarray
print(type(taxi))

# Every entry of an ndarray must be of the same tipe, in this case float64.
# Text (in our case the headers in the 1st row) is imported as NaN
print(taxi.dtype)
print(taxi[0,:])


# Remove headers
taxi = taxi[1:,:]
print(taxi[0,:])

<class 'numpy.ndarray'>
float64
[nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
[2.016e+03 1.000e+00 1.000e+00 5.000e+00 0.000e+00 2.000e+00 4.000e+00
 2.100e+01 2.037e+03 5.200e+01 8.000e-01 5.540e+00 1.165e+01 6.999e+01
 1.000e+00]


# Scalar operation on vector
We can perform any scalar operation on a numpy ndarray and it will be applied to every element of the vector.

In [2]:
a_array = np.array([1, 2, 3, 4, 5])
b_array = np.array(["blue", "blue", "red", "blue"])
c_array = np.array([80.0, 103.4, 96.9, 200.3])
d_array = np.array([[1, 2, 3, 4, 5],[6, 7, 8, 9, 10]])

a_result = a_array < 3
b_result = b_array == "blue"
c_result = c_array > 100

# It also works when the ndarray has more than one dimension. 
d_result = d_array + 100

print(a_result)
print(b_result)
print(c_result)
print(d_result)

[ True  True False False False]
[ True  True False  True]
[False  True False  True]
[[101 102 103 104 105]
 [106 107 108 109 110]]


# Boolean selection

**Boolean selection exemple 1**
<br><br>Calculate the number of rides in the taxi ndarray that are from January:
* Create a boolean array, february_bool, that evaluates whether the items in pickup_month are equal to 2.
* Use the february_bool boolean array to index pickup_month, and assign the result to february.
* Use the ndarray.shape attribute to find the number of items in february and assign the result to february_rides.

In [3]:
pickup_month = taxi[:,1]

january_bool = pickup_month == 1
january = pickup_month[january_bool]
january_rides = january.shape[0]

print(taxi.shape[0])
print(january.shape[0])

89560
13481


**Boolean selection exemple 2**
* Create a boolean array, `tip_bool`, that determines which rows have values for the `tip_amount` column of more than 50.
* Use the `tip_bool` array to select all rows from `taxi` with values tip amounts of more than 50, and the columns from indexes 5 to 13 inclusive. Assign the resulting array to `top_tips`.

In [4]:
tip_amount = taxi[:,12]
tip_bool = tip_amount > 50
top_tips = taxi[tip_bool,5:14]

# Boolean assignment

1) Create the boolean vector first and then use the vector to select the elements to which assiggnment applies.

`bool = array[:, column_for_comparison] == value_for_comparison`  
`array[bool, column_for_assignment] = new_value`  

2) Introduce the selection condition directly inside brackets for assignment in one single step.

`array[array[:, column_for_comparison] == value_for_comparison, column_for_assignment] = new_value`

In [5]:
a = np.array([1, 2, 3, 4, 5])
a[a > 2] = 99
print(a)

# # It also works when the ndarray has more than one dimension. 
b = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
b[b > 4] = 99
print(b)

#Or we can select just partof the columns
c = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])
c[c[:,1] > 2, 1] = 99
print(c)

[ 1  2 99 99 99]
[[ 1  2  3]
 [ 4 99 99]
 [99 99 99]]
[[ 1  2  3]
 [ 4 99  6]
 [ 7 99  9]]
