## Exercise 2 Numpy Data Exploration

Complete the following using data from the file traffic.csv.  The data file contains data from a car speeding study in which signs were placed along a roadway. It was a study about the effect that warning signs have on speeding patterns.  The city council undertaking the study considered 14 pairs of locations. The locations were paired to account for factors such as traffic volume and type of road. Pairs of sites were used, and one sites had a sign erected warning of the dangers of speeding and asking drivers to slow down. No action was taken at the second site. Three sets of measurements were taken at each site. These speed measurements were taken before the erection of the sign, shortly after the erection of the sign, and again after the sign had been in place for some time.

This data set contains the following columns:

1. <b>Speed:</b> Speeds of cars (in miles per hour).

2. <b>Period:</b> A numeric column indicating the time that the reading was taken. A value of 1 indicates a reading taken before the sign was erected, a 2 indicates a reading taken shortly after erection of the sign and a 3 indicates a reading taken after the sign had been in place for some time.

3. <b>Warning:</b> A numeric column indicating whether the location of the reading was chosen to have a warning sign erected. A value of 1 indicates presence of a sign and a value of 2 indicates that no sign was erected.

4. <b>Pair:</b> A numeric column giving the pair number at which the reading was taken. Pairs were numbered from 1 to 14.

#### Question 1

Load data in the file data.csv into a numpy array and display.

In [3]:
import numpy as np

In [4]:
#Load the data using -> np.loadtxt('data.csv',delimiter = ',',skiprows=1)
columns = np.loadtxt('../data/traffic.csv',delimiter = ',', max_rows=1, dtype=object)
data = np.loadtxt('../data/traffic.csv',delimiter = ',', skiprows=1)

In [5]:
columns



In [6]:
# Display the first five rows here
data[:5]

array([[26.,  1.,  1.,  1.],
       [26.,  1.,  1.,  1.],
       [26.,  1.,  1.,  1.],
       [26.,  1.,  1.,  1.],
       [27.,  1.,  1.,  1.]])

#### Question 2

Confirm that the data has 14 pairs, numbered from 1 to 14

In [5]:
# We have 14 pairs
np.unique(data[:,3])

array([ 1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9., 10., 11., 12., 13.,
       14.])

#### Question 3

Confirm that for each pair, there is a place with a warning and one without

**Hint:** A for loop makes this much easier

In [6]:
for i in range(1,15):
    print(f'Pair:{i}\tWarnings:{np.unique(data[data[:,3]==i,2])}')



#### Question 4

Confirm that for each pair/warning combo, we have values taken from each time (1,2, and 3)

In [7]:
for i in range(1,15):
    for j in range(1,3):
        print(f'Pair: {i} \tWarning: {j}\tTime Periods: {np.unique(data[(data[:,3]==i)&(data[:,2]==j),1])}')



#### Question 5

What is the average speed calculated for each time period?  We only care about the impact that the warning had so let's make sure that we only look at speeds where the warning sign was placed.  Did the signs help?

In [8]:
for t in range(1,4):
    print(f'Period: {t}\tAvg Speed: {np.mean(data[(data[:,1]==t)&(data[:,2]==1),0])}')

Period: 1	Avg Speed: 36.51
Period: 2	Avg Speed: 35.769285714285715
Period: 3	Avg Speed: 37.66226138032305


#### Question 6

Which pair had the slowest speed recorded?  The fastest?

In [13]:
# Method 1
print(columns)

min_speed = np.argmin(data[:,0])
print(data[min_speed])

max_speed = np.argmax(data[:,0])
print(data[max_speed])

[19.  2.  1.  7.]
[67.  2.  2. 13.]


In [13]:
# Method 2
lowest_speed = np.inf
highest_speed = -np.inf
slowest_pair = 0
fastest_pair = 0

for i in range(1,15):
    
    min_speed = np.min(data[data[:,3]==i,0])
    max_speed = np.max(data[data[:,3]==i,0])
    
    
    if min_speed<lowest_speed:
        lowest_speed = min_speed
        slowest_pair = i
        
    if max_speed>highest_speed:
        highest_speed = max_speed
        fastest_pair = i
        
    
print(f'Slowest pair was {slowest_pair} with a speed of {lowest_speed}\nFastest pair was {fastest_pair} with a speed of {highest_speed}')

Slowest pair was 7 with a speed of 19.0
Fastest pair was 13 with a speed of 67.0


#### Question 7

Which pair had the biggest drop off in average speed from time period 1 to 2?

In [14]:
biggest_drop_off = -np.inf
pair = 0

for i in range(1,15):
    
    min_speed = np.min(data[data[:,3]==i,0])
    max_speed = np.max(data[data[:,3]==i,0])
    drop_off = np.mean(data[(data[:,3]==i)&(data[:,1]==1),0])-np.mean(data[(data[:,3]==i)&(data[:,1]==2),0])
    
    
    if drop_off>biggest_drop_off:
        biggest_drop_off = drop_off
        pair = i
        
    
print(f'Biggest drop off was {biggest_drop_off} for pair {pair}')

Biggest drop off was 3.5150000000000006 for pair 9
