### Case Study - Statistics of wind Data of India from 2001 to 2017
**-----------------------------------------------------------------**

** The data in 'wind.data' has the following format::**

2017  1  1 15.04 14.96 13.17  9.29 13.96  9.87 13.67 10.25 10.83 12.58 18.50 15.04
2017  1  2 14.71 16.88 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
2017  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25  8.04  8.50  7.67 12.75 12.71

- The first three columns are year, month and day.  
- The remaining 12 columns are average windspeeds in knots at 12 locations in India on that day.

Note - Use the 'loadtxt' function from numpy to read the data into an array.

#### Peroform the following operations
1. Load data using loadtxt method and verify

2. Calculate the min, max and mean windspeeds and standard deviation of the windspeeds over all the locations and all the times (a single set of numbers for the entire dataset).

3. Calculate the min, max and mean windspeeds and standard deviations of the windspeeds at each location over all the days (a different set of numbers for each location)

4. Calculate the min, max and mean windspeed and standard deviations of the windspeeds across all the locations at each day (a different set of numbers for each day)

5. Find the location which has the greatest windspeed on each day (an integer column number for each day).

6. Find the year, month and day on which the greatest windspeed was recorded.

7. Find the average windspeed in January for each location.

8. Calculate the mean windspeed for each month in the dataset.  Treat
   January 2017 and January 2016 as *different* months.

9. Calculate the min, max and mean windspeeds and standard deviations of the windspeeds across all locations for each week.

10. Calculate the mean windspeed for each month without using a for loop. (Hint: look at `searchsorted` and `add.reduceat`.)


**Note: You should be able to perform all of these operations without using a for loop or other looping construct.**

In [1]:
from numpy import (loadtxt, arange, searchsorted, add, zeros, unravel_index,
                   where)

#### 1. Load data

In [2]:
wind_data = loadtxt('wind.data')

# Print the shape
wind_data.shape

(6574, 15)

In [3]:
# First row - year, month, day, wind speed at 12 locations - 15 cols
print(wind_data[1,:]) # Converted everything to float

[  2.01700000e+03   1.00000000e+00   2.00000000e+00   1.47100000e+01
   1.68800000e+01   1.08300000e+01   6.50000000e+00   1.26200000e+01
   7.67000000e+00   1.15000000e+01   1.00400000e+01   9.79000000e+00
   9.67000000e+00   1.75400000e+01   1.38300000e+01]


#### 2. Calculate the min, max and mean windspeeds and standard deviation of the windspeeds over all the locations and all the times (a single set of numbers for the entire dataset).

In [4]:
# We should ignore first 3 cols - year, month, day

# Get only wind data
data = wind_data[:, 3:]

print('2. Statistics over all values')
print('  min:', data.min())
print('  max:', data.max())
print('  mean:', data.mean())
print('  standard deviation:', data.std())
print()

2. Statistics over all values
  min: 0.0
  max: 42.54
  mean: 10.2283737704
  standard deviation: 5.6038401811



#### 3. Calculate the min, max and mean windspeeds and standard deviations of the  windspeeds at each location over all the days (a different set of numbers for each location)

In [5]:
# Each column belongs to one location, access data column wise
# and perform min, max..

# axis = 0 --> columns

# 12 locations - 12mins, 12maxs, 12 means, 12 std vals

data = wind_data[:, 3:]

print('3. Statistics over all days at each location')
print('  min:', data.min(axis=0))
print('  max:', data.max(axis=0))
print('  mean:', data.mean(axis=0))
print('  standard deviation:', data.std(axis=0))
print()

3. Statistics over all days at each location
  min: [ 0.67  0.21  1.5   0.    0.13  0.    0.    0.    0.    0.04  0.13  0.67]
  max: [ 35.8   33.37  33.84  28.46  37.54  26.16  30.37  31.08  25.88  28.21
  42.38  42.54]
  mean: [ 12.36371463  10.64644813  11.66010344   6.30627472  10.45688013
   7.09225434   9.7968345    8.49442044   8.49581838   8.70726803
  13.121007    15.59946152]
  standard deviation: [ 5.61918301  5.26820081  5.00738377  3.60513309  4.93536333  3.96838126
  4.97689374  4.49865783  4.16746101  4.50327222  5.83459319  6.69734719]



#### 4. Calculate the min, max and mean windspeed and standard deviations of the windspeeds across all the locations at each day (a different set of numbers for each day)

In [6]:
# each day -- a day is a row for 12 locations 
# Use axis = 1 for row

# i.e minimum wind speed of all 12 locations on every day
# so, we get 6574 min values

print('4. Statistics over all locations for each day')
print('  min:', data.min(axis=1))
print('  max:', data.max(axis=1))
print('  mean:', data.mean(axis=1))
print('  standard deviation:', data.std(axis=1))
print()

# Check the dimension of min array
data.min(axis=1).shape

4. Statistics over all locations for each day
  min: [ 9.29  6.5   6.17 ...,  8.71  9.13  9.59]
  max: [ 18.5   17.54  18.5  ...,  29.58  28.79  27.29]
  mean: [ 13.09666667  11.79833333  11.34166667 ...,  14.89        15.3675      15.4025    ]
  standard deviation: [ 2.5773188   3.28972854  3.50543348 ...,  5.51175108  5.30456427
  5.45971172]



(6574,)

#### 5. Find the location which has the greatest windspeed on each day (an integer column number for each day).

In [8]:
print('5. Location of daily maximum')
print('  daily max location:', data.argmax(axis=1))  #Return index of column with max wind
print()

5. Location of daily maximum
  daily max location: [10 10  0 ..., 11 11  2]



In [9]:
# Print all the locations with max wind speed for each day

#for loc in data.argmax(axis=1):
#    print(loc)

#### 6. Find the year, month and day on which the greatest windspeed was recorded.

In [10]:
# Take all the columns and rows - wind_data

# Get row and col where wind speed is max

max_row, max_col = where(data == data.max())

# max-row --> row number where max wind speed is present
# max_col --> col number where maximum wind spaed is present

print(max_row, max_col)

print()
print('6. Day of maximum reading')
print('  Year:', int(wind_data[max_row, 0])) # col0 - year
print('  Month:', int(wind_data[max_row, 1])) # col1 - month
print('  Day:', int(wind_data[max_row, 2]))  # col2 - day
print()

january_indices = wind_data[:, 1] == 1
january_data = data[january_indices]



[2161] [11]

6. Day of maximum reading
  Year: 2013
  Month: 12
  Day: 2



#### 7. Find the average windspeed in January for each location.

In [11]:
print('  mean:', january_data.mean(axis=0))

  mean: [ 14.86955197  12.92166667  13.29962366   7.19949821  11.67571685
   8.05483871  11.81935484   9.5094086    9.54320789  10.05356631
  14.55051971  18.02876344]


### Exercise 1
#### 8. Calculate the mean windspeed for each month in the dataset. Treat January 2017 and January 2016 as different months.

In [12]:
# compute the month number for each day in the dataset


In [13]:
# we're going to use the month values as indices, so we need
# them to be integers
months = months.astype(int)

# get set of unique months
month_values = set(months)

# initialize an array to hold the result
monthly_means = zeros(len(month_values))

for month in month_values:
    # find the rows that correspond to the current month
    

    # extract the data for the current month using fancy indexing
    

    # find the mean
    
    
# In fact the whole for loop could reduce to the following one-liner
# monthly_means = array([data[months==month].mean() for month in month_values])

print(" mean:", monthly_means)
print()

IndentationError: expected an indented block (<ipython-input-13-74a445f95d79>, line 24)

#### 9.  Calculate the min, max and mean windspeeds and standard deviations of the windspeeds across all locations for each week.

In [45]:
# Extract the data for the first 52 weeks. Then reshape the array to put
# on the same line 7 days worth of data for all locations. Let Numpy
# figure out the number of lines needed to do so
weekly_data = data[:52 * 7].reshape(-1, 7 * 12)

print('  Weekly statistics over all locations')
print('  min:', weekly_data.min(axis=1))
print('  max:', weekly_data.max(axis=1))
print('  mean:', weekly_data.mean(axis=1))
print('  standard deviation:', weekly_data.std(axis=1))
print()

  Weekly statistics over all locations
  min: [ 1.79  0.5   1.04  2.17  3.63  8.08  3.42  2.21  5.66  1.71  2.75  2.58
  1.46  3.21  1.54  0.83  1.38  3.83  1.04  3.33  1.63  3.29  3.21  1.58
  2.88  4.42  3.54  2.67  1.46  2.17  2.25  2.5   6.83  3.96  1.13  1.25
  4.17  1.46  3.21  1.04  2.96  3.75  2.21  1.71  1.33  0.63  2.88  1.92
  3.13  5.46  0.58  0.42]
  max: [ 18.5   20.71  20.79  27.63  27.71  26.38  28.62  29.63  25.8   22.71
  22.95  21.54  22.5   18.29  16.17  21.09  17.5   28.08  26.63  15.96
  20.96  17.96  19.83  25.25  24.71  21.87  21.29  22.5   21.42  25.37
  20.25  14.58  24.3   22.29  24.71  20.25  33.09  20.96  23.21  19.62
  21.04  33.45  30.88  23.58  20.41  32.71  22.58  23.75  29.33  25.62
  24.41  29.33]
  mean: [ 10.30154762   8.895        9.29952381  14.92047619  12.7902381
  16.03654762  13.69488095  11.7597619   13.05642857  10.07535714
  12.7502381    9.80142857  11.27690476   8.75619048   7.65988095
   9.45642857   7.72511905  11.66607143   9.49797619 

### Exercise 2
#### 10. Calculate the mean windspeed for each month without using a for loop. (Hint: look at searchsorted and add.reduceat.)