# Wind-dependent Variables: Predict Wind Speeds of Tropical Storms

By Gilad Shtern 15-Jan-2020

## Introduction

In this challenge, you will estimate the wind_speed of a storm in knots at a given point in time using satellite imagery. The training data consist of single-band satellite images from 494 different storms in the Atlantic and East Pacific Oceans, along with their corresponding wind speeds. These images are captured at various times throughout the life cycle of each storm. Your goal is to build a model that outputs the wind speed associated with each image in the test set.

For each storm in the training and test sets, you are given a time-series of images with their associated relative time since the beginning of the storm. Your models may take advantage of the temporal data provided for each storm up to the point of prediction. Keep in mind that the goal of this competition is to produce an operational model that uses recent images to estimate future wind speeds.
Link: https://www.drivendata.org/competitions/72/predict-wind-speeds/page/275/

Links:
- https://www.nhc.noaa.gov/aboutsshws.php
- https://www.unidata.ucar.edu/data/NGCS/lobjects/chp/structure/

In [None]:
#Step1- Load Train dataset
import os
import shutil
import cv2
import numpy as np
import json
import pandas as pd

#path
dirPath = 'G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms'

trainDF = pd.read_csv('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/training_set_features.csv')
trainLabelDF = pd.read_csv('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/training_set_labels.csv')
trainDF['wind_speed'] = ''
trainDF['wind_speed'] = trainLabelDF['wind_speed']
trainDF.head()

testDF = pd.read_csv('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/test_set_features.csv')

In [None]:
#Step2 - Check col types, NA, unique.
df = pd.DataFrame(columns = ['Col', 'Type', 'NA', '%NA', 'UniqLen']) 
colList = list(trainDF)

for i, value in enumerate(colList):
    df.loc[i] = [value, trainDF.dtypes[i], trainDF[value].isna().sum(),  trainDF[value].isna().sum()/len(trainDF), len(trainDF[value].unique())]

df

In [None]:
#Step3 - Resize all image into 224X224
#Create img list
trainImgList = os.listdir(dirPath + '/train/')
testImgList = os.listdir(dirPath + '/test/')

for i, value in enumerate(trainImgList ):
    origImg = dirPath + '/train/' + value
    img = cv2.imread(origImg)
    
    # dsize
    dsize = (224, 224)
    
    output = cv2.resize(img, dsize)
    cv2.imwrite(origImg, output)
    
for i, value in enumerate(testImgList ):
    origImg = dirPath + '/test/' + value
    img = cv2.imread(origImg)
    
    # dsize
    dsize = (224, 224)
    
    output = cv2.resize(img, dsize)
    cv2.imwrite(origImg, output)

In [None]:
print("Min wind speed: ", trainDF['wind_speed'].min())
print("Miedian wind speed: ", trainDF['wind_speed'].median())
print("Max wind speed: ", trainDF['wind_speed'].max())

In this section, I will use a pre-traind model the based on the followed:
Create category classifications:
- The 1st category model was build on reletive one: 
- Image item wind speed - the lowest wind speed / wind speed range; then multiply by 10
- The 2nd category model was build on absolute wind speed arbitrary - 3; then devide by 7

The next step was to train each model via Fastai. Model-1 accuracy 96.8%, model-2 accruracy 93%.
My current goal is to improve the accuracy by use combination of 2 models.


In [1]:
#Step4 - Fastai Prediction
import fastai
from fastai.metrics import error_rate
from fastai.vision import *
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import os
import ctypes

In [2]:
#Step5 - Predict by Model-1
ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll')
learn = load_learner('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/', 'fastai.pkl')
testDF = pd.read_csv('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/test_clean.csv')
testDF['WindCat1']  = 0
path = 'G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/test/'
for i in range(0, len(testDF)):
    file = path + testDF['image_id'].iloc[i] + '.jpg'  
    img = open_image(file)
    pred_class, pred_idx, output = learn.predict(img)
    testDF['WindCat1'].iloc[i] = str(pred_class)


In [3]:
#Step6 - Predict by Model-2
ctypes.cdll.LoadLibrary('caffe2_nvrtc.dll')
learn = load_learner('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/', '2.pkl')
testDF['WindCat2']  = 0
path = 'G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/test/'
for i in range(0, len(testDF)):
    file = path + testDF['image_id'].iloc[i] + '.jpg'  
    img = open_image(file)
    pred_class, pred_idx, output = learn.predict(img)
    testDF['WindCat2'].iloc[i] = str(pred_class)
    
testDF.to_csv('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/test_pred_combined_model.csv', index=False)

In [2]:
#Step7 - Prepare Submission
import pandas as pd
testDF = pd.read_csv('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/test_pred_combined_model.csv')
testDF['Min1'] = 17 * (testDF['WindCat1'] - 1) + 15
testDF['Min2'] = 7 * testDF['WindCat2'] + 3
testDF['Max1'] = 17 * testDF['WindCat1'] + 15
testDF['Max2'] = testDF['Min2'] + 7
testDF['wind_speed'] = 0

In [3]:
Lower = 0
Upper = 0
for i in range(0, len(testDF)):
    if testDF['Min1'].iloc[i] >= testDF['Min2'].iloc[i]:
        Lower = testDF['Min1'].iloc[i]
    else:
         Lower = testDF['Min2'].iloc[i]
            
    if testDF['Max1'].iloc[i] >= testDF['Max2'].iloc[i]:
        Upper = testDF['Max1'].iloc[i]
    else:
         Upper = testDF['Max2'].iloc[i]
            
    testDF['wind_speed'].iloc[i] = (Lower + Upper)/2
    
testDF.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


Unnamed: 0,image_id,relative_time,ocean,WindCat,WindCat1,WindCat2,Min1,Min2,Max1,Max2,wind_speed
0,acd_123,304198,1,,2,4,32,31,49,38,40.5
1,acd_124,305998,1,,2,4,32,31,49,38,40.5
2,acd_125,307798,1,,1,3,15,24,32,31,28.0
3,acd_126,309598,1,,1,4,15,31,32,38,34.5
4,acd_127,313198,1,,1,3,15,24,32,31,28.0


In [6]:
testDF['wind_speed'] = testDF['wind_speed'].astype("int32")
submit = testDF.drop(['relative_time', 'ocean', 'WindCat', 'WindCat1', 'WindCat2', 'Min1', 'Min2', 'Max1', 'Max2'], axis=1)
submit.to_csv('G:/DataScienceProject/Drivendata-Predict-Wind-Speeds-of-Tropical-Storms/submit2.csv', index=False)