# Where is the limit?
In this notebook we evaluate the model for classifying images into TC or x-TC. With this we seek to discover which is the boundary between both phenomena according to the model. We hope this can shed some light on the process of discerning between these events, as it is usually a subjective human decision.

In [9]:
import sys
sys.path.insert(0, '../..')
from os.path import join
import json

from pyphoon.db.pd_manager import PDManager
from pyphoon.db.data_extractor import DataExtractor
from pyphoon.app.preprocess import DefaultImagePreprocessor

import numpy as np

## Which sequences?
Not all sequences serve the purpose of evaluating how good our model is performing. We make sure that any sequence used here belongs either to the training set or the validation set. We want to avoid using any test data as a source of feedback to improve our model.

In [2]:
with open('../../tasks/tcxtc/traintest_split_tcxtc.json') as f:
    info = json.load(f)

Let us now select a typhoon sequence from the validation set.

**Note**: Interesting sequences are: 199906, 201711, 200306, which contain both TC and xTC sequences.

In [3]:
# Pick sequence
import random
seq_no = random.sample(info['valid'], 1)[0]
print("Typhoon selected:", seq_no)

Typhoon selected: 201524


## Load Sequence
Let us load a sequence from the original dataset (well, actually we use the corrected version). To this end, we will first create a `PDManager` object, which is the bridge to the dataset. Using this object, we will define a `DataExtractor` which will allow us to extract, for instance, a specific typhoon sequence given its sequence number. In addition, we need to define a Preprocessor, such that the data is loaded to be suitable for our model.

In [4]:
# Paths to source data
orig_images_dir = '/root/fs9/datasets/typhoon/wnp/image/'
besttrack_dir = '/root/fs9/datasets/typhoon/wnp/jma'

# Path where corrected images are to be stored
corrected_dir = '/root/fs9/grishin/database/corrected'

# Path to new database files
db_dir = '/root/fs9/grishin/database'
# Pickle files (used to store dataframes)
images_pkl_path = join(db_dir, 'images.pkl')
corrected_pkl_path = join(db_dir, 'corrected.pkl')
besttrack_pkl_path = join(db_dir, 'besttrack.pkl')
missing_pkl_path = join(db_dir, 'missing.pkl')

In [5]:
# Create pd_man
man = PDManager()
man.load_original_images(images_pkl_path)
man.load_besttrack(besttrack_pkl_path)
man.load_corrected_images(corrected_pkl_path)

# Preprocess algorithm
preprocessor = DefaultImagePreprocessor(mean=269.15, std=24.14, resize_factor=2, reshape_mode='keras')

# Define data extractor
de = DataExtractor(original_images_dir=orig_images_dir, corrected_images_dir=corrected_dir, pd_manager=man)

Load the sequence.

In [17]:
images, images_ids, features = de.read_seq(seq_no, preprocessor.apply, ['class'])
X = np.array(images)
Y = ground_truth = [0 if label != 6 else 1 for label in features['class'] ]

In [18]:
# Number Typhoon class distribution
print("Number of samples:")
print(" * TC:", sum([1 for label in Y if label == 0]))
print(" * x-TC:", sum([1 for label in Y if label == 1]))

Number of samples:
 * TC: 223
 * x-TC: 0


## Load model

In [13]:
from keras.models import load_model
import numpy as np

Using TensorFlow backend.
  return f(*args, **kwds)


In [14]:
# Load the model
model = load_model('../../tasks/tcxtc/model_tcxtc_1.h5')

Once the model has been loaded, time to obtain the predictions on the loaded chunk.

In [20]:
predictions = model.predict(X)[:,0]

In [19]:
# Get complete report
from sklearn.metrics import classification_report
y_pred = model.predict_classes(X)
print(classification_report(Y, y_pred))

             precision    recall  f1-score   support

          0       1.00      0.90      0.95       223
          1       0.00      0.00      0.00         0

avg / total       1.00      0.90      0.95       223



  'recall', 'true', average, warn_for)
