**Note: The code in this notebook is meant to be run locally, NOT in kaggle's cloud!**

My approach to this was to use the Inception v3 model, which is trained on a very broad array of images and then just re-train the final layer to focus only on classifying invasive and non-invasive.

There is a really good [Codelab from Google][1] that goes over the whole procedure of retraining Inception v3.

If you follow the Codelab, you will eventually get to the optional step "Training on your own categories". This was the trigger for me to search for a dataset and I found this one, which even has a real-life purpose. Cool!

In this notebook I show you the only things I implemented in order to adapt this dataset to the retrain workflow covered in the codelab:

  [1]: https://codelabs.developers.google.com/codelabs/tensorflow-for-poets/#0

In [None]:
# First I needed to sort the photos into folders. 
# I named them "invasive" and "noninvasive". 
# I used the train_labels.csv file to do that:

import pandas
import os
labels = pandas.read_csv('train_labels.csv')

for index, row in labels.iterrows():
    is_invasive = row['invasive'] == 1
    name = str(row['name'])
    if is_invasive:
        # move file with that name into 'invasive' subfolder
        os.rename(name + '.jpg', 'invasive/' + name + '.jpg')
    else:
        # move file with that name into 'noninvasive' subfolder
        os.rename(name + '.jpg', 'noninvasive/' + name + '.jpg')

Then, the next step is to already just call retrain.py! 
I used this command, but you might need to change your folder-names accordingly:

    python retrain.py \
      --bottleneck_dir=bottlenecks \
      --model_dir=inception \
      --summaries_dir=training_summaries/invasion \
      --output_graph=retrained_graph.pb \
      --output_labels=retrained_labels.txt \
      --image_dir=train_photos

BTW: I highly recommend looking at the retrain.py source, it is very interesting. At the bottom in the main functions you will also see the possible flags you can set for hyper-parameter tuning (e.g. learning_rate, testing_percentage, random_brightness etc.)

As in the codelab, this will first create the bottleneck files. Afterward it will actually train the last layer of the inception model.

You can start tensorboard to monitor the progress:

    tensorboard --logdir training_summaries

Next up after training the model: Doing the predictions on the test-images. The following code does just that and creates a submission.csv in the end.
I called it 'create_invasive_submission.py' and called it with this command:

     python create_invasive_submission.py test_photos/

(it's again  shamelessly copied & modified from the Codelab I mentioned in the intro)

In [None]:
# To actually create the submission file, 
# we need to classify the test images 
# and write the predictions into a .csv 
# using the format seen in sample_submission.csv.

import os, sys
import pandas

import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

# change this as you see fit
folder_path = sys.argv[1]

# create predictions-table
predictions_list = [];

# get a list of all image-files in folder
images_list = os.listdir(folder_path)

# Unpersists graph from file
with tf.gfile.FastGFile("retrained_graph.pb", 'rb') as f:
    graph_def = tf.GraphDef()
    graph_def.ParseFromString(f.read())
    tf.import_graph_def(graph_def, name='')

with tf.Session() as sess:
    # Feed the image_data as input to the graph and get first prediction
    softmax_tensor = sess.graph.get_tensor_by_name('final_result:0')

    for image_path in images_list:
        try:
            # Read in the image_data
            image_data = tf.gfile.FastGFile(folder_path + image_path, 'rb').read()

            predictions = sess.run(softmax_tensor, \
                    {'DecodeJpeg/contents:0': image_data})

            # invasive is the first value, so we access index 0 of the node
            predictions_list.append([image_path[:-4], predictions[0][0]])

            print('%s (score = %.5f)' % (image_path[:-4], predictions[0][0]))

        except:
            print('%s is not a jpg!' % (image_path))

    predictions_table = pandas.DataFrame(predictions_list, columns=['name', 'invasive'])

    predictions_table.to_csv('submission_invasive.csv', index=False)


Then you've got the submission.csv file ready to be entered! With exactly this approach and no hyper-parameter tuning I got a public score of 0.97578 with this approach. 

This is it! As I'm still pretty new to TensorFlow, I'd love any feedback. Thanks for reading!