<a href="https://colab.research.google.com/github/ritalulu/wids_2019/blob/master/WiDS2019_kaggle_template.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [0]:
# Import libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf

# Downloading Data for WiDS Datathon from Kaggle
* Run all the steps below to obtain the data

In [0]:
!pip install kaggle

In [0]:
!mkdir .kaggle

* The cell below will prompt you to choose kaggle.json file dowloaded from your Kaggle account to your computer.
* Run the cell, then select the file. Once the upload of the file is complete, the cell will finish running.

In [0]:
# Run this cell and select the kaggle.json file downloaded
# from the Kaggle account settings page.
from google.colab import files
files.upload()

In [0]:
# Let's make sure the kaggle.json file is present.
!ls -lha kaggle.json

In [0]:
# The Kaggle API client expects this file to be in ~/.kaggle,
# so move it there.
!mkdir -p ~/.kaggle
!cp kaggle.json ~/.kaggle/

# This permissions change avoids a warning on Kaggle tool startup.
!chmod 600 ~/.kaggle/kaggle.json

In [0]:
# Download the data for the WiDS datathon
!kaggle competitions download -c widsdatathon2019 -p /content

In [0]:
# Unzip the files
!unzip \*.zip

In [0]:
# Look at what files are available
!ls

In [0]:
# Read a file
train_labels = pd.read_csv('traininglabels.csv')
train_labels.head()

# Now we are done with downloading data! 
* Try building a model inside this notebook by create additional cells below with code to specify and fit the model
* If you are fitting large neural nets, make sure this google colab notebook is running on GPUs
* Check Edit --> Notebook settings --> Hardware accelerator: GPU

In [0]:
# Start building a model here

# Once you have generated predictions on the holdout and test data, create a submission file
* The submission file must have 6534 rows  - number of samples in holdout and test sets combined
* The submission file must have correct headings "image_id" and "has_oilpalm"
* The names of the rows should be image ids
* Below is an example of how to do it


In [0]:
# Let's take a look at the sample submission file provided by Kaggle
sample_submission = pd.read_csv('SampleSubmission.csv', index_col=0)
sample_submission.head()

In [0]:
# helper function for creating submission file
# create a dataframe containing image names in specified folder
from os import listdir
from os.path import isfile, join
def make_dataframe_with_imagefile_names(mypath):
  onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]
  df = pd.DataFrame(onlyfiles, columns = ['image_id'])
  return df

In [0]:
# create a dataframe with image ids found in the holdout set
holdout_df = make_dataframe_with_imagefile_names('./leaderboard_holdout_data/')
# create a dataframe with image ids found in the test set
test_df = make_dataframe_with_imagefile_names('./leaderboard_test_data/')

* Here we're making fake predictions on the holdout and test sets by setting everything to 0
* But you will have real predictions based on your model, so please submit those instead
* Your predictions should be a score for each data point for which class it likely belongs to

In [0]:
# here we're making fake predictions on the holdout and test sets
# set everything to 0
pred_holdout = np.zeros(len(holdout_df))
pred_test = np.zeros(len(test_df))

In [0]:
# combine predictions on holdout and test sets
image_names = pd.concat([holdout_df, test_df])
predicted_labels = np.concatenate((pred_holdout, pred_test))
# create a submission dataframe
submissions = pd.DataFrame(image_names, columns = ['image_id'])
submissions['has_oilpalm'] = predicted_labels
# set image_id to be the row names
submissions = submissions.set_index('image_id')
# just in case sort the submissions file we created by the image ids in the SampleSubmission.csv file
# so that the order of images is consistent
submissions = submissions.loc[sample_submission.index.values,:]
submissions.head()

In [0]:
# Save the file
submissions.to_csv('/content/submission.csv')

In [0]:
# Download the file locally on your computer
from google.colab import files
files.download('/content/submission.csv')