# Recitation OD - Colab
11785 - Introduction to Deep Learning, Carnige Mellon University

Author: Haoxuan Zhu (haoxuanz@andrew.cmu.edu)

Spring 2021

Welcome to 11-785! This tutorial is a supplementary material for recitation 0D. It contains a **fun** toy example to help you get started with Colab -- a powerful tool that will be of great help during your IDL adventure.

You can also download Recitation0D handouts [here](https://hxzhu9819.github.io/homepage/posts/11785_rc0e_colab).

In this notebook, we will cover

* How to request and check the GPU that the Colab allocates for you
* How to mount your Google Drive and why it's important
* How to set up Kaggle
* A **mandatory** toy example

## Colab Basics

One advantage of Colab (Jupyter Notebook) is that you do not need to worry too much about compilation and environment. You can write code and execute directly. Let's start by importing some packages. Press `shift+enter` to execute a block.

In [None]:
import numpy as np
import random

print("Hello world!")

### Involving GPU
In most cases, you will need to request a GPU instance (this is the reason why we use Colab). Now please try to allocate a GPU instance. You can verify by running the following command. It will tell you what GPU Colab allocates for you.

In [None]:
!nvidia-smi

### Mounting your Google Drive

Let's begin with mounting your Google drive to your current instance. 

1. Run the following block. You will be given an URL, follow the instruction to get an authorization code
2. Enter the authorization code and press `Enter`
3. On success, you will see `Mounted at /content/gdrive`

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

You can navigate inside any folder by `%cd`

In [None]:
%cd gdrive/MyDrive

**tips:** you can run bash commands directly from Colab by adding an `!` before your `bash cmd`. However, if you use `!cd gdrive`, you will not enter the gdrive folder. You are recommended to always check where you are with `!pwd`.

Now, let's check where we are. 

In [None]:
!pwd

You can also try to check what packages have already been installed for you by Colab. Do you remember the command?

In [None]:
# TODO: Check what packages have already been installed


In the following assignments, **you are strongly recommended to save your models to your Google drive instead of the instance itself.** This is because all your saved files will be cleared together with the VM once you disconnect. Feeling skeptical? Well, let's try!

On your left, there is a Files tag. By default, we are in `/content`. Let's try to put our precious files there

In [None]:
# TODO: Get back to /content if you are in gdrive


In [None]:
# Save files to the instance /content (hint: use !pwd to check where you are)
outputs = [1,1,7,8,5]
with open('super_hard_to_get_result.npy', 'wb') as f:
  np.save(f, outputs)

Refresh your Files tag and you can see that our `super_hard_to_get_result.npy` is there. Now, say your Network is down suddenly. You can simulate this scenario by terminating the session. 

Step:

1. locate RAM/Disk status bar on the top-right corner
2. click it to get a list of `Active sessions`
3. choose the current session and terminate it.


You should see `Reconnect` button on the top-right corner if the session is terminated. 

Click it to get back online. Check the File tag, and you will find out that the `super_hard_to_get_result.npy` is gone (just imagine it's your model after 5 hours' training)

Let's try to save this file to our Google Drive. Since we have reconnected, what we are given is a new instance. Hence, we need to mount the drive again.

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

%cd gdrive/MyDrive

Now we are in /content/gdrive/MyDrive, where you can see all your files in the Google Drive. You can consider it as a portal to your Google Drive. And everything you store here will be kept in your drive (aka, they will not be gone with the instance). Now let's try to save our `super_hard_to_get_result.npy` to our drive.

In [None]:
# Save files to Google Drive. Please make sure you are in /gdrive/MyDrive
outputs = [1,1,7,8,5]
with open('super_hard_to_get_result.npy', 'wb') as f:
  np.save(f, outputs)

Now, if the network is down, your files remain in your Google Drive. All you need to do is to connect to a new instance, re-mount your Google Drive and load all your necessary files from there.

Let's get back to `/content` for now

In [None]:
# TODO: Get back to /content


### Kaggle
Kaggle is the platform we use to hold competitions (assignmnets), release dataset and gather your submissions.

If you have not done so, please go and sign up an account here:
www.kaggle.com

**Mandatory: Please fill out this form**: https://forms.gle/UqKXaPTYgs6nuTRh8

For most of the assignments, the data would be huge for you to download and then upload it somewhere. The best option is to directly download the data from Kaggle every time you load the notebook. This can be easily accomplished by using the Kaggle API.

In [None]:
!pip install kaggle
!mkdir .kaggle

In [None]:
import json
token = {"username":"your_username","key":"your_key"}
with open('/content/.kaggle/kaggle.json', 'w') as file:
    json.dump(token, file)

In [None]:
!chmod 600 /content/.kaggle/kaggle.json
!cp /content/.kaggle/kaggle.json /root/.kaggle/
!kaggle config set -n path -v /content

## Toy Example - Does s/he like me?
Welcome to this toy example! In this section, you will have the chance to try all the tricks and methods that we have mentioned above about Colab. If you can finish this toy example, then you are good to go. There is no need to worry if you are new to the world of *machine learning*, as all you need in this section are covered by the pre-requisite courses. And trust me, you will be an expert of Deep learning after you finish the course.

### Problem description
When you have a crush on someone, do you want to know how s/he feels? In this example, we will use **Bayes Theorem** to help our friend Bob. 

Bob tells you that he has a crush on Alice. Even though Alice often smiles when she meets Bob, he still does not know whether Alice likes him back. He asks you whether he should propose to her when she smiles at him.

You happen to have a magical machine designed by Haoxuan Inc. that can tell

* **P(A)**: Probability of Alice liking Bob
* **P(B|A)**: Probability of Alice smiling at Bob given that she likes Bob
* **P(B)**: Probability of Alice smiling at Bob

(**A**: Alice likes Bob, **B**: Alice smiles at Bob)


In [None]:
# Output of the magical machine is a vector in the format of [P(A), P(B|A), P(B)]
screctAlice = [0.7, 0.5, 0.4]

THRESHOLD = 0.8


However, **the machine warns you that you shall never tell any of the data directly to Bob**. Hence, you decide to calculate the Posterior Probability *P(A|B)* (the probability of Alice liking Bob given that she smiles at him) and directly encourage him to propose if *P(A|B) > 80%*, and discourage him otherwise.

Hint:

$$
P(A|B) = P(A)\frac{P(B|A)}{P(B)}
$$

In [None]:
def calculatePosteriorProbability(token):
  """
    @param [in] token: A 1d array in the form of [P(A), P(B|A), P(B)]

    return the posterios probability
  """

  # Use Bayes Theorem to calculate Posterior Probability
  # Hint: P(A|B) = P(A) * (P(B|A) / P(B))
  Pab = 0  # TODO: implement Bayes Theorem

  return Pab

In [None]:
# Calculate the posterior Probability
Pab = calculatePosteriorProbability(screctAlice)


# Make decision based on the probability that you have calculated
if True:  # TODO: Change the if statement according to the request
  print("Go for it Bob! P(A|B) = {}".format(Pab))
else:
  print("Hold your thought! P(A|B)={}".format(Pab))

Since P(A|B) = 0.875, you encourge Bob to propose. The other day, Bob comes to you, officially telling you that he is no longer single.

### The real deal
After that, you become so famous that tons of people come to you for your advice. Since you are very busy working on IDL assignments, you hire IDL staff to pre-process the data for you. We have prepared all the data on Kaggle and now it's your time to download the data, process them, generate results and submit your predictions back to Kaggle!

In [None]:
import numpy as np
import random

#### Connecting to Kaggle
First, let's install Kaggle and setup the environment. If you are lost, please refer to the recitation recording and the handout

Kaggle competition link:
http://www.kaggle.com/c/11785-spring2021-rc0e-colab

**Mandatory: Please fill out this form:**
https://forms.gle/UqKXaPTYgs6nuTRh8

In [None]:
# TODO: Install Kaggle in the Colab VM

# TODO: Setup the Kaggle keys

# TODO: Set token permissions


Now let's download the dataset from [Kaggle](https://www.kaggle.com/c/11785-spring2021-rc0e-colab/data). You can manually download the dataset from Kaggle website and upload them to your instance, or use the Kaggle API

In [None]:
# Ignore this block if you have already uploaded the dataset to the instance

# TODO: Kaggle API: !kaggle datasets download -d [competition name...]
# Hint: if you see 403 forbidden, please double check whether you have joined the competition 



In [None]:
# TODO: Unzip the dataset (check where the dataset is downloaded)


# TODO: Load the dataset. hint: np.load() is for npy files



Check the dataset and make sure you understand how the data are organized.

We have prepared 100,000 datapoints for you. If you load the dataset correctly, you will get a 2d array in the shape of (100000, 3). Each row vector corresponds to a person's info in the order of [P(A), P(B|A), P(B)]

In [None]:
# Check your dataset


# TODO: Does the array have the right shape?


# TODO: Are all elements in the range of [0, 1]? Why should they be in this range?



#### Do your magic
Now, it's your turn to calculate P(A|B) for each person and generate your suggestions. For simplicity, return 

* 1 if P(A|B) $>$ 80%
* 0 if P(A|B) $\le$ 80%

In [None]:
result = []  # the array that stores your results

# TODO: append your answers to the result array





Now format your results in `csv` format according to the given `sample.csv`

Hint: You can use `pandas`

In [None]:
# Format your results (There are other ways to accomplish this. Consider it as a gift from us~)
import pandas as pd
df = pd.DataFrame({'id':np.arange(len(result)), "label":result})
df.to_csv(r"propose_prediction.csv", index=False)

#### Submit your result to Kaggle
Warning: Since you are so famous, you do not want to ruin your reputation (and others' happiness of course). Hence, you need to double check your result before submitting. In your following assigments, **you can only submit 10 times per day**. So use it wisely.

In [None]:
# Use Kaggle API to submit your result


## Reflection
Congrats! This is it. Now it should be quite easy and handy for you to use Colab to tackle the real assignments. The toy example does not represent the difficulty level of the coming assignments (the example even contains no DL material)

After finishing the exmaple, you should feel confident to
* Mount your Google Drive
* Setup Kaggle
* Implement your design (DL models will be covered later)
* Submit your prediction/results to Kaggle

If you still have concerns, please post them on Piazza. We are always here to help.