<a href="https://colab.research.google.com/github/hogo56/BertQA/blob/master/BERT_for_Humans_Baseline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# BERTjoint Question Answering Contest

This code is designed to run on Google Colab. Because we also want to submit the kernel to the Kaggle QA competition it needs to be able to run in either location. This is handled by having python vars when the FLAGS are set:

* Kaggle &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; *(cwd = /kaggle/working/)*<br>
  \$indir = /kaggle/input<br>
  \$outdir = /kaggle/working<br>
* Colab &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; *(cwd = /content/)*<br>
  \$indir = /content/input<br>
  \$outdir = /content/output<br>

#### - Required Libraries

#### - Required Data

#### - Inputs
   * \$indir/tensorflow2-question-answering/simplified-nq-test.jsonl
   * \$indir/tensorflow2-question-answering/simplified-nq-test.jsonl

#### - Outputs
   * \$outdir/predictions.json
   * \$outdir/submission.csv<br>
   * \$outdir/eval.tf_record<br>
   * \$outduir/.ipynb_checkpoints/<br>


#### - Credit
The code for this notebook is taken from the [translated version](https://www.kaggle.com/dimitreoliveira/using-tf-2-0-w-bert-on-nq-translated-to-tf2-0) posted by [Dimitre Oliviera](https://www.kaggle.com/dimitreoliveira)

Dimitre updated the baseline [philculliton script](https://www.kaggle.com/philculliton/using-tensorflow-2-0-w-bert-on-nq) to the Tensorflow 2.0 version, this way we can take part in the TF2 prizes and may use the version to improve the work.

The original source for this may have originated from: [Google BERTjoint](https://github.com/google-research/language/tree/master/language/question_answering/bert_joint)

In [0]:
## Set file locations    (these variables are not implemented in the FLAGS code yet)
# if colab:
$basedir = '/content'
$indir = '/content/data'
$outdir = '/content/output'
# elsif kaggle:
#     $basedir =
#     $indir =
#     $outdir =
# else:
#     print("Cannot continue without determining file locations")
#     assert false

# ============= Machine Spinup =============

In [0]:
! zdump PST
! pwd
import os

def list_files(startpath):
    for root, dirs, files in os.walk(startpath):
        level = root.replace(startpath, '').count(os.sep)
        indent = ' ' * 4 * (level)
        print('{}{}/'.format(indent, os.path.basename(root)))
        subindent = ' ' * 4 * (level + 1)
        for f in files:
            print('{}{}'.format(subindent, f))
list_files('/content')

In [0]:
## Reset kernel without removing downloaded data files and libs
#%reset
#! rm -i /content/output/*

## -- Main System Config --
<Details><Summary>Global Config</Summary>
Put any global system configuration here

In [0]:
%%bash
zdump PST               # Not sure what is up with the time, PST is running about 8 hrs ahead
mkdir -p /content/lib
mkdir -p /content/data
mkdir -p /content/output       # Maybe symlink to Google Drive for permenance
rm -rf /content/sample_data

In [0]:
import os, sys
sys.path.append('/content/lib')

### Runtime Parameters
<Details><Summary>Global Variables</Summary>
EnableAllCode - There are code blocks here that should not be run with "Run All". By default EnableAllCode will set to False and those blocks will be excluded. If you want to run them individually for some reason set EnableAllCode True.<p>
DownloadBigFiles - There are GBs of files and downloads to make this run. If you are just wanting to spin up the Colab so you can SSH into it set DownloadBigFiles = False then Runtime -> RunAfter</Details>

In [0]:
EnableAllCode = False                # Prevent codeblocks that should not execute on Run All
DownloadBigFiles = True

## -- Setup --

###Google Drive
<Details>There are several ways to provide access to your Google Drive from Colab. (What about the Drive FUSE wrapper?)<br>
I am not sure if this is the best. This mounts your Drive into the machine.<br>
I expect there will be a folder in the Drive that we all share.</Details>

In [0]:
## File link to Google Drive
from google.colab import drive
drive.mount('/content/gdrive', force_remount=False)   # true to reread drive
# Create a shorter shared directory name than one with a space
! ln -s '/content/gdrive/My Drive/bertqa' /content/bertqa

In [0]:
if EnableAllCode:
    ## Flush and unmount Google Drive
    # You probablyu won't do this but if you want to at some point click the play button
    drive.flush_and_unmount()

### Kaggle API
<Details>You will need Kaggle API token to link the Colab instance to your Kaggle account to get data, etc.<br>
Go to: https://www.kaggle.com/yourID/account and click on the "Create New API Token: button to get a file named kaggle.json.<p>You can put your kaggle.json file in your google drive at My Drive/colab/kaggle.json.<br>
Alternately, you can store it on your local machine and the script will ask you to upload it.</Details>

In [0]:
## Link to Kaggle
from google.colab import files

# see if there is a kaggle.json file in gdrive
try:
    # see if auth file is in gdrive
    f = open("/content/gdrive/My Drive/colab/kaggle.json")
    os.environ['KAGGLE_CONFIG_DIR'] = "/content/gdrive/My Drive/colab/"
    ! ls -l "/content/gdrive/My Drive/colab/kaggle.json"
except IOError:
    # Have user upload file
    ! rm /content/kaggle.json  2> /dev/null
    print('Upload kaggle.json.')
    # The files.upload() command is failing sporatically with:
    #   TypeError: Cannot read property '_uploadFiles' of undefined (just run again)
    files.upload()
    os.environ['KAGGLE_CONFIG_DIR'] = "/content/"
    ! ls -l /content/kaggle.json

import kaggle

# =========== Project Specific Stuff ===========

## -- Project Setup --

### Download Dataset and Support Files

Kaggle Competition Files

In [0]:
## Competition Dataset  (5GB zipped)
if DownloadBigFiles:
    if not os.path.exists("/content/data/compdata.flag"):
        print("Downloading Competition Data\n")        # ! kaggle competitions list
        # ! kaggle competitions download -c tensorflow2-question-answering -p /content/data
        # ! mv /content/data/sample_submission.csv /content/output/
        # ! unzip /content/data/simplified-nq-test.jsonl.zip -d /content/data/
        # ! rm /content/data/simplified-nq-test.jsonl.zip
        # ! unzip /content/data/simplified-nq-train.jsonl.zip -d /content/data/
        # ! rm /content/data/simplified-nq-train.jsonl.zip
        ! touch /content/data/compdata.flag
    else:
        print("Competition Data already exists. Not downloading.\n")
        !ls -l /content/data

Bert-Joint files from: 
https://github.com/google-research/language/tree/master/language/question_answering/bert_joint


In [0]:
if DownloadBigFiles:
    if not os.path.exists("/content/data/bertdata.flag"):
        print("Downloading BERT-joint Data\n")        # ! kaggle competitions list
        ! gsutil cp -R gs://bert-nq/bert-joint-baseline /content/data
        ! touch "/content/data/bertdata.flag"
    else:
        print("BERT-joint Data already exists. Not downloading.\n")
        !ls -l /content/data/bert-joint-baseline/

Bert files from: https://github.com/google-research/bert<br>
(Not the model we are using at the moment)

In [0]:
## get BERT (this is unlikely to be the BERT-joint files needed for competition)
# this version of BERT seems won't import as is. On line 88 of lib/bert/optimization.py
#    change   tr.train.Optimizer to tf.keras.optimizers.Optimizer
if DownloadBigFiles and False:
    ! git clone https://github.com/google-research/bert.git
    ! mv bert lib

    # get some pretrained models  (I really  have no idea what these are or if useful)
    ! wget https://storage.googleapis.com/bert_models/2018_10_18/cased_L-12_H-768_A-12.zip
    ! unzip cased_L-12_H-768_A-12.zip
    ! rm cased_L-12_H-768_A-12.zip

<Details><Summary>BERT tf.compat.v1 Notes</Summary>
baseline_w_bert_translated_to_tf2_0 (next code block) comes from /dimitreoliveira with this warning:<br>
This baseline uses code that was migrated from TF1.x. Be aware that it contains use of tf.compat.v1, which is not permitted to be eligible for TF2.0 prizes in this competition. It is intended to be used as a starting point, but we're excited to see how much better you can do using TF2.0!<br>
https://www.kaggle.com/dimitreoliveira/using-tf-2-0-w-bert-on-nq-translated-to-tf2-0</Details>

### Library Setup

In [0]:
## Copy lib files over from Google Drive
! cp -a /content/bertqa/lib/* lib/

In [0]:
## Load Libraries
#magic to make colab path to Tensorflow V2
%tensorflow_version 2.x 
import tensorflow as tf
print("TensofFlow", tf.__version__)

import numpy as np
import pandas as pd

import bert_modeling as modeling                    # from philculliton
import bert_optimization as optimization            # from philculliton
import bert_tokenization as tokenization            # from philculliton

# import tf2_0_baseline_w_bert as tf2baseline # old script from philculliton
import tf2_0_baseline_w_bert_translated_to_tf2_0 as tf2baseline # from dimitreoliveira

import json
import absl
from zipfile import ZipFile

In [0]:
%%bash
zdump PST

In [0]:
# assert False                ### Stop Execution

## -- Code Implementation in Tensorflow 2.0 --

**A few notes:**
- If you want to keep using **flags** and **logging** you will have to use the **absl** lib (this is recommended by the TF team).
- Since we won't use it with the kernels, he removed most of the **TPU** related stuff to reduce complexity.
- Tensorflow 2 don't let us use global variables **(tf.compat.v1.trainable_variables())**.

In this notebook, we'll be using the Bert baseline for Tensorflow to create predictions for the Natural Questions test set. Note that this uses a model that has already been pre-trained - we're only doing inference here. A GPU is required, and this should take between 1-2 hours to run.

The original script can be found [here](https://github.com/google-research/language/blob/master/language/question_answering/bert_joint/run_nq.py).<br>
The supporting modules were drawn from the [official Tensorflow model repository](https://github.com/tensorflow/models/tree/master/official).<br>
The bert-joint-baseline data is described [here](https://github.com/google-research/language/tree/master/language/question_answering/bert_joint).

**Note:** This baseline uses code that was migrated from TF1.x. Be aware that it contains use of tf.compat.v1, which is not permitted to be eligible for [TF2.0 prizes in this competition](https://www.kaggle.com/c/tensorflow2-question-answering/overview/prizes).

In [0]:
%%bash
zdump PST
ls -l /content/data/
ls -l /content/data/bert-joint-baseline/
ls -l /content/output/

### Tensorflow Flags

Tensorflow flags are variables that can be passed around within the TF system. Every flag below has some context provided regarding what the flag is and how it's used.<p>
Most of these can be changed as desired, with the exception of the Special Flags at the bottom, which must stay as-is to work with the Kaggle back end.

In [0]:
def del_all_flags(FLAGS):
    flags_dict = FLAGS._flags()
    keys_list = [keys for keys in flags_dict]
    for keys in keys_list:
        FLAGS.__delattr__(keys)

del_all_flags(absl.flags.FLAGS)

flags = absl.flags

flags.DEFINE_string(
    "bert_config_file", "/content/data/bert-joint-baseline/bert_config.json",
    "The config json file corresponding to the pre-trained BERT model. "
    "This specifies the model architecture.")

flags.DEFINE_string("vocab_file", "/content/data/bert-joint-baseline/vocab-nq.txt",
                    "The vocabulary file that the BERT model was trained on.")

flags.DEFINE_string(
    "output_dir", "/content/output",
    "The output directory where the model checkpoints will be written.")

flags.DEFINE_string("train_precomputed_file", None,
                    "Precomputed tf records for training.")

flags.DEFINE_integer("train_num_precomputed", None,
                     "Number of precomputed tf records for training.")

flags.DEFINE_string(
    "output_prediction_file", "predictions.json",
    "Where to print predictions in NQ prediction format, to be passed to"
    "natural_questions.nq_eval.")

flags.DEFINE_string(
    "init_checkpoint", "/content/data/bert-joint-baseline/bert_joint.ckpt",
    "Initial checkpoint (usually from a pre-trained BERT model).")

flags.DEFINE_bool(
    "do_lower_case", True,
    "Whether to lower case the input text. Should be True for uncased "
    "models and False for cased models.")

flags.DEFINE_integer(
    "max_seq_length", 384,
    "The maximum total input sequence length after WordPiece tokenization. "
    "Sequences longer than this will be truncated, and sequences shorter "
    "than this will be padded.")

flags.DEFINE_integer(
    "doc_stride", 128,
    "When splitting up a long document into chunks, how much stride to "
    "take between chunks.")

flags.DEFINE_integer(
    "max_query_length", 64,
    "The maximum number of tokens for the question. Questions longer than "
    "this will be truncated to this length.")

flags.DEFINE_bool("do_train", False, "Whether to run training.")

flags.DEFINE_bool("do_predict", True, "Whether to run eval on the dev set.")

flags.DEFINE_integer("train_batch_size", 32, "Total batch size for training.")

flags.DEFINE_integer("predict_batch_size", 8,
                     "Total batch size for predictions.")

flags.DEFINE_float("learning_rate", 5e-5, "The initial learning rate for Adam.")

flags.DEFINE_float("num_train_epochs", 3.0,
                   "Total number of training epochs to perform.")

flags.DEFINE_float(
    "warmup_proportion", 0.1,
    "Proportion of training to perform linear learning rate warmup for. "
    "E.g., 0.1 = 10% of training.")

flags.DEFINE_integer("save_checkpoints_steps", 1000,
                     "How often to save the model checkpoint.")

flags.DEFINE_integer("iterations_per_loop", 1000,
                     "How many steps to make in each estimator call.")

flags.DEFINE_integer(
    "n_best_size", 20,
    "The total number of n-best predictions to generate in the "
    "nbest_predictions.json output file.")

flags.DEFINE_integer(
    "verbosity", 1, "How verbose our error messages should be")

flags.DEFINE_integer(
    "max_answer_length", 30,
    "The maximum length of an answer that can be generated. This is needed "
    "because the start and end predictions are not conditioned on one another.")

flags.DEFINE_float(
    "include_unknowns", -1.0,
    "If positive, probability of including answers of type `UNKNOWN`.")

flags.DEFINE_bool("use_tpu", False, "Whether to use TPU or GPU/CPU.")
flags.DEFINE_bool("use_one_hot_embeddings", False, "Whether to use use_one_hot_embeddings")

absl.flags.DEFINE_string(
    "gcp_project", None,
    "[Optional] Project name for the Cloud TPU-enabled project. If not "
    "specified, we will attempt to automatically detect the GCE project from "
    "metadata.")

flags.DEFINE_bool(
    "verbose_logging", False,
    "If true, all of the warnings related to data processing will be printed. "
    "A number of warnings are expected for a normal NQ evaluation.")

flags.DEFINE_boolean(
    "skip_nested_contexts", True,
    "Completely ignore context that are not top level nodes in the page.")

flags.DEFINE_integer("task_id", 0,
                     "Train and dev shard to read from and write to.")

flags.DEFINE_integer("max_contexts", 48,
                     "Maximum number of contexts to output for an example.")

flags.DEFINE_integer(
    "max_position", 50,
    "Maximum context position for which to generate special tokens.")


## Special flags - do not change

flags.DEFINE_string(
    "predict_file", "/content/data/simplified-nq-test.jsonl",
    "NQ json for predictions. E.g., dev-v1.1.jsonl.gz or test-v1.1.jsonl.gz")
flags.DEFINE_boolean("logtostderr", True, "Logs to stderr")
flags.DEFINE_boolean("undefok", True, "it's okay to be undefined")
flags.DEFINE_string('f', '', 'kernel')
flags.DEFINE_string('HistoryManager.hist_file', '', 'kernel')

FLAGS = flags.FLAGS
FLAGS(sys.argv) # Parse the flags

**Here, we:**
1. Set up Bert
2. Read in the test set
3. Run it past the pre-built Bert model to create embeddings
4. Use those embeddings to make predictions
5. Write those predictions to `predictions.json`

Feel free to change the code below. Code for the `tf2baseline.*` functions is included in the `tf2_0_baseline_w_bert` utility script, and can be customized, whether by forking the utility script and updating it, or by creating your own non-`tf2baseline` versions in this kernel.

Note: the `tf2_0_baseline_w_bert` utility script contains code for training your own embeddings. Here that code is removed.

In [0]:
bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)

tf2baseline.validate_flags_or_throw(bert_config)
tf.io.gfile.makedirs(FLAGS.output_dir)

tokenizer = tokenization.FullTokenizer(
    vocab_file=FLAGS.vocab_file, do_lower_case=FLAGS.do_lower_case)

run_config = tf.estimator.RunConfig(
    model_dir=FLAGS.output_dir,
    save_checkpoints_steps=FLAGS.save_checkpoints_steps)

num_train_steps = None
num_warmup_steps = None

model_fn = tf2baseline.model_fn_builder(
    bert_config=bert_config,
    init_checkpoint=FLAGS.init_checkpoint,
    learning_rate=FLAGS.learning_rate,
    num_train_steps=num_train_steps,
    num_warmup_steps=num_warmup_steps,
    use_tpu=FLAGS.use_tpu,
    use_one_hot_embeddings=FLAGS.use_one_hot_embeddings)

estimator = tf.estimator.Estimator(
    model_fn=model_fn,
    config=run_config,
    params={'batch_size':FLAGS.train_batch_size})


if FLAGS.do_predict:
  if not FLAGS.output_prediction_file:
    raise ValueError(
        "--output_prediction_file must be defined in predict mode.")
    
  eval_examples = tf2baseline.read_nq_examples(
      input_file=FLAGS.predict_file, is_training=False)

  print("FLAGS.predict_file", FLAGS.predict_file)

  eval_writer = tf2baseline.FeatureWriter(
      filename=os.path.join(FLAGS.output_dir, "eval.tf_record"),
      is_training=False)
  eval_features = []

  def append_feature(feature):
    eval_features.append(feature)
    eval_writer.process_feature(feature)

  num_spans_to_ids = tf2baseline.convert_examples_to_features(
      examples=eval_examples,
      tokenizer=tokenizer,
      is_training=False,
      output_fn=append_feature)
  eval_writer.close()
  eval_filename = eval_writer.filename

  print("***** Running predictions *****")
  print(f"  Num orig examples = %d" % len(eval_examples))
  print(f"  Num split examples = %d" % len(eval_features))
  print(f"  Batch size = %d" % FLAGS.predict_batch_size)
  for spans, ids in num_spans_to_ids.items():
    print(f"  Num split into %d = %d" % (spans, len(ids)))

  predict_input_fn = tf2baseline.input_fn_builder(
      input_file=eval_filename,
      seq_length=FLAGS.max_seq_length,
      is_training=False,
      drop_remainder=False)

  all_results = []

  for result in estimator.predict(
      predict_input_fn, yield_single_examples=True):
    if len(all_results) % 1000 == 0:
      print("Processing example: %d" % (len(all_results)))

    unique_id = int(result["unique_ids"])
    start_logits = [float(x) for x in result["start_logits"].flat]
    end_logits = [float(x) for x in result["end_logits"].flat]
    answer_type_logits = [float(x) for x in result["answer_type_logits"].flat]

    all_results.append(
        tf2baseline.RawResult(
            unique_id=unique_id,
            start_logits=start_logits,
            end_logits=end_logits,
            answer_type_logits=answer_type_logits))

  print ("Going to candidates file")

  candidates_dict = tf2baseline.read_candidates(FLAGS.predict_file)

  print ("setting up eval features")

  raw_dataset = tf.data.TFRecordDataset(eval_filename)
  eval_features = []
  for raw_record in raw_dataset:
    eval_features.append(tf.train.Example.FromString(raw_record.numpy()))
    
  print ("compute_pred_dict")

  nq_pred_dict = tf2baseline.compute_pred_dict(candidates_dict, eval_features,
                                   [r._asdict() for r in all_results])
  predictions_json = {"predictions": list(nq_pred_dict.values())}

  print ("writing json")

  with tf.io.gfile.GFile(FLAGS.output_prediction_file, "w") as f:
    json.dump(predictions_json, f, indent=4)

In [0]:
%%bash
zdump PST
ls -l /content/output/

**Now, we turn `predictions.json` into a `submission.csv` file.**

Note: In most recent run predictions.json was not created from above code.

In [0]:
test_answers_df = pd.read_json("/content/output/predictions.json")

The Bert model produces a `confidence` score, which the Kaggle metric does not use. You, however, can use that score to determine which answers get submitted. See the limits commented out in `create_short_answer` and `create_long_answer` below for an example.

Values for `confidence` will range between `1.0` and `2.0`.

In [0]:
def create_short_answer(entry):
    # if entry["short_answers_score"] < 1.5:
    #     return ""
    
    answer = []    
    for short_answer in entry["short_answers"]:
        if short_answer["start_token"] > -1:
            answer.append(str(short_answer["start_token"]) + ":" + str(short_answer["end_token"]))
    if entry["yes_no_answer"] != "NONE":
        answer.append(entry["yes_no_answer"])
    return " ".join(answer)

def create_long_answer(entry):
   # if entry["long_answer_score"] < 1.5:
   # return ""

    answer = []
    if entry["long_answer"]["start_token"] > -1:
        answer.append(str(entry["long_answer"]["start_token"]) + ":" + str(entry["long_answer"]["end_token"]))
    return " ".join(answer)

In [0]:
test_answers_df["long_answer_score"] = test_answers_df["predictions"].apply(lambda q: q["long_answer_score"])
test_answers_df["short_answer_score"] = test_answers_df["predictions"].apply(lambda q: q["short_answers_score"])

In [0]:
test_answers_df["long_answer_score"].describe()

An example of what each sample's answers look like in `prediction.json`:

In [0]:
test_answers_df.predictions.values[0]

We re-format the JSON answers to match the requirements for submission.

In [0]:
test_answers_df["long_answer"] = test_answers_df["predictions"].apply(create_long_answer)
test_answers_df["short_answer"] = test_answers_df["predictions"].apply(create_short_answer)
test_answers_df["example_id"] = test_answers_df["predictions"].apply(lambda q: str(q["example_id"]))

long_answers = dict(zip(test_answers_df["example_id"], test_answers_df["long_answer"]))
short_answers = dict(zip(test_answers_df["example_id"], test_answers_df["short_answer"]))

Then we add them to our sample submission. Recall that each sample has both a `_long` and `_short` entry in the sample submission, one for each type of answer.

In [0]:
sample_submission = pd.read_csv("/content/data/tensorflow2-question-answering/sample_submission.csv")

long_prediction_strings = sample_submission[sample_submission["example_id"].str.contains("_long")].apply(lambda q: long_answers[q["example_id"].replace("_long", "")], axis=1)
short_prediction_strings = sample_submission[sample_submission["example_id"].str.contains("_short")].apply(lambda q: short_answers[q["example_id"].replace("_short", "")], axis=1)

sample_submission.loc[sample_submission["example_id"].str.contains("_long"), "PredictionString"] = long_prediction_strings
sample_submission.loc[sample_submission["example_id"].str.contains("_short"), "PredictionString"] = short_prediction_strings

And finally, we write out our submission!

In [0]:
sample_submission.to_csv("submission.csv", index=False)
sample_submission.head()

In [0]:
%%bash
zdump PST
ls -l /content/output/

## -- Submitting Results --

In [0]:
assert True                     ## Protect from being executed

In [0]:
%%bash
## View Previous Results
#kaggle competitions list
kaggle competitions submissions -c tensorflow2-question-answering

In [0]:
## Make Submission
# I am not sure if we can submit this competition from this as it has to be a kernel submission
#! kaggle competitions submit -c tensorflow2-question-answering -f $RESULT_CSV  -m 'test kaggle cli 3'

Verify submission by viewing previous results

End of Project Notebook
# ====== Please fold this stuff up and ignore =====

### SSH Setup
This is only neeeded if you want to log into the Colab machine. Otherwise fold it up and ignore.<br>
To use it you have to create a login at https://ngrok.com
<Details>Thanks to Imad El Hanafi (https://imadelhanafi.com) for showing me how to do this.<p>
You will need to create a free account at https://ngrok.com/ for the SSH tunnel to work.</Details>

In [0]:
assert False        # Make sure user does not accedentially drop into this code

In [0]:
%%bash
## Install sshd; Set to allow login and config
apt-get install -o=Dpkg::Use-Pty=0 openssh-server pwgen > /dev/null
mkdir -p /var/run/sshd
echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config
# set host key to known value (need to test if exist)
if [ -f "/content/bertqa/colab/ssh_host_rsa_key.pub" ]; then
    cp "/content/bertqa/colab/ssh_host_rsa_key.pub" /etc/ssh/
    echo "Using ssh_host_rsa_key from gdrive"
fi
# this script will give fix the login shell so Python will work
if [ -f "/content/bertqa/colab/init_shell.sh" ]; then
    echo "source /content/bertqa/colab/init_shell.sh" >> /root/.bashrc
fi

In [0]:
## setup ssh user / pass and start sshd

#Generate a random root password
import random, string
sshpass = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(30))

#Set root password
! echo root:$sshpass | chpasswd

#Run sshd
get_ipython().system_raw('/usr/sbin/sshd -D &')

In [0]:
%%bash
## Get Ngrok from gdrive or try to download (see: https://ngrok.com/download)
if [ -f "/content/bertqa/colab/ngrok-stable-linux-amd64.zip" ]; then
    cp "/content/bertqa/colab/ngrok-stable-linux-amd64.zip" .
    echo "Using ngrok-stable-linux-amd64.zip from gdrive"
else
    wget -q -c -nc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
fi
unzip -qq -n ngrok-stable-linux-amd64.zip
rm ngrok-stable-linux-amd64.zip

In [0]:
## Get user to enter auth token from ngrok and start tunnel

# Get token from ngrok for the tunnel
print("Get your authtoken from https://dashboard.ngrok.com/auth")
import getpass
authtoken = getpass.getpass()

#Create tunnel
get_ipython().system_raw('./ngrok authtoken $authtoken && ./ngrok tcp 22 &')

#### ==============================<br>|====&nbsp;&nbsp;  SSH Login Credentials &nbsp;&nbsp;====||<br>==============================

In [0]:
#@title
print("username: root")
print("password: ", sshpass)

Get the host name and port number at: https://dashboard.ngrok.com/status

```bash
ssh root@0.tcp.ngrok.io -p [ngrok_port]
Login as: root
Servrer refused our key
root@0.tcp.ngrok.io's password: [see above]

(Colab):/content$
```


Install vim

In [0]:
! apt-get install vim > /dev/null

If you need to kill Ngrok run this cell

In [0]:
if EnableAllCode and False:
    !kill $(ps aux | grep './ngrok' | awk '{print $2}')

## -- Misc Notes --

### Prevent Disconnects
Colab periodically disconnects the browser.<br>
You have to save model checkpoints to Google Drive so you don't lose work<br>
See: https://mc.ai/google-colab-drive-as-persistent-storage-for-long-training-runs/<br>
Something to try...<br>
Ctrl+Shift+i in browser and in console run this code...
```
function KeepAlive(){
    console.log("Maintaining Connection");
    document.querySelector("colab-toolbar-button#connect").click()
}
setInterval(KeepAlive,60000);
```
There have been reports of people having their GPU privileges suspended for letting processes run for over 12 hours. It seems that they may penalize you rather than just cutting you off.