
# **Server and workspace setup**

In [0]:
# '''  <--- the first line to be un-commented
# AFTER running the setup 
# step 1 and 2 for the first time, 
# un-comment the first line
# so that the following two variables
# would not change again!
googleDriveAlreadySetUp = False
sshServerAlreadySetUp = False
# '''

##**Step 1. Mount your google drive to */content***


---


1. The *`/content`* directory is the only place where Google Colab allows file-writing. However, it is NOT persistent between each connection! (All change you make here will be erased once disconnected!!)

2. And, Google Colab stays in /content would not `cd` to anyother folders (both inside and outside /content)!

   ```
        ! pwd >>> /content
        ! cd ../ | pwd >>> /content
        ! cd ~ | pwd >>> /content
        ! cd / | pwd >>> /content
        ! cd /content/gdrive | pwd >>> /content
   ```



3. So, all newly downloaded/created files & folders **(through "`!-commands`" on this notebook interface without an abosolute path)** automatically go to *`/content`*!! 


4. However, if you are **accessing the server using ssh** (see Step 2), you can make changes in 2 places: 
        a) /content
        b) your google drive folder (wherever it is mounted to)
        
5. The conclusion: 

    **a) only make changes in /content/gdrive**

    **b) always use absolute path in Colab notebook**
    
6. For shared drives: everything is the same as gdrive/My\ Drive. 
    - expecting some file lock features to prevent simultaneous editing 
    - but didn't experient on that yet (#learning_through_errors)

In [22]:
if not googleDriveAlreadySetUp:
    from google.colab import drive
    drive.mount('/content/gdrive')
    googleDriveAlreadySetUp = True

Mounted at /content/gdrive


##**Step 2. Server setup**


---


Notes: 

1. Once setup, go to your choice of terminal and do ***`ssh root@0.tcp.ngrok.io -p [port#]`*** and enter the generated random password, you're in!

2. **Sometimes you will encounter the following error message:**
    ```
    Traceback (most recent call last):
    File "<string>", line 1, in <module>
    IndexError: list index out of range
    ```
   This may (or may not) be because of the following line:
   ```
   ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
   ```
   
    *#TODO rewrite this line with python, not linux command.*
   
3. **Yet, the connection has been setup. To see the portal number, just run the cell again.**

In [21]:
if not sshServerAlreadySetUp:
    #Generate root password
    import random, string
    global password
    password = ''.join(random.choice(string.ascii_letters + string.digits) for i in range(20))

    #Download ngrok
    ! wget -q -c -nc https://bin.equinox.io/c/4VmDzA7iaHb/ngrok-stable-linux-amd64.zip
    ! unzip -qq -n ngrok-stable-linux-amd64.zip
    #Setup sshd
    ! apt-get install -qq -o=Dpkg::Use-Pty=0 openssh-server pwgen > /dev/null
    #Set root password
    ! echo root:$password | chpasswd
    ! mkdir -p /var/run/sshd
    ! echo "PermitRootLogin yes" >> /etc/ssh/sshd_config
    ! echo "PasswordAuthentication yes" >> /etc/ssh/sshd_config
    ! echo "LD_LIBRARY_PATH=/usr/lib64-nvidia" >> /root/.bashrc
    ! echo "export LD_LIBRARY_PATH" >> /root/.bashrc

    #Run sshd
    get_ipython().system_raw('/usr/sbin/sshd -D &')

    #Ask token
    print("Copy authtoken from https://dashboard.ngrok.com/auth")
    import getpass
    authtoken = getpass.getpass()

    #Create tunnel
    get_ipython().system_raw('./ngrok authtoken $authtoken && ./ngrok tcp 22 &')
    #Print root password
    print("Root password: {}".format(password))
    #Get public address
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"
    
    # finished setup
    print("ssh server already setup, do ssh root@0.tcp.ngrok.io -p [port# see below] ")
    sshServerAlreadySetUp = True # could the bug be here? no more line after a indented command?
    
else:
    print("ssh server already setup, do ssh root@0.tcp.ngrok.io -p [port# see below] ")   
    print("password", password)
    ! curl -s http://localhost:4040/api/tunnels | python3 -c \
        "import sys, json; print(json.load(sys.stdin)['tunnels'][0]['public_url'])"

Copy authtoken from https://dashboard.ngrok.com/auth
··········
Root password: xjUIZ5DhhLLPaKY8dFik
tcp://0.tcp.ngrok.io:17050
ssh server already setup, do ssh root@0.tcp.ngrok.io -p [port# see below] 


## Optional git/github/ssh-keygen/ssh-add setup


- For once, run `ssh-keygen` and save id_ras in /content/gdrive/My\ Drive/.ssh/id_rsa and save the public key in github setting

- Every time connected to server, 
    1. run `eval $(ssh-agent)`
    2. run `ssh-add /content/gdrive/My\ Drive/.ssh/id_rsa`
    3. set up local username & email (to be checked):
```
     git config user.name "your-user-name"
     git config user.email "your-email-addr"
```



In [17]:
! eval $(ssh-agent)
# ! cat /content/gdrive/My\ Drive/.ssh/id_rsa.pub # commented out for security reasons
# Run the following command via ssh server to avoid "Could not open a connection to your authentication agent." error
# ! ssh-add /content/gdrive/My\ Drive/.ssh/id_rsa

Agent pid 1206


**Step 3. Workplace Setup and Raw File Preparation** (an example using VQA task and dataset)


---



Directory path: /content/gdrive/Shared\ drives/VQA



---


Notes on file system:
1. `wget -P []` specifies the directory to download to, if not existing will be created.

2. `unzip <zipped_file> -d [path_to_directory]` similarly

2. Have to use absolute path since we cannot change current directory (run `pwd` will always return `/content`) in G-Colab notebook.


---


Notes on data collection:


irrelavant, omited


---

Notes on folder structure:



```
root@addd4560b196:/content/gdrive/Shared drives/VQA# tree -a -C -L 4 ./
./
├── data
│   ├── datahelper (https://github.com/GT-Vision-Lab/VQA)
│   │   └── ...
│   ├── raw zip
│   |   ├── Annotations_Train_abstract_v002.zip
│   |   └── Questions_Train_abstract_v002.zip
│   └── abstract scene
│   │   └── train                                                               
│   │       ├── abstract_v002_train2015_annotations.json
│   │       ├── MultipleChoice_abstract_v002_train2015_questions.json
│   │       ├── OpenEnded_abstract_v002_train2015_questions.json
│   │       └── images
│   │           ├── abstract_v002_train2015_000000000000.png
│   │           ├── abstract_v002_train2015_000000000001.png
│   │           ├── ...
│   │           ├── abstract_v002_train2015_000000019998.png
│   │           └── abstract_v002_train2015_000000019999.png
└── VQA_baseline_with_notes.ipynb

39 directories, 43 files

```
The above tree was simplified by hand after retreived from the `tree` command.

Note that you have to first *`sudo apt-get install tree`* on G-Colab (ssh).




In [24]:
# Abstract Scene (same as v1.0 release)
print("=============================================================================================")
print("\nCollecting raw training data for abstract scenes...\n")
print("---------------------------------------------------------------------------------------------")

'''Training Annotations'''

# check if Annotations_Train_abstract_v002.zip is downloaded, if not, download it
# ! test -f /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Annotations_Train_abstract_v002.zip \
#     && echo "Annotations_Train_abstract_v002.zip already here, skip download" \
#     || { echo "Annotations_Train_abstract_v002.zip does not exist, start downloading..."; \
#          wget https://s3.amazonaws.com/cvmlp/vqa/abstract_v002/vqa/Annotations_Train_abstract_v002.zip \
#                 -P /content/gdrive/Shared\ drives/VQA/data/raw\ zip;}
# /

# check if Annotations_Train_abstract_v002.zip is unzipped, if not, unzip it
# ! test -f /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/abstract_v002_train2015_annotations.json \
#     && echo "abstract_v002_train2015_annotations.json already here, skip unzip" \
#     || { echo "abstract_v002_train2015_annotations.json does not exist, start unzipping..."; \
#          unzip /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Annotations_Train_abstract_v002.zip \
#                -d /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train;}
# /

''' Training Questions '''
# check if Questions_Train_abstract_v002.zip is downloaded, if not, download it
# ! test -f /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Questions_Train_abstract_v002.zip \
#     && echo "Questions_Train_abstract_v002.zip already here, skip download" \
#     || { echo "Questions_Train_abstract_v002.zip does not exist, start downloading..."; \
#          wget https://s3.amazonaws.com/cvmlp/vqa/abstract_v002/vqa/Questions_Train_abstract_v002.zip \
#                 -P /content/gdrive/Shared\ drives/VQA/data/raw\ zip;}
# /

# check if Questions_Train_abstract_v002.zip is unzipped, if not, unzip it
# ! test -f /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/OpenEnded_abstract_v002_train2015_questions.json \
#     && echo "OpenEnded(MultipleChoice)_abstract_v002_train2015_questions.json already here, skip unzip" \
#     || { echo "OpenEnded(MultipleChoice)_abstract_v002_train2015_questions.json does not exist, start unzipping..."; \
#          unzip /content/gdrive/Shared\ drives/VQA/data/raw\ zip/Questions_Train_abstract_v002.zip \
#                -d /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train;}
# /

''' Training Images '''
# check if scene_img_abstract_v002_train2015.zip is downloaded, if not, download it
# ! test -f /content/gdrive/Shared\ drives/VQA/data/raw\ zip/scene_img_abstract_v002_train2015.zip \
#     && echo "scene_img_abstract_v002_train2015.zip already here, skip download" \
#     || { echo "scene_img_abstract_v002_train2015.zip does not exist, start downloading..."; \
#          wget https://s3.amazonaws.com/cvmlp/vqa/abstract_v002/scene_img/scene_img_abstract_v002_train2015.zip \
#                 -P /content/gdrive/Shared\ drives/VQA/data/raw\ zip;}
# /

# check if Questions_Train_abstract_v002.zip is unzipped, if not, unzip it
# ! test -f /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/images/abstract_v002_train2015_000000019999.png \
#     && echo "abstract scene/train/images already here, skip unzip" \
#     || { echo "abstract scene/train/images does not exist, start unzipping..."; \
#          unzip /content/gdrive/Shared\ drives/VQA/data/raw\ zip/scene_img_abstract_v002_train2015.zip \
#                -d /content/gdrive/Shared\ drives/VQA/data/abstract\ scene/train/images;}
# /
print()

print("raw data for abstract training collected.")

print("---------------------------------------------------------------------------------------------")


Collecting raw training data for abstract scenes...

---------------------------------------------------------------------------------------------
Annotations_Train_abstract_v002.zip already here, skip download
abstract_v002_train2015_annotations.json already here, skip unzip
Questions_Train_abstract_v002.zip already here, skip download
OpenEnded(MultipleChoice)_abstract_v002_train2015_questions.json already here, skip unzip
scene_img_abstract_v002_train2015.zip already here, skip download
abstract scene/train/images already here, skip unzip

raw data for abstract training collected.
---------------------------------------------------------------------------------------------
