#What is Colab?

* Colab is a free [Jupyter Notebook](https://jupyter.org/) 
environment hosted by Google that allows you to develop and run code and analyze data using computing resources in the cloud.  In this class, we'll typically use it for programming assignments in Python.


* Colab Notebooks (like this one) consist of "Cells" that help organize code and text. From within Colab, you can add a Code Cell or a Text Cell by clicking the "+ Code" or "+ Text" buttons on the top left. 


* Google provides lots more details and tips about Colab in their [tutorials](https://colab.research.google.com/notebooks/welcome.ipynb).  If you are new to Colab, you might find these tutorials helpful to get up to speed.


* To execute the code in a cell, use the key-command `shift+enter`. Try that out on the code cell below

In [13]:
5 + 5

10

# Creating a Colab Notebook

* You need to be signed into a Google account in order to create, edit, and save Colab Notebooks.  You can create a new Notebook and find your saved Notebooks by going to https://colab.research.google.com/ while logged into your Google account. 


* We recommend that you use your Berkeley Gmail account rather than a personal Gmail for this class in order to avoid confusion.  When viewing a notebook, you can make sure that you're using the right Gmail account by clicking on your profile icon at the top right, and switch accounts if necessary.


* Saved notebooks will then be stored in your account on Google Drive.  

# Viewing a Colab hosted on Github

* In addition to Google Drive, Colab Notebooks can also be opened in the browser if they are stored on Github. For this class, assignments or examples will be posted on the course Github. Jupyter Notebooks are files with the extension ".ipynb". 


* If a Jupyter Notebook is posted on Github, you can open it in Colab by copying the web address address of the '.ipynb' file and pointing your browser to `https://colab.research.google.com/github/{github-path-to-notebook}`


* For example, this notebook can be opened in Colab via `https://colab.research.google.com/github/dbamman/nlp20/blob/master/setup/Colab_Intro.ipynb`


* Once you've opened a Notebook in Colab, you can run the code and make edits. To save your changes, click on File->Save in the menu.  ** If you are editing a copy of a Colab that you opened from Github, Your edits will only be saved once you save a copy in your Google Drive.** You'll be prompted to do this if you try to save and haven't already done this step.


* For assignments, Colab notebooks will typically be posted on the course Github page and will have instructions, starter code, and space for you to add your own code. In order to get started on an assigment, you'll want to open the starter notebook from Github in Colab and then save a copy into your Google Drive so that you can make changes to complete the assignment.

# What is a Runtime?

* Each time you open a Notebook in Colab, you are actually connecting to a computer (a server running the Linux operating system) in the Cloud, hosted by Google. When you execute your code, it's actually running on that computer and sending back any results to display in your browser.  In Colab, this connection between a Google server and your browser is called a 'Runtime'.


* Because Google provides these servers free to Colab users, running code on Colab lets you take advantage of powerful hardware to run computations faster than you could on a laptop, for example.


* Because you need to connect to a "Runtime" in order to use Colab, each time you connect, you'll need to re-run setup code each time you re-connect (such as installing Python libraries or downloading data). This can take a few minutes, but it's a necessary trade-off in order to be able to use these free computational resources.

# Installing Software Libraries

* We'll use Python 3 for programming assignments in this class. Colab Runtimes come with Python installed by default, along with some common libraries. For example, the `numpy` library is already installed, so you can import it to use in your code without any additional setup.

In [0]:
import numpy as np

* If you need to install libraries that aren't installed by default, you can often install via [pip](https://pypi.org/project/pip/). If you have some familiarity with using the command line, you can access typical command-line stuff using the "!" operator in a Code Cell.

In [3]:
!pip install numpy



* But don't worry if you're not familiar with the command line or installing python libraries. All of these sorts of commands for installation and setup can be stored inside a Notebook, so for assignments, the course staff will try to take care of this whenever possible.

# Downloading Data

* To work with data in Colab, we need to download the data to the computer you're connecting to in your Runtime. Data can be downloaded using the `!wget` command or by connecting to files on Google Drive.


* When possible, data for course assignments will be uploaded to the course Github and then can be downloaded from within Colab. The easiest way to do this is to download the zip file of the course Github, which will contain any data files inside it.


* Again, for assignments, the course staff will try to include commands for downloading data, so don't worry if you're not familiar yet with how this works.

In [8]:
!wget https://github.com/dbamman/nlp20/archive/master.zip
!unzip master.zip
!rm master.zip

--2020-01-20 21:49:27--  https://github.com/dbamman/nlp20/archive/master.zip
Resolving github.com (github.com)... 140.82.118.3
Connecting to github.com (github.com)|140.82.118.3|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://codeload.github.com/dbamman/nlp20/zip/master [following]
--2020-01-20 21:49:27--  https://codeload.github.com/dbamman/nlp20/zip/master
Resolving codeload.github.com (codeload.github.com)... 140.82.114.10
Connecting to codeload.github.com (codeload.github.com)|140.82.114.10|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/zip]
Saving to: ‘master.zip’

master.zip              [ <=>                ]   4.65K  --.-KB/s    in 0.002s  

2020-01-20 21:49:28 (2.66 MB/s) - ‘master.zip’ saved [4763]

Archive:  master.zip
15b18d3fff31a75ca4dd600d5cf74da4920eb689
replace nlp20-master/README.md? [y]es, [n]o, [A]ll, [N]one, [r]ename: N


* Now we've downloaded the course Github repository into a folder called 'nlp20-master'. We can look inside it using the `ls` command. (if this command also shows you a folder called 'sample_data', that's just a default Colab thing that's always there)

In [9]:
!ls

nlp20-master  sample_data


In [10]:
!ls nlp20-master

README.md  setup


* Let's open a file from the course Github using Python and print out the contents.

In [11]:
open('nlp20-master/setup/example_data.txt').read().split('\n')

['a, 1',
 'b, 2',
 'c, 3',
 'd, 4',
 'e, 5',
 'f, 6',
 'g, 7',
 'h, 8',
 'i, 9',
 'j, 10',
 '']