# An Introduction to Google Colaboratory

> This notebook contains materials originally developed for a virtual session at the NC State University DELTA [Summer Shorts in Instructional Technologies 2021](https://sites.google.com/ncsu.edu/summer-shorts-2021) delivered on August 6, 2021. Summer Shorts in Instructional Technologies focuses on leveraging innovative NC State learning technologies and strategies to improve student success.

This is an interactive, Python computational notebook that provides a hands-on demonstration of using Google Colaboratory, or “Colab” for short. Colab is a computational notebook environment that allows users to run Python code in the browser without having to download or install anything.

## About the Workshop

### Agenda

1. What is Google Colab?
2. How to use a Google Colab Notebook
 - Text and code cells
 - User interface
 - Useful features in Colab
3. Accessing data in a Colab Notebook
4. Sharing Colab Notebooks
5. Opening Colab Notebooks

### Learning Objectives

After this workshop, you will be able to create and share Colab notebooks and understand how Colab can support instructional and research initiatives


## What is a computational notebook and what does Google Colab offer?

Computational notebooks are a type of literate programming document, where you write and evaluate code, view your data and plots, and write explanations and full text all in one document. While there are many forms of computational notebooks, one of the most common today is the Jupyter notebook, which we see particularly often in data science. Jupyter started in the Python ecosystem, but can now support interactive programming in many different languages. 

Computational notebooks are a great learning tool, since they allow you to see immediate output and easily document the code you are writing. They are often used to prototype code, where you might need to repeatedly adjust and re-run chunks of code as part of an analysis. You can then export the code into a standard script file, such as a `.py` file. 

Google Colaboratory is a free-to-use, hosted version of Jupyter. It allows you to create notebooks and run them using Google's servers and processing power, and even offers free access to GPUs. Notebooks are stored in your Google Drive, can be shared with others, and exported to the standard open-source `ipynb` format used by Jupyter or to normal Python files. There are limitations to the memory allocated to each notebook, and to how long notebooks can run, but Google provides a remarkable amount of resources for no financial cost. 

## A brief overview of a Python notebook and the Colab interface

### Cells

The main structure of a notebook consists of a list of cells. There are two types of cells in Colab:

1. **Text cells** for including formatted, descriptive text
1. **Code cells** for writting and executing Python code

#### Text Cells

Text cells use markdown syntax to generate formatted text that can be used to add context and organization to your notebook. **Double-click** on this text cell to enable edit mode.

Exit a text cell by selecting another cell or clicking on the "Close markdown editor" button in the top right of the cell.
<img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_close_edit.PNG" alt="A pencil icon with a line through it, the close markdown editor button" style="display:inline; width: 4%"/>

**This example markdown:**

```markdown
# Section 1
## Sub-section 1.1
### Sub-section 1.1.1
# Section 2
## Sub-section 2.1

A list demonstrating:
- **Bold text**
- *Italicized text*
- [Links](https://www.lib.ncsu.edu/services/data-visualization)
- `code`
    1. Ordered lists
    2. More ordered lists
```
**Produces this formatted text:**

# Section 1
## Sub-section 1.1
### Sub-section 1.1.1
# Section 2
## Sub-section 2.1

A list demonstrating:
- **Bold text**
- *Italicized text*
- [Links](https://www.lib.ncsu.edu/services/data-visualization)
- `code`
    1. Ordered lists
    2. More ordered lists

See the [Markdown Guide](https://www.markdownguide.org/cheat-sheet/) to learn more about the Markdown syntax.

#### Code Cells

Code cells contain lines of code. Output appears just below the cell. To run a code cell, click inside of the cell and do one of the following:

* Click the **Play icon** in the left gutter of the cell
* Type **Cmd/Ctrl+Enter** to run the cell in place
* Type **Shift+Enter** to run the cell and move focus to the next cell (adding one if none exists)
* Type **Alt+Enter** to run the cell and insert a new code cell immediately below it.

There are additional options for running some or all cells in the **Runtime** menu.

In [2]:
# Create a list of words
animals = ["chicken", "lizard", "tiger", "cheetah", "penguin", "crab"]

# For each word in the list
for word in animals:
    # If the word starts with the letter c
    if word.startswith("c"):
        # Print out the word
        print(word)

chicken
cheetah
crab


### Colab User Interface

Colab provides an interface for easily navigating and editing a notebook as well as helpful utilities to support code generation and access to files.

#### Table of Contents <img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_tableofcontents.PNG" alt="three lines, the table of contents button" style="display:inline; width: 3%"/>

The table of contents, available on the left side of Colab, is populated using at most one section header (i.e., markdown text preceded by one or more `#`) from each text cell. Section headers can be used to organize your notebook and improve nagivation for other users.

#### Code Snippets <img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_snippets.PNG" alt="open and close brackets, the code snippets button" style="display:inline; width: 3%"/>

Code snippets are accessible through the angled brackets (< >) button on the left side of Colab and provide a searchable collection of pre-written code snippets that provide specific functionality such as importing data from Google Sheets, installing additional Python libraries, and saving data to Google Drive.

#### Files <img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_files.PNG" alt="a file folder icon, the file manager button" style="display:inline; width: 3%"/>

You can browse the files and folders available in the Colab envrionment through the file button on the left side of Colab. By default, the file browser starts in a folder titled *content* containing one folder titled *sample data*. The file browser can be helpful after loading or connecting to external data storage such as Google Drive or a GitHub repository.

Through the Files interface it is possible to upload a file from your local machine or connect to your Google Drive. Note that files uploaded into Colab will not persist if the page is closed or reloaded. It is recommended that you use a connection to Google Drive or a connection to an external data source if you plan to share your notebook or use it across multiple development sessions. *Accessing various data sources is covered later in this notebook*

### Features

Colab includes several features that are provided in Jupyter Notebooks, such as access to defined variables and functions across cells and cell output, as well as some additional code exploration and completion features.

#### Sharing code across cells

Using code cells allows you to seperate logical chunks of code and run them independently. This organization supports improved troubleshooting and testing when writing new code and debugging sections of code and is well suited for creating step-by-step learning procedures.

**Once you run a cell, you can use any variables or functions defined in that cell in other cells.**

In [9]:
event_name = "DELTA Summer Shorts"
workshop_topic = "Google Colaboratory"

In [4]:
# A function that generates a welcome message
def generate_welcome_message(event, topic):
    return f"Welcome to our {event} session on {topic}!"

In [10]:
welcome = generate_welcome_message(event_name, workshop_topic)

print(welcome)

Welcome to our DELTA Summer Shorts session on Google Colaboratory!


#### Cell output

By default, in Colab the last line that produces a value in a code cell is printed. To print an expression that is not at the end of a code cell use the `print()` function

In [11]:
print('This string is printed using the print function')

'This line is not printed'

'This string is printed because it is the last line in a code cell'

This string is printed using the print function


'This string is printed because it is the last line in a code cell'

Colab also supports formatted output such as tables and visualizations.

In the following cell the [pandas](https://pandas.pydata.org/) library is used to create a DataFrame of fruits and fruit counts. When this DataFrame is printed it is automatically displayed as a formatted table.

In [12]:
import pandas as pd

# Create a table of fruits and the number of each type
fruit = pd.DataFrame({'fruit': ['strawberries', 'pineapples', 'bananas'], 'count': [20, 2, 6]})
fruit

Unnamed: 0,fruit,count
0,strawberries,20
1,pineapples,2
2,bananas,6


In the following cell the [Altair](https://altair-viz.github.io/) visualization library is used to create a horizontal bar chart of fruit counts from the `fruit` DataFrame. Colab can display visualizations created with numerous Python visualization libraries.

In [13]:
import altair as alt

alt.Chart(fruit).mark_bar().encode(
    x="count",
    y="fruit"
).properties(
    width=200,
    height=150
)

ModuleNotFoundError: No module named 'altair'

#### Code help

Colab provides coding support in the form of easily accessible code documentation and automatic completion.

One way to look up documentation for a specific function is to type the name of the function with a ? after it and run the cell. This will bring up a help window.

In [14]:
#Run this cell to find out what len does
len?

Using the pandas library imported earlier, you can test the autocompletion functionality in the cell below by typing a **Period** (**.**) after `pd`. You can also open the completion list by pressing **Ctrl+Space**.

In [None]:
# Add a period "." after pd to view a list of methods available on the pandas module
pd

A documentation pop-up will also appear for functions after typing the open parenthesis of a function call. You can also open the documentation pop-up by pressing **Ctrl+Shift+Space** when the curser is between the parentheses or hovering over the method name.

In [None]:
# Add an open parenthesis "(" after DataFrame to view a pop-up containing documentation for the DataFrame method
pd.DataFrame

## Accessing data

There are several ways to connect to data sources within a notebook. In addition to the ability to upload files from your local machine and connect to external data sources programmatically, Colab provides simple integration with Gogle Drive and Google Sheets.

### Mount your Google Drive

It is possible to mount the entire contents of your Google Drive in the Colab environment. Mounting a Drive can be accomplished in two lines of code:

```python
from google.colab import drive
drive.mount('/gdrive')
```

You can access and insert these prewritten lines using the following methods:

1. In the "Files" menu in the left sidebar, click the "Mount Drive" button to automatically add a new cell into your notebook with the code to mount your Drive.
1. In the "Code snippets" menu in the left sidebar, search for "mounting" and the first result should provide the code. Copy and paste the code into a cell in your notebook.

*Note that before you mount your Drive you will have to authenticate with Google using an authentication process provided in the cell output*

In [None]:
# Copy and paste the code snippet to mount your Google Drive in this cell and then run the cell


Once your Drive is mounted, you can explore its contents in the "Files" menu in the left sidebar by clicking the "Up one level" folder button (above the "Sample Data" folder) and selecting the "gdrive" folder. You My Drive and any Shared Drives will be accessible. You can now connect to any data file using its file path in your code. For example, to access the file "vegetable_data.csv" with the folder structure "My Drive > test_data > vegetable_data.csv" you would use the following path `/gdrive/MyDrive/test_data/vegetable_data.csv`.

*Note that the path to any files in your MyDrive, even ones shared with other people will be specific to your MyDrive folder arrangement. For multiple collaborators, it may be best to utilize a Shared Google Drive for universal path access.*

In [None]:
# Connect to the data in "My Drive > test_data > test_data.csv" using a pandas method
pd.read_csv("/gdrive/MyDrive/test_data/vegetable_data.csv")

### Connect to Google Sheets

Connecting to Google Sheets can be accomplished programmatically. This code is available in the code snippet "Importing data from Google Sheets". The next cell contains part of the code from this snippet for authenicating and connecting to Google Sheets.

*Note that before you can connect to Google Sheets you will have to authenticate with Google using an authentication process provided in the cell output*

In [None]:
# Authentic with and connect to Google Sheets
from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials

gc = gspread.authorize(GoogleCredentials.get_application_default())

After connecting you now have access to any Google Sheet available to you in Drive, even Sheets shared from other people and Sheets contained in Shared Drives.

In [None]:
# List the names of the spreadsheets you have access to
[f["name"] for f in gc.list_spreadsheet_files()]

You can access the data contained in a Google Sheet by referencing the file name and the sheet within that file from which you want to extract data. 

In [None]:
# Get the first sheet (sheet1) in the Google Sheet named "animals_data"
worksheet = gc.open('animals_data').sheet1

# get_all_values gives a list of rows.
rows = worksheet.get_all_values()
print(rows)

# Convert to a DataFrame and render.
pd.DataFrame.from_records(rows)

### Connect to external data source

Any dataset that is publicly available via a URL can be loaded into Colab. This is a convenient method if your data can be publicly accessible. For example, in our Python workshops we publish our workshop activity data to a public GitHub repository for easy access by all participants.

In [None]:
# A URL pointing to the location of a text file
file_url = "https://raw.githubusercontent.com/ncsu-libraries-data-vis/introduction-to-programming-with-python/main/International_cats_of_mystery_US_chapter.txt"

# Load the data into a pandas DataFrame
pd.read_csv(file_url, delimiter=";")

## Sharing a notebook from Colab

You can share Colab notebooks similarly to other Google apps, like Google Docs or Google Sheets. In the top right corner of the Colabs interface, there is a share button:<img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_share.PNG" alt="Colab share button" style="display:inline; width: 10%; vertical-align: middle;" width="10%"/> This will allow you to share in two different ways.

### Add editors

If you want to add co-editors to your Colab notebook, you can enter the names and/or emails of collaborators. They will be able to edit your original version of the notebook. 


**Note**: Colab notebooks are difficult to edit simultaneously. We would not recommend trying to have multiple people edit notebooks at the same time.

<img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_share_options.PNG" alt="sharing options for Google Colab" style="width: 50%"/>

### Share a link

Similar to other Google apps, you can get a link that can be used by people added, NCSU affiliates (requires login), or anyone with the link. They can be editors, commenters, or viewers. 

<img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_link_sharing.PNG" alt="sharing options for Google Colab" style="width: 50%"/>

People who are "viewers" on a notebook cannot make edits, but they can make their own personal copy of a notebook. That will allow them to edit and interact with the notbook without changing the original version. We use this in instructional settings where we would like to create a template or sample, then have students make a copy to practice in.

In a view-only notebook, there is a "Copy to Drive" button that makes a copy of the notebook saved in your personal Google Drive. <img src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab_copy_to_drive.PNG" alt="sharing options for Google Colab" style="display:inline; width: 15%; vertical-align: middle;"/>

## Finding and opening notebooks in Colab

You can find and open Jupyter or Colab notebooks in Colab in several different ways, including through direct links as already mentioned. 

If you're browing Github or other open, git repository hosts, you might come across repositories with an "Open in Colab" button, like below.

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/notebooks/intro.ipynb)

These buttons open a specific notebook inside Colab, and allow you to create a copy for your own account if you want. In the Libraries, we use these buttons in our workshop repositories to make it easier for attendees to access our materials. 

Colab provides an interface in the application that provides several other options for opening notebooks in Colab. If you go to "File" -> "Open notebook" in Colab or go straight to [colab.research.google.com](http://colab.research.google.com), you'll reach the home screen of the app. In a pop-up, you'll have the option to view example notebooks, view your own recent notebooks, open notebooks from Google Drive or Github, or upload Jupyter notebooks from your own computer. Opening from Github is noteable, since it lets you identify a Github user, browse repositories, and open specific notebooks all within Colab. 

<img alt="Colab Open Notebook Interface" src="https://raw.githubusercontent.com/ncsu-libraries-data-vis/intro-to-colab-DELTA-summer-shorts/main/images/colab-open-interface.png" style="width: 75%" width="75%"/>

## Wrapping Up

### Discussion Questions

What are potential use cases for Google Colab in instruction settings? Research settings?

What are the limitations?

### Further Resources

- Learn more about Colaboratory in general in the [Welcome Notebook](https://colab.research.google.com/notebooks/welcome.ipynb).
- Learn more about coding in Python:
    - [A Byte of Python](https://python.swaroopch.com/) is a great intro book and reference for Python
    - [Official Python documentation and tutorials](https://docs.python.org/3/)
    - [Real Python](https://realpython.com/) contains a lot of different tutorials at different levels
    - [LinkedIn Learning](https://www.lynda.com/Python-training-tutorials/415-0.html) is free with NC State accounts and contains several video series for learning Python
    - [Dataquest](https://www.dataquest.io/) is a free then paid series of courses with an emphasis on data science

### Contact Us

Questions after the session? Find us in the [Libraries Data & Visualization Services](https://www.lib.ncsu.edu/services/data-visualization).