# Science on the Cloud

## Context

Today we are going to be exploring the cloud platform and digging into some of the details of what it means to do science on the cloud. We aren't actually going too write much code today, but we are going to look at a few pieces of how computers work and explore some of the ways that the cloud facilitates collaboration. It's going to be a bit of a hodge-podge of skills, but together they give you some of the computer science building blocks to navigate the cloud interface.

REALLY IMPORTANT NEW IDEA!!!!

## Computer Images

When you first started the hub you had a lot of options for what setup to choose. Choosing your setup options is a bit of a "build your own computer" moment every time your start the hub. You can choose what type of computer you want each time you start the hub! It's a bit like building a custom burrito at chipotle.

Every time you login to the cloud a computer gets turned on somewhere for you to use. In cloud lingo this computer is referred to as a **node**.

> When we talk about accessing a "computer" in the cloud we usually call it a **node** or a **compute node**

The two options you are choosing between on that first page are **image** and **node** size. The **image** defines the types of softwares that are availble when you start the hub. The options you'll see are Python, R, and Matlab. The **node** size refers to how powerful of a computer you are going to start. You see options ranging from about 2 GB to 32 GB. 

Please only ever used the smallest node size that you need. It helps saves energy and money!

![chiptole](https://www.foodandwine.com/thmb/oCl8jGErMHGWoXWBUC-jZhvKTSA=/1500x0/filters:no_upscale():max_bytes(150000):strip_icc()/chipotle-food-hacks-overhead-FT-BLOG0717-4cb0442dcddf4bc9b1d774374d00be8f.jpg)

:::{admonition} Vocabulary

**Image**: An image contains the software setup of a computer. It knows the operating system, the installed softwares, and the programming packages or libraries availble.

:::

:::{admonition} Cloud benefit: Scalability

The ability to choose the amount of computing power you need on demand and grow what you are using as you need more. This is **scalability**.

:::

## Tour of the interface

First, let's tour the interface of JupyterHub! There are lots of exciting tools. We will look at the left and right side tab tools. On the left side you'll find:
- file explorer
- kernel manager
- dask interface
- git interface
- markdown table of contents
- extension manager
- jupyter AI interface

On the rigth side you'll find:

- property inspector
- kernal usage
- debugger

## Private and Shared Files

One of the major benefits of the cloud is the way you can access shared spaces with your collaborators. In cryocloud there are both public and private spaces. The three public spaces on Cryocloud are:
- `shared`
- `shared-public`
- `shared-readwrite`

Anything in these folders is visible to everyone on Cryocloud. Anything any files you create outside those folders are visible only to you.

If you would like to distribute data to your group, feel free to use the `NASA-SARP` folder of `shared-public`. 

:::{warning}

Please do not delete anything you did not create in this folder, and don't add or remove anything from any of the other folders in `shared-public`. We want to be good stewards of the space!

:::

:::{admonition} Cloud benefit #2 -- Collaboration

Accessing shared resources (folders, data, environment) is a huge leap for collaboration. One way we take advantage of this is through access to shared folders.

:::

:::{tip} Checkin

Within your table pick one person to make a folder in the shared NASA-SARP folder. Have each person in the group create a text file in that folder. Type something in your file, close it, then open someone else's to see what they wrote. Delete the file when you are done (but keep the folder)

Be careful -- when possible try not to access the same file at the same time. The computer can get confused about which version to save if different edits are being made at the same time.

:::{dropdown} Make it harder

Instead of a text file, create a csv file. Open the csv file in the CSV" Viewer. Then right click on the file > open with > Editor and add the following text:
```
temperature, co2, ch4
C, ppm, ppb
80, 420, 1852
82, 411, 1903
```
Save the file, close it, then open it again in the CSV Viewer.

:::
:::

## Terminal

Working in the command line is like working on the inside of your computer. There is no visual screen, or graphical user interface (GUI), for you to use, but you are accessing the same files and programs as you can access in your File Explorer.

In general, command line commands can have multiple parts:
1. Command_name (Ex. `cd`, `ncdump`)
2. Flags (Ex. `-r`, `-h`, `--help`)
3. Arguments (Ex. `file1.txt`, `outputfile.nc`)

 
### Navigating your File System
* `cd` - “change directory” go into a folder or directory
* `cd ..` goes back up a level in the folder hierarchy
* `ls` - shows the contents of the current directory
* `pwd` - shows the absolute path of the current directory



Doing these commands is all fine and good, but it's easy to get lost inside of a computer if you don't have a general idea of how it is setup. Let's start building one of those by drawing a diagram for movement between folders and files.

As a note, you can also run these terminal commands from your jupyter notebook by using a code cell that starts with `!`. For example:

In [1]:
!pwd

/home/jovyan


In [13]:
!ls

allmydata     envs		 shared			 terminal
cloud-lesson  hierarchical_data  shared-public		 test
data	      Lessons		 shared-readwrite	 trainings
Desktop       projects		 SpectralUnmixing-0.2.3  Untitled.ipynb
emit-utils    sarp_lessons	 Sync


Checkin: starting from `/shared-public/NASA-SARP` draw a tree diagram representing the folders / files present.

:::{tip} Checkin

Starting with the folder `/shared-public/NASA-SARP/` draw a diagram the shows the structure of this folder. Start with the NASA-SARP folder at the top in a box and draw arrows for each of the folders in each sub-folder.
:::

You likely made a diagram that has a similar shape to this:

![tree](https://upload.wikimedia.org/wikipedia/commons/thumb/5/5f/Tree_%28computer_science%29.svg/800px-Tree_%28computer_science%29.svg.png)

Notice that each of the folders or files can be traced back to the single starting point. If you draw a diagram like this for your whole file systme this starting point is called the **root** folder and this type of organization in computer science is called a **tree** structure.

:::{admonition} Vocabulary

**Root folder**: The root folder is the folder at the very top of the tree. It is the folder in which every single folder on your computer lives.

:::

Why go into all this detailed discussion? It's easy to get mentally lost in terminal, but it helps to remember that we are always scooting around on this tree. When we are in terminal we are always _somewhere_ on this tree, and every time we change directories we are just moving to another place on the tree. That's how the inside of a computer is organized! 🌳💻

Note about how Cryocloud works: When we start up a terminal from Cryocloud, the terminal is always going to initialize itself to whatever folder you have open in your file explorer window. 

:::{tip} Checkin

Create a new folder with a name of your choice (ex. lesson notebooks). `wget` a file into the folder.


:::{dropdown} Make it harder
The command for making a new folder in terminal is `mkdir`. Use `mkdir` to create a new folder instead of using the clickable interface.
:::


:::

### Moving and Copying Files
* `mv file1.txt file2.txt` - `mv` -  moves a file from one location to another. Also used for renaming files
* `rm file1.txt` - `rm` - permanently deletes a file.
* - asterisk is a wildcard so it can be used to remove all the files that match a certain pattern
	Ex. rm -Confirm *.txt - removes all text files in a directory
	Ex. rm -Confirm m704_2022*.nc - removes all files that start with m704_2022 and end with .nc
* `cp file1.txt file1_copy.txt` - `cp` - copy a file to a new location

## Virtual Desktop

Not all tasks are done best with code - sometimes an interface is better. For that purpose we have the Virtual Desktop. This tool allows us to access _the same files_ from a graphical user interface (GUI) that is much more similar to a laptop.

> Demo: viewing WAS data in QGIS

## Uploading and Downloading Files

Something you may want to do very commonly is upload or download data. Let's demo doing that in the cryocloud interface.

:::{tip} Final Checkin

Go to earth data search and find one granule of any data product. You can choose any data product (maybe one you're considering using?!). If you are not sure which one to use try MODIS chlorophyll-a. Download it and upload to your home directory.

:::{dropdown} Make it harder
The code below shows how you would download data directly using a library called `earthaccess`.

```
import earthaccess

earthaccess.login()

results = earthaccess.search_data(
    short_name='ECO4ESIPTJPL',
    bounding_box=(-77, 37, -76, 38),
    temporal=("2020-02-01", "2020-02-10"),
    count=8,
)

earthaccess.download(
    results,
    local_path=data_dir,
)
```

:::

:::{tip} Final Checkin

Take out a piece of paper and pen. Draw a diagram that represents what happens when you logon to cryocloud. pieces you might include are:
- 1+ nodes
- your computer
- the type of image
- the size of the node

:::{dropdown} Make it harder

Add to your diagram the process of uploading a data file to the cloud from your computer.



:::