# Guide to HelioCloud Tutorials
S. Antunes (APL)
October 2023

There are many tutorials here in the GSFC HelioClound instance, and we'll help walk you through them. At this point we'll assume you're already in your account and able to run a Notebook.  We will walk you through an overview of SMCE, examples of data reads from AWS S3 storage, using Dask for compute power, and working in IDL.  We also include a link to a local copy of the PyHC summer school package tutorials for SunPy, SpacePy, AstroPy, HAPI, and others.

The core HelioCloud notebooks to date are:
1) basic file access of FITS, CDF and NetCDF data that is stored in AWS S3 cloud storage, in [S3-Access-Demo.ipynb](S3-Access-Demo.ipynb)
2) 'bursting' a job onto multiple temporary CPUs in [Dask-Gateway-Example.ipynb](Dask-Gateway-Example.ipynb)
3) combining accessing private or public S3 files with bursting via Dask in [S3-Dask-Demo.ipynb](S3-Dask-Demo.ipynb)
4) finding datasets and lists of data files within one more more HelioClouds, in the [CloudCatalog-Demo.ipynb](CloudCatalog-Demo.ipynb)
5) extended example MMS: searching for MMS data by instrument name and time range, then analyzing and plotting them, in [MMS-Catalog-Demo.ipynb](MMS-Catalog-Demo.ipynb)
6) extended example SDO: searching for SDO data then processing a large set on multiple CPUs via Dask and gathering the results, in [HelioCloud-SDO-Demo.ipynb](HelioCloud-SDO-Demo.ipynb)

We start off with the 'Science in the Browser' approach where the Juptyer Notebook suffices to find, analyze and plot data entirely within the cloud.  We also include additional material for power users who prefer to work in their own cloud VM or cloud console environment.

## About SMCE

A necessary overview of this SMCE AWS environment is given in [Setup/Services_README notebook](Setup/Services_README.ipynb). 

If you are interested in how to write in Jupyter notebooks to make attractive presentation-ready pages, read the [Additional/OutputTypes notebook](Additional/OutputTypes.ipynb)



### First: What is S3?

S3 stands for "Simple Storage Service," which provides object storage for for AWS. https://aws.amazon.com/s3/

It allows people to query and access data from a common location reference. The buckets can be made web accessible to users outside of daskhub if web access is enabled.

S3 buckets are individual storage elements.

## Science Part 1: Cloud storage and using multiple CPUs in Python

Python practice examples for reading sample data in FITS, CDF or NetCDF that are stored in this cloud is in our [S3-Access-Demo notebook](S3-Access-Demo.ipynb). These make use of AstroPy for FITS files, cdflib for CDF files, and Xarray for NetCDF files.

Dask is software that lets you 'burst' a job onto multiple temporary CPUs by defining then using a cluster of CPUs to lazily parallelize jobs. Using Dask in a notebook is in [Dask-Gateway-Example notebook](Dask-Gateway-Example.ipynb)

We then combine reading from cloud S3 storage then using Dask to 'burst' the problem solving in [S3-Dask-Demo.ipynb](S3-Dask-Demo.ipynb)


## Science Part 2: Big Data Sets

Cloud means never having to download datasets. Instead, you find data across multiple HelioClouds and directly access it without downloading it locally.  The [CloudCatalog API](https://pypi.org/project/cloudcatalog/) on top of the CloudCatalog sharing standard enables finding and listing cloud-stored scientific datasets such as CDAWeb, SDO, MMS and others.

Our initial example for finding datasets and lists of data files within one more more HelioClouds is in the [CloudCatalog-Demo.ipynb](CloudCatalog-Demo.ipynb)

## Science Part 3: Using real MMS and SDO data

We present two sample tasks wherein we first query the cloud for instrument data. We then pull a list of files for a given instrument and time range, then process it, all within the cloud (not using your local storage or laptop CPU).

A serial example is searching for MMS data by instrument name and time range, then analyzing and plotting them, in the [MMS-Catalog-Demo notebook](MMS-Catalog-Demo.ipynb)

We then add Dask 'burst' capabilities to tackle 2TB of data rapidly.  Here we search for SDO data then processing a large set on multiple CPUs via Dask and gathering the results, in the [HelioCloud-SDO-Demo notebook](HelioCloud-SDO-Demo.ipynb)

## Working with IDL

We provide an example for using IDL in the [IDL/IDL_examples notebook](IDL/IDL_examples.ipynb) and, additionally, accessing S3 in IDL in the [IDL/IDL-S3 notebook](IDL/IDL-S3.ipynb)


## Power users

Pushing data on or off S3 using SFTP is in [Setup/SFTP service notebook](Setup/SFTP service.ipynb)

A simple "hello world" in Fortran is in the [Additional/fortran_helloworld notebook](Additional/fortran_helloworld.ipynb)

Updating your personal Conda environment is in the [Setup/Conda_instructions_for_cloud notebook](Setup/Conda_instructions_for_cloud.ipynb)

Testing if GPUs are enabled, in the[Additional/GPU-Info notebook](Additonal/GPU-Info.ipynb)

# PyHC Package Tutorials

Tutorials for using core Python PyHC science packages from the PyHC 2022 Summer School include the 
* [AstroPy tutorial](../summer-school/astropy-tutorial/README.md)
* [HAPI tutorial](../summer-school/hapi-tutorial/README.md)
* [Kamodo tutorial](../summer-school/kamodo-tutorial/README.md)
* [OMMBV tutorial](../summer-school/ommbv-tutorial/README.md)
* [PlasmaPy tutorial](../summer-school/plasmapy-tutorial/README.md)
* [PySat tutorial](../summer-school/pysat-tutorial/README.md)
* [PySPEDAS tutorial](../summer-school/pyspedas-tutorial/README.md)
* [SolarMACH tutorial](../summer-school/solarmach-tutorial/README.md)
* [SpacePy tutorial](../summer-school/spacepy-tutorial/README.md)
* [Speasy tutorial](../summer-school/speasy-tutorial/README.md)
* [SunPy tutorial](../summer-school/sunpy-tutorial/README.md)
* [Vires-SWARM tutorial](../summer-school/vires-swarm-tutorial/README.md)

All are available in the [PyHC 2022 Summer School github account](https://github.com/Christian-Palmroos/PyHC_summer_school_2022/blob/main/README.md)
