# Tutorial for Beginners

Welcome to the Fugue tutorials. All questions are welcome in the Slack channel.

[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://join.slack.com/t/fugue-project/shared_invite/zt-jl0pcahu-KdlSOgi~fP50TZWmNxdWYQ)

Fugue is an abstraction framework that lets users write code in native Python or Pandas, and then port it over to Spark and Dask. The beginner tutorial will cover the motivation of Fugue, and the benefits of using an abstraction layer.

## [1. Introduction](introduction.ipynb)
We'll start by going over the motivation for Fugue and the problems it solves. We'll show some basic Fugue code, and how to execute code on different ExecutionEngines (Spark or Dask). Fugue is more than a framework. It is a mindset for working on data problems and distributed compute. In this section we'll cover some of the values and beliefs of Fugue.

## [2. Fugue Interface](interface.ipynb)
We'll start by going over the motivation for Fugue and the problems it solves. We'll show some basic Fugue code, and how to execute code on different ExecutionEngines (Spark or Dask). Fugue is more than a framework. It is a mindset for working on data problems and distributed compute. In this section we'll cover some of the values and beliefs of Fugue.

## [3. Distributed Compute](distributed_compute.ipynb)
We'll start by going over the motivation for Fugue and the problems it solves. We'll show some basic Fugue code, and how to execute code on different ExecutionEngines (Spark or Dask). Fugue is more than a framework. It is a mindset for working on data problems and distributed compute. In this section we'll cover some of the values and beliefs of Fugue.

## [2. Stock Sentiment Analysis (Preprocessing)](stock_sentiment.ipynb)
A Fugue use case for NLP preprocessing. It's to get a general idea what Fugue is trying to solve, and why we want to add Fugue layer instead of directly using Pandas.


## [3. Execution Graph (DAG) & Programming Interface](dag.ipynb) (MUST READ)
A deep dive on the programming interfaces. In this tutorial we will cover most features of the Fugue programming interface.


## [4. COVID19 Data Exploration](example_covid19.ipynb)
Another Fugue example, this one shows you how to use Fugue SQL to do data analysis


## [5. Fugue SQL](sql.ipynb) (MUST READ)
The most fun part of Fugue. You can use SQL instead of python to represent the backbone of your workflow, while you can invoke python extensions in the SQL-like language. The SQL mindset is great for distributed computing, it can help make your logic more scale agnostic. In this tutorial, we will cover all syntax of Fugue SQL.


## [6. Extensions](extensions.ipynb)
From the previous tutorials you have seen plenty of extension examples, here is a complete guide to the Fugue extensions

### [Transformer](transformer.ipynb) (MUST READ)
The most useful and widely used extension

### [CoTransformer](cotransformer.ipynb)
Transform multiple dataframes partitioned in the same way

### [Creator](creator.ipynb)
Generate dataframes for a DAG

### [Processor](processor.ipynb)
Take in one or multiple dataframes and produce a single dataframe as output

### [Outputter](outputter.ipynb)
Take in one or multiple dataframes to do final jobs such as save and print


## 7. Deep Dive
It's time to have a systematic understanding of the Fugue architecture.

<img src="../images/architecture.svg" width="500">

### [Data Type, Schema & DataFrames](schema_dataframes.ipynb)
Fugue data types and schema are strictly based on [Apache Arrow](https://arrow.apache.org/docs/index.html). Dataframe is an abstract concept with several built-in implementations to adapt to different dataframes. In this tutorial, we will go through the basic APIs and focus on the most common use cases.

### [Partition](partition.ipynb) (MUST READ)
This tutorial is more focused on explaining the basic ideas of data partitioning. It's less related with Fugue. To have a good understanding of partition is the key for writing high performance code.

### [Checkpoint](checkpoint.ipynb)
Checkpoint is important for advanced users to keep the executions robust and stateful. This section gives you a bigger picture of the checkpoint concept and compared the implementation difference between Fugue and Spark.

### [Execution Engine](execution_engine.ipynb)
The heart of Fugue. It is the layer that unifies many of the core concepts of distributed computing, and separates the underlying computing frameworks from user level logic. Normally you don't directly interact with execution engines. But it's good to understand some basics.

### [Callbacks From Transformers To Driver](rpc.ipynb)
You can provide a callback function to any transformer, to communicate with driver while running

### [Fugue Configurations](useful_config.ipynb) (MUST READ)
These configurations can have significant impact on building and running the Fugue workflows.

### [X-Like Objects Initialization](x-like.ipynb)
You may often see -like objects in Fugue API document, here is a complete list of these objects and their ways to initialize.