# FugueSQL and Dask-SQL

Pandas and Spark have solutions that allow users to execute SQL code to describe computation workflows. Dask, on the other hand, has not had a standard SQL interface, until recently. [`dask-sql`](https://dask-sql.readthedocs.io/en/latest/index.html) is a relatively new project and has a majority of SQL keywords implemented already. Additionally, it is also faster than FugueSQL on average. However, there are still some features under development. Most notably, the SQL `WINDOW` is not yet implemented.

We are collaborating to have our solutions converge to create the de facto SQL interface for Dask. In the meantime, we have unified our solutions by allowing `Fugue` to use `dask-sql` as a SQL (execution engine)[../execution_engine.ipynb] for `FugueSQLWorkflow`. `dask-sql` has added code that lets us import it and pass it into `FugueSQLWorkflow`.

## Sample Usage

This example below shows that when the keywords are unavailble in `dask-sql`, it will use the FugueSQL keywords. We are able to use the `RANK()` function, which is an example of a `WINDOW` function along with the `FugueSQL` `TAKE` keyword.

`FugueSQL` and `dask-sql` together can provide a more powerful solution. This allows us to use both solutions to get the best of both worlds in terms of speed and completeness. All we need to do is pass the `DaskSQLExecutionEngine` into `FugueSQLWorkflow`


In [None]:
from dask_sql.integrations.fugue import DaskSQLExecutionEngine
from fugue_sql import FugueSQLWorkflow

data = [
    ["A", "2020-01-01", 10],
    ["A", "2020-01-02", None],
    ["A", "2020-01-03", 30],
    ["B", "2020-01-01", 20],
    ["B", "2020-01-02", None],
    ["B", "2020-01-03", 40]
]
schema = "id:str,date:date,value:double"

with FugueSQLWorkflow(DaskSQLExecutionEngine) as dag:
    df = dag.df(data, schema)
    dag("""

    SELECT id, date, value,
    RANK() OVER (PARTITION BY id ORDER BY date) row
    FROM df
    TAKE 2 ROWS PREPARTITION BY id PRESORT value NULLS FIRST
    PRINT
    """)

---
**Conflict with SparkExecutionEngine**

Note that `dask-sql` requires Python 3.8 to run, which may cause errors with the SparkExecutionEngine because Spark is more stable on Python 3.7. 

---