# Fugue Roadmap (10 mins)

Fugue has already done significant work on the Python side by scaling Pandas, SQL, and Python code to Spark, Dask, and Ray. The future is more about integrating with other open-source tools and backends.

[![Slack Status](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](http://slack.fugue.ai)

In [None]:
_=!mamba install -y openjdk
_=!pip install -r ../requirements.txt

## Open Source Integrations

* [whylogs](https://github.com/whylabs/whylogs) - data profiling
* [pycaret](https://github.com/pycaret/pycaret) - low code machine learning
* [statsforecast](https://github.com/Nixtla/statsforecast/) - lightning fast timeseries models
* [ipyvizzu](https://github.com/vizzuhq/ipyvizzu-story) - interactive data stories

## Polars Integration

In [1]:
import polars as pl

data = {"id": ["A", "A", "A", "B", "B", "B", "C", "C", "C"],
        "number": [10, 20, 30, 15, 25, 35, 20, 30, 40]}
df = pl.DataFrame(data)

# schema: *, diff:float
def diff(df: pl.DataFrame) -> pl.DataFrame:
    return df.with_columns(pl.col("number").diff().alias("diff"))

**DuckDB + Polars**

In [2]:
from fugue_jupyter import setup
setup()

In [3]:
%%fsql duckdb
SELECT *
  FROM df
 WHERE id IN ('B', 'C')

TRANSFORM PREPARTITION BY id USING diff 
PRINT

Unnamed: 0,id:str,number:long,diff:float
0,B,15,
1,B,25,10.0
2,B,35,10.0
3,C,20,
4,C,30,10.0
5,C,40,10.0


**Runing Distributedly**

In [4]:
import fugue.api as fa
from fugue.api import transform
from pyspark.sql import SparkSession

spark = SparkSession.builder.getOrCreate()

# Output is Spark DataFrame
res = fa.transform(df, diff, partition={"by": "id"}, engine=spark)

pl.from_arrow(fa.as_arrow(res))

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).


23/04/26 07:58:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/04/26 07:58:16 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.


id,number,diff
str,i64,f32
"""B""",15,
"""B""",25,10.0
"""B""",35,10.0
"""C""",20,
"""C""",30,10.0
"""C""",40,10.0
"""A""",10,
"""A""",20,10.0
"""A""",30,10.0


## BigQuery Integration

Our SQL has been limited to SQL on top of Python DataFrames until recently. We want to facilitate the synergy of using data warehouses with distributed computing.

```python
import pandas as pd
from typing import List, Any

# schema: *
def median(df:pd.DataFrame) -> List[List[Any]]:
    return [[df.state.iloc[0], df.number.median()]]

fa.transform(
    ("bq", """SELECT state, number
    FROM `bigquery-public-data.usa_names.usa_1910_2013` TABLESAMPLE SYSTEM (1 PERCENT)"""),
    median,
    partition="state",
    engine="dask"
).compute().head()
```

## Next Plans

* Snowflake Integration - Utilize Snowflake and Databricks together
* Use DuckDB for local testing before bringing to data warehouses