## Fugue and DuckDB: Fast SQL Code in Python

### Why Fugue and DuckDB

Fugue also allows you to scale your SQL code using Spark or Dask

### Docs
- [FugueSQL](https://fugue-tutorials.readthedocs.io/tutorials/fugue_sql/index.html)
- [DuckDB with Fugue](https://duckdb.org/docs/guides/python/fugue)
- [Introducing FugueSQL — SQL for Pandas, Spark, and Dask DataFrames](https://towardsdatascience.com/introducing-fuguesql-sql-for-pandas-spark-and-dask-dataframes-63d461a16b27)
- [Optimize Your SQL Code with Python and DuckDB](https://medium.com/towards-data-science/fugue-and-duckdb-fast-sql-code-in-python-e2e2dfc0f8eb)


### Setup
```
pip install -U fugue[duckdb,sql] 
```

### Use in Jupyter

In [1]:
from fugue_notebook import setup
import fugue_duckdb

setup()



<IPython.core.display.Javascript object>

#### Quick start

In [6]:
import pandas as pd 
max_n = 5
df = pd.DataFrame(
    {
        "col_int": range(max_n),
        "col_char": [chr(97+i) for i in range(max_n)]
    }
)

In [7]:
df

Unnamed: 0,col_int,col_char
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


In [9]:
%%fsql

SELECT * FROM df
PRINT

Unnamed: 0,col_int:long,col_char:str
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


- [Ignore case in FugueSQL](https://fugue-tutorials.readthedocs.io/tutorials/advanced/useful_config.html)

In [10]:
from fugue.api import fugue_sql_flow

fugue_sql_flow("""
               select * from df
               print
               """, fsql_ignore_case=True).run();

Unnamed: 0,col_int:long,col_char:str
0,0,a
1,1,b
2,2,c
3,3,d
4,4,e


#### Data Science Topics

- https://github.com/khuyentran1401/Data-science

- https://github.com/khuyentran1401/Data-science/tree/master/productive_tools/Fugue_and_Duckdb

In [2]:
import os 
save_path = os.getcwd() + '/raw.parquet'

In [2]:
save_path

'C:\\Users\\p2p2l\\projects\\wgong\\py4kids\\lesson-14.1-db\\duckdb\\fugue/raw.parquet'

In [3]:
import pandas as pd 

In [4]:
df = pd.read_parquet(save_path)

In [5]:
df.shape, df.columns

((46674268, 12),
 Index(['Open', 'High', 'Low', 'Close', 'Volume', 'Close_time',
        'Quote_asset_volume', 'Number_of_trades', 'Taker_buy_base_asset_volume',
        'Taker_buy_quote_asset_volume', 'symbol', 'time'],
       dtype='object'))

In [6]:
df.head(5)

Unnamed: 0,Open,High,Low,Close,Volume,Close_time,Quote_asset_volume,Number_of_trades,Taker_buy_base_asset_volume,Taker_buy_quote_asset_volume,symbol,time
0,0.000436,0.000436,0.000436,0.000436,2018.0,1509494459999,0.8804,4,0.0,0.0,SNGLSETH,2017-11-01 00:00:00
1,0.000436,0.000436,0.000421,0.000425,2497.0,1509494519999,1.074549,8,893.0,0.379605,SNGLSETH,2017-11-01 00:01:00
2,0.000425,0.000428,0.000425,0.000428,2671.0,1509494579999,1.139313,3,2671.0,1.139313,SNGLSETH,2017-11-01 00:02:00
3,0.000428,0.000428,0.000428,0.000428,1773.0,1509494639999,0.758578,4,1773.0,0.758578,SNGLSETH,2017-11-01 00:03:00
4,0.000428,0.000428,0.000428,0.000428,887.0,1509494699999,0.379715,4,887.0,0.379715,SNGLSETH,2017-11-01 00:04:00


In [7]:
btcusdt = df[df.symbol == "BTCUSDT"]

In [8]:
btcusdt.shape

(207985, 12)

In [9]:
btcusdt.to_parquet('BTCUSDT.parquet',
              compression='snappy') 

In [13]:
from pathlib import Path

In [10]:
btcusdt_path = os.getcwd() + '/BTCUSDT.parquet'

In [12]:
btcusdt_path

'C:\\Users\\p2p2l\\projects\\wgong\\py4kids\\lesson-14.1-db\\duckdb\\fugue/BTCUSDT.parquet'

https://stackoverflow.com/questions/37400974/error-unicode-error-unicodeescape-codec-cant-decode-bytes-in-position-2-3

In [14]:
btcusdt_path_2 = Path(os.getcwd()) / 'BTCUSDT.parquet'

In [15]:
btcusdt_path_2

WindowsPath('C:/Users/p2p2l/projects/wgong/py4kids/lesson-14.1-db/duckdb/fugue/BTCUSDT.parquet')

In [17]:
%%fsql duck

LOAD 'BTCUSDT.parquet'
PRINT

Unnamed: 0,Open:double,High:double,Low:double,Close:double,Volume:double,Close_time:long,Quote_asset_volume:double,Number_of_trades:long,Taker_buy_base_asset_volume:double,Taker_buy_quote_asset_volume:double,symbol:str,time:datetime,__index_level_0__:long
0,6463.0,6463.07,6463.0,6463.06,0.471863,1960939103,3049.655132,9,0.065039,420.351608,BTCUSDT,2017-11-01 00:00:00,3215665
1,6463.07,6463.07,6421.13,6463.07,1.191819,1960999103,7683.421756,21,0.397939,2571.88376,BTCUSDT,2017-11-01 00:01:00,3215666
2,6463.0,6463.0,6421.15,6422.07,0.316035,1961059103,2030.005697,5,0.0,0.0,BTCUSDT,2017-11-01 00:02:00,3215667
3,6430.0,6430.0,6421.15,6421.15,0.837717,1961119103,5379.397652,5,0.756465,4857.375235,BTCUSDT,2017-11-01 00:03:00,3215668
4,6421.15,6421.15,6421.13,6421.13,0.029332,1961179103,188.345152,2,0.0,0.0,BTCUSDT,2017-11-01 00:04:00,3215669
5,6421.13,6421.15,6421.1,6421.15,3.747058,1961239103,24060.355411,8,1.155313,7418.437797,BTCUSDT,2017-11-01 00:05:00,3215670
6,6421.15,6421.15,6421.15,6421.15,0.0,1961299103,0.0,0,0.0,0.0,BTCUSDT,2017-11-01 00:06:00,3215671
7,6438.43,6459.94,6422.0,6422.0,2.850467,1961359103,18358.363787,7,1.954536,12604.694905,BTCUSDT,2017-11-01 00:07:00,3215672
8,6459.95,6459.96,6422.0,6422.01,0.359784,1961419103,2314.752211,5,0.111118,717.816725,BTCUSDT,2017-11-01 00:08:00,3215673
9,6422.04,6422.04,6422.04,6422.04,0.03339,1961479103,214.431916,2,0.0,0.0,BTCUSDT,2017-11-01 00:09:00,3215674


### Fugue + DuckDB in Production