# Lossless -> gridded 

Grids `last_trade` data: 

- Each `BinanceLastTradesGrid` reads the underlying lossless dataset `symbol_date_df` to determine symbol-date pairs. 
- Each dataset maintains a list of "validated" dates. Second runs will be very quick

In [2]:
import mnemosyne as ms 
import polars as pl

args = [
    # (ms.DatasetType.BinanceSpotTrades, '20s', 'USDC'),
    # (ms.DatasetType.BinanceSpotTrades, '2m', 'USDC'),
    # (ms.DatasetType.BinanceSpotTrades, '10m', 'USDC'),

    (ms.DatasetType.BinanceUmPerpTrades, '2m', 'USDC'),
    (ms.DatasetType.BinanceUmPerpTrades, '10m', 'USDC'),

    # (ms.DatasetType.BinanceSpotTrades, '20s', 'USDT'), 
    # (ms.DatasetType.BinanceSpotTrades, '2m', 'USDT'),
    # (ms.DatasetType.BinanceSpotTrades, '4s', 'USDT'),
    # (ms.DatasetType.BinanceSpotTrades, '10m', 'USDT'),

    (ms.DatasetType.BinanceUmPerpTrades, '2m', 'USDT'),
    (ms.DatasetType.BinanceUmPerpTrades, '10m', 'USDT'),
    (ms.DatasetType.BinanceUmPerpTrades, '4s', 'USDT'),
]

In [None]:
for dataset_type, grid_interval, peg_symbol in args:
    # Reads the underlying lossless dataset's `symbol_date_df` to determine symbol-date pairs
    dataset = ms.binance.BinanceLastTradesGrid(
        peg_symbol=peg_symbol, 
        grid_interval=grid_interval, 
        dataset_type=dataset_type, 
        parquet_names='*.parquet', # Write to a single parquet: polars defaults to "0.parquet, 1.parquet ..."
        num_workers=2, 
    )
    print(f'{dataset_type} {grid_interval} {peg_symbol}: {dataset.path}')
    dataset.compute(recompute=True, days_per_batch=30)

Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
INFO:mnemosyne.dataset.interface:Computing 658 partitions in 22 batches (30 days/batch) with 2 workers


BinanceUmPerpTrades 2m USDC: /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC


  0%|          | 0/22 [00:00<?, ?it/s]

Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDC exists.
Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/p

BinanceUmPerpTrades 10m USDC: /data/mnemosyne/binance/grids/futures/um/last_trade/10m/peg_symbol=USDC


  0%|          | 0/22 [00:00<?, ?it/s]

# Usage example

In [4]:
from datetime import date as Date 

peg_symbol = 'USDT'
dstype = ms.DatasetType.BinanceUmPerpTrades
grid_interval = '2m'

dataset = ms.binance.BinanceLastTradesGrid(
        peg_symbol=peg_symbol, 
        grid_interval=grid_interval, 
        dataset_type=dstype, 
        parquet_names='*.parquet', 
        num_workers=4, 
    )

# Reading the full dataset
lf = pl.scan_parquet(dataset.path / f'**/{dataset.parquet_names}')
dataset.universe_df

Check whether /data/mnemosyne/binance/grids/futures/um/last_trade/2m/peg_symbol=USDT exists.


symbol,date
str,date
"""AGLD""",2024-03-17
"""APE""",2024-03-17
"""REN""",2024-03-17
"""JUP""",2024-03-17
"""AXS""",2024-03-17
…,…
"""CETUS""",2024-11-24
"""ONE""",2024-11-24
"""NTRN""",2024-11-24
"""ALPACA""",2024-11-24


In [None]:
# db = pl.scan_parquet(dataset.src_path / '**/data.parquet', hive_partitioning=True)
# db.filter(pl.col('date') >= Date(2025, 1, 1)).select(pl.len()).collect()

In [1]:
# Getting specific dates
dataset[[Date(2024, 2, 2), Date(2025, 3, 4)]].collect()

NameError: name 'dataset' is not defined