- DECIDE ON DATASET 
- SPATIAL 

# Tutorial

Using ibis + duckdb to do large scale (spatial) data analysis

In [12]:
import ibis as ib
from ibis import _

ib.options.interactive = True

## Introduction

ibis is the link between powerfull backend infrastructure and user friendly `python`. It allows writing code in a familiar environement and with an almost same syntax as `pandas`, but executes the code in a database

In [13]:
conn = ib.connect("duckdb://")

In [14]:
conn.list_catalogs()

['memory', 'system', 'temp']

In [15]:
conn.list_databases()

['information_schema', 'main', 'pg_catalog']

In [16]:
iris = ib.examples.iris_raw.fetch(backend=conn)

In [24]:
iris.schema()

ibis.Schema {
  Sepal.Length  float64
  Sepal.Width   float64
  Petal.Length  float64
  Petal.Width   float64
  Species       string
}

In [32]:
ib.literal(140_000).log()

┌───────────┐
│ [1;36m11.849398[0m │
└───────────┘

In [17]:
flights = ib.examples.nycflights13_weather.fetch(table_name="flights", backend=conn)

In [18]:
flights.describe()

In [19]:
# ib.to_sql(flights.describe())

In [20]:
flights_df = flights.execute()

In [30]:
# flights_df.describe()

In [22]:
# conn.list_tables()

## SQL, DUCKDB, IBIS
DuckDB is a *flavour* of SQL, meaning it's generally the same with some extras that are meant to expand or facilitate cetain functionalities that traditional SQL does not have. 

- https://ibis-project.org/tutorials/coming-from/pandas
- 

### Selecting

In [23]:
# SQL
select_sql = conn.sql("""
    SELECT "Petal.Width", Species FROM iris_raw;
""")
select_sql

In [39]:
# ibis
select_expr = iris.select("Petal.Width", "Species")
select_expr

In [40]:
ib.to_sql(select_expr)

```sql
SELECT
  "t0"."Petal.Width",
  "t0"."Species"
FROM "iris_raw" AS "t0"
```

### Filtering

In [41]:
# SQL
filter_sql = conn.sql("""
    SELECT * from iris_raw where "Petal.Width" > .3;
""")
filter_sql

In [46]:
# ibis
iris.filter(_["Petal.Width"] > .3)

## IBIS

In [None]:
iris.describe()

In [None]:
iris.count()

## Spatial

## Example

### Example 1 : Foursquare

## Ecosystem

Extensions and applications

- OSM data reading
- String matching
- Scalenav
- ....

### Reading OSM buffer files directly into duckdb

In [None]:
import quackosm # https://github.com/kraina-ai/quackosm

In [None]:
in_path = '../datasets/OSM/raw/africa-focus.osm.pbf'
out_path = '../datasets/OSM/processed/africa_amenity.parquet'

In [None]:
# qo.convert_pbf_to_parquet(
#     # pbf_path='../datasets/OSM/raw/africa-latest.osm.pbf',
#     pbf_path=in_path,
#     result_file_path=out_pah,
#     tags_filter={
#         "amenity" : True,
#     },
#     explode_tags=False,
# # )

In [None]:
res_ = conn.read_parquet(out_path)
res_.head()

In [None]:
res = (
    res_
    .mutate(
        amenity=_.tags['amenity']
    )
    .select('amenity',"geometry")

)

In [None]:
res.head()

### Global geospatial anaylytics

In [None]:
import scalenav.oop as snoo

## Similar tools

polars