# Ibis Dempo

In [None]:
#| echo: false
!curl -OLsS 'https://storage.googleapis.com/ibis-tutorial-data/imdb/2024-03-22/imdb_title_ratings.parquet'
!curl -OLsS 'https://storage.googleapis.com/ibis-tutorial-data/imdb/2024-03-22/imdb_title_basics.parquet'
!psql < create_imdb.sql
!duckdb < load_imdb.sql

First, we import Ibis:

In [None]:
import ibis
from ibis import *

## Connect to DuckDB and create a table

In [None]:
con = ibis.duckdb.connect()

ratings = con.read_parquet("imdb_title_ratings.parquet", table_name="imdb_title_ratings")
basics = con.read_parquet("imdb_title_basics.parquet", table_name="imdb_title_basics")

We can list the names of the tables

In [None]:
con.list_tables()

And render an abstract version of the expresion withe the column's names and types:

In [None]:
ratings

In [None]:
basics

Ibis works with multiple in-memory formats, including Pandas and PyArrow

In [None]:
basics.to_pandas(limit=10)

In [None]:
ratings.to_pyarrow(limit=10)

And Polars, with the `to_polars()` method, if you have polars installed. 


In [None]:
ratings.to_polars(limit=10)

## Columns with proper names, and interactive mode

Sometimes, you have columns with dirty names, and you need to clean them. Ibis 
can help with that:

In [None]:
ibis.options.interactive = True

In [None]:
basics = basics.rename("snake_case")

In [None]:
ratings = ratings.rename("snake_case")

In [None]:
ratings

In [None]:
basics

## `ibis.to_sql()`

When you need to see the SQL, we have the `to_sql()` function

In [None]:
expr = ratings.average_rating.round().cast("float64").name("round")
ibis.to_sql(expr)

We can also provide a different dialect,

In [None]:
ibis.to_sql(expr, dialect="postgres")

In [None]:
ibis.options.interactive = False

## `con.sql()`

We also have the `.sql()` method, which allows you to write raw sql, because 
sometimes that's what you need to do:

In [None]:
con.sql("""
    SELECT
    "tconst",
    CAST("averageRating" AS VARCHAR) AS "average_rating",
    CAST("numVotes" AS VARCHAR) AS "num_votes"
    FROM "imdb_title_ratings"
""")

In [None]:
con.sql("""
    SELECT
    "tconst",
    CAST("averageRating" AS VARCHAR) AS "average_rating",
    CAST("numVotes" AS VARCHAR) AS "num_votes"
    FROM "imdb_title_ratings"
""").to_pandas()

In [None]:
ibis.options.interactive = True

In [None]:
con.sql("""
    SELECT
    "tconst",
    CAST("averageRating" AS VARCHAR) AS "average_rating",
    CAST("numVotes" AS VARCHAR) AS "num_votes"
    FROM "imdb_title_ratings"
""")

## Other operations

In [None]:
basics.columns

To do a "GROUP BY" with `count()` we have the `value_counts()` method:

In [None]:
basics.title_type.value_counts()

For our final analysis, we will clean a few things.

We will select only the titles with the "movie" type and discard the adult movies.

To do this, we will use a list of predicates

In [None]:
basics = (
    basics
    .filter([basics.title_type == "movie", basics.is_adult == 0])
    .select("tconst", "primary_title")
)

In [None]:
basics

Then, we join the `basics` table and the `ratings` table using the `tconst` column, 
and we execute

In [None]:
basics.join(ratings, "tconst").to_pandas(limit=10)

We order by `average_rating`

In [None]:
basics.join(ratings, "tconst").order_by(_.average_rating.desc())

And filter the titles with more than one million votes

In [None]:
(
    basics.join(ratings, "tconst")
    .order_by(_.average_rating.desc())
    .filter(_.num_votes > 1e6)
)

## Table joins (`join`)

Finally, here is the full expression to compute the ten best movies.

In [None]:
topfilms = (
    basics.join(ratings, "tconst")
    .order_by(_.average_rating.desc())
    .filter(_.num_votes > 1e6)
)

In [None]:
ibis.options.interactive = False

In [None]:
topfilms

In [None]:
topfilms.to_pandas(limit=10)

## Execute the same expression in Postgres!

### Connect to Postgres with Ibis

We already have the data in Postgres

In [None]:
pgcon = ibis.postgres.connect()

### Check that the tables exist

In [None]:
pgcon.list_tables()

Execute the same expresion in Postgres:

In [None]:
pgcon.to_pandas(topfilms)

¡That's it, that's Ibis!

Ibis has 20+ backends, like Snowflake, DuckDB (what we saw here), and more.