So far, we have seen Ibis in interactive mode. Interactive mode (also known as eager mode) makes Ibis return the results of an operation immediately.

In most cases, instead of using interactive mode, it makes more sense to use the default lazy mode. In lazy mode, Ibis won't be executing the operations automatically, but instead, will generate an expression to be executed at a later time.

Let's see this in practice, starting with the same example as in previous tutorials - the geography database.

In [2]:
import os
import tempfile
import ibis


connection = ibis.sqlite.connect(os.path.join('data/geography.db'))
countries = connection.table('countries')

In [5]:
ibis.options.interactive = False

countries['name', 'continent', 'population'].limit(3)

r0 := AlchemyTable: countries
  iso_alpha2  string
  iso_alpha3  string
  iso_numeric int32
  fips        string
  name        string
  capital     string
  area_km2    float64
  population  int32
  continent   string

r1 := Selection[r0]
  selections:
    name:       r0.name
    continent:  r0.continent
    population: r0.population

Limit[r1, n=3]

In lazy mode, instead of obtaining the results after each operation, we build an expression (a graph) of all the operations that need to be done. After all the operations are recorded, the graph is sent to the backend which will perform the operation in an efficient way - only moving to memory the required data.

In [6]:
countries_expression = countries['name', 'continent', 'population'].limit(3)
type(countries_expression)

ibis.expr.types.relations.TableExpr

The type is an Ibis TableExpr, since the result is a table (in a broad way, you can consider it a dataframe).

We can continue building our expression if we haven't finished yet. Or once we are done, we can simply request it from the database using the method .execute().

In [7]:
countries_expression.execute()

Unnamed: 0,name,continent,population
0,Andorra,EU,84000
1,United Arab Emirates,AS,4975593
2,Afghanistan,AS,29121286


We can build other types of expressions, for example, one that instead of returning a table, returns a columns.

In [10]:
population_in_millions = (countries['population'] / 1_000_000).name('population_in_millions')
population_in_millions.execute()

0       0.084000
1       4.975593
2      29.121286
3       0.086754
4       0.013254
         ...    
247    23.495361
248     0.159042
249    49.000000
250    13.460305
251    13.061000
Name: population_in_millions, Length: 252, dtype: float64

In [11]:
type(population_in_millions)

ibis.expr.types.numeric.FloatingColumn

We can combine the previous expression to be a column of a table expression.

In [13]:
countries['name', 'continent', population_in_millions].limit(3)

Unnamed: 0,name,continent,population_in_millions
0,Andorra,EU,0.084
1,United Arab Emirates,AS,4.975593
2,Afghanistan,AS,29.121286


Since we are in lazy mode (not interactive), those expressions don't request any data from the database unless explicitly requested with .execute().

In [14]:
countries['name', 'continent', population_in_millions].limit(3).execute()

Unnamed: 0,name,continent,population_in_millions
0,Andorra,EU,0.084
1,United Arab Emirates,AS,4.975593
2,Afghanistan,AS,29.121286


In [15]:
ibis.options.verbose = True

countries['name', 'continent', population_in_millions].limit(3).execute()

SELECT t0.name, t0.continent, t0.population / CAST(? AS REAL) AS population_in_millions 
FROM main.countries AS t0
 LIMIT ? OFFSET ?


Unnamed: 0,name,continent,population_in_millions
0,Andorra,EU,0.084
1,United Arab Emirates,AS,4.975593
2,Afghanistan,AS,29.121286


By default, the logging is done to the terminal, but we can process the query with a custom function. This allows us to save executed queries to a file, save to a database, send them to a web service, etc.

For example, to save queries to a file, we can write a custom function that given a query, saves it to a log file.

In [25]:
import os
import datetime
import tempfile
from pathlib import Path


def log_query_to_file(query: str) -> None:
    """
    Log queries to `data/tutorial_queries.log`.
    
    Each file is a query. Line breaks in the query are 
    represented with the string '\n'.
    
    A timestamp of when the query is executed is added.
    """
    dirname = Path('logs')
    fname = dirname / 'tutorial_queries.log'
    query_in_a_single_line = query.replace('\n', r'\n')
    with fname.open(mode='a') as f:
        f.write(f'{query_in_a_single_line}\n')

In [26]:
import time

ibis.options.verbose_log = log_query_to_file

countries.execute()
time.sleep(1.)
countries['name', 'continent', population_in_millions].limit(3).execute()

Unnamed: 0,name,continent,population_in_millions
0,Andorra,EU,0.084
1,United Arab Emirates,AS,4.975593
2,Afghanistan,AS,29.121286


In [28]:
!cat -n logs/tutorial_queries.log

     1	SELECT t0.iso_alpha2, t0.iso_alpha3, t0.iso_numeric, t0.fips, t0.name, t0.capital, t0.area_km2, t0.population, t0.continent \nFROM main.countries AS t0\n LIMIT ? OFFSET ?
     2	SELECT t0.name, t0.continent, t0.population / CAST(? AS REAL) AS population_in_millions \nFROM main.countries AS t0\n LIMIT ? OFFSET ?
