https://labs.quansight.org/blog/2020/06/ibis-an-idiomatic-flavor-of-sql-for-python-programmers/

In [1]:
!pip install ibis-framework

Collecting ibis-framework
  Downloading ibis_framework-1.3.0-py3-none-any.whl (601 kB)
[K     |████████████████████████████████| 601 kB 3.3 MB/s eta 0:00:01
Collecting regex
  Downloading regex-2020.6.8-cp37-cp37m-manylinux2010_x86_64.whl (661 kB)
[K     |████████████████████████████████| 661 kB 5.0 MB/s eta 0:00:01
[?25hCollecting multipledispatch>=0.6.0
  Downloading multipledispatch-0.6.0-py3-none-any.whl (11 kB)
Installing collected packages: regex, multipledispatch, ibis-framework
Successfully installed ibis-framework-1.3.0 multipledispatch-0.6.0 regex-2020.6.8


In [2]:
%matplotlib inline
import ibis
import pathlib, requests

db_path = pathlib.Path.cwd() / 'lahmansbaseballdb.sqlite'

if not db_path.exists():          # Downloads database if necessary
    with open(db_path, 'wb') as f:
        URL = 'https://github.com/WebucatorTraining/lahman-baseball-mysql/raw/master/lahmansbaseballdb.sqlite'
        req = requests.get(URL)
        f.write(req.content)

client = ibis.sqlite.connect(db_path.name) # Opens SQLite database connection

In [3]:
ls

ibis-guide.ipynb  lahmansbaseballdb.sqlite  README.md


In [4]:
tables = client.list_tables()
print(f'This database has {len(tables)} tables.')

This database has 29 tables.


In [5]:
tables

['allstarfull',
 'appearances',
 'awardsmanagers',
 'awardsplayers',
 'awardssharemanagers',
 'awardsshareplayers',
 'batting',
 'battingpost',
 'collegeplaying',
 'divisions',
 'fielding',
 'fieldingof',
 'fieldingofsplit',
 'fieldingpost',
 'halloffame',
 'homegames',
 'leagues',
 'managers',
 'managershalf',
 'parks',
 'people',
 'pitching',
 'pitchingpost',
 'salaries',
 'schools',
 'seriespost',
 'teams',
 'teamsfranchises',
 'teamshalf']

invoke the table method associated with the client object called with the appropriate names.

In [6]:
halloffame = client.table('halloffame', database='base')
appearances = client.table('appearances', database='base')

We can examine the contents of these Ibis table expressions using the `TableExpr.limit` or the `TableExpr.head` method 

In [12]:
sample = halloffame.head()
print(f'The object sample is of type {type(sample).__name__}')

The object sample is of type TableExpr


In [13]:
str(sample.compile())

'SELECT t0."ID", t0."playerID", t0.yearid, t0."votedBy", t0.ballots, t0.needed, t0.votes, t0.inducted, t0.category, t0.needed_note \nFROM base.halloffame AS t0\n LIMIT ? OFFSET ?'

Display dag (need to find how to do this in notebook)

In [14]:
sample

ref_0
SQLiteTable[table]
  name: halloffame
  schema:
    ID : int32
    playerID : string
    yearid : int16
    votedBy : string
    ballots : int16
    needed : int16
    votes : int16
    inducted : string
    category : string
    needed_note : string

Limit[table]
  table:
    Table: ref_0
  n:
    5
  offset:
    0

In [15]:
result = sample.execute()
print(f'The type of result is {type(result).__name__}')
result    # Leading 5 rows of halloffame table)

The type of result is DataFrame


Unnamed: 0,ID,playerID,yearid,votedBy,ballots,needed,votes,inducted,category,needed_note
0,1,cobbty01,1936,BBWAA,226,170,222,Y,Player,
1,2,ruthba01,1936,BBWAA,226,170,215,Y,Player,
2,3,wagneho01,1936,BBWAA,226,170,215,Y,Player,
3,4,mathech01,1936,BBWAA,226,170,205,Y,Player,
4,5,johnswa01,1936,BBWAA,226,170,189,Y,Player,


A similar extraction of the leading five rows from the appearances table (in one line) gives the following table with 23 columns:

In [16]:
appearances.head().execute() 

Unnamed: 0,ID,yearID,teamID,team_ID,lgID,playerID,G_all,GS,G_batting,G_defense,...,G_2b,G_3b,G_ss,G_lf,G_cf,G_rf,G_of,G_dh,G_ph,G_pr
0,1,1871,TRO,8,,abercda01,1,1,1,1,...,0,0,1,0,0,0,0,0,0,0
1,2,1871,RC1,7,,addybo01,25,25,25,25,...,22,0,3,0,0,0,0,0,0,0
2,3,1871,CL1,3,,allisar01,29,29,29,29,...,2,0,0,0,29,0,29,0,0,0
3,4,1871,WS3,9,,allisdo01,27,27,27,27,...,0,0,0,0,0,0,0,0,0,0
4,5,1871,RC1,7,,ansonca01,25,25,25,25,...,2,20,0,1,0,0,1,0,0,0


## Filtering and selecting data

As mentioned earlier, Ibis uses familiar Pandas syntax to build SQL queries. As an example, let's look at the various kinds of entries in the category column from the halloffame table. 

In [17]:
halloffame.category.value_counts().execute()

Unnamed: 0,category,count
0,Manager,74
1,Pioneer/Executive,41
2,Player,4066
3,Umpire,10
