# Blaze

Blaze is a library that allows querying a variety of data sources without needing to know the specifics of those sources.

In [1]:
import blaze as bz

## SQL databases

Load a SQLite database.

In [2]:
people = bz.Data("sqlite:///people.db")

### Looking at the data

List all tables in the database:

In [9]:
people.fields

['addresses', 'friends', 'persons']

In [3]:
people.persons

Unnamed: 0,person_id,last_name,first_name,address_id,age
0,1,Jansen,Jan,1.0,31
1,2,Fransen,Frans,2.0,33
2,3,Nellis,Nele,2.0,29
3,4,Malone,Molly,3.0,48
4,5,Michaels,Micky,,4
5,6,Patricks,Pat,,78


In [4]:
people.addresses

Unnamed: 0,address_id,street,number,city
0,1,Fifth Avenue,1343,New York
1,2,Downing Street,10,London
2,3,Avenue Louise,203,Brussels
3,4,Oxford Street,212,London
4,5,Cantersteen,43,Brussels


### Queries

What are the distinct cities in the `addresses` table?

In [5]:
people.addresses.city.distinct()

Unnamed: 0,city
0,New York
1,London
2,Brussels


How meany addresses do we have per city?

In [6]:
people.addresses.city.count_values()

Unnamed: 0,city,count
0,Brussels,2
1,London,2
2,New York,1


In [7]:
bz.by(people.addresses.city, count=people.addresses.city.count())

Unnamed: 0,city,count
0,Brussels,2
1,London,2
2,New York,1


Which steets do we have in Brussels?

In [10]:
people.addresses[people.addresses.city == 'Brussels'].street

Unnamed: 0,street
0,Avenue Louise
1,Cantersteen


What is the average age of the people in the `persons` table?

In [11]:
people.persons.age.mean()

How many perons are older than 30?

In [12]:
people.persons[people.persons.age > 40].count()

Is Mr. Fransen less than 20 years old?

In [13]:
people.persons[people.persons.last_name == 'Fransen'].age < 20

Unnamed: 0,age
0,False


Who is less than 30 years old?

In [24]:
people.persons[people.persons.age < 30][['first_name', 'last_name']]

Unnamed: 0,first_name,last_name
0,Nele,Nellis
1,Micky,Michaels


Unfortunately, joins seem to break on NaNs.

In [70]:
bz.join(people.persons, people.addresses, 'address_id')

TypeError: Schema's of joining columns do not match

## CSV files

CSV files can be used as data sources as well.

In [72]:
persons   = bz.Data('persons.csv')
addresses = bz.Data('addresses.csv')

In [73]:
persons

Unnamed: 0,person_id,last_name,first_name,age,address_id
0,1,Smith,John,31,1
1,2,Jones,Jane,29,1
2,3,Doe,John,65,2


On this data, the join succeeds.

In [75]:
contacts = bz.join(persons, addresses, 'address_id')[['first_name', 'last_name', 'city']]
contacts

Unnamed: 0,first_name,last_name,city
0,John,Smith,Londen
1,Jane,Jones,Londen
2,John,Doe,Paris


Now save the contacts into a CSV file.

In [77]:
bz.into('contacts.csv', contacts);