# Données, approches fonctionnelles - énoncé - blaze - odo

Ce notebook illustre le module [Blaze](http://blaze.pydata.org/en/latest/).

In [None]:
%pylab inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')
import pyensae
from pyquickhelper.helpgen import NbImage
from jyquickhelper import add_notebook_menu
add_notebook_menu()

Populating the interactive namespace from numpy and matplotlib


In [None]:
from actuariat_python.data import table_mortalite_euro_stat 
table_mortalite_euro_stat()
import pandas
df = pandas.read_csv("mortalite.txt", sep="\t", encoding="utf8", low_memory=False)
df.head()

Unnamed: 0,annee,valeur,age,age_num,indicateur,genre,pays
0,2009,0.0008,Y01,1.0,DEATHRATE,F,AM
1,2008,0.00067,Y01,1.0,DEATHRATE,F,AM
2,2007,0.00052,Y01,1.0,DEATHRATE,F,AM
3,2006,0.00123,Y01,1.0,DEATHRATE,F,AM
4,2013,0.00016,Y01,1.0,DEATHRATE,F,AT


### Blaze : interface commune

[Blaze](http://blaze.pydata.org/en/latest/) fournit une interface commune, proche de celle des Dataframe, pour de nombreux modules comme [bcolz](http://bcolz.blosc.org/)...

* [Pandas to Blaze](http://blaze.pydata.org/en/latest/rosetta-pandas.html)

In [None]:
from blaze import Data

In [None]:
bs = Data(df)

In [None]:
bs.shape

(2760921,)

In [None]:
bs.schema

dshape("""{
  annee: int64,
  valeur: float64,
  age: ?string,
  age_num: float64,
  indicateur: ?string,
  genre: ?string,
  pays: ?string
  }""")

In [None]:
life = bs[bs.indicateur == 'LIFEXP']
life.head()

Unnamed: 0,annee,valeur,age,age_num,indicateur,genre,pays
396432,2009,76.5,Y01,1.0,LIFEXP,F,AM
396433,2008,76.4,Y01,1.0,LIFEXP,F,AM
396434,2007,76.5,Y01,1.0,LIFEXP,F,AM
396435,2006,75.9,Y01,1.0,LIFEXP,F,AM
396436,2013,83.0,Y01,1.0,LIFEXP,F,AT
396437,2012,82.8,Y01,1.0,LIFEXP,F,AT
396438,2011,83.1,Y01,1.0,LIFEXP,F,AT
396439,2010,82.8,Y01,1.0,LIFEXP,F,AT
396440,2009,82.5,Y01,1.0,LIFEXP,F,AT
396441,2008,82.5,Y01,1.0,LIFEXP,F,AT


Le design de Blaze est différent, un filtrage de la base de données retourne un type différent.

In [None]:
type(bs[bs.indicateur == 'LIFEXP']), type(bs)

(blaze.expr.expressions.Selection, blaze.interactive._Data)

Contrairement à pandas :

In [None]:
type(df[df.indicateur=='LIFEXP']), type(df)

(pandas.core.frame.DataFrame, pandas.core.frame.DataFrame)

### Odo, conversion en tout genre

[odo](https://odo.readthedocs.org/en/latest/) convertit à peu près n'importe quoi en n'importe quoi, fichiers, fichiers compressés, bases de données, Spark, Hadoop...

* [What sorts of URI’s does odo support?](https://odo.readthedocs.org/en/latest/uri.html#what-sorts-of-uri-s-does-odo-support)

In [None]:
from odo import odo

In [None]:
odo(df, "mortalite_compresse.csv.gz")

<odo.backends.csv.CSV at 0x23ed95f2160>

Et la relecture :

In [None]:
df_lu = odo("mortalite_compresse.csv.gz", pandas.DataFrame)
df_lu.shape

(2760921, 7)