# Dataset reader source

Execute the following cell in order to make the table of contents appear

In [None]:
%%javascript
$.getScript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc.js')

In this notebook, we will see differents ways to create dataset from differents sources.

## Initialize a context

As we already seen in previous chapter, context is a key concept in ddapi. It allows us to communicate with the environnements by creating a DB connexion for example

In [None]:
from dd import DB
from dd.api.contexts import LocalContext

db = DB(dbtype='sqlite', filename=':memory:')
context = LocalContext(db)

In [None]:
context.set_default_write_options(if_exists="replace", index=False)

In this case, we need a context to create a DatasetReader (through context.read) which will allow us to create a dataset from differents sources (csv, database, or dataframe).

In [None]:
dataset_reader = context.read
dataset_reader

We can now create our dataset from differents sources with dataset_reader (which is a DatasetReader instance)

## From a CSV file

The dd library comes with some data package in it. We can access the files thanks to the pkg_resources from the standard library :

In [None]:
titanic_datapath = 'https://raw.githubusercontent.com/justmarkham/DAT8/master/data/titanic.csv'

Now let's use our DatasetReader to create dataset this csv

In [None]:
train_dataset = dataset_reader.csv(titanic_datapath, output_table='titanic_train', normalize=True)

In [None]:
type(train_dataset)

In [None]:
train_dataset.head()

## From a table

At this point, we have a table named titanic_train. We will use it to create a new datase through our DatasetReader oblect :

In [None]:
train_dataset_from_table = dataset_reader.table(table_name='titanic_train')

In [None]:
train_dataset_from_table.head()

As we can see, we have a new dataset

## From a dataframe

Let's see how to create a dataset from dataframe

In [None]:
dataframe = train_dataset_from_table.collect()
type(dataframe)

As we can we have a pandas' dataframe. To create a new dataset we just have to call the dataset_reader.dataframe method :

In [None]:
dataset_from_dataframe = dataset_reader.dataframe(dataframe, output_table='dataframe_table')

In [None]:
dataset_from_dataframe.head()

Great! It's work well.