### Demo "Airbnb simplified"

In this notebook, we will show you step-by-step how to download the "Airbnb simplified" dataset, explain the structure and sample the data.

The original "Airbnb" dataset can be found in [Kaggle](https://www.kaggle.com/c/airbnb-recruiting-new-user-bookings/data), this dataset version has been simplified. There are 2 tables: "users" and "sessions".

**users**

It is the main table of the dataset, its primary key is "id".

Contains information about each user.

| Fields                  | Type        | Subtype | Additional Properties  |
|-------------------------|-------------|---------|------------------------|
| id                      | id          | string  |                        |
| date_account_created    | datetime    |         | format: "%Y-%m-%d"     |
| timestamp_first_active  | datetime    |         | format: "%Y%m%d%H%M%S" |
| date_first_booking      | datetime    |         | format: "%Y-%m-%d"     |
| gender                  | categorical |         |                        |
| age                     | numerical   | integer |                        |
| signup_method           | categorical |         |                        |
| signup_flow             | categorical |         |                        |
| language                | categorical |         |                        |
| affiliate_channel       | categorical |         |                        |
| affiliate_provider      | categorical |         |                        |
| first_affiliate_tracked | categorical |         |                        |
| signup_app              | categorical |         |                        |
| first_device_type       | categorical |         |                        |
| first_browser           | categorical |         |                        |
| country_destination     | categorical |         |                        |

**sessions**

It is a children's table of "users", it has no primary key and its foreign key is "user_id".

Contains information about user sessions.

| Fields                  | Type        | Subtype | Additional Properties  |
|-------------------------|-------------|---------|------------------------|
| user_id                 | id          | string  | foreign key (users.id) |
| action                  | categorical |         |                        |
| action_type             | categorical |         |                        |
| action_detail           | categorical |         |                        |
| device_type             | categorical |         |                        |
| secs_elapsed            | numerical   | integer |                        |

### 1. Download the demo data

To download the demo data we will use the `load_demo` method. In this example we will use the **airbnb-simplified** dataset. Datasets will be downloaded from an Amazon S3 bucket.

In [None]:
from sdv import load_demo

metadata, tables = load_demo(dataset_name='airbnb-simplified', metadata=True)

By default, datasets will be downloaded to the "data" folder within SDV. If SDV is installed via pip, the data will be stored in the virtual environment. You can change the output path using the "data_path" argument.

In [None]:
from sdv import load_demo

metadata, tables = load_demo(
    dataset_name='airbnb-simplified',
    data_path='/home/josedavid/.sdv',
    metadata=True
)

### 2. Create an instance of SDV

Once the data is downloaded, we can create a new instance of SDV.

In [None]:
from sdv import SDV

sdv = SDV()

### 3. Train the model

Once the SDV object has been created, we must fit the model.

We just need to call the "fit" method with the previous metadata and the tables from the csv files.

In [None]:
sdv.fit(metadata, tables=tables)

### 4. Data sampling

After fitting the model, we are ready to generate data. To create data for all the tables we will call "sample_all" method.

In [None]:
samples = sdv.sample_all()

In [None]:
samples['users'].head()

In [None]:
samples['sessions'].head()

This function will return a dictionary with all tables in the dataset with a dataframe for each table.

Alternatively, we can sample, table by table, by calling the "sample" method with the table name and the number of rows to sample.

In [None]:
sdv.sample('users', 5)

In [None]:
sdv.sample('sessions', 5)

### 5. Data evaluation

Once you have generated sample data, we may want to evaluate them.

SDV implements an evaluation package to calculate scores using different descriptors and metrics. In this example, we will use metrics and descriptors by default.

In [None]:
from sdv.evaluation import evaluate

evaluate(samples, real=tables, metadata=sdv.metadata)