### Demo "walmart"

In this notebook, we will show you step-by-step how to download the "Walmart" dataset, explain the structure and sample the data.

The "Walmart" dataset can be found in [Kaggle](https://www.kaggle.com/c/walmart-recruiting-store-sales-forecasting/data). This dataset contains 3 tables: "stores", "functions" and "depts".

**stores**

It is the main table of the dataset, its primary key is "Store".

Contains information about the type and size of each store.

| Field | Type        | Subtype | Additional Properties |
|-------|-------------|---------|-----------------------|
| Store | id          | integer |                       |
| Size  | numerical   | integer |                       |
| Type  | categorical |         |                       |

**features**

It is a children's table of "stores", it has no primary key and its foreign key is "Store".

Contains additional store information on a specific date.

| Fields       | Type      | Subtype | Additional Properties       |
|--------------|-----------|---------|-----------------------------|
| Store        | id        | integer | foreign key (stores.Store)  |
| Date         | datetime  |         | format: "%Y-%m-%d"          |
| IsHoliday    | boolean   |         |                             |
| Fuel_Price   | numerical | float   |                             |
| Unemployment | numerical | float   |                             |
| Temperature  | numerical | float   |                             |
| CPI          | numerical | float   |                             |
| MarkDown1    | numerical | float   |                             |
| MarkDown2    | numerical | float   |                             |
| MarkDown3    | numerical | float   |                             |
| MarkDown4    | numerical | float   |                             |
| MarkDown5    | numerical | float   |                             |

**depts**

It's another "store" children's table, at Kaggle the file is called "train.csv", but we've renamed it as "depts." It has no primary key and its foreign key is "store".

Contains information about departments on a date range between 2010-02-05 and 2012-11-01

| Fields       | Type      | Subtype | Additional Properties        |
|--------------|-----------|---------|------------------------------|
| Store        | id        | integer | foreign key (stores.Stores)  |
| Date         | datetime  |         | format: "%Y-%m-%d"           |
| Weekly_Sales | numerical | float   |                              |
| Dept         | numerical | integer |                              |
| IsHoliday    | boolean   |         |                              |

### 1. Download the demo data

To download the demo data we will use the `load_demo` method. In this example we will use the **walmart** dataset. Datasets will be downloaded from an [Amazon S3 bucket](http://sdv-datasets.s3.amazonaws.com/index.html).

In [None]:
from sdv import load_demo

metadata, tables = load_demo(dataset_name='walmart', metadata=True)

### 2. Create an instance of SDV

Once the data is downloaded, we can create a new instance of SDV.

In [None]:
from sdv import SDV

sdv = SDV()

### 3. Train the model

Once the SDV object has been created, we must fit the model.

We just need to call the "fit" method with the previous metadata and the tables from the csv files.

In [None]:
sdv.fit(metadata, tables=tables)

### 4. Data sampling

After fitting the model, we are ready to generate data. To create data for all the tables we will call "sample_all" method.

In [None]:
samples = sdv.sample_all()

In [None]:
samples['stores'].head()

In [None]:
samples['features'].head()

In [None]:
samples['depts'].head()

This function will return a dictionary with all tables in the dataset with a dataframe for each table.

Alternatively, we can sample, table by table, by calling the "sample" method with the table name and the number of rows to sample.

In [None]:
sdv.sample('stores', 5)

In [None]:
sdv.sample('features', 5)

In [None]:
sdv.sample('depts', 5)