# 0.0 - Retrieve Dataset

**Actions**: Before we can begin analyzing the data, we need to retrieve it. Luckily this is largely handled by our automation build tool, the `doit` python package. 

After installing the dependencies, run `doit` in the project's root directory. It should execute the `download_dataset` and `unpack_dataset` tasks.

**Dependencies**: This notebook has no dependencies and can be run at the start of the analysis process.

**Targets**: This notebook outputs a `data/raw/marketing_sample_for_walmart_com-ecommerce__20191201_20191231__30k_data.csv` data file.

In [1]:
!doit -f "./../dodo.py" list

download_dataset   Download the dataset from Kaggle.
unpack_dataset     Unpack the raw dataset (.zip) as the (.csv) file.


In [2]:
!doit -f "./../dodo.py" run


2021-10-01 08:16:53,841 - __main__ - INFO - Loading envvars...

2021-10-01 08:16:53,857 - __main__ - INFO - Done loading envvars.

2021-10-01 08:16:53,857 - __main__ - INFO - Dataset exists.

2021-10-01 08:16:53,857 - __main__ - INFO - Extracting "home/sdf/marketing_sample_for_walmart_com-ecommerce__20191201_20191231__30k_data.csv" from "data/raw/walmart-product-data-2019.zip" to "data/raw"


.  download_dataset
.  unpack_dataset



2021-10-01 08:16:54,122 - __main__ - INFO - Done extracting "home/sdf/marketing_sample_for_walmart_com-ecommerce__20191201_20191231__30k_data.csv"

2021-10-01 08:16:54,122 - __main__ - INFO - Moving extracted file to "data/raw"

2021-10-01 08:16:54,160 - __main__ - INFO - Moved extracted file



In [3]:
# Show our current tasks.
with open("./../dodo.py") as f: # The with keyword automatically closes the file when you are done
    print(f.read())

# dodo.py
#
# Tasks in this file are executed by the doit cli.

DOIT_CONFIG = { 'action_string_formatting': 'both' }

def task_download_dataset():
    """
    Download the dataset from Kaggle.
    """
    return {
        'targets': [ 'data/raw/walmart-product-data-2019.zip' ],
        'actions': [ 'python src/data/download_dataset.py {targets}' ],
        'clean': True
    }

def task_unpack_dataset():
    """
    Unpack the raw dataset (.zip) as the (.csv) file.
    """
    return {
        'file_dep': [ 'data/raw/walmart-product-data-2019.zip' ],
        'targets': [ 'data/raw/home/sdf/marketing_sample_for_walmart_com-ecommerce__20191201_20191231__30k_data.csv' ],
        'actions': [ 'python src/data/make_dataset.py {dependencies} data/raw' ],
        'clean': True
    }


In [4]:
%ls "./../data/raw/"
# Displays the files in the data/raw folder.

 Volume in drive C has no label.
 Volume Serial Number is F69E-1392

 Directory of C:\Users\effen\OneDrive\Documents\RIT\ISTE 780\Projects\price-clf\data\raw

10/01/2021  08:16 AM    <DIR>          .
10/01/2021  08:16 AM    <DIR>          ..
10/01/2021  12:52 AM                 0 .gitkeep
10/01/2021  08:16 AM        45,653,183 marketing_sample_for_walmart_com-ecommerce__20191201_20191231__30k_data.csv
10/01/2021  07:04 AM        14,308,028 walmart-product-data-2019.zip
               3 File(s)     59,961,211 bytes
               2 Dir(s)  70,425,075,712 bytes free
