Skip to content
A simple data science project template
Python Makefile
Branch: master
Clone or download

Latest commit

Fetching latest commit…
Cannot retrieve the latest commit at this time.


Type Name Latest commit message Commit time
Failed to load latest commit information.

DataBake - A Simple Data Science Project Template

Build Status
DataBake is a simple data science project template based on the Cookiecutter templating engine and heavily inspired by the excellent Cookiecutter Data Science.


This project template makes certain assumptions. These are:

  • Your data science project uses Python.
  • You're working in a *nix environment. Cookiecutter will give you project structure in Windows, but some features of this template may not work.



To start a new project, run:


You will be asked to provide the variables described below.

Variable Description
project_name Name of your project.
repo_name Name of your repository. Defaults to lower case project_name with spaces and underscores replaced with hyphens.
package_name Name of your source code package. Defaults to repo_name with underscores instead of hyphens.
author_name Your name, or name of your project or organization.
description A short description of your project.
dvc_remote_type Remote location for DVC to use for data storage.
license A choice of several open source licenses. Choose "None" if your project is not open source.

The resulting project structure is:

├ Makefile              <- Makefile with helpful make commands.
├             <- Top-level README for project developers.
├ LICENSE               <- License file (unless no license was specified).
├ .env                  <- Secrets. DO NOT SOURCE CONTROL!
├ .gitignore            <- Files to ignore.
├ pytest.ini            <- PyTest configuration.
├ setup.cfg             <- Project configuration.
├ data
│   ├ external          <- Data from external sources.
│   ├ interim           <- Intermediate, transformed data.
│   ├ processed         <- Final, canonical data sets from modelling.
│   ├ raw               <- Original, immutable raw data sets.
│   ├ results           <- Results of modelling and analysis.
│   └ resources         <- Useful resources (e.g. relevant papers).
├ models                <- Trained and serialized models, model predictions, or model summaries.
├ notebooks             <- Jupyter notebooks. Suggested naming convention is a of the form
│                          <step>.<version>-<initials>-<description>.ipynb
│                          For example 01.0-PH-really-interesting-analysis.ipynb.
├ outputs               <- Generated outputs, such as figures or reports.
├ requirements.txt      <- Python requirements file for reproducing the analysis environment. 
└ src                   <- Source code for use in the project.
    ├ package_name
    │   └
    ├ tests
    │   └
You can’t perform that action at this time.