DataBake - A Simple Data Science Project Template
This project template makes certain assumptions. These are:
- Your data science project uses Python.
- You're working in a *nix environment. Cookiecutter will give you project structure in Windows, but some features of this template may not work.
To start a new project, run:
You will be asked to provide the variables described below.
||Name of your project.|
||Name of your repository. Defaults to lower case
||Name of your source code package. Defaults to
||Your name, or name of your project or organization.|
||A short description of your project.|
||Remote location for DVC to use for data storage.|
||A choice of several open source licenses. Choose "None" if your project is not open source.|
The resulting project structure is:
├ Makefile <- Makefile with helpful make commands. ├ README.md <- Top-level README for project developers. ├ LICENSE <- License file (unless no license was specified). ├ .env <- Secrets. DO NOT SOURCE CONTROL! ├ .gitignore <- Files to ignore. ├ pytest.ini <- PyTest configuration. ├ setup.cfg <- Project configuration. ├ data │ ├ external <- Data from external sources. │ ├ interim <- Intermediate, transformed data. │ ├ processed <- Final, canonical data sets from modelling. │ ├ raw <- Original, immutable raw data sets. │ ├ results <- Results of modelling and analysis. │ └ resources <- Useful resources (e.g. relevant papers). ├ models <- Trained and serialized models, model predictions, or model summaries. ├ notebooks <- Jupyter notebooks. Suggested naming convention is a of the form │ <step>.<version>-<initials>-<description>.ipynb │ For example 01.0-PH-really-interesting-analysis.ipynb. ├ outputs <- Generated outputs, such as figures or reports. ├ requirements.txt <- Python requirements file for reproducing the analysis environment. └ src <- Source code for use in the project. ├ package_name │ └ __init__.py ├ tests │ └ __init__.py └ setup.py