# Project Folder Structure

In this notebook, we'll explore a well-organized project folder structure that is adapted from the Cookiecutter Data Science project by Driven Data. This structure is designed to facilitate collaboration, reproducibility, and efficient project management in data science. We will provide an overview of each directory and file in the structure, explain its purpose, and demonstrate how to set it up for your projects.

**Table of Contents**

1. [Project Folder Structure](#1)
2. [Example Folder Structure](#2)
3. [Exercise](#3)

---
## 1. Project Folder Structure <a id="1"></a>

Here is an example of folder structure:

```markdown
project/
├── LICENSE            <- Open-source license if one is chosen
├── Makefile           <- Makefile with convenience commands like `make data` or `make train`
├── README.md          <- The top-level README for developers using this project.
├── data
│   ├── external       <- Data from third party sources.
│   ├── interim        <- Intermediate data that has been transformed.
│   ├── processed      <- The final, canonical data sets for modeling.
│   └── raw            <- The original, immutable data dump.
│
├── envs
│   ├── .dev_env        <- Base environment for development
│   ├── .prod_env       <- Environment for production
│   ├── .dev2_env       <- Extra environment for development, if needed
│   ├── dev_req.txt     <- The requirements file for reproducing the base development environment
│   ├── prod_req.txt    <- The requirements file for reproducing the production environment
│   └── dev2_req.txt    <- The requirements file for reproducing the extra development environment
│
├── docs               <- A default mkdocs (www.mkdocs.org) or Sphinx (https://www.sphinx-doc.org) project.
│
├── models             <- Trained and serialized models, model predictions, or model summaries
│
├── notebooks          <- Jupyter notebooks. Naming convention is a number (for ordering),
│                         the creator's initials, and a short `-` delimited description, e.g.
│                         `1.0-jqp-initial-data-exploration`.
│
├── references         <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports            <- Generated analysis as HTML, PDF, LaTeX, etc.
│   └── figures        <- Generated graphics and figures to be used in reporting
│
└── src                <- Source code for use in this project.
    │
    ├── __init__.py             <- Makes src a Python module
    ├── config.py               <- Store useful variables and configuration
    ├── dataset.py              <- Scripts to download or generate data
    ├── features.py             <- Code to create features for modeling
    ├── modeling                
    │   ├── __init__.py 
    │   ├── predict.py          <- Code to run model inference with trained models          
    │   └── train.py            <- Code to train models
    └── plots.py                <- Code to create visualizations
```

---
## 2. Example Folder Structure  <a id="2"></a>

Let's create an example folder structure for a data science project using this layout.

In [None]:
import os

def create_folder_structure(base_path):
    folders = [
        "data/external", "data/interim", "data/processed", "data/raw",
        "envs",
        "docs",
        "models",
        "notebooks",
        "references",
        "reports/figures",
        "src",
        "src/modeling"
    ]
    
    for folder in folders:
        os.makedirs(os.path.join(base_path, folder), exist_ok=True)
    
    files = ["LICENSE", "Makefile", "README.md", 
             "envs/.dev_env", "envs/.prod_env", "envs/.dev2_env",
             "envs/dev_req.txt", "envs/prod_req.txt", "envs/dev2_req.txt",
             "src/__init__.py", "src/config.py", "src/dataset.py", 
             "src/features.py", "src/modeling/__init__.py", 
             "src/modeling/predict.py", "src/modeling/train.py", 
             "src/plots.py"]
    
    for file in files:
        open(os.path.join(base_path, file), 'a').close()

# Create the folder structure
base_path = "project"
create_folder_structure(base_path)

---
## 3. Exercise  <a id="3"></a>
Create a similar folder structure for this project adapting to your needs the provided script. Sort the files in the corresponding folder. Document the purpose of each folder in your README.md file if you find it necessary.

The idea is helping you understand and implement a structured layout for your data science projects, facilitating better organization, collaboration, and reproducibility.