From 44c58f26e45d0391a2bb02d26e27fcc83e1ccf3d Mon Sep 17 00:00:00 2001 From: martonvago Date: Mon, 28 Jul 2025 15:35:47 +0100 Subject: [PATCH 1/4] docs: :memo: add template README --- template/README.md.jinja | 17 +---- template/docs/README.md | 160 +++++++++++++++++++++++++++++++++++++++ 2 files changed, 161 insertions(+), 16 deletions(-) create mode 100644 template/docs/README.md diff --git a/template/README.md.jinja b/template/README.md.jinja index af5ad06..e223305 100644 --- a/template/README.md.jinja +++ b/template/README.md.jinja @@ -1,16 +1 @@ - -## Post-setup steps - -- Run `just list-todos` to get a list of TODO items you need to fill out. - -## Versioning and changelog - -This project uses -[Commitizen](https://commitizen-tools.github.io/commitizen/) to update -versions and generate changelogs. Based on the [Conventional -Commits](https://www.conventionalcommits.org/en/v1.0.0/) message, it -will automatically update the version in both `pyproject.toml` and -`datapackage.json`. The [Data Package](https://datapackage.org/) -standard suggests using their version of [Semantic -Versioning](https://datapackage.org/recipes/data-package-version/). So -follow these conventions when making commits to this repository. + diff --git a/template/docs/README.md b/template/docs/README.md new file mode 100644 index 0000000..76e0465 --- /dev/null +++ b/template/docs/README.md @@ -0,0 +1,160 @@ +# A Data Package built with Seedcase packages + +This [Data Package](https://datapackage.org/) was generated from the +[`template-data-package`](https://github.com/seedcase-project/template-data-package) +Seedcase template. + +## Project files and folders + +- `docs/`: Documentation about using and developing the Data Package, + including this README file. +- `scripts/`: Python scripts for creating and managing the Data + Package. Files describing the data will be generated here. +- `.copier-answers.yml`: Contains the answers you gave when copying + the project from the template. You should not modify this file + directly. +- `.cz.toml`: + [Commitizen](https://commitizen-tools.github.io/commitizen/) + configuration file for managing versions and changelogs. +- `.pre-commit-config.yaml`: [Pre-commit](https://pre-commit.com/) + configuration file for managing and running checks before each + commit. +- `.typos.toml`: [typos](https://github.com/crate-ci/typos) spell + checker configuration file. +- `CITATION.cff`: Structured citation metadata for your project. +- `justfile`: [`just`](https://just.systems/man/en/) configuration + file for scripting project tasks. +- `main.py`: Central script file for the Data Package. This is where + helper scripts are invoked and work together to create and manage + the Data Package. +- `pyproject.toml`: Main Python project configuration file defining + metadata and dependencies. +- `README.md`: Autogenerated description of the Data Package. Not a + development guide. Information on using and developing the project + should be included in `docs/README.md`. +- `ruff.toml`: [Ruff](https://docs.astral.sh/ruff/) configuration file + for linting and formatting Python code. +- `uv.lock`: Lockfile used by [`uv`](https://docs.astral.sh/uv/) to + record exact versions of installed dependencies. + +## How to develop your Data Package + +In your new project generated from the `template-data-package`, the +first steps for creating and developing your Data Package are already +set up in `main.py`. For more detailed instructions on using Seedcase +Sprout to organise your Data Package, see the +[guides](https://sprout.seedcase-project.org/docs/guide/) on Sprout's +website. You can read more about the files and folders created by +`main.py` on the +[Outputs](https://sprout.seedcase-project.org/docs/design/interface/outputs) +page of the design documentation. + +### Creating package properties + +1. Run `main.py` to create the `scripts/package_properties.py` file for + the properties of your Data Package. + + ``` {.bash filename="Terminal"} + just build + ``` + + You can also run `main.py` by clicking the "Run" button in your IDE. + +2. Open `scripts/package_properties.py` and fill in all required + fields. Also fill in any optional fields you find useful. You can + always update these later. Make sure to save the file. + +3. In `main.py`, uncomment the lines referencing the + `package_properties` and `package_path` variables: lines 4, 17, 29, + 34, and 36. + +4. Rerun `main.py` to create the `datapackage.json` and `README.md` + files for your Data Package. + +### Creating a new resource + +#### With data to add to the resource + +If you already have some data in a tidy format, load this as a Polars +data frame into the `raw_data` variable in `main.py`. + +1. Uncomment lines up to and including the creation of resource + properties: lines 1, 18, 20, and 23-26. + +2. Fill in the `resource_name` argument in line 24. + +3. Rerun `main.py` to create the + `scripts/resource_properties_.py` file for the properties of + the new resource. + +4. Open `scripts/resource_properties_.py` and fill in all + required fields. Also fill in any optional fields you find useful. + You can always update these later. Make sure to save the file. + +5. In `package_properties.py`, import your new resource properties by + uncommenting line 3 and updating it with the name of your resource. + Also uncomment the `resources` field in lines 46-48 and update the + name of the resource properties in the array to match the name of + your new resource. + +6. In `main.py`, import your new resource properties by uncommenting + line 6 and updating it with the name of your resource. + +7. Uncomment all remaining lines in the file and rename the + `resource_properties` variable to the name of the new resource + properties you just imported. + +8. Rerun `main.py`. This will: + + - Update `datapackage.json` and `README.md`. + - Create a `resources/` folder containing a folder for your new + resource. In here, you will find a `batch/` folder with the + individual data batches you've uploaded for this resource and a + `data.parquet` file containing all resource data. + +#### Without data to add to the resource + +You can create a new resource without adding data to it using a shorter +version of the steps above. + +- Step 1: uncomment only lines 23, 24, and 26. +- Step 4: if you cannot describe columns in your data at this stage, + comment out the `schema` field. +- Steps 6-7: as you have no data to add, you should skip these steps. + +## How to use the `justfile` + +The `justfile` contains scripts or "recipes" that are shorthands for +performing common project tasks. You can get an overview of available +recipes by running + +``` {.bash filename="Terminal"} +just +``` + +in the project root. + +You can run a recipe by typing + +``` {.bash filename="Terminal"} +just +``` + +A simple workflow would be running + +1. `just build` repeatedly while working on a new feature to test that + it's working +2. `just run-all` before submitting your work for review to make sure + all checks pass + +## Versioning and changelog + +This project uses +[Commitizen](https://commitizen-tools.github.io/commitizen/) to update +versions and generate changelogs. Based on the [Conventional +Commits](https://www.conventionalcommits.org/en/v1.0.0/) message, it +will automatically update the version in both `pyproject.toml` and +`datapackage.json`. The [Data Package](https://datapackage.org/) +standard suggests using their version of [Semantic +Versioning](https://datapackage.org/recipes/data-package-version/). So +follow these conventions when making commits to this repository. From 317c3824984a7ea4f72382555f0eaed18eaffbcc Mon Sep 17 00:00:00 2001 From: martonvago <57952344+martonvago@users.noreply.github.com> Date: Tue, 29 Jul 2025 12:41:44 +0100 Subject: [PATCH 2/4] docs: :memo: apply suggestions from code review Co-authored-by: Luke W. Johnston --- template/docs/README.md | 46 ++++++++++++++++++----------------------- 1 file changed, 20 insertions(+), 26 deletions(-) diff --git a/template/docs/README.md b/template/docs/README.md index 76e0465..8b11877 100644 --- a/template/docs/README.md +++ b/template/docs/README.md @@ -11,8 +11,8 @@ Seedcase template. - `scripts/`: Python scripts for creating and managing the Data Package. Files describing the data will be generated here. - `.copier-answers.yml`: Contains the answers you gave when copying - the project from the template. You should not modify this file - directly. + the project from the template. **You should not modify this file + directly.** - `.cz.toml`: [Commitizen](https://commitizen-tools.github.io/commitizen/) configuration file for managing versions and changelogs. @@ -31,7 +31,7 @@ Seedcase template. metadata and dependencies. - `README.md`: Autogenerated description of the Data Package. Not a development guide. Information on using and developing the project - should be included in `docs/README.md`. + should be included in the `docs/` folder. - `ruff.toml`: [Ruff](https://docs.astral.sh/ruff/) configuration file for linting and formatting Python code. - `uv.lock`: Lockfile used by [`uv`](https://docs.astral.sh/uv/) to @@ -43,7 +43,7 @@ In your new project generated from the `template-data-package`, the first steps for creating and developing your Data Package are already set up in `main.py`. For more detailed instructions on using Seedcase Sprout to organise your Data Package, see the -[guides](https://sprout.seedcase-project.org/docs/guide/) on Sprout's +[guide](https://sprout.seedcase-project.org/docs/guide/) on Sprout's website. You can read more about the files and folders created by `main.py` on the [Outputs](https://sprout.seedcase-project.org/docs/design/interface/outputs) @@ -54,7 +54,7 @@ page of the design documentation. 1. Run `main.py` to create the `scripts/package_properties.py` file for the properties of your Data Package. - ``` {.bash filename="Terminal"} + ``` bash just build ``` @@ -65,8 +65,7 @@ page of the design documentation. always update these later. Make sure to save the file. 3. In `main.py`, uncomment the lines referencing the - `package_properties` and `package_path` variables: lines 4, 17, 29, - 34, and 36. + `package_properties` and `package_path` variables. 4. Rerun `main.py` to create the `datapackage.json` and `README.md` files for your Data Package. @@ -75,13 +74,18 @@ page of the design documentation. #### With data to add to the resource -If you already have some data in a tidy format, load this as a Polars +While you can create resource properties without data, it is +a lot more challenging. If at all possible, only create a +resource properties object when you have data to use to +at least pre-fill in some of the important fields. +In order to use Sprout, the data needs to already be in a tidy format. +When it is, load the data as a Polars data frame into the `raw_data` variable in `main.py`. 1. Uncomment lines up to and including the creation of resource - properties: lines 1, 18, 20, and 23-26. + properties. -2. Fill in the `resource_name` argument in line 24. +2. Fill in the `resource_name` argument. 3. Rerun `main.py` to create the `scripts/resource_properties_.py` file for the properties of @@ -92,15 +96,15 @@ data frame into the `raw_data` variable in `main.py`. You can always update these later. Make sure to save the file. 5. In `package_properties.py`, import your new resource properties by - uncommenting line 3 and updating it with the name of your resource. - Also uncomment the `resources` field in lines 46-48 and update the + uncommenting and updating it with the name of your resource. + Also uncomment the `resources` field and update the name of the resource properties in the array to match the name of your new resource. 6. In `main.py`, import your new resource properties by uncommenting - line 6 and updating it with the name of your resource. + it and updating it with the name of your resource. -7. Uncomment all remaining lines in the file and rename the +7. Uncomment everything else in the `main.py` file and rename the `resource_properties` variable to the name of the new resource properties you just imported. @@ -112,23 +116,13 @@ data frame into the `raw_data` variable in `main.py`. individual data batches you've uploaded for this resource and a `data.parquet` file containing all resource data. -#### Without data to add to the resource - -You can create a new resource without adding data to it using a shorter -version of the steps above. - -- Step 1: uncomment only lines 23, 24, and 26. -- Step 4: if you cannot describe columns in your data at this stage, - comment out the `schema` field. -- Steps 6-7: as you have no data to add, you should skip these steps. - ## How to use the `justfile` The `justfile` contains scripts or "recipes" that are shorthands for performing common project tasks. You can get an overview of available recipes by running -``` {.bash filename="Terminal"} +``` bash just ``` @@ -136,7 +130,7 @@ in the project root. You can run a recipe by typing -``` {.bash filename="Terminal"} +``` bash just ``` From 8cafba2caaa10795fbe205d1bfe30a098a46b5b2 Mon Sep 17 00:00:00 2001 From: "pre-commit-ci[bot]" <66853113+pre-commit-ci[bot]@users.noreply.github.com> Date: Tue, 29 Jul 2025 11:42:28 +0000 Subject: [PATCH 3/4] chore(pre-commit): :pencil2: automatic fixes --- template/docs/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/template/docs/README.md b/template/docs/README.md index 8b11877..040424a 100644 --- a/template/docs/README.md +++ b/template/docs/README.md @@ -75,10 +75,10 @@ page of the design documentation. #### With data to add to the resource While you can create resource properties without data, it is -a lot more challenging. If at all possible, only create a +a lot more challenging. If at all possible, only create a resource properties object when you have data to use to at least pre-fill in some of the important fields. -In order to use Sprout, the data needs to already be in a tidy format. +In order to use Sprout, the data needs to already be in a tidy format. When it is, load the data as a Polars data frame into the `raw_data` variable in `main.py`. From 3ea8d34a2deebaa855ac37625b32958f4d3d3d58 Mon Sep 17 00:00:00 2001 From: martonvago Date: Tue, 29 Jul 2025 12:52:42 +0100 Subject: [PATCH 4/4] docs: :memo: format markdown --- template/docs/README.md | 80 ++++++++++++++++++++--------------------- 1 file changed, 39 insertions(+), 41 deletions(-) diff --git a/template/docs/README.md b/template/docs/README.md index 040424a..9738b16 100644 --- a/template/docs/README.md +++ b/template/docs/README.md @@ -6,35 +6,35 @@ Seedcase template. ## Project files and folders -- `docs/`: Documentation about using and developing the Data Package, +- `docs/`: Documentation about using and developing the Data Package, including this README file. -- `scripts/`: Python scripts for creating and managing the Data +- `scripts/`: Python scripts for creating and managing the Data Package. Files describing the data will be generated here. -- `.copier-answers.yml`: Contains the answers you gave when copying +- `.copier-answers.yml`: Contains the answers you gave when copying the project from the template. **You should not modify this file directly.** -- `.cz.toml`: +- `.cz.toml`: [Commitizen](https://commitizen-tools.github.io/commitizen/) configuration file for managing versions and changelogs. -- `.pre-commit-config.yaml`: [Pre-commit](https://pre-commit.com/) +- `.pre-commit-config.yaml`: [Pre-commit](https://pre-commit.com/) configuration file for managing and running checks before each commit. -- `.typos.toml`: [typos](https://github.com/crate-ci/typos) spell +- `.typos.toml`: [typos](https://github.com/crate-ci/typos) spell checker configuration file. -- `CITATION.cff`: Structured citation metadata for your project. -- `justfile`: [`just`](https://just.systems/man/en/) configuration +- `CITATION.cff`: Structured citation metadata for your project. +- `justfile`: [`just`](https://just.systems/man/en/) configuration file for scripting project tasks. -- `main.py`: Central script file for the Data Package. This is where +- `main.py`: Central script file for the Data Package. This is where helper scripts are invoked and work together to create and manage the Data Package. -- `pyproject.toml`: Main Python project configuration file defining +- `pyproject.toml`: Main Python project configuration file defining metadata and dependencies. -- `README.md`: Autogenerated description of the Data Package. Not a +- `README.md`: Autogenerated description of the Data Package. Not a development guide. Information on using and developing the project should be included in the `docs/` folder. -- `ruff.toml`: [Ruff](https://docs.astral.sh/ruff/) configuration file +- `ruff.toml`: [Ruff](https://docs.astral.sh/ruff/) configuration file for linting and formatting Python code. -- `uv.lock`: Lockfile used by [`uv`](https://docs.astral.sh/uv/) to +- `uv.lock`: Lockfile used by [`uv`](https://docs.astral.sh/uv/) to record exact versions of installed dependencies. ## How to develop your Data Package @@ -51,7 +51,7 @@ page of the design documentation. ### Creating package properties -1. Run `main.py` to create the `scripts/package_properties.py` file for +1. Run `main.py` to create the `scripts/package_properties.py` file for the properties of your Data Package. ``` bash @@ -60,58 +60,56 @@ page of the design documentation. You can also run `main.py` by clicking the "Run" button in your IDE. -2. Open `scripts/package_properties.py` and fill in all required +2. Open `scripts/package_properties.py` and fill in all required fields. Also fill in any optional fields you find useful. You can always update these later. Make sure to save the file. -3. In `main.py`, uncomment the lines referencing the +3. In `main.py`, uncomment the lines referencing the `package_properties` and `package_path` variables. -4. Rerun `main.py` to create the `datapackage.json` and `README.md` +4. Rerun `main.py` to create the `datapackage.json` and `README.md` files for your Data Package. ### Creating a new resource #### With data to add to the resource -While you can create resource properties without data, it is -a lot more challenging. If at all possible, only create a -resource properties object when you have data to use to -at least pre-fill in some of the important fields. -In order to use Sprout, the data needs to already be in a tidy format. -When it is, load the data as a Polars -data frame into the `raw_data` variable in `main.py`. +While you can create resource properties without data, it is a lot more +challenging. If at all possible, only create a resource properties +object when you have data to use to at least pre-fill in some of the +important fields. In order to use Sprout, the data needs to already be +in a tidy format. When it is, load the data as a Polars data frame into +the `raw_data` variable in `main.py`. -1. Uncomment lines up to and including the creation of resource +1. Uncomment lines up to and including the creation of resource properties. -2. Fill in the `resource_name` argument. +2. Fill in the `resource_name` argument. -3. Rerun `main.py` to create the +3. Rerun `main.py` to create the `scripts/resource_properties_.py` file for the properties of the new resource. -4. Open `scripts/resource_properties_.py` and fill in all +4. Open `scripts/resource_properties_.py` and fill in all required fields. Also fill in any optional fields you find useful. You can always update these later. Make sure to save the file. -5. In `package_properties.py`, import your new resource properties by - uncommenting and updating it with the name of your resource. - Also uncomment the `resources` field and update the - name of the resource properties in the array to match the name of - your new resource. +5. In `package_properties.py`, import your new resource properties by + uncommenting and updating it with the name of your resource. Also + uncomment the `resources` field and update the name of the resource + properties in the array to match the name of your new resource. -6. In `main.py`, import your new resource properties by uncommenting - it and updating it with the name of your resource. +6. In `main.py`, import your new resource properties by uncommenting it + and updating it with the name of your resource. -7. Uncomment everything else in the `main.py` file and rename the +7. Uncomment everything else in the `main.py` file and rename the `resource_properties` variable to the name of the new resource properties you just imported. -8. Rerun `main.py`. This will: +8. Rerun `main.py`. This will: - - Update `datapackage.json` and `README.md`. - - Create a `resources/` folder containing a folder for your new + - Update `datapackage.json` and `README.md`. + - Create a `resources/` folder containing a folder for your new resource. In here, you will find a `batch/` folder with the individual data batches you've uploaded for this resource and a `data.parquet` file containing all resource data. @@ -136,9 +134,9 @@ just A simple workflow would be running -1. `just build` repeatedly while working on a new feature to test that +1. `just build` repeatedly while working on a new feature to test that it's working -2. `just run-all` before submitting your work for review to make sure +2. `just run-all` before submitting your work for review to make sure all checks pass ## Versioning and changelog