-
Notifications
You must be signed in to change notification settings - Fork 1
docs: 📝 split guide into creating vs managing a Data Package #1592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
46 commits
Select commit
Hold shift + click to select a range
b886e09
docs: :memo: Clarify the installation steps
joelostblom e090964
docs: :memo: Explain what a data package is and motivate Sprout more
joelostblom cf74fc3
docs: :memo: Move sidenote to callout
joelostblom b6d0f75
docs: :memo: Reword intro
joelostblom 287374f
docs: :memo: Note differences when using the template repo
joelostblom ad1f51d
Syntax highlight the script code in the docs
joelostblom a53d18c
docs: :memo: Mention what the terminal command does
joelostblom 26d67a2
docs: :memo: Divide guide into clearer sections
joelostblom 264e0de
docs: :memo: Fix capitalization and clarify a bit
joelostblom 702a73f
docs: :memo: Clarify the role of classes
joelostblom 5cf676a
Delete more complex metadata management section
joelostblom 0ca6001
Simplify example and use the actual metadata in the guide for the wri…
joelostblom 611be79
Show example of what datapackage.json looks like
joelostblom 7293957
Add chapter for manaing data package metadata
joelostblom cf27d6e
Update order to account for new page
joelostblom b7910a9
Change title and setup main.py file for second section
joelostblom 98e016e
Show script content with syntax highlighting
joelostblom 410fb40
Add old text as is
joelostblom 76ab979
Make final statement more accurate
joelostblom 1ef3a15
Elaborate on the example description
joelostblom 3815970
Evaluate cell so that we can use `package_properties` later
joelostblom b41dbcf
Add more examples of how to edit the properties file
joelostblom 55c4766
Reformat code blocks
joelostblom 91bb115
Make easier to parse via additional section
joelostblom e6467c1
Fix typo
joelostblom 7373e75
Remove import since `package_properties` is already defined
joelostblom a013422
Fix typo
joelostblom f49f405
Automate create of json output
joelostblom e1068fe
Merge branch 'main' into docs/needed-vs-recommended
joelostblom d869270
Improve wording
joelostblom 14d6450
Apply suggestions from code review to improve wording
joelostblom cceb4ce
Turn note about file deletion into callout
joelostblom f668b7d
Clarify title
joelostblom 380e2a4
Rephrase for clarity
joelostblom 09179dd
Shorten paragraph
joelostblom dac7e02
Move heading to have intro paragraph
joelostblom af50649
Add explicit reference to install section
joelostblom 25f3661
Avoid making it sound like the datapackage file is created here
joelostblom 80d2703
Make heading more appropriate for content
joelostblom a2e042d
Apply suggestions from code review to improve wording
joelostblom 89efa62
chore(pre-commit): :pencil2: automatic fixes
pre-commit-ci[bot] fb70906
Apply suggestions from code review to improve wording
joelostblom 818843a
Name files consistently
joelostblom 5fcf102
Merge branch 'main' into docs/needed-vs-recommended
lwjohnst86 831f7cb
Reflow text
joelostblom 0e6b17c
Update links to match new file names
joelostblom File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,263 @@ | ||
| --- | ||
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| title: "Adding to the package metadata" | ||
| order: 2 | ||
| jupyter: python3 | ||
| --- | ||
|
|
||
| In the [previous guide](/docs/guide/package.qmd), we created a Data | ||
| Package and saw how to write a minimal set of metadata properties to | ||
| `datapackage.json`. Here, we'll take a closer look at the full | ||
| `package_properties.py` that's created by `create_properties_script()`. | ||
|
|
||
| ::: callout-important | ||
| Before we get started with this section, it's necessary to delete any | ||
| previously created `scripts/package_properties.py` since | ||
| `create_properties_script()` does not overwrite this file if it already | ||
| exists. | ||
| ::: | ||
|
|
||
| First, change your `main.py` to look like this: | ||
|
|
||
| ```{python} | ||
| #| filename: "main.py" | ||
| import seedcase_sprout as sp | ||
|
|
||
| def main(): | ||
| # Create the properties script in the default location. | ||
| sp.create_properties_script() | ||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
| ``` | ||
|
|
||
| Now run the script from terminal: | ||
|
|
||
| ``` {.bash filename="Terminal"} | ||
| uv run main.py | ||
| ``` | ||
|
|
||
| If you now view the created `package_properties.py` script, you'll see | ||
| that it includes a template containing many of the most commonly used | ||
| metadata names together with comments indicating which are required and | ||
| which are optional: | ||
|
|
||
| <!-- Create the script where quarto can find it for building the docs --> | ||
|
|
||
| {{< include _python-minimal-package-setup.qmd >}} | ||
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| ```{python} | ||
| #| include: false | ||
| sp.create_properties_script(package_path.root()) | ||
| ``` | ||
|
|
||
| ```{python} | ||
| #| echo: false | ||
| #| output: asis | ||
| #| filename: "scripts/package_properties.py" | ||
| print( | ||
| '```python', | ||
| package_path.properties_script().read_text(), | ||
| '```', | ||
| sep='\n' | ||
| ) | ||
| ``` | ||
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| As you can see, there are a lot of available metadata properties. | ||
| However, as we saw in the previous section and as you can see from the | ||
| comments in this template script, there aren't that many *required* | ||
| metadata. So you can quickly get started creating a Data Package and add | ||
| more metadata later as needed. | ||
|
|
||
| Sometimes it might feel tedious to fill out metadata properties at all | ||
| and you might be tempted to skip creating a Data Package for your data. | ||
| But it's important to remember just how vital these metadata actually | ||
| are. Without them, your data are simply a collection of files without | ||
| any context or meaning. The metadata (properties) are **crucially | ||
| important** for understanding and actually using the data in your data | ||
| package! | ||
|
|
||
| ## Creating a more complex `datapackage.json` file | ||
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Since metadata is so important, Sprout encourages users to include it by | ||
| making it easier to manage it through the use of Python classes as we | ||
| saw in the previous section. In the example above, you can see a couple | ||
| of additional classes `ContributorProperties` and `SourceProperties`. | ||
| Let's create a slightly more complex example using one of these other | ||
| classes: | ||
|
|
||
| ```{python} | ||
| #| filename: "scripts/package_properties.py" | ||
|
|
||
| import seedcase_sprout as sp | ||
|
|
||
| package_properties = sp.PackageProperties( | ||
| name="diabetes-study", | ||
| title="A Study on Diabetes", | ||
| # You can write Markdown below, with the helper `sp.dedent()`. | ||
| description=sp.dedent(""" | ||
| # Data from a 2021 study on diabetes prevalence | ||
|
|
||
| This data package contains data from a study conducted in 2021 on the | ||
| *prevalence* of diabetes in various populations. The data includes: | ||
|
|
||
| - demographic information | ||
| - health metrics | ||
| - survey responses about lifestyle | ||
| """), | ||
| contributors=[ | ||
| sp.ContributorProperties( | ||
| title="Jamie Jones", | ||
| email="jamie_jones@example.com", | ||
| path="example.com/jamie_jones", | ||
| roles=["creator"], | ||
| ), | ||
| sp.ContributorProperties( | ||
| title="Zdena Ziri", | ||
| email="zdena_ziri@example.com", | ||
| path="example.com/zdena_ziri", | ||
| roles=["creator"], | ||
| ) | ||
| ], | ||
| licenses=[ | ||
| sp.LicenseProperties( | ||
| name="ODC-BY-1.0", | ||
| path="https://opendatacommons.org/licenses/by", | ||
| title="Open Data Commons Attribution License 1.0", | ||
| ) | ||
| ], | ||
| ## Autogenerated: | ||
| id="8f301286-2327-45bf-bbc8-09696d059499", | ||
| version="0.1.0", | ||
| created="2025-11-07T11:12:56+01:00", | ||
| ) | ||
| ``` | ||
|
|
||
| You can see that we included a more involved description of the package | ||
| using the helper function `dedent()` and that we used the | ||
| `ContributorProperties` class twice as we set the `contributors` | ||
| parameter to a list of two contributors who co-created this example Data | ||
| Package. | ||
|
|
||
| Now you can edit your `main.py` file to again include the | ||
| `write_properties()` function: | ||
|
|
||
| ```{python} | ||
| #| eval: false | ||
| #| filename: "main.py" | ||
| import seedcase_sprout as sp | ||
| from scripts.package_properties import package_properties | ||
|
|
||
| def main(): | ||
| # Create the metadata properties script in default location. | ||
| sp.create_properties_script() | ||
| # Write metadata properties from properties script to `datapackage.json`. | ||
| sp.write_properties(properties=package_properties) | ||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
| ``` | ||
|
|
||
| ```{python} | ||
| #| include: false | ||
| # Only to check that it runs. | ||
| sp.write_properties( | ||
| properties=package_properties, | ||
| path=package_path.properties() | ||
| ) | ||
| ``` | ||
|
|
||
| Then, use uv to run the script from the Terminal. | ||
|
|
||
| ``` {.bash filename="Terminal"} | ||
| uv run main.py | ||
| ``` | ||
|
|
||
| When you inspect the created `datapackage.json` file, you'll see that it | ||
| contains all the metadata from the `scripts/package_properties.py`: | ||
|
|
||
| ```{python} | ||
| #| echo: false | ||
| #| output: asis | ||
| #| filename: datapackage.json | ||
| print( | ||
| '```json', | ||
| (package_path.path / 'datapackage.json').read_text(), | ||
| '```', | ||
| sep='\n' | ||
| ) | ||
| ``` | ||
|
|
||
| If you made a mistake and want to update the properties in the current | ||
| `datapackage.json`, remember that you never need to edit the JSON file | ||
| directly. Instead, you edit the `scripts/package_properties.py` and then | ||
| run the `main.py` script to regenerate `datapackage.json`. | ||
|
|
||
| ## Creating a README of the metadata properties | ||
lwjohnst86 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| Having a *human-readable* version of what is contained in the | ||
| `datapackage.json` file is useful for others who may be working with or | ||
| wanting to learn more about your data package. You can use | ||
| `as_readme_text()` to convert the properties into text that can be added | ||
| to a README file. Let's create a README file with the properties of the | ||
| data package you just created by writing it in the `main.py` file. | ||
|
|
||
| ```{python} | ||
| #| eval: false | ||
| #| filename: "main.py" | ||
| import seedcase_sprout as sp | ||
| from scripts.package_properties import package_properties | ||
|
|
||
| def main(): | ||
| # Create the properties script in default location. | ||
| sp.create_properties_script() | ||
| # Save the properties to `datapackage.json`. | ||
| sp.write_properties(properties=package_properties) | ||
| # Create text for a README of the data package. | ||
| readme_text = sp.as_readme_text(package_properties) | ||
| # Write the README text to a `README.md` file. | ||
| sp.write_file(readme_text, sp.PackagePath().readme()) | ||
|
|
||
| if __name__ == "__main__": | ||
| main() | ||
| ``` | ||
|
|
||
| Sprout splits the README creation functionality into two steps: One to | ||
| make the text and one to write to the file. That way, if you want to add | ||
| or manipulate the text, you can do so before writing it to the file. | ||
| This is useful if you want to add information to the README that you | ||
| don't want included in the `datapackage.json` file. For this guide we | ||
| won't cover how or why to do this. | ||
|
|
||
| Next, run this command in the Terminal to make the README file. The | ||
| `write_file()` will always overwrite the existing README file. | ||
|
|
||
| ``` {.bash filename="Terminal"} | ||
| uv run main.py | ||
| ``` | ||
|
|
||
| ```{python} | ||
| #| include: false | ||
| # Only to check that it runs. | ||
| readme_text = sp.as_readme_text(package_properties) | ||
| sp.write_file( | ||
| string=readme_text, | ||
| path=package_path.readme() | ||
| ) | ||
| ``` | ||
|
|
||
| Now you can see that the `README.md` file has been created in your data | ||
| package: | ||
|
|
||
| ```{python} | ||
| #| echo: false | ||
| print(file_tree(package_path.root())) | ||
| ``` | ||
|
|
||
| Now that you know how to create and manage metadata at the | ||
| project-level, it is time to learn how to add data to the project and | ||
| manage its metadata. | ||
|
|
||
| ```{python} | ||
| #| include: false | ||
| temp_path.cleanup() | ||
| ``` | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.