-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation for year input for Source methods #273
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This definitely makes the year
input clearer. Thanks!
I made a couple minor suggestions around line 1167.
"id": "762799a5-5b99-46ca-8b68-401dfdf10100", | ||
"metadata": {}, | ||
"source": [ | ||
"To include multiple years of data, you can include the years in the \"year\" parameter in the form of [Start Year, Stop Year]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple minor suggestions:
- Maybe "To request" instead of "To include"?
- Date ranges can only be used for multi-year datasets. I suggest adding something to indicate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it! I'll reword the statement.
"source": [ | ||
"To include multiple years of data, you can include the years in the \"year\" parameter in the form of [Start Year, Stop Year]\n", | ||
"\n", | ||
"This also works in a dataset with only a single year of data" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you can drop the 2nd sentence
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be interested to see how the code would run if I were to query a single year dataset using a multi-year input. I notice that if I include a year that does not exist in a multi-year dataset (say 2021-2027), the output will still be from 2021-2024, without any warning on the non-exist years. So I assumed that this would work on a single year dataset?
I might need to find a single year dataset and observe its output
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way that the year input works rights now needs to be improved to be clearer. If you have any suggestions how to make it clearer or how to better explain it in our documentation, let us know. This is something that we are actively thinking about because we know it should be better.
If you request a year that does not exist (2020) in a multi-year dataset (2021-2024), you should receive in return an empty dataset unless there is a single year dataset for 2020 in which case, you will receive data from the 2020 dataset.
…npolicedata into contributing-branch Updating versio
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated my commit with your suggestion, let me know what you think!
"To include multiple years of data, you can include the years in the \"year\" parameter in the form of [Start Year, Stop Year]\n", | ||
"To request multiple years of data, you can include the years in the \"year\" parameter in the form of `[Start Year, Stop Year]`.\n", | ||
"\n", | ||
"This also works in a dataset with only a single year of data" | ||
"Date ranges can only be used for multi-year datasets" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reworded the statement with your suggestion to highlight that this parameter can only be used for multi-year data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is good. I like the rewritten 2nd line.
One more thing, can you add something like " (i.e. Year Column is "MULTIPLE" in the src.datasets
table or equivalently, coverage_start to coverage_end columns span multiple years)" at the end of the 2nd line"?
That isn't needed specifically for your issue, but I think it would generally be good to define what a multi-year dataset is because it is not currently done anywhere else.
After that, I'll merge the pull request.
"multiyear_tbl" | ||
"multiyear_tbl.table.groupby(\"Year\").size()" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To provide more clarity on the output, I updated this line to show the summarized content of the table using size() to help users understand the size of their data by each year, and also to validate that the year is present on the data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good addition!
@imrnmzri Thanks for your help on this! Your documentation update was automatically included in our "latest" documentation, which will become part of the stable documentation with the next code release which hopefully will happen in the next week |
Solves #269 by adding a documentation in the "Getting Started" guide on how to include data for multiple years., which is done by including the years as a list using the format [Start Date, End Date]