New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adjust to breaking changes on https://api.opencovid.ca/ #1452
Adjust to breaking changes on https://api.opencovid.ca/ #1452
Conversation
This restores health region-level data for Canada and results in update_prevalence.py script running succesfully to completion. Fixes microCOVID#1436
✅ Deploy Preview for microcov ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
@@ -324,8 +332,6 @@ class CanadaRegionalVaccinationReports(pydantic.BaseModel): | |||
|
|||
class Report(pydantic.BaseModel): | |||
date_: date = pydantic.Field(alias="date") | |||
total_cases: int | |||
total_tests: Optional[int] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Validation started failing because total_cases
started returning only null. Perhaps api.covid19tracker.ca
had the same issue with the breaking API changes from api.opencovid.ca
. I removed these entirely, as they're not used (we only use the vaccination-related items).
@@ -270,26 +272,28 @@ class RegionInfo(pydantic.BaseModel): | |||
class CanadaOpenCovidCases(pydantic.BaseModel): | |||
SOURCE: ClassVar[ | |||
str | |||
] = "https://api.opencovid.ca/timeseries?stat=cases&loc={hr_uid}&ymd=true&before={before}&after={after}" | |||
] = "https://api.opencovid.ca/timeseries?stat=cases&loc={hr_uid}&geo=hr&ymd=true&fill=true&before={before}&after={after}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fill
parameter is associated with this documented change:
⚠️ Each time series now ends on the date the data were last updated (e.g., if case data in a province was updated on January 6 with data up to January 3, the case time series for that province would end on January 3). Previously, every time series ended on the most recent date. However, https://github.com/ccodwg/CovidTimelineCanada/issues/40—some time series end on the most recent date regardless of when the data were updated (this will be corrected in the near future). Time series data with rows up to the most recent date for every time series are available from the timeseries route of the API using the fill=true parameter. Note that this means that PT time series data cannot be simply aggregated up into a Canadian time series with additional processing. Please use the data in data/can or the API for ready-to-use Canadian time series.
Basically, any dates which are too recent for the API used to be filled in with duplicates of the most recent day, but that changed. Our logic depends on that behavior. Adding fill=true
reverts to the old semantics.
update_prevalence.py
Outdated
pop: Union[int, str] | ||
# Covid Timeline Canada region info dataset: | ||
class CovidTimelineCanadaRegion(pydantic.BaseModel): | ||
SOURCE: ClassVar[str] = "https://raw.githubusercontent.com/ccodwg/CovidTimelineCanada/main/geo/health_regions.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Health region metadata is no longer hosted via an API, but the data is still around in a CSV we can pull.
province_full: str # Prince Edward Island | ||
province_short: str # PE | ||
class CovidTimelineCanadaProvinceOrTerritory(pydantic.BaseModel): | ||
SOURCE: ClassVar[str] = "https://raw.githubusercontent.com/ccodwg/CovidTimelineCanada/main/geo/pt.csv" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto above - now in CSV form
update_prevalence.py
Outdated
def get_canada_region_place(self, health_region: CovidTimelineCanadaRegion, province_or_territory: str) -> Place: | ||
return self.get_county(health_region.name_short, | ||
state=province_or_territory, | ||
country="Canada") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spelled-out province/territory name isn't provided in this API anymore, but a mapping file is provided, so this method now takes in the full province/territory name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the pt_names
and hr_names
parameters allow the PT and HR names to be returned in a variety of formats for both the timeseries
and summary
routes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeanpaulrsoucy: Ah, useful! I'll keep that in mind. In this case, we're starting through an iteration of every health region, driven by the health region metadata CSV [here](https://raw.githubusercontent.com/ccodwg/CovidTimelineCanada/main/geo/health_regions.csv, which is static and does not have that column.
v * provincial_reports.summary[-1].cumulative_cvaccine, | ||
manufacturer, | ||
round(proportion * partially_vaccinated), | ||
round(proportion * total_fully_vaccinated), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current API data gives us the number of people who have gotten one, two, three, four, etc shots.
I don't think it'd be correct to pass that information into this function:
def get_partially_vaccinated(total_shots, total_fully_vaccinated, shots_for_full_vaccination) -> int:
return total_shots - shots_for_full_vaccination * total_fully_vaccinated
The second parameter is meant to be the total number of shots administered in the area. Perhaps that's what cumulative_cvaccine
meant (?), but it's certainly not what vaccine_coverage_dose_2
means. I believe that vaccine_coverage_dose_2
means "the number of people who have gotten two doses of the vaccine. As such, the logic is simpler:
total_fully_vaccinated = most_recent.vaccine_coverage_dose_2
partially_vaccinated = most_recent.vaccine_coverage_dose_1 - most_recent.vaccine_coverage_dose_2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that both vaccine coverage (% of population) and administration (raw dose numbers) are available. Presently, the former is an adaption PHAC vaccine coverage data (ccodwg/CovidTimelineCanada#21) whereas the latter is an adaption of the PHAC vaccine administration data (ccodwg/CovidTimelineCanada#47). The former cumulative_cvaccine
value is equivalent to vaccine_administration_dose_2
in the current dataset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeanpaulrsoucy: Thanks for the clarification! Is cumulative_avaccine
the same as vaccine_administration_dose_1
? i.e., the number of people who have gotten at least one shot?
The current microCOVID code assumed that cumulative_avaccine
counts the total number of shots given, not the number of people who have had at least one shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jeanpaulrsoucy : One other question - could you clarify the meaning of vaccine_administration_total_doses
? It seems to be the sum of the vaccine_administration_dose_*
values - but if my assumption is correct about what those fields mean, that wouldn't be a number that would have any meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cumulative_avaccine
is equivalent to vaccine_administration_total_doses
, which is the sum of all vaccine_administration_dose_n
values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, thanks - I'll dig in on that and double check my assumptions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All right, I'm convinced that the logic as I added is correct. The old logic was correct enough until boosters started to happen, at which point the computed number of partially vaccinated people started trending under the real value, as each booster shot administered inappropriately reduced it by one.
I believe the other (existing) use of get_partially_vaccinated()
is fine, as it uses api.covid19tracker.ca's total_vaccinations
and total_vaccinated
fields, which seem like what the function was intended to use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems right to me; previously cumulative_avaccine was all we could get from the API.
|
||
canada_provinces = parse_csv( | ||
cache, CovidTimelineCanadaProvinceOrTerritory, CovidTimelineCanadaProvinceOrTerritory.SOURCE | ||
) | ||
except pydantic.error_wrappers.ValidationError as e: | ||
print_and_log_to_sentry(f"Discarding county-level data from Canada due to error: {e}") | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to do more here to handle more potential API issues - e.g. right now the only cases that would fail would be the CSVs above going away or columns being renamed. There are multiple additional API calls below this which are currently unhandled. I think it would confuse the diff more than is worth it for this PR, however, and there are a couple of different approaches to think through, so I'd propose to punt that for now.
@justinhaaheim @beshaya: I'm awaiting an upstream data fix at andrewthong/covid19tracker-api#118 before this can be merged, but this is otherwise ready for review. Here's what's been done to help make sure this is right:
I'd welcome suggestions on other ways to QC this. Sorry for the delay to all who were waiting to get this functionality back! It's been an interesting learning process getting familiar with microCOVID's data ingestion. |
Update: andrewthong/covid19tracker-api#118 is resolved! This is ready to merge once approved. |
v * provincial_reports.summary[-1].cumulative_cvaccine, | ||
manufacturer, | ||
round(proportion * partially_vaccinated), | ||
round(proportion * total_fully_vaccinated), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems right to me; previously cumulative_avaccine was all we could get from the API.
…iology/microCOVID into fix_canada_health_region_data
Head branch was pushed to by a user without write access
@beshaya: Looks like the build failed. Perhaps it was due to some kind of GitHub Actions outage? I see prevalence updates didn't run this morning. I don't have have the ability to re-request a build, but I pushed up a trivial change to get it to build. It will need to be reapproved, however. |
Thanks for the ping! And thanks for the fix! |
This restores health region-level data for Canada and results in
update_prevalence.py
script running successfully to completion.Fixes #1436