Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust to breaking changes on https://api.opencovid.ca/ #1452

Merged
merged 22 commits into from Jun 2, 2022

Conversation

apiology
Copy link
Member

@apiology apiology commented May 23, 2022

This restores health region-level data for Canada and results in update_prevalence.py script running successfully to completion.

Fixes #1436

This restores health region-level data for Canada and results in
update_prevalence.py script running succesfully to completion.

Fixes microCOVID#1436
@netlify
Copy link

netlify bot commented May 23, 2022

Deploy Preview for microcov ready!

Name Link
🔨 Latest commit 4b1d926
🔍 Latest deploy log https://app.netlify.com/sites/microcov/deploys/629934cab94f0800084d2a5c
😎 Deploy Preview https://deploy-preview-1452--microcov.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@@ -324,8 +332,6 @@ class CanadaRegionalVaccinationReports(pydantic.BaseModel):

class Report(pydantic.BaseModel):
date_: date = pydantic.Field(alias="date")
total_cases: int
total_tests: Optional[int]
Copy link
Member Author

@apiology apiology May 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Validation started failing because total_cases started returning only null. Perhaps api.covid19tracker.ca had the same issue with the breaking API changes from api.opencovid.ca. I removed these entirely, as they're not used (we only use the vaccination-related items).

@@ -270,26 +272,28 @@ class RegionInfo(pydantic.BaseModel):
class CanadaOpenCovidCases(pydantic.BaseModel):
SOURCE: ClassVar[
str
] = "https://api.opencovid.ca/timeseries?stat=cases&loc={hr_uid}&ymd=true&before={before}&after={after}"
] = "https://api.opencovid.ca/timeseries?stat=cases&loc={hr_uid}&geo=hr&ymd=true&fill=true&before={before}&after={after}"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fill parameter is associated with this documented change:

⚠️ Each time series now ends on the date the data were last updated (e.g., if case data in a province was updated on January 6 with data up to January 3, the case time series for that province would end on January 3). Previously, every time series ended on the most recent date. However, https://github.com/ccodwg/CovidTimelineCanada/issues/40—some time series end on the most recent date regardless of when the data were updated (this will be corrected in the near future). Time series data with rows up to the most recent date for every time series are available from the timeseries route of the API using the fill=true parameter. Note that this means that PT time series data cannot be simply aggregated up into a Canadian time series with additional processing. Please use the data in data/can or the API for ready-to-use Canadian time series.

Basically, any dates which are too recent for the API used to be filled in with duplicates of the most recent day, but that changed. Our logic depends on that behavior. Adding fill=true reverts to the old semantics.

pop: Union[int, str]
# Covid Timeline Canada region info dataset:
class CovidTimelineCanadaRegion(pydantic.BaseModel):
SOURCE: ClassVar[str] = "https://raw.githubusercontent.com/ccodwg/CovidTimelineCanada/main/geo/health_regions.csv"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Health region metadata is no longer hosted via an API, but the data is still around in a CSV we can pull.

province_full: str # Prince Edward Island
province_short: str # PE
class CovidTimelineCanadaProvinceOrTerritory(pydantic.BaseModel):
SOURCE: ClassVar[str] = "https://raw.githubusercontent.com/ccodwg/CovidTimelineCanada/main/geo/pt.csv"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto above - now in CSV form

def get_canada_region_place(self, health_region: CovidTimelineCanadaRegion, province_or_territory: str) -> Place:
return self.get_county(health_region.name_short,
state=province_or_territory,
country="Canada")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spelled-out province/territory name isn't provided in this API anymore, but a mapping file is provided, so this method now takes in the full province/territory name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the pt_names and hr_names parameters allow the PT and HR names to be returned in a variety of formats for both the timeseries and summary routes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanpaulrsoucy: Ah, useful! I'll keep that in mind. In this case, we're starting through an iteration of every health region, driven by the health region metadata CSV [here](https://raw.githubusercontent.com/ccodwg/CovidTimelineCanada/main/geo/health_regions.csv, which is static and does not have that column.

v * provincial_reports.summary[-1].cumulative_cvaccine,
manufacturer,
round(proportion * partially_vaccinated),
round(proportion * total_fully_vaccinated),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current API data gives us the number of people who have gotten one, two, three, four, etc shots.

I don't think it'd be correct to pass that information into this function:

    def get_partially_vaccinated(total_shots, total_fully_vaccinated, shots_for_full_vaccination) -> int:
        return total_shots - shots_for_full_vaccination * total_fully_vaccinated

The second parameter is meant to be the total number of shots administered in the area. Perhaps that's what cumulative_cvaccine meant (?), but it's certainly not what vaccine_coverage_dose_2 means. I believe that vaccine_coverage_dose_2 means "the number of people who have gotten two doses of the vaccine. As such, the logic is simpler:

                total_fully_vaccinated = most_recent.vaccine_coverage_dose_2
                partially_vaccinated = most_recent.vaccine_coverage_dose_1 - most_recent.vaccine_coverage_dose_2

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that both vaccine coverage (% of population) and administration (raw dose numbers) are available. Presently, the former is an adaption PHAC vaccine coverage data (ccodwg/CovidTimelineCanada#21) whereas the latter is an adaption of the PHAC vaccine administration data (ccodwg/CovidTimelineCanada#47). The former cumulative_cvaccine value is equivalent to vaccine_administration_dose_2 in the current dataset.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanpaulrsoucy: Thanks for the clarification! Is cumulative_avaccine the same as vaccine_administration_dose_1? i.e., the number of people who have gotten at least one shot?

The current microCOVID code assumed that cumulative_avaccine counts the total number of shots given, not the number of people who have had at least one shot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeanpaulrsoucy : One other question - could you clarify the meaning of vaccine_administration_total_doses? It seems to be the sum of the vaccine_administration_dose_* values - but if my assumption is correct about what those fields mean, that wouldn't be a number that would have any meaning.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cumulative_avaccine is equivalent to vaccine_administration_total_doses, which is the sum of all vaccine_administration_dose_n values.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks - I'll dig in on that and double check my assumptions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All right, I'm convinced that the logic as I added is correct. The old logic was correct enough until boosters started to happen, at which point the computed number of partially vaccinated people started trending under the real value, as each booster shot administered inappropriately reduced it by one.

I believe the other (existing) use of get_partially_vaccinated() is fine, as it uses api.covid19tracker.ca's total_vaccinations and total_vaccinated fields, which seem like what the function was intended to use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems right to me; previously cumulative_avaccine was all we could get from the API.


canada_provinces = parse_csv(
cache, CovidTimelineCanadaProvinceOrTerritory, CovidTimelineCanadaProvinceOrTerritory.SOURCE
)
except pydantic.error_wrappers.ValidationError as e:
print_and_log_to_sentry(f"Discarding county-level data from Canada due to error: {e}")
return
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to do more here to handle more potential API issues - e.g. right now the only cases that would fail would be the CSVs above going away or columns being renamed. There are multiple additional API calls below this which are currently unhandled. I think it would confuse the diff more than is worth it for this PR, however, and there are a couple of different approaches to think through, so I'd propose to punt that for now.

@apiology apiology changed the title WIP: Adjust to breaking changes on https://api.opencovid.ca/ Adjust to breaking changes on https://api.opencovid.ca/ May 28, 2022
@apiology
Copy link
Member Author

@justinhaaheim @beshaya: I'm awaiting an upstream data fix at andrewthong/covid19tracker-api#118 before this can be merged, but this is otherwise ready for review.

Here's what's been done to help make sure this is right:

  • Copied in example data from the API responses to the Pydantic definitions to help with review to ensure the right fields were being mapped
  • A couple of rounds of self-review of the diff
  • Ran ./update_prevalence.py and verified that Canada health-region-level data appears in location.ts and that it appears in the UI
  • Verified health region names in .csv files seem consistent with past health region names.
  • Worked through commentary from @jeanpaulrsoucy, owner of the upstream API in question.
  • After discovering an inconsistency in population vs total-people-vaccinated numbers, followed up upstream to find issues there, then verified other numbers in Canada are roughly in line with expectations.

I'd welcome suggestions on other ways to QC this.

Sorry for the delay to all who were waiting to get this functionality back! It's been an interesting learning process getting familiar with microCOVID's data ingestion.

@apiology
Copy link
Member Author

Update: andrewthong/covid19tracker-api#118 is resolved! This is ready to merge once approved.

@apiology apiology marked this pull request as ready for review May 30, 2022 19:22
v * provincial_reports.summary[-1].cumulative_cvaccine,
manufacturer,
round(proportion * partially_vaccinated),
round(proportion * total_fully_vaccinated),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems right to me; previously cumulative_avaccine was all we could get from the API.

@beshaya beshaya enabled auto-merge (squash) June 2, 2022 04:33
auto-merge was automatically disabled June 2, 2022 22:08

Head branch was pushed to by a user without write access

@apiology
Copy link
Member Author

apiology commented Jun 2, 2022

@beshaya: Looks like the build failed. Perhaps it was due to some kind of GitHub Actions outage? I see prevalence updates didn't run this morning. I don't have have the ability to re-request a build, but I pushed up a trivial change to get it to build. It will need to be reapproved, however.

@beshaya beshaya merged commit 517f0f0 into microCOVID:main Jun 2, 2022
@beshaya
Copy link
Member

beshaya commented Jun 2, 2022

Thanks for the ping! And thanks for the fix!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Data hasn't updated in last week
3 participants