Initial notes #1

lewisjared · 2024-04-04T05:41:14Z

Description

Some notes from reading through the codebase about some initial code cleanup that would be useful, including some questions for @prayner.

The biggest hole for me is understanding where the files are downloaded. I'm particularly interested in understanding how the data flows between these repos.

I can also move these notes into a GDoc if that is a better medium (I was offline when writing these notes).

CC @aethr

aethr · 2024-04-04T07:07:49Z

notes/initial-cleanup.md

+* How does the downloading of inputs work? I see that data is fetched from `https://prior.openmethane.org`. How does data get there?
+* What do you do with the outputs? Are they uploaded to `https://prior.openmethane.org`?


Gerard showed me yesterday that prior.openmethane.org is a R2 bucket (CloudFlare version of S3). His understanding is that the prior doesn't change very frequently unless new data or methodology becomes available. This can be computed based on (fairly) static inputs and placed in an easily accessible location for reference by the model.

There is some utility to this data and the methods for generating it being publicly available.

It probably makes sense for the prior to be "productionised" but possibly not automated. @prayner what are your thoughts?

At a glance it looks like the data in the R2 bucket is only the semi-static data that is updated semi-annually. I'm fine if these data are updated on an ad hoc basis rather than being checked monthly and automatically used assuming @prayner is also. The step of "productionising" those semi-static data might be as simple as documentation (public or private).

The GFAS data is downloaded during the processing of its layer and will be updated at least monthly.

@aethr When you say "prior" are you referring to this repository as a whole or just the data on R2?

the data in the R2 bucket is only the semi-static data that is updated semi-annually. I'm fine if these data are updated on an ad hoc basis rather than being checked monthly and automatically used assuming @prayner is also. The step of "productionising" those semi-static data might be as simple as documentation (public or private).

@lewisjared that is my understanding as well. As I understand it the data in the R2 bucket are all inputs to the prior which aren't know to change on a predictable cadence. @prayner will have to help us understand where these are sourced from and how we can understand when they might need to be updated.

The GFAS data is downloaded during the processing of its layer and will be updated at least monthly.

I wasn't aware of that, and perhaps that's an argument that this should be done on a schedule, ignoring when the "semi-static" inputs are updated.

When you say "prior" are you referring to this repository as a whole or just the data on R2?

I understood "the prior" to be a data set we compute in advance, based on semi-static data, which forms one of the inputs to the Open Methane model. The data in R2 is the input (along with GFAS), and the code in this repo transforms and combines the input to give us data in "our format" (ie, the grid).

Not sure what the plan is for the outputs, but if this only needs to get run once each time data changes (ie, new GFAS) it does seem like a good idea to store/cache it somewhere the model can access it.

Sorry, my comments probably aren't that helpful! Will wait for @prayner to illuminate things. :)

Comments are always helpful. We are all trying to get on the same page and use the same language.

I think your description of the prior is correct, other than adding that this is our initial guess at the methane emissions across Australia.

I think that this has to be run at least monthly to pull in the new GFAS (and maybe wetlands) data. In our meeting notes, we said that the "above steps are run daily" which included the prior calculation so it might even need to be run at a daily cadance. The current implementation allows for individual layers to be rerun as needed so this isn't a major can be heavily cached.

the "above steps are run daily" which included the prior calculation so it might even need to be run at a daily cadance.

If that is true then I think "the prior" is probably the process (ie "run the prior") rather than the data set (ie "access the prior").

lewisjared added 2 commits April 3, 2024 09:19

docs: Notes from the first read through

2f2547e

docs: formatting

b3ce8b9

lewisjared requested review from aethr and prayner April 4, 2024 05:41

aethr reviewed Apr 4, 2024

View reviewed changes

prayner merged commit 6919c79 into main Apr 17, 2024

lewisjared deleted the initial-notes branch May 30, 2024 17:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial notes #1

Initial notes #1

lewisjared commented Apr 4, 2024 •

edited

Loading

aethr Apr 4, 2024

lewisjared Apr 4, 2024 •

edited

Loading

aethr Apr 4, 2024

lewisjared Apr 4, 2024

aethr Apr 5, 2024

		* How does the downloading of inputs work? I see that data is fetched from `https://prior.openmethane.org`. How does data get there?
		* What do you do with the outputs? Are they uploaded to `https://prior.openmethane.org`?

Initial notes #1

Initial notes #1

Conversation

lewisjared commented Apr 4, 2024 • edited Loading

Description

aethr Apr 4, 2024

Choose a reason for hiding this comment

lewisjared Apr 4, 2024 • edited Loading

Choose a reason for hiding this comment

aethr Apr 4, 2024

Choose a reason for hiding this comment

lewisjared Apr 4, 2024

Choose a reason for hiding this comment

aethr Apr 5, 2024

Choose a reason for hiding this comment

lewisjared commented Apr 4, 2024 •

edited

Loading

lewisjared Apr 4, 2024 •

edited

Loading