Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Conversation

@peterdudfield
Copy link
Contributor

@peterdudfield peterdudfield commented Feb 1, 2022

Pull Request

Description

Add id to metadata, this is optional so can be None.

Fixes #598 #600

How Has This Been Tested?

normal unittests

  • Yes

Checklist:

  • My code follows OCF's coding style guidelines
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked my code and corrected any misspellings

@peterdudfield peterdudfield self-assigned this Feb 1, 2022
@peterdudfield peterdudfield marked this pull request as ready for review February 1, 2022 12:29
@JackKelly JackKelly changed the title add if to metadata add id to metadata Feb 1, 2022
Copy link
Contributor

@JackKelly JackKelly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good.

One thought is:

In the near future (this month?), we plan to create batches across the entire geospatial extent of the satellite data (over the ocean; over Europe; over Africa; etc.). (These training examples won't have PV data, of course. Instead they'll be used to pre-train the model to predict the next few frames of satellite imagery).

And, separately, I also plan to create batches which are centered on individual PV systems (and where the model will know nothing about GSPs).

And we plan to produce start using PV data from mainland Europe and elsewhere.

In all three of these scenarios, there won't be a "GSP ID". As such, is the approach proposed in this PR is compatible with our plans to create batches which aren't associated with a GSP? When there isn't a GSP then I suppose we could just fill the id columns with NaNs or something like that, but that maybe feels like a bit of a hack?

Also, if we do stick with including the GSP ID, then I'd advocate for changing the name to gsp_id rather than just id, just to be really explicit?

@peterdudfield
Copy link
Contributor Author

This looks good.

One thought is:

In the near future (this month?), we plan to create batches across the entire geospatial extent of the satellite data (over the ocean; over Europe; over Africa; etc.). (These training examples won't have PV data, of course. Instead they'll be used to pre-train the model to predict the next few frames of satellite imagery).

And, separately, I also plan to create batches which are centered on individual PV systems (and where the model will know nothing about GSPs).

And we plan to produce start using PV data from mainland Europe and elsewhere.

In all three of these scenarios, there won't be a "GSP ID". As such, is the approach proposed in this PR is compatible with our plans to create batches which aren't associated with a GSP? When there isn't a GSP then I suppose we could just fill the id columns with NaNs or something like that, but that maybe feels like a bit of a hack?

Also, if we do stick with including the GSP ID, then I'd advocate for changing the name to gsp_id rather than just id, just to be really explicit?

Yea I agree it feels a bit of a hack. But yea that would be an option i.e set id to NaN or None.

This is all work to get the forecast MVP working. Its because there is no GSP data to load, but I think we do need the GSP id for the MVP. One option would be to have some static GSP data, and do a hack when making batches/examples.

Perhaps doing this: #600, would help both tasks. Perhaps Ill give this a go, and itll actually make things tidier

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Feb 1, 2022

Would be something like this, and then Location can be used in other general places in the code

class Location(BaseModel):
    """ Location of the example"""
    
    t0_datetime_utc: pd.Timestamp = Field(
        ...,
        description="The t0 of one example ",
    )

    x_center_osgb: float = Field(
        ...,
        description="The x center of one example in OSGB coordinates",
    )

    y_center_osgb: float = Field(
        ...,
        description="The y center of one example in OSGB coordinates",
    )

    id: Optional[int] = Field(
        None,
        description="The id of the GSP or the PV system. "
        "This is optional so can be None",
    )


class Metadata(BaseModel):
    """Class to store metadata data"""

    batch_size: int = Field(
        ...,
        g=0,
        description="The size of this batch. If the batch size is 0, "
        "then this item stores one data item",
    )

    locations: List[Location]

@peterdudfield peterdudfield marked this pull request as draft February 1, 2022 17:30
@JackKelly
Copy link
Contributor

Cool beans.

But, um, isn't the GSP ID (and PV system ID) already in each batch? I haven't checked the code recently but I definitely remember feeding both the GSP ID and PV system ID into my ML models back in December 🙂 Or am I getting confused?

@peterdudfield
Copy link
Contributor Author

Cool beans.

But, um, isn't the GSP ID (and PV system ID) already in each batch? I haven't checked the code recently but I definitely remember feeding both the GSP ID and PV system ID into my ML models back in December 🙂 Or am I getting confused?

No your right! Just that means for the live models we would need the data sources: gsp in order for the system to work. Even though the model might not use gsp data. Im making good progress with #600 so hopefully itll all come together

@codecov-commenter
Copy link

codecov-commenter commented Feb 1, 2022

Codecov Report

Merging #599 (5e985ef) into main (1cc44c9) will increase coverage by 0.10%.
The diff coverage is 97.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #599      +/-   ##
==========================================
+ Coverage   93.25%   93.36%   +0.10%     
==========================================
  Files          45       45              
  Lines        2922     2982      +60     
==========================================
+ Hits         2725     2784      +59     
- Misses        197      198       +1     
Impacted Files Coverage Δ
nowcasting_dataset/consts.py 100.00% <ø> (ø)
nowcasting_dataset/manager/manager_live.py 93.93% <91.66%> (+2.76%) ⬆️
...asting_dataset/data_sources/gsp/gsp_data_source.py 91.50% <93.10%> (-0.03%) ⬇️
...ng_dataset/data_sources/metadata/metadata_model.py 98.41% <97.77%> (-1.59%) ⬇️
nowcasting_dataset/data_sources/data_source.py 87.73% <100.00%> (-0.23%) ⬇️
nowcasting_dataset/data_sources/fake/batch.py 87.17% <100.00%> (-0.06%) ⬇️
...owcasting_dataset/data_sources/fake/coordinates.py 90.32% <100.00%> (ø)
...a_sources/optical_flow/optical_flow_data_source.py 100.00% <100.00%> (ø)
...wcasting_dataset/data_sources/pv/pv_data_source.py 97.25% <100.00%> (+0.03%) ⬆️
...et/data_sources/satellite/satellite_data_source.py 94.11% <100.00%> (+0.17%) ⬆️
... and 4 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b8e8947...5e985ef. Read the comment docs.

@peterdudfield peterdudfield marked this pull request as ready for review February 1, 2022 21:46
@peterdudfield
Copy link
Contributor Author

Comments for chat with @JackKelly and @jacobbieker
-change Location to SpaceTimeLocation

  • add id_type, to show what type of id it is

Copy link
Contributor

@jacobbieker jacobbieker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Just one small comment. Good job!

Co-authored-by: Jacob Bieker <jacob@openclimatefix.org>
@peterdudfield
Copy link
Contributor Author

@JackKelly im gona merge this, to push on the MVP, but please free feel to add comments, and ill do those changes

@peterdudfield peterdudfield merged commit 02cfe7b into main Feb 2, 2022
@peterdudfield peterdudfield deleted the issue/598-id-metadata branch February 2, 2022 13:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add id to metadata

5 participants