-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Split up overall ts files #647
Comments
Maybe a common practice from SQL databases helps here: The principle there is make record sizes as small as possible (and as big as necessary). |
Since I just had a look at the structure of all the json files for my parallelization efforts, here's a little documentation:
So it should be easy to split that as done e.g. with the map files by obs network and variable |
nice, here is the from __future__ import annotations
from pydantic import BaseModel, Field
class Field1610668800000(BaseModel):
totnum: int
num_valid: int
refdata_mean: float
refdata_std: float
data_mean: float
data_std: float
weighted: int
rms: float
r: float = Field(..., alias='R')
r_spearman: float = Field(..., alias='R_spearman')
r_kendall: float = Field(..., alias='R_kendall')
nmb: float
mnmb: float
fge: float
num_coords_tot: int
num_coords_with_data: int
class World(BaseModel):
field_1610668800000: Field1610668800000 = Field(..., alias='1610668800000')
class Sconcpm25(BaseModel):
world: World = Field(..., alias='WORLD')
class IfsOsuite(BaseModel):
sconcpm25: Sconcpm25
class Surface(BaseModel):
ifs_osuite: IfsOsuite = Field(..., alias='IFS-OSUITE')
class AnEeaMp(BaseModel):
surface: Surface = Field(..., alias='Surface')
class Concpm25(BaseModel):
an_eea_mp: AnEeaMp = Field(..., alias='AN-EEA-MP')
class Model(BaseModel):
concpm25: Concpm25 |
I implemented a first attempt at splitting up the timeseries files this morning, and will be testing it out this afternoon. |
The files are still quite big when considering the cams2-83 last-seasons experiment.
I suggest to also split by region then, which will also be consistent with other timeseries files (e.g in the forecast directory:
|
The overall timeseries (located in hm/ts/) should be separated into multiple files, as they can be very large (>100MB) when considering multiple models, observations, .... All models should be in the same file, but we could separate them at least per observation and perhaps per region.
The text was updated successfully, but these errors were encountered: