Conversation
|
Thanks @Chrimspie! @JoelLucaAdams Would it be possible to implement this as an optional argument to I think this is roughly how it's handled in plain Xarray, though it also has some extra options. |
Co-authored-by: Chrimspie <chrimspie@gmail.com>
|
Thanks again for writing this along with the tests @Chrimspie! (Like we discussed documentation can come later!) @LiamPattinson Thats a great suggestion to stay more in-line with how I have refactored the code (and tests) to utilise We load all the data in with the A couple of things to note:
Examples: import sdf_xarray as sdfxr
ds = sdfxr.open_mfdataset(
"/Users/joel/Source/sdf-xarray/tests/example_files_1D/*.sdf",
keep_particles=True,
data_vars=["Particles_Particles_Per_Cell_proton", "Electric_Field_Ez", "dist_fn_x_px_proton"]
)
<xarray.Dataset> Size: 143kB
Dimensions: (time: 11, X_x_px_proton: 16,
Px_x_px_proton: 100, X_Grid_mid: 16)
Coordinates:
* time (time) float64 88B 5.467e-14 ... 2.4...
* X_x_px_proton (X_x_px_proton) float64 128B 1.725e-...
* Px_x_px_proton (Px_x_px_proton) float64 800B -2.97e...
* X_Grid_mid (X_Grid_mid) float64 128B 1.725e-05 ...
Data variables:
dist_fn_x_px_proton (time, X_x_px_proton, Px_x_px_proton) float64 141kB dask.array<chunksize=(1, 16, 100), meta=np.ndarray>
Particles_Particles_Per_Cell_proton (time) float64 88B nan nan ... 120.0
Electric_Field_Ez (time, X_Grid_mid) float64 1kB dask.array<chunksize=(1, 16), meta=np.ndarray>
Attributes: (12/21)
filename: /Users/joel/Source/sdf-xarray/tests/example_files_1D/00...
file_version: 1
file_revision: 4
code_name: Epoch1d
step: 0
time: 5.466992913512341e-14
... ...
compile_machine: noether
compile_flags: unknown
defines: 0
compile_date: Mon Jul 29 09:55:15 2024
run_date: Thu Oct 17 11:08:44 2024
io_date: Thu Oct 17 11:08:44 2024 |
|
Potentially fixes #57 as you could load it in 1 distribution function at a time. FYI @LucyMeganArmitage |
LiamPattinson
left a comment
There was a problem hiding this comment.
I've added a few suggestions, and the main issue that stands out to me is that I don't think the case in which separate_times == True and data_vars is not None is being handled.
Co-authored-by: Liam Pattinson <LiamPattinson@users.noreply.github.com>
Co-authored-by: Liam Pattinson <LiamPattinson@users.noreply.github.com>
Co-authored-by: Liam Pattinson <LiamPattinson@users.noreply.github.com>
Co-authored-by: Liam Pattinson <LiamPattinson@users.noreply.github.com>
LiamPattinson
left a comment
There was a problem hiding this comment.
Nice, this looks really clean now 👍
|
@Chrimspie Thanks again for the original code implementation, I hope the new way it works isn't too confusing! @LiamPattinson Thanks again for reviewing it. You've reviewed so many of these PRs now that I think its high time you add your name to the list of contributors in |
Use of
open_mfdataset()can cause the machine to allocate unfeasible amounts of memory to store the dataset created from the SDF files. This function avoids the requesting of excess memory by only extracting data from the files for a single variable, given as an argument. The function opens SDF files one by one, extracts the requested variable, and appends it to a list of data arrays. Only after this information is extracted from all relevant SDF files is the output dataset created. This dataset is much smaller than that which would be created byopen_mfdataset().Example: