Skip to content

Functions to write TOD in HDF5 files#139

Merged
ziotom78 merged 35 commits intomasterfrom
todwrite
Dec 7, 2021
Merged

Functions to write TOD in HDF5 files#139
ziotom78 merged 35 commits intomasterfrom
todwrite

Conversation

@ziotom78
Copy link
Copy Markdown
Member

@ziotom78 ziotom78 commented Oct 8, 2021

  • Add dependency on h5py
  • Use a more precise type for Observation.init
  • Add code to write TODs to HDF5 files
  • Add tests
  • Add documentation

For now the code uses h5py 3.1, as it is the last version supporting Python 3.6.

@sgiardie
Copy link
Copy Markdown
Contributor

sgiardie commented Oct 26, 2021

Hi @ziotom78 , I slightly modified the function write_one_observation in io.py in the part in which the attributes for each observation are written to a json file. I found some problems with the attribute 'sampling_rate_hz', which is a single number for all the detectors instead of a list of values for each of them (I corrected the problem in this way). Another problem was in the json serialization of numpy objects (see for example this, for np.int and this, for np.array). I tried to solve that issue in this way. Now the function seems to work, but give a look at it and see if those solutions are improvable. Thanks!

@ziotom78
Copy link
Copy Markdown
Member Author

Thanks @sgiardie for having spotted the problem with NumPy types, I have slightly reworked your solution and now everything seems to work as expected. I have added a couple of tests to the suite, but more are needed:

  • Test the actual value of the TOD and pointings
  • Test that the HDF5 file is able to save proper MJD times and recover them

Once this is in place, we just need to add a chapter to the User's Manual.

Now that PR#136 has been merged, there is no reason to keep using h5py 3.1.
There are some peculiar situations (used mostly in unit tests) where
distribute_optimally returns a number of time spans that is smaller
than the number of MPI processes. This happens because of the way
the painter's algorithm works. Suppose that there are 12 observations
of equal length and 9 MPI processes; in this case, the painter's
algorithm comes up with 2 observations for the first 6 processes
and no observations for the last 3. This is an optimal solution
because it minimizes the maximum amount of data per process; any
alternative like 2+2+2+1+1+1+1+1+1 would leave 6 processes IDLE for
50% of the time and still require the same time as 2+2+2+2+2+2+0+0+0.

This commit fixes a failed assert that assumed that all the processes
should be filled with at least one observation.
Given that now distribute_optimally might assign no observations
to some MPI processes, this commit makes sure that
write_list_of_observations properly works in this case too.
@ziotom78
Copy link
Copy Markdown
Member Author

Finally the PR has all the features implemented, and all the tests pass. Please have a look at the documentation here and post your comment; I plan to merge this next week.

@ziotom78
Copy link
Copy Markdown
Member Author

Warning: the tests have failed because of a failure in fetching PySM maps, but the tests were passing till last commit (which only removed a few debug statements). Until PR#147 is merged, we must cope with these failures.

@ziotom78 ziotom78 merged commit 1ef63ca into master Dec 7, 2021
@ziotom78 ziotom78 deleted the todwrite branch December 7, 2021 12:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants