Skip to content

Make Madam work in a MPI environment#204

Merged
ziotom78 merged 11 commits intomasterfrom
fix201
Nov 1, 2022
Merged

Make Madam work in a MPI environment#204
ziotom78 merged 11 commits intomasterfrom
fix201

Conversation

@ziotom78
Copy link
Copy Markdown
Member

@ziotom78 ziotom78 commented Oct 17, 2022

Because of issue #201, Madam files created in a MPI environment do not contain all the TODs. This PR solves the problem by properly running over all the MPI processes.

The PR is quite huge, because the task is complex: Madam requires each detector to have its data in distinct files that must be numbered with an increasing counter. Therefore, to make the code work, this PR implements an algorithm that walks over all the MPI processes and counts how many observations for each of them contribute to each detector.

To make the code clearer to read, and to make litebird_sim easier to debug, I have added a new method to Simulation: describe_mpi_distribution(). Its purpose is to build a «map» of all the observations in every MPI process. This map is defined using the new type MpiDistributionDescr, which can be printed to get a visual representation of the way the TOD was split across observations and processes; here is an example:

# MPI rank #1

## Observation #0
- Start time: 0.0
- Duration: 21600.0 s
- 1 detector(s) (0A)
- TOD shape: 1×216000

## Observation #1
- Start time: 43200.0
- Duration: 21600.0 s
- 1 detector(s) (0A)
- TOD shape: 1×216000

# MPI rank #2

## Observation #0
- Start time: 21600.0
- Duration: 21600.0 s
- 1 detector(s) (0A)
- TOD shape: 1×216000

## Observation #1
- Start time: 64800.0
- Duration: 21600.0 s
- 1 detector(s) (0A)
- TOD shape: 1×216000

# MPI rank #3

## Observation #0
- Start time: 0.0
- Duration: 21600.0 s
- 1 detector(s) (0B)
- TOD shape: 1×216000

## Observation #1
- Start time: 43200.0
- Duration: 21600.0 s
- 1 detector(s) (0B)
- TOD shape: 1×216000

# MPI rank #4

## Observation #0
- Start time: 21600.0
- Duration: 21600.0 s
- 1 detector(s) (0B)
- TOD shape: 1×216000

## Observation #1
- Start time: 64800.0
- Duration: 21600.0 s
- 1 detector(s) (0B)
- TOD shape: 1×216000

Things to do before merging this PR:

  • Implement MpiDistributionDescr and all ancillary classes
  • Implement describe_mpi_distribution
  • Modify save_simulation_for_madam so that it uses describe_mpi_distribution to properly walk over all the MPI processes
  • Document describe_mpi_distribution and MpiDistributionDescr in the manual

@ziotom78 ziotom78 merged commit 597e23d into master Nov 1, 2022
@ziotom78 ziotom78 deleted the fix201 branch November 1, 2022 05:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant