Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement new calculate_population_statistics method. #453

Merged
merged 23 commits into from
Jun 28, 2024
Merged

Conversation

TinyMarsh
Copy link
Contributor

@TinyMarsh TinyMarsh commented Jun 19, 2024

This begins to tackle #420, but does not completely close that issue because I wanted to keep the PRs at at a reasonably-digestible size. More PRs to come if/when this gets approved. Happy to discuss this PR in person as it's a bit weird.

  • Move initialisation of factors to the header file as a data member and rename to factors_to_calculate_. Eventually this will be set elsewhere anyway.
  • Move factor_bins_ to the header file as a data member. This is because it needs to be accessed elsewhere in the class.
  • Introduces factor_bin_widths_ as a data member. This helps us calculate which bin a given factor value is in later on.
  • Overload the calculate_population_statistics function with the calculated_factors_ vector in the signature instead of the series object. We can remove the original function when we have completed implementation of this new way of calculating the analysis.

The new, overloaded calculate_population_statistics function is the main focus of this PR. At the moment all it is doing is calculating the correct index in the calculated_factors_ vector for a given entity. This is necessary because we are representing a matrix of arbitrary dimensions in a flat vector. The number of dimensions is equal to the size of factors_to_calculate_.

The calculation for fetching the correct index is based on the following idea for calculating the index in a 3D, row-major ordered matrix with dimensions of length L, M, and N.

matrix[ i ][ j ][ k ] = vector[ i*(N*M) + j*M + k ]

but extended to be able to deal with an arbitrary number of dimensions (I think I have done this right, really this is the bit of the PR that I'm most interested in being reviewed).

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small suggestions and a question, but otherwise looks good.

I tried reasoning through the code for calculating the indices but it made my head hurt after a while 😆. It didn't look obviously wrong but I'm not confident that I've got it right either! How about making a separate standalone function for caculating indices for an n-dimensional matrix and writing a unit test for it (using some hand-calculated values)?

src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.h Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@jamesturner246 jamesturner246 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting to take shape 👍. Some fixing up to do still.

src/HealthGPS/analysis_module.h Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.h Outdated Show resolved Hide resolved
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

src/HealthGPS/analysis_module.cpp Outdated Show resolved Hide resolved
src/HealthGPS/analysis_module.h Outdated Show resolved Hide resolved
@TinyMarsh
Copy link
Contributor Author

Thanks for the feedback @jamesturner246 @alexdewar . This should be good for another review now.

I don't really understand the wall of text output for why the build is failing on Windows. I think it doesn't like this:

size_t total_num_bins =
std::accumulate(factor_bins_.cbegin(), factor_bins_.cend(), 1, std::multiplies<>());

but I'm not really sure what it wants me to do. Any ideas?

@jamesturner246
Copy link
Contributor

Humm.. Try:

 size_t total_num_bins = 
     std::accumulate(factor_bins_.cbegin(), factor_bins_.cend(), 1u, std::multiplies<>()); 

Note 1u for unsigned int literal.

Copy link
Contributor

@jamesturner246 jamesturner246 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. 👍

Re. the Windows bug, looks like a pedantic casting warning, Try using 1u as mentioned.

Copy link
Contributor

@alexdewar alexdewar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (except for that weird MSVC failure).

@TinyMarsh TinyMarsh enabled auto-merge June 28, 2024 15:35
@TinyMarsh
Copy link
Contributor Author

Yep well done @jamesturner246 that was the problem, I ended up having to make the accumulator type in std::accumulate match the type of total_num_bins. MSVC seems happy now.

@TinyMarsh TinyMarsh merged commit 0a6531b into main Jun 28, 2024
5 checks passed
@alexdewar alexdewar deleted the extend_analysis branch July 1, 2024 06:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants