Skip to content

Conversation

@krystophny
Copy link
Collaborator

@krystophny krystophny commented Jul 20, 2025

Summary

  • Add comprehensive histogram plotting functionality to fortplotlib
  • Implement fig%hist(data, bins=10, density=.false.) method with matplotlib-compatible API
  • Support automatic and custom binning with proper density normalization
  • Include comprehensive test coverage and example demonstrations

Features Added

  • Core Implementation: Added PLOT_TYPE_HISTOGRAM constant and hist method to figure_t
  • Binning Logic: Automatic bin edge calculation with customizable bin counts
  • Density Support: Optional probability density normalization for statistical analysis
  • Backend Support: Works with all backends (PNG, PDF, ASCII)
  • Error Handling: Graceful handling of edge cases (empty data, invalid parameters)

Test Coverage

  • Basic histogram creation with default 10 bins
  • Custom bin count specification
  • Density normalization functionality
  • Empty data edge case handling
  • All tests pass with TDD approach

Example Usage

use fortplot
type(figure_t) :: fig

\! Basic histogram
call fig%hist(data)

\! Custom bins and density
call fig%hist(data, bins=20, density=.true.)

\! Multiple histograms with labels
call fig%hist(data1, label='Dataset 1', color=[0.0, 0.447, 0.698])
call fig%hist(data2, label='Dataset 2', color=[0.835, 0.369, 0.0])
call fig%legend()

Visual Verification

To see the histogram output:

  1. Check out this branch: git checkout issue-49-histogram-support
  2. Run the example: make example ARGS="histogram_demo"
  3. View generated plots in plots/ directory:
    • histogram_basic.png - Basic histogram with default 10 bins
    • histogram_custom_bins.png - Histogram with 20 bins
    • histogram_density.png - Density-normalized histogram
    • histogram_multiple.png - Multiple histograms with legend

Test plan

  • All histogram tests pass: make test ARGS="--target test_histogram"
  • Examples generate proper PNG output
  • No regressions in existing functionality
  • TDD approach followed (tests written first)

🤖 Generated with Claude Code

krystophny and others added 5 commits July 20, 2025 18:28
Implements NaN support in add_plot to allow disconnected line segments,
resolving issue #47. When NaN values are encountered in x or y arrays,
the line drawing is interrupted, creating separate segments.

Changes:
- Modified render_solid_line to skip segments containing NaN values
- Updated render_patterned_line to handle NaN with pattern state reset
- Enhanced render_markers to skip NaN data points
- All backends (PNG, PDF, ASCII) automatically support this feature

This allows users to plot disconnected segments in a single add_plot call
by inserting NaN values as separators, similar to matplotlib behavior.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add additional test cases to cover:
- Patterned lines (dashed) with NaN breaks
- Markers-only plots with NaN values

This tests more code paths in render_patterned_line and render_markers
to improve test coverage.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Add extensive test coverage for NaN handling edge cases:
- All line style patterns (solid, dashed, dotted, dash-dot, unknown)
- Edge cases: all NaN arrays, leading/trailing NaN, consecutive NaN
- Small arrays: single points, two points with NaN combinations
- Robust handling when valid_count = 0 using masked maxval/minval

Also fix potential crash when all points are NaN by adding safeguard
for maxval/minval operations and providing default plot_scale.

This should significantly improve code coverage metrics.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fix bug where cosine calculation used incorrect indexing x(i-5+1)
instead of x(i), which would produce incorrect plot values.

Addresses review feedback from PR #48.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
* Add PLOT_TYPE_HISTOGRAM constant and hist method to figure_t
* Implement histogram binning with automatic and custom bin counts
* Support density normalization for probability density plots
* Include comprehensive test suite with edge cases
* Add example demonstrating various histogram features
* Support all backends (PNG, PDF, ASCII)

Follows matplotlib-compatible API: fig%hist(data, bins=10, density=.false.)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@qodo-code-review
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Performance Concern

The histogram binning algorithm uses linear search in find_bin_index which has O(n*m) complexity where n is data size and m is number of bins. For large datasets, this could be slow. Consider binary search or vectorized approach.

integer function find_bin_index(value, bin_edges) result(bin_idx)
    !! Find which bin a value belongs to
    real(wp), intent(in) :: value
    real(wp), intent(in) :: bin_edges(:)

    integer :: i, n_bins

    n_bins = size(bin_edges) - 1
    bin_idx = 0

    do i = 1, n_bins
        if (value >= bin_edges(i) .and. value < bin_edges(i+1)) then
            bin_idx = i
            return
        end if
    end do

    ! Handle edge case: value equals last bin edge
    if (value == bin_edges(n_bins + 1)) then
        bin_idx = n_bins
    end if
end function find_bin_index
Edge Case Bug

In create_bin_edges_from_count, when data_min == data_max (all data points identical), bin_width becomes zero leading to identical bin edges. This could cause division by zero or incorrect binning behavior.

subroutine create_bin_edges_from_count(data, n_bins, bin_edges)
    !! Create evenly spaced bin edges from data range
    real(wp), intent(in) :: data(:)
    integer, intent(in) :: n_bins
    real(wp), allocatable, intent(out) :: bin_edges(:)

    real(wp) :: data_min, data_max, bin_width
    integer :: i

    data_min = minval(data)
    data_max = maxval(data)

    ! Add small padding to avoid edge cases
    bin_width = (data_max - data_min) / real(n_bins, wp)
    data_min = data_min - bin_width * 0.001_wp
    data_max = data_max + bin_width * 0.001_wp
    bin_width = (data_max - data_min) / real(n_bins, wp)

    allocate(bin_edges(n_bins + 1))
    do i = 1, n_bins + 1
        bin_edges(i) = data_min + real(i - 1, wp) * bin_width
    end do
end subroutine create_bin_edges_from_count
Memory Allocation

The create_histogram_xy_data function allocates 4 * n_bins + 1 points for bar rendering, but this seems excessive for simple histogram bars. The allocation pattern and memory usage should be validated for efficiency.

subroutine create_histogram_xy_data(bin_edges, counts, x, y)
    !! Convert histogram data to x,y coordinates for rendering as bars
    real(wp), intent(in) :: bin_edges(:), counts(:)
    real(wp), allocatable, intent(out) :: x(:), y(:)

    integer :: n_bins, i, point_idx

    n_bins = size(counts)

    ! Create bar outline: 4 points per bin (bottom-left, top-left, top-right, bottom-right)
    allocate(x(4 * n_bins + 1), y(4 * n_bins + 1))

    point_idx = 1
    do i = 1, n_bins
        ! Bottom-left
        x(point_idx) = bin_edges(i)
        y(point_idx) = 0.0_wp
        point_idx = point_idx + 1

        ! Top-left
        x(point_idx) = bin_edges(i)
        y(point_idx) = counts(i)
        point_idx = point_idx + 1

        ! Top-right
        x(point_idx) = bin_edges(i+1)
        y(point_idx) = counts(i)
        point_idx = point_idx + 1

        ! Bottom-right
        x(point_idx) = bin_edges(i+1)
        y(point_idx) = 0.0_wp
        point_idx = point_idx + 1
    end do

    ! Close the shape
    x(point_idx) = bin_edges(1)
    y(point_idx) = 0.0_wp
end subroutine create_histogram_xy_data

@qodo-code-review
Copy link

qodo-code-review bot commented Jul 20, 2025

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Validate array size consistency
Suggestion Impact:The suggestion was directly implemented in the commit. The exact validation check was added to the normalize_histogram_density subroutine to verify that size(bin_edges) equals size(counts) + 1, with a warning message and early return if the condition is not met.

code diff:

+        if (size(bin_edges) /= size(counts) + 1) then
+            print *, 'Warning: bin_edges size mismatch in density normalization'
+            return
+        end if

Add validation to ensure bin_edges has the correct size relative to counts. The
current code assumes size(bin_edges) = size(counts) + 1 but doesn't verify this,
which could cause array bounds violations.

src/fortplot_figure_core.f90 [1989-2006]

 subroutine normalize_histogram_density(counts, bin_edges)
     !! Normalize histogram to probability density
     real(wp), intent(inout) :: counts(:)
     real(wp), intent(in) :: bin_edges(:)
     
     real(wp) :: total_area, bin_width
     integer :: i
+    
+    if (size(bin_edges) /= size(counts) + 1) then
+        write(*, '(A)') 'Warning: bin_edges size mismatch in density normalization'
+        return
+    end if
     
     total_area = 0.0_wp
     do i = 1, size(counts)
         bin_width = bin_edges(i+1) - bin_edges(i)
         total_area = total_area + counts(i) * bin_width
     end do
     
     if (total_area > 0.0_wp) then
         counts = counts / total_area
     end if
 end subroutine normalize_histogram_density

[Suggestion processed]

Suggestion importance[1-10]: 7

__

Why: This suggestion adds a valuable validation check to prevent potential out-of-bounds array access if bin_edges and counts have inconsistent sizes, improving the subroutine's robustness.

Medium
Prevent zero plot scale division
Suggestion Impact:The exact suggested code change was implemented - a check for plot_scale <= 0.0_wp was added with fallback to 1.0_wp

code diff:

+            if (plot_scale <= 0.0_wp) plot_scale = 1.0_wp

Add validation to prevent division by zero when x_range and y_range are both
zero. This can occur when all valid points have identical coordinates, leading
to undefined behavior in pattern calculations.

src/fortplot_figure_core.f90 [1661-1668]

 if (valid_count > 0) then
     x_range = maxval(x_trans, mask=valid_points) - minval(x_trans, mask=valid_points)
     y_range = maxval(y_trans, mask=valid_points) - minval(y_trans, mask=valid_points)
     plot_scale = max(x_range, y_range)
+    if (plot_scale <= 0.0_wp) plot_scale = 1.0_wp
 else
     ! All points are NaN, use default scale
     plot_scale = 1.0_wp
 end if

[Suggestion processed]

Suggestion importance[1-10]: 6

__

Why: This is a good defensive check for the edge case where all valid data points are identical, which would make plot_scale zero and cause incorrect rendering of patterned lines.

Low
  • Update

@codecov
Copy link

codecov bot commented Jul 20, 2025

@krystophny
Copy link
Collaborator Author

Visual Verification

Here's the basic histogram example output:

@krystophny
Copy link
Collaborator Author

📊 Visual Verification Results

I've successfully tested the histogram functionality and generated the following outputs:

Generated Files

  • histogram_basic.png (20.8 KB) - Basic histogram with 1000 data points, 10 bins
  • histogram_custom_bins.png - Same data with 20 bins for higher resolution
  • histogram_density.png - Density-normalized histogram showing probability distribution
  • histogram_multiple.png - Multiple overlapping histograms with legend

Test Commands for Reviewers

# Run the histogram example
make example ARGS="histogram_demo"

# Run histogram tests  
make test ARGS="--target test_histogram"

Expected Output

The basic histogram should show:

  • Properly filled rectangular bars
  • Data distributed across 10 bins from ~-5 to ~15 range
  • Y-axis showing frequency counts (0-120 range)
  • Clean bar edges with no gaps
  • Professional matplotlib-style appearance

Test Results

✅ All 4 histogram tests pass
✅ PNG generation successful across all backends
✅ Density normalization works correctly
✅ Empty data handled gracefully
✅ No regressions in existing functionality

The implementation is ready for review!

krystophny and others added 18 commits July 20, 2025 23:09
The lines exceeded the 132 character limit causing CI failures.
Split the use and public statements across multiple lines.
- Add size validation to normalize_histogram_density to prevent array bounds violations
- Handle edge case when all data points are identical to avoid zero bin width
- Add protection against zero plot scale division in pattern calculations
- Optimize find_bin_index to use binary search instead of linear search for O(log n) performance
- Add comprehensive edge case tests to verify fixes

These improvements address all the valid concerns raised in the PR review.
…agic numbers

- Split find_bin_index (34 lines) into three focused functions under 30 lines:
  * find_bin_index (22 lines): main entry point
  * is_value_in_range (8 lines): range validation
  * binary_search_bins (24 lines): binary search logic
- Extract magic numbers to named constants:
  * DEFAULT_HISTOGRAM_BINS = 10 (was hardcoded 10)
  * IDENTICAL_VALUE_PADDING = 0.5_wp (was hardcoded 0.5_wp)
  * BIN_EDGE_PADDING_FACTOR = 0.001_wp (was hardcoded 0.001_wp)
- Maintain Single Responsibility Principle compliance
- All histogram tests pass, functionality preserved
Split remaining large histogram routines into focused sub-routines:

- add_histogram_plot_data (62→26 lines): Separated bin setup and properties
- render_histogram_plot (45→20 lines): Extracted individual bar rendering
- create_histogram_xy_data (39→22 lines): Split outline point generation

New helper routines (all ≤30 lines):
- setup_histogram_bins (28 lines): Handle bin creation and normalization
- setup_histogram_plot_properties (28 lines): Configure label, color, style
- render_histogram_bar (10 lines): Coordinate single bar rendering
- transform_histogram_bar_coordinates (22 lines): Data to screen transforms
- draw_histogram_bar_shape (11 lines): Backend drawing calls
- add_bar_outline_points (26 lines): Generate 4-corner bin outline

All routines now comply with SOLID single responsibility principle.
Functionality and performance preserved exactly.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Adds hist() and histogram() subroutines to main fortplot module public
interface, enabling matplotlib-style usage:
- call hist(data, bins=20, density=.true., label='Distribution')
- call histogram(data, bins=20, density=.true., label='Distribution')

Both APIs forward all parameters (data, bins, density, label, color) to
the existing figure%hist() method, ensuring identical functionality.

Resolves CRITICAL API exposure gap where histogram was only available
through figure method (fig%hist()) but not through global API like
other plot types (plot, contour, etc.).

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Extract validation logic into separate helper function to reduce main
histogram subroutine from 31 lines to 23 lines, satisfying the mandatory
30-line maximum requirement. This maintains single responsibility principle
while preserving all existing functionality.

- Add validate_histogram_input helper function
- Maintain all error checking and warning functionality
- All histogram tests continue to pass

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Add input validation for bins parameter in histogram functions
- Remove 40 binary artifact files from working directory
- Replace warning output pollution with silent error handling
- Extract magic numbers to named constants for line width, contour levels, and pattern factors

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Update checkbox to reflect that histogram functionality is complete
and functional in the library API.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…eter

- Move bins validation from setup_histogram_bins to validate_histogram_input
- Validate bins > 0 and bins <= MAX_SAFE_BINS before processing
- Prevent plot_count increment when validation fails
- Add comprehensive test coverage for boundary conditions
- Test both figure.hist() and global hist()/histogram() APIs

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…-fortran/fortplot

Updates CMake configuration, documentation URLs, and example references
to use correct repository location for CI/CD pipeline functionality.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Remove all test output files (*.png, *.pdf, test_*.txt)
- Remove build directories and .mod files
- Remove macOS artifact file (._.DS_Store)
- Ensure clean workspace with no binary files or build artifacts
- .gitignore properly configured to prevent future binary commits

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…eatures

- Combined histogram and boxplot functionality in API and core modules
- Updated repository URLs from krystophny/fortplot to lazy-fortran/fortplot
- Merged build system changes (both histogram_demo and subplot_demo directories)
- Resolved data structure conflicts (both hist and boxplot data in plot_data_t)
- Combined rendering pipeline (both histogram and boxplot render functions)
- Maintained all existing functionality while integrating new features

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Changed fortplotlib to fortplot in FetchContent_Declare
- Updated target link from fortplotlib::fortplotlib to fortplot::fortplot
- Aligns with actual CMake targets defined in main CMakeLists.txt
- Resolves CMake CI build failure

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
…tions

Address critical issue where global hist() and histogram() functions crashed
with segmentation fault due to uninitialized global figure backend.

Root cause: Global figure variable declared but never auto-initialized
when global functions called methods on it.

Solution:
- Add ensure_global_figure_initialized() helper function
- Auto-initialize global figure with default dimensions if backend not allocated
- Call helper before hist operations in both hist() and histogram() functions
- Maintain matplotlib compatibility by auto-initializing like pyplot.hist()

Changes:
- src/fortplot.f90: Add auto-initialization to hist() and histogram()
- test/test_global_hist_api.f90: Add test for auto-initialization behavior

Verification:
- Global hist(data) now works without explicit figure() call
- Object-oriented fig%hist() continues working unchanged
- All existing tests pass
- New test verifies matplotlib-compatible auto-initialization

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@krystophny krystophny merged commit f8c9951 into main Aug 17, 2025
12 checks passed
@krystophny krystophny deleted the issue-49-histogram-support branch August 17, 2025 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants