Skip to content

plotGraph.py fails with ValueError in parser.py due to inhomogeneous array in pd.array(full_stats) #19

@Androidbeingx

Description

@Androidbeingx

Description

Running plotGraph.py on any collected metrics directory fails with a ValueError in treecript/parser.py at line 214. The error occurs because columns like processor_ids, core_ids, and cpu_ids in the metrics CSV files contain space-separated strings (e.g. "1 3 5"), making full_stats an inhomogeneous list. pd.array() without an explicit dtype tries to infer a uniform numeric type and fails.

This bug is reproducible on any platform with any pandas version (tested on 1.5.3 and 2.3.3).

Environment

  • OS: Ubuntu 22.04 (native Linux)
  • Python: 3.10
  • Installation: pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec
  • pandas: as pinned by constraints-3.10.txt

Steps to reproduce

# Install treecript
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec

# Run plotGraph.py on any collected metrics directory
plotGraph.py path/to/metrics_dir/ path/to/output_dir/

Traceback

Traceback (most recent call last):
  File "/home/andreabsc/miniforge3/envs/Treecript/bin/plotGraph.py", line 24, in 
    main()
  File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/treecript/plot_graph.py", line 279, in main
    plot_graphs(
  File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/treecript/plot_graph.py", line 239, in plot_graphs
    pids, num_cpu_cores, sampling_period_seconds = metrics_parser(
  File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/treecript/parser.py", line 214, in metrics_parser
    pids["full_stats"] = pd.array(full_stats)
  File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/pandas/core/construction.py", line 384, in array
    return NumpyExtensionArray._from_sequence(data, dtype=dtype, copy=copy)
  File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/pandas/core/arrays/numpy_.py", line 130, in _from_sequence
    result = np.asarray(scalars, dtype=dtype)  # type: ignore[arg-type]
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (11,) + inhomogeneous part.

Root cause

In treecript/parser.py line 214:

# current — fails
pids["full_stats"] = pd.array(full_stats)

full_stats contains rows where some fields (processor_ids, core_ids, cpu_ids) are space-separated strings while others are numeric, resulting in an inhomogeneous structure that pd.array() cannot coerce to a single dtype.

Fix

# fixed — explicitly allow mixed types
pids["full_stats"] = pd.array(full_stats, dtype=object)

This tells pandas not to infer a uniform type and simply store the elements as-is, which is the correct behaviour for a mixed-type column.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions