Description
Running plotGraph.py on any collected metrics directory fails with a ValueError in treecript/parser.py at line 214. The error occurs because columns like processor_ids, core_ids, and cpu_ids in the metrics CSV files contain space-separated strings (e.g. "1 3 5"), making full_stats an inhomogeneous list. pd.array() without an explicit dtype tries to infer a uniform numeric type and fails.
This bug is reproducible on any platform with any pandas version (tested on 1.5.3 and 2.3.3).
Environment
- OS: Ubuntu 22.04 (native Linux)
- Python: 3.10
- Installation:
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec
- pandas: as pinned by
constraints-3.10.txt
Steps to reproduce
# Install treecript
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@exec
# Run plotGraph.py on any collected metrics directory
plotGraph.py path/to/metrics_dir/ path/to/output_dir/
Traceback
Traceback (most recent call last):
File "/home/andreabsc/miniforge3/envs/Treecript/bin/plotGraph.py", line 24, in
main()
File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/treecript/plot_graph.py", line 279, in main
plot_graphs(
File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/treecript/plot_graph.py", line 239, in plot_graphs
pids, num_cpu_cores, sampling_period_seconds = metrics_parser(
File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/treecript/parser.py", line 214, in metrics_parser
pids["full_stats"] = pd.array(full_stats)
File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/pandas/core/construction.py", line 384, in array
return NumpyExtensionArray._from_sequence(data, dtype=dtype, copy=copy)
File "/home/andreabsc/miniforge3/envs/Treecript/lib/python3.10/site-packages/pandas/core/arrays/numpy_.py", line 130, in _from_sequence
result = np.asarray(scalars, dtype=dtype) # type: ignore[arg-type]
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (11,) + inhomogeneous part.
Root cause
In treecript/parser.py line 214:
# current — fails
pids["full_stats"] = pd.array(full_stats)
full_stats contains rows where some fields (processor_ids, core_ids, cpu_ids) are space-separated strings while others are numeric, resulting in an inhomogeneous structure that pd.array() cannot coerce to a single dtype.
Fix
# fixed — explicitly allow mixed types
pids["full_stats"] = pd.array(full_stats, dtype=object)
This tells pandas not to infer a uniform type and simply store the elements as-is, which is the correct behaviour for a mixed-type column.
Description
Running
plotGraph.pyon any collected metrics directory fails with aValueErrorintreecript/parser.pyat line 214. The error occurs because columns likeprocessor_ids,core_ids, andcpu_idsin the metrics CSV files contain space-separated strings (e.g."1 3 5"), makingfull_statsan inhomogeneous list.pd.array()without an explicitdtypetries to infer a uniform numeric type and fails.This bug is reproducible on any platform with any pandas version (tested on
1.5.3and2.3.3).Environment
pip install -c constraints-3.10.txt git+https://github.com/inab/treecript.git@execconstraints-3.10.txtSteps to reproduce
Traceback
Root cause
In
treecript/parser.pyline 214:full_statscontains rows where some fields (processor_ids,core_ids,cpu_ids) are space-separated strings while others are numeric, resulting in an inhomogeneous structure thatpd.array()cannot coerce to a single dtype.Fix
This tells pandas not to infer a uniform type and simply store the elements as-is, which is the correct behaviour for a mixed-type column.