Official repository for the (submitted) APMS 2026 paper: "FlowExtract: Procedural Knowledge Extraction from Maintenance Flowcharts".
Maintenance procedures in manufacturing facilities are often documented as flowcharts in static PDFs or scanned images. These documents encode procedural knowledge essential for asset lifecycle management but remain inaccessible to modern operator support systems. While Vision-Language Models (VLMs) struggle to reconstruct complex connection topologies from such diagrams, FlowExtract offers a robust, hybrid alternative.
FlowExtract is a pipeline that deliberately separates element detection from connectivity reconstruction:
- Node Detection: Single-stage object detection (YOLOv8s) localized and classified symbols.
- Text Extraction: Deep-learning OCR (EasyOCR) extracts node content.
- Edge Extraction: Classical line-tracing (Hough Transform) derives directed graphs from detected arrowheads.
By focusing on high precision rather than forced recall, FlowExtract is explicitly designed for Human-in-the-Loop (HITL) workflows. The system provides a highly reliable structural skeleton of the standard operating procedure, allowing human validators to efficiently contribute completeness without having to untangle hallucinatory cross-links.
Evaluated on a dataset of real-world ISO 5807-standardized industrial troubleshooting guides, FlowExtract substantially outperforms state-of-the-art vision-language model baselines (such as Qwen2-VL-7B and Pixtral-12B) on graph extraction tasks.
- Node Detection (F1):
98.8%(vs best VLM: 34.0%) - Edge Detection (F1):
66.7%(vs best VLM: 10.7%) - Edge Precision:
85.5%
The pipeline successfully handles dense technical terminology, tightly spaced nodes, and overlapping edges, tracing multi-branching procedural paths accurately.
The original textual content within the nodes has been computationally redacted to anonymize proprietary procedural data, while preserving the structural morphology.
- Python 3.9+
- Tesseract (required by EasyOCR depending on OS)
- MacOS M-series or CUDA-compatible GPU recommended for YOLO inference.
-
Clone the repository:
git clone https://github.com/guille-gil/FlowExtract.git cd FlowExtract -
Create a virtual environment and install dependencies:
python -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate pip install -r requirements.txt
-
Download the pre-trained model weights (if hosted externally) and place them in:
runs/detect/train/weights/best.pt
FlowExtract/
├── docs/ # Auxiliary documentation and paper figures
├── data/
│ ├── input/ # Raw legacy PDFs/images and YOLO annotations
│ ├── intermediate/ # Output of intermediate pipeline stages
│ └── output/ # Final JSON graphs and metric charts
├── scripts/
│ ├── train_yolo.py # Script for fine-tuning YOLOv8s
│ └── generate_figure.py # Qualitative validation chart generation
├── src/
│ ├── pipeline/ # Modulized extraction pipeline (Stages 1-3)
│ ├── utils/ # Bounding box spatial heuristics & visualization
│ ├── main.py # Main operational script
│ └── evaluate.py # End-to-end ground-truth metric evaluation
└── README.md
To extract a directed graph from a raw flowchart image, run the main entry point:
python src/main.pyThis will parse the files in data/input/images/test/ and output the structural JSON graphs to data/intermediate/arrows/.
To replicate the evaluation results found in the paper, execute the evaluation script. This will compare the extracted JSON graphs against the data/input/final_annotations ground truth:
python src/evaluate.py --chartsEvaluation metrics will be printed to stdout, and publication-ready charts (like the ones generated for APMS) will be saved to data/output/charts/.
Note: The Vision-Language Model (VLM) baseline results reported in our paper are evaluated on the same dataset in our prior work. If you reference those comparisons, please cite:
@article{gilavalle2026procedural,
title={Procedural Knowledge Extraction from Industrial Troubleshooting Guides Using Vision Language Models},
author={Gil de Avalle, Guillermo and Maruster, Laura and Emmanouilidis, Christos},
journal={arXiv preprint arXiv:2601.22754},
year={2026}
}This project's source code is licensed under the MIT License, which permits commercial use and modifications provided that original attribution remains. See the LICENSE file for full details.

