StreetSign2POI is an end-to-end framework for automatic extraction and generation of Points of Interest (POI) data from street view imagery. The framework integrates computer vision and natural language processing technologies through three core components: an enhanced YOLOv11s-DLKA model for precise storefront signboard detection, a multimodal feature-based ROI clustering algorithm for cross-view signboard instance association, and a POI generation pipeline combining large language models with 3D reconstruction for name extraction and accurate positioning. We also provide the first public benchmark dataset, containing 3,536 street view images, 1,004 manually annotated signboards, and 1,078 verified OpenStreetMap POI data. Compared to existing methods, our approach achieves significant performance improvements in signboard detection and POI name extraction, offering an efficient and precise POI data generation solution for smart city applications.
The dataset and model file can be download from here, extracted code:hcxj.