Seesun (시선)

멀티캠퍼스 딥러닝 기반 AI 엔지니어링 과정에서 파이널 프로젝트로 진행한 복합 AI 서비스 시선입니다. (우수상 ⭐ )

Integrative-AI service project from Deep Learning based AI engineering course at Multicampus (won 2nd place ⭐ )

결과보고서(전체 내용) : Result report

1. Overview

개발 배경

기술의 발전이 항상 우리가 살아가는데 이롭게만 작용하지는 않습니다. 4차 산업혁명에 대한 관심과 개발이 해마다 빠르게 진행됨에 따라 이러한 기술로부터 소외된 계층, 단절된 사용자가 발생하기도 합니다. 이러한 이유로 시선 서비스 는 전맹, 약맥 및 시각적으로 불편하신 분들의 새로운 눈이 되어 세상을 밝혀주고자 진행하게 된 복합 AI 서비스 입니다.

초기 구상

초기 구상 단계에서 크게 2가지 기능으로 서비스 개발을 목표로 하였으며 음성을 통해 서비스를 이용할 수 있도록 구상하였습니다.

보여줘 : YOLO v3 활용 객체 탐지 모델

읽어줘 : pytesseract 활용 문자 인식 모델

최종 구현

최종 서비스 구현은 flask 기반 웹 어플리케이션 형태로 구현하였으며, 음성을 통해 보여줘 , 읽어줘 와 같은 명령 전달을 할 수 있습니다.

시연 영상(simulation video)

2. Role

jw0831

Prior research review (OCR, STR, CRAFT)
Text detection modeling ( pytesseract)
Text recognition modeling ( pytesseract)
Text-image preprocessing (OpenCV , deskew)
Support modularization
Translator modeling (Seq2Seq , in progress)

ineed-coffee(작성자)

For details, details of my role

Image data collection (AI HUB , Roboflow.ai , Google open image dataset)
Define custom category & Image annotation work
Custom object detection modeling (YOLOv3 , darknet)
Modularization & Maintenance
Speech-to-text module work (Kakao open API)
Support web application implementation (Flask)

heewonp

Prior research review (CRAFT, YOLOv5 from PyTorch)
Image data collection (AI HUB , Roboflow.ai , Google open image dataset)
Define custom category & Image annotation work
Custom object detection modeling (YOLOv3 , darknet)
Video-stream module work (Flask)
Web application implementation (Flask)

cjlee0217

Prior research review (OCR, STR, EAST, CRAFT)
Text detection modeling ( pytesseract)
Text recognition modeling ( pytesseract)
Text-image preprocessing (OpenCV , deskew)
Translator modeling (Seq2Seq , in progress)

chloecmin

Text detection modeling ( pytesseract)
Text recognition modeling ( pytesseract)
Text-to-speech module work (Clova open API)
Video-stream module work (Flask)
Module QA

3. Skills & Process

Project skills

1. Language & Tool

Python 3.8
Visual Studio Code
PyCharm

2. Object detection model

Darknet framework 🔗 Link
Fine tuning from YoloV3 pretrained weights
opencv-dnn framework (4.4.0)

3. Text recognition model

pytesseract (0.3.6)
deskew (0.10.3)
opencv-python (4.4.0)

Development process

2020.11.24 ~ 2020.12.23

WBS in details

4. Service Architecture

5. Main Function

【Show】 what's in front of you

by asking with specific keyword "보여줘" , our custom YOLO model will tell you what's in front of you

Recognizable object table

__	__	__	__	__
1000 won	10000 won	desk	chair	sunglass
bottle	umbrella	toy	chopstick	biker
car	motorcycle	cat	dog	person
truck	bus	traffic light (green)	traffic light (red)	traffic sign

Example

Out : "There are 1 1000won , 1 cat , and 1 dog in front of you"

【Read】 what's in front of you

by asking with specific keyword "읽어줘" , our pytesseract model will read the recognized text in front of you

Example

Out : "2020년 하반기 4차산업혁명 선도인력 양성 훈련 입과를 환영합니다 multicampus"

6. Details of my role

Define custom dataset

시각적으로 불편함을 겪는 사용자의 입장에서 자주 찾게되는 물체 20개를 선정.

수집이 불가한 항목은 직접 샘플 촬영을 진행하여 labelImg 오픈 소스를 통해 annotation 작업 수행.

수집 가능한 항목은 AI hub , roboflow.ai , google open image dataset 로부터 분할 수집을 진행.

수집 결과

Choosing proper framework

fine-tuning 진행에 앞서 yolo v3 모델 활용을 위한 각 YOLO-framework 비교 및 선정.

최종 선정은 AlexeyAB의 Darknet framework 를 로컬환경에 빌드하여 커스텀 데이터셋으로부터 전이학습을 진행.

Training result

Train environment

Windows 10

Visual studio 2017

CUDA 10.1

cuDNN 8.0.5

GeForce RTX 2080 SUPER (two)

Image size = 416 X 416

batch=64

subdivisions=32

iterations = 60200

learning rate = 0.0005 (using 2 GPU)

0.8 수준의 Avg loss 와 45% 의 최종 mAP 성능을 확인할 수 있었음

custom weights file download

Modularization

두 인식 모델 및 음성 입/출력 모듈을 별도로 작성하여 PyPi 에 배포 SeeSun

`Package architecture`

__init.py__
model_config
- model.cfg (모델 구조 파일)
- model.weights (모델 가중치 파일)
Detector.py
- seesunObjectDetector (class , object)
  - detect (method) : return type =string
- seesunTextDetector (class , object)
  - recognize(method) : return type = string
Speech.py
- seesunSpeech (class , object)
  - tts (method) : return type = None (audio played immediately)
  - stt (method) : return type = string

`Usage`

for object detection

from SeeSun.Detect import seesunObjectDetector
import cv2

detector = seesunObjectDetector()

my_img = cv2.imread('path/to/image/file')
detector.detect(my_img)

OUT : '현재 앞에는 XX는 3개 , OO은 6대 있습니다.'

for text detection

from SeeSun.Detect import seesunTextDetector
import cv2

detector = seesunTextDetector()

my_img = cv2.imread('path/to/image/file')
detector.recognize(my_img)

OUT : '코로나 3차 대유행으로 인한 지하철 운행시간 조정 안내 ... '

for speech recognition,synthesis

from SeeSun.Speech import seesunSpeech

speech = seesunSpeech()

speech.tts('input_string')
speech.stt('path/to/audio_file')

OUT : None

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
README.assets		README.assets
api_config		api_config
model_config		model_config
static		static
templates		templates
.gitignore		.gitignore
Detect.py		Detect.py
LICENSE		LICENSE
README.md		README.md
Speech.py		Speech.py
__init__.py		__init__.py
app.py		app.py
camera.py		camera.py
seesun_presentation.pdf		seesun_presentation.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Seesun (시선)

결과보고서(전체 내용) : Result report

Table of contents

1. Overview

개발 배경

초기 구상

최종 구현

시연 영상(simulation video)

2. Role

For details, details of my role

3. Skills & Process

Project skills

Development process

4. Service Architecture

5. Main Function

【Show】 what's in front of you

【Read】 what's in front of you

6. Details of my role

Define custom dataset

Choosing proper framework

Training result

Modularization

`Package architecture`

`Usage`

About

Releases

Packages

Contributors 2

Languages

License

ineed-coffee/seesun

Folders and files

Latest commit

History

Repository files navigation

Seesun (시선)

결과보고서(전체 내용) : Result report

Table of contents

1. Overview

개발 배경

초기 구상

최종 구현

시연 영상(simulation video)

2. Role

For details, details of my role

3. Skills & Process

Project skills

Development process

4. Service Architecture

5. Main Function

【Show】 what's in front of you

【Read】 what's in front of you

6. Details of my role

Define custom dataset

Choosing proper framework

Training result

Modularization

Package architecture

Usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`Package architecture`

`Usage`

Packages