Skip to content

maxbak753/IMGS621-Computer-Vision-Project

Repository files navigation

Automatic Synthetic Object Focusing with Shallow Depth of Field Lens

IMGS-621 Computer Vision
Max Bakalos

PDF Version of ReadMe

There are two versions of the program:

  • Artificial_DOF_Webcam.ipynb   ← livestreams from a webcam
  • Artificial_DOF_VideoRead.ipynb  ← reads a pre-recorded video file

Artificial Shallow Depth of Field

Overview

I proposed a method that would use an existing object detection program (YOLO) to track some moving object in a wide depth of field (DOF) camera, estimate the depth of the object, then adjust the focal length of another closely spaced shallow DOF camera to bring the object into focus using a camera control software. However, I ran out of time to complete the camera-control aspect of the project, which is mostly out of the scope of the computer vision class anyway. As an adjustment to the original project idea, I instead implemented an artificial shallow depth of field effect, similar to “portrait mode” in modern smartphones.

Methods

Three separate problems fit together in a chain of events in order to blur everything besides a certain distance from the camera: object detection, depth estimation, and selective artificial blurring. For the unused 2nd camera approach, calculating the angle of the focus ring on the camera lens would replace selective artificial blurring.

Implementation

Fig 1. Diagram of synthetic shallow depth of field implementation

I implemented my code in a Python Jupyter Notebook and used OpenCV to stream video frames from a webcam. To detect objects in the frames, the pretrained YOLOv26 model was used (PyTorch). The object in question was a tennis ball that could be classified as a “sports ball” by the model. It was quite effective at finding the tennis ball; however, it struggled at larger relative object sizes in the frame (closer to the camera). To get a depth image from the monocular setup, the DepthAnythingV2 library was used. Once a depth map was acquired, the region of interest (ROI) from the object detector was cut out of the corresponding location in the depth image. Since the tennis ball is round, the corners of the square ROI bounding box held inaccurate depth information, so only a central fractional sub-ROI was taken. This sub-region’s pixel depth values were then averaged to find the approximate depth of the front center of the ball. The final step involved artificially blurring the frame. To do this, the frame must be segmented into four depth regions. A depth error is calculated between the object and the rest of the frame, and this is segmented into 4 groups based on thresholds. These 4 masked binary groups are then blurred to smooth the transitions. The 1st mask has no blur, 2nd has a bit of blur, the 3rd has medium blur, and the 4th has heavy blur. Each image is multiplied by the mask and then combined into the final output image.

Conclusion

Fig 2. (top left) image with bounding box, (top right) depth image, (bottom left) depth masks visualization, (bottom right) final artificially blurred image

Overall, the project worked well, but it could be improved in many ways. First, the depth blurring could be thresholded into more segments to improve the smoothness, and maybe use a more physics-based blurring approach. Also, of course, I would like to implement this with another real camera and spin the focus ring to keep an object at a constant depth. I plan to continue this project in the future.

Part II

Core

Use an existing object detection program (YOLO) to track some moving object in a wide DOF camera, estimate the depth of the object, then adjust the focal length of another closely spaced shallow DOF camera to bring the object into focus using a camera control software.

Deliverable

A comparison of human focusing and machine focusing using sharpness index (confusion matrix), video of camera automatically focusing for specific object

References

[1] Chou, Jean-Peic. “Synthetic Depth-Aware Defocus and Controllable Bokeh for Stylized Photography.” CS231n: Deep Learning for Computer Vision, Stanford University, https://cs231n.stanford.edu/reports/2022/pdfs/90.pdf.
[2] “Object Detection.” Ultralytics YOLO Docs, Ultralytics, 20 Apr. 2026, https://docs.ultralytics.com/tasks/detect/.
[3] Yang, Lihe, et al. Depth Anything, CVPR, 2024, https://depth-anything.github.io/.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors