Max Bakalos
There are two versions of the program:
- Artificial_DOF_Webcam.ipynb ← livestreams from a webcam
- Artificial_DOF_VideoRead.ipynb ← reads a pre-recorded video file
I implemented my code in a Python Jupyter Notebook and used OpenCV to stream video frames from a webcam. To detect objects in the frames, the pretrained YOLOv26 model was used (PyTorch). The object in question was a tennis ball that could be classified as a “sports ball” by the model. It was quite effective at finding the tennis ball; however, it struggled at larger relative object sizes in the frame (closer to the camera). To get a depth image from the monocular setup, the DepthAnythingV2 library was used. Once a depth map was acquired, the region of interest (ROI) from the object detector was cut out of the corresponding location in the depth image. Since the tennis ball is round, the corners of the square ROI bounding box held inaccurate depth information, so only a central fractional sub-ROI was taken. This sub-region’s pixel depth values were then averaged to find the approximate depth of the front center of the ball. The final step involved artificially blurring the frame. To do this, the frame must be segmented into four depth regions. A depth error is calculated between the object and the rest of the frame, and this is segmented into 4 groups based on thresholds. These 4 masked binary groups are then blurred to smooth the transitions. The 1st mask has no blur, 2nd has a bit of blur, the 3rd has medium blur, and the 4th has heavy blur. Each image is multiplied by the mask and then combined into the final output image.
Overall, the project worked well, but it could be improved in many ways. First, the depth blurring could be thresholded into more segments to improve the smoothness, and maybe use a more physics-based blurring approach. Also, of course, I would like to implement this with another real camera and spin the focus ring to keep an object at a constant depth. I plan to continue this project in the future.
Use an existing object detection program (YOLO) to track some moving object in a wide DOF camera, estimate the depth of the object, then adjust the focal length of another closely spaced shallow DOF camera to bring the object into focus using a camera control software. A comparison of human focusing and machine focusing using sharpness index (confusion matrix), video of camera automatically focusing for specific object [1] Chou, Jean-Peic. “Synthetic Depth-Aware Defocus and Controllable Bokeh for Stylized Photography.” CS231n: Deep Learning for Computer Vision, Stanford University, https://cs231n.stanford.edu/reports/2022/pdfs/90.pdf.[2] “Object Detection.” Ultralytics YOLO Docs, Ultralytics, 20 Apr. 2026, https://docs.ultralytics.com/tasks/detect/.
[3] Yang, Lihe, et al. Depth Anything, CVPR, 2024, https://depth-anything.github.io/.
