In [1]:
"""Global-Local Path Networks for Monocular Depth Estimation with Vertical CutDepth

Credit/author:Mattia Gatti
https://towardsdatascience.com/generate-a-3d-mesh-from-an-image-with-python-12210c73e5cc
pip install transformers"""

import matplotlib
matplotlib.use('TkAgg')
from matplotlib import pyplot as plt
from PIL import Image
import torch
from transformers import GLPNImageProcessor, GLPNForDepthEstimation

In [2]:
feature_extractor = GLPNImageProcessor.from_pretrained("vinvino02/glpn-nyu")
model = GLPNForDepthEstimation.from_pretrained("vinvino02/glpn-nyu")

In [3]:
"""Why The Sigmoid Function Is Important In Neural Networks?

The Sigmoid As A Squashing Function
The sigmoid function is also called a squashing function as its domain is the set of all real numbers,
and its range is (0, 1). Hence, if the input to the function is either a very large negative number
or a very large positive number,  the output is always between 0 and 1.

If we use a linear activation function in a neural network, then this model can only learn linearly separable problems. 
However, with the addition of just one hidden layer and a sigmoid activation function in the hidden layer, the neural
network can easily learn a non-linearly separable problem. Using a non-linear function produces non-linear boundaries
and hence, the sigmoid function can be used in neural networks for learning complex decision functions.

The only non-linear function that can be used as an activation function in a neural network is one which is monotonically
increasing. So for example, sin(x) or cos(x) cannot be used as activation functions.
Also, the activation function should be defined everywhere and should be continuous everywhere in the space
of real numbers. The function is also required to be differentiable over the entire space of real numbers."""

'Why The Sigmoid Function Is Important In Neural Networks?\n\nThe Sigmoid As A Squashing Function\nThe sigmoid function is also called a squashing function as its domain is the set of all real numbers,\nand its range is (0, 1). Hence, if the input to the function is either a very large negative number\nor a very large positive number,  the output is always between 0 and 1.\n\nIf we use a linear activation function in a neural network, then this model can only learn linearly separable problems. \nHowever, with the addition of just one hidden layer and a sigmoid activation function in the hidden layer, the neural\nnetwork can easily learn a non-linearly separable problem. Using a non-linear function produces non-linear boundaries\nand hence, the sigmoid function can be used in neural networks for learning complex decision functions.\n\nThe only non-linear function that can be used as an activation function in a neural network is one which is monotonically\nincreasing. So for example, sin

In [4]:
"""https://huggingface.co/docs/transformers/model_doc/glpn

The monocular depth estimation model chosen for this guide is GLPN⁴. 
It is available on the Hugging Face Model Hub. Models can be retrieved from
this hub by using the Hugging Face library Transformers."""

# load and resize the input image
image = Image.open("C:/Users/seeho/assets/raw/m2.jpg")
new_height = 480 if image.height > 480 else image.height
new_height -= (new_height % 32)
new_width = int(new_height * image.width / image.height)
diff = new_width % 32
new_width = new_width - diff if diff < 16 else new_width + 32 - diff
new_size = (new_width, new_height)
image = image.resize(new_size)

# prepare image for the model
inputs = feature_extractor(images=image, return_tensors="pt")

# get the prediction from the model
with torch.no_grad():
    outputs = model(**inputs)
    predicted_depth = outputs.predicted_depth

# remove borders
pad = 16
output = predicted_depth.squeeze().cpu().numpy() * 1000.0
output = output[pad:-pad, pad:-pad]
image = image.crop((pad, pad, image.width - pad, image.height - pad))

# visualize the prediction
fig, ax = plt.subplots(1, 2)
ax[0].imshow(image)
ax[0].tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
ax[1].imshow(output, cmap='viridis')
ax[1].tick_params(left=False, bottom=False, labelleft=False, labelbottom=False)
plt.tight_layout()
plt.pause(5)

In [5]:
""" -Alpha shape¹, 
    -Ball pivoting², 
    -and Poisson surface reconstruction³

These methods are known as surface reconstruction algorithms."""

' -Alpha shape¹, \n    -Ball pivoting², \n    -and Poisson surface reconstruction³\n\nThese methods are known as surface reconstruction algorithms.'

In [6]:
"""Point cloud.
   The following code converts the estimated depth map to an Open3D point cloud object.
   http://www.open3d.org/docs/release/getting_started.html"""
import numpy as np
import open3d as o3d

width, height = image.size

depth_image = (output * 255 / np.max(output)).astype('uint8')
image = np.array(image)

# create rgbd image
depth_o3d = o3d.geometry.Image(depth_image)
image_o3d = o3d.geometry.Image(image)
rgbd_image = o3d.geometry.RGBDImage.create_from_color_and_depth(image_o3d, depth_o3d, convert_rgb_to_intensity=False)

# camera settings
camera_intrinsic = o3d.camera.PinholeCameraIntrinsic()
camera_intrinsic.set_intrinsics(width, height, 500, 500, width/2, height/2)

"""Create a point cloud for that merger of RGB + DI"""
pcd = o3d.geometry.PointCloud.create_from_rgbd_image(rgbd_image, camera_intrinsic)
"""Debugging for the point cloud distance"""
pcd2 = pcd

Jupyter environment detected. Enabling Open3D WebVisualizer.
[Open3D INFO] WebRTC GUI backend enabled.
[Open3D INFO] WebRTCWindowSystem: HTTP handshake server disabled.


In [7]:
"""An RGBD Image is simply a combination of an RGB image and its corresponding depth image. 
PinholeCameraIntrinsic class stores what is known as the intrinsic camera matrix.
Through this matrix, Open3D can create a point cloud from an RGBD image with the correct
spacing between the points. Keep the intrinsic parameters as they are.

To visualize the point cloud use below call"""
o3d_pointer=o3d.visualization.draw_geometries([pcd])

In [8]:
"""Don't run it if you intend to go to the mesh section below or kill it 1st.Need to unblock resorces 1st"""
#o3d_pointer.destroy.window()

"Don't run it if you intend to go to the mesh section below or kill it 1st.Need to unblock resorces 1st"

In [9]:
"""Mesh Generator Unit.
Among the various methods available in the literature for this task, the prudent approach would be so called 
Poisson surface reconstruction algorithm. This method has been chosen because it’s the one that usually gives better
and smoother results.
This code generates the mesh from the point cloud obtained in the last step above

statistical_outlier_removal removes points that are further away from their neighbors compared to the average
for the point cloud. 
It takes two input parameters:
-nb_neighbors allows to specify how many neighbors are taken into account in order to calculate
the average distance for a given point.

-std_ratio allows to set the threshold level based on the standard deviation of the average distances
across the point cloud. The lower this number the more aggressive the filter will be.
"""
# outliers removal
cl, ind = pcd.remove_statistical_outlier(nb_neighbors=20, std_ratio=20.0)
pcd = pcd.select_by_index(ind)

# estimate normals
pcd.estimate_normals()
pcd.orient_normals_to_align_with_direction()

"""Surface reconstruction
Finally, the algorithm is executed. depth value defines the detail level of the mesh. 
A higher depth value aside from increasing mesh quality increases also the output dimensions."""
mesh = o3d.geometry.TriangleMesh.create_from_point_cloud_poisson(pcd, depth=10, n_threads=1)[0]

# rotate the mesh
rotation = mesh.get_rotation_matrix_from_xyz((np.pi, 0, 0))
mesh.rotate(rotation, center=(0, 0, 0))

"""Save the mesh
Opening an OBJ file with a text editor
OBJ file open in Microsoft Visual Studio Code
OBJ file open in Microsoft Visual Studio Code
Since OBJ files are saved in plain text, you can also open them with a text editor, such as Microsoft Notepad (Windows)
or Apple TextEdit (Mac), or a source code editor. You may need to rename the .obj file extension to .txt
for the text editor to recognize it.

When opened in a text editor or source code editor, you can modify the properties of the OBJ file. 
Remember that if you incorrectly edit the file, you may inadvertently corrupt the model."""

o3d.io.write_triangle_mesh(f'./mesh.obj', mesh)

# visualize the mesh
o3d.visualization.draw_geometries([mesh], mesh_show_back_face=True)

In [None]:
#SHIFT Q isn't the correct way!

In [None]:
"""DON'T EVENT THINK OF RUNNING IT HERE!!!


For future research...Intel RealView cam pipeline
"""
pipeline1 = rs.pipeline()
rs_config.enable_device(connect_device[0])
pipeline_profile1 = pipeline1.start(rs_config)

intr1 = pipeline_profile1.get_stream(rs.stream.color).as_video_stream_profile().get_intrinsics()
pinhole_camera1_intrinsic = o3d.camera.PinholeCameraIntrinsic(intr1.width, intr1.height, intr1.fx, intr1.fy, intr1.ppx, intr1.ppy)
cam1 = rgbdTools.Camera(intr1.fx, intr1.fy, intr1.ppx, intr1.ppy)
# print('cam1 intrinsics:')
# print(intr1.width, intr1.height, intr1.fx, intr1.fy, intr1.ppx, intr1.ppy)

pipeline2 = rs.pipeline()
rs_config.enable_device(connect_device[1])
pipeline_profile2 = pipeline2.start(rs_config)

intr2 = pipeline_profile2.get_stream(rs.stream.color).as_video_stream_profile().get_intrinsics()
pinhole_camera2_intrinsic = o3d.camera.PinholeCameraIntrinsic(intr2.width, intr2.height, intr2.fx, intr2.fy, intr2.ppx, intr2.ppy)
cam2 = rgbdTools.Camera(intr2.fx, intr2.fy, intr2.ppx, intr2.ppy)
# print('cam2 intrinsics:')
# print(intr2.width, intr2.height, intr2.fx, intr2.fy, intr2.ppx, intr2.ppy)

print('Calculating Transformation Matrix:')
cam1_point = []
cam2_point = []
for view in range(chessBoard_num):
    cam1_rgb = cv2.imread('./output/cam1_color_'+str(view)+'.png')
    cam1_rgb_array = np.asanyarray(cam1_rgb)
    cam1_depth = cv2.imread('./output/cam1_depth_'+str(view)+'.png',-1)
    cam1_depth_array = np.asanyarray(cam1_depth)
    cam2_rgb = cv2.imread('./output/cam2_color_'+str(view)+'.png')
    cam2_rgb_array = np.asanyarray(cam2_rgb)
    cam2_depth = cv2.imread('./output/cam2_depth_'+str(view)+'.png',-1)
    cam2_depth_array = np.asanyarray(cam2_depth)