# EDA on OBJ file

In this notebook, I attempt to conduct some EDA on an OBJ file. OBJ files are similar to CSV, and contain vertex, edge and face information about a 3D object (as well as certain other properties). Let's load one up and see what we can do with it. 

In [2]:
import pandas as pd
import numpy as np
import altair as alt

Before loading the OBJ file to a pandas dataframe, lets briefly explore it's contents.

The OBJ file looks similar to a basic CSV, except our delimiter is a `space`. I see 4 main categories in our first column of data- `v`, `vn`, `vt`, and `f`. These stand for `vertex`, `vertex-normal`, `vertex-texture` and `face` respectively

    `v`: Vertex, the individual data points.
    `vn`: Vertex-Normal, the normal vector to a vertex, used for lighting and refelctions.
    `vt`: Vertex-Texture, contains information about the texture map at the point in the object.
    `f`: Face, collection of 3-4 verticecs, along with information about the normal and texture.

Since our data seems to consist of 4 distinct datasets, it might be best to split these out.

## Loading the Data

Load our OBJ data file.

In [3]:
obj = pd.read_csv("../data/human-foot-in-blender.obj", delimiter=' ', names=['1','2','3','4','5'], skiprows=3, header=None)
obj.head()

Unnamed: 0,1,2,3,4,5
0,v,0.059608,0.383419,-0.047925,
1,v,-0.030599,-0.01601,-0.020009,
2,v,0.052395,0.402345,0.102574,
3,v,0.049393,-0.015811,-0.022423,
4,v,-0.061567,0.421855,-0.066727,


Here I had to manually choose 5 columns to be read into our dataframe. This corresponds to the faces, which each have 4 data points.

Let's split our data into vertices, vertex-normals, vertex-textures, and faces.

In [4]:
vertices = obj[ obj['1'] == 'v' ]
vertex_textures = obj[ obj['1'] == 'vt' ]
vertex_normals = obj[ obj['1'] == 'vn' ]
faces = obj[ obj['1'] == 'f' ]

Our `vertices` dataframe contains some unnecessary columns. Let's drop those and also rename to X, Y and Z.

In [None]:
col_rename = {
    "2":"X",
    "3":"Z",
    "4":"Y"
}

vertices = vertices.drop(columns=['1','5'])
vertices = vertices.rename(columns = col_rename)

vertices.head()

Unnamed: 0,2,3,4
0,0.059608,0.383419,-0.047925
1,-0.030599,-0.01601,-0.020009
2,0.052395,0.402345,0.102574
3,0.049393,-0.015811,-0.022423
4,-0.061567,0.421855,-0.066727


In [6]:
vertex_textures = vertex_textures.drop(columns=['5', '4'])
vertex_textures.head()

Unnamed: 0,1,2,3
1600,vt,0.625,0.627438
1601,vt,0.625,0.632199
1602,vt,0.625,0.583333
1603,vt,0.625,0.583347
1604,vt,0.625,0.5


In [7]:
vertex_normals = vertex_normals.drop(columns=['5'])
vertex_normals.head()

Unnamed: 0,1,2,3,4
800,vn,0.8359,0.0076,-0.5488
801,vn,-0.057,-0.9954,0.077
802,vn,0.6646,0.1227,0.737
803,vn,0.1375,-0.9905,0.005
804,vn,-0.7827,0.0011,-0.6224


In [8]:
faces.head()

Unnamed: 0,1,2,3,4,5
1723,f,716/1/716,32/2/32,34/3/34,713/4/713
1724,f,87/5/87,17/6/17,18/7/18,84/8/84
1725,f,60/9/60,796/9/796,797/10/797,68/11/68
1726,f,65/12/65,786/12/786,789/13/789,64/14/64
1727,f,84/8/84,18/7/18,13/15/13,89/16/89


Let's make some charts with the data. To start with, let's make the same point chart as we did [here](/vertex_coord_analysis.ipynb).

In [9]:
alt.data_transformers.enable('vegafusion')

DataTransformerRegistry.enable('vegafusion')

In [10]:
vertices = vertices.astype('double')

xy_point = alt.Chart(vertices).mark_point().encode(
    x = alt.X('2').scale(domain=[-0.4,0.4]),
    y= alt.Y('4').scale(domain=[-0.3,0.8])
).properties(
    width = 250,
    height = 350
)

# Create the point chart between X and Z
xz_point = alt.Chart(vertices).mark_point().encode(
    x = alt.X('2').scale(domain=[-0.4,0.4]),
    y= alt.Y('3').scale(domain=[-0.1,0.8])
).properties(
    width = 250,
    height = 300
)

# Create the point chart between Y and Z
yz_point = alt.Chart(vertices).mark_point().encode(
    x = alt.X('4').scale(domain=[-0.3,0.8]),
    y= alt.Y('3').scale(domain=[-0.1,0.8])
).properties(
    width = 400,
    height = 300
)

In [11]:
vertex_point = (xz_point | yz_point) & xy_point
vertex_point


Hint: Instead of e.g. `is_pandas_dataframe(df)`, did you mean `is_pandas_dataframe(df.to_native())`?
  return _is_pandas_dataframe(obj) or isinstance(

Hint: Instead of e.g. `is_pandas_dataframe(df)`, did you mean `is_pandas_dataframe(df.to_native())`?
  return _is_pandas_dataframe(obj) or isinstance(

Hint: Instead of e.g. `is_pandas_dataframe(df)`, did you mean `is_pandas_dataframe(df.to_native())`?
  return _is_pandas_dataframe(obj) or isinstance(

Hint: Instead of e.g. `is_pandas_dataframe(df)`, did you mean `is_pandas_dataframe(df.to_native())`?
  return _is_pandas_dataframe(obj) or isinstance(

Hint: Instead of e.g. `is_pandas_dataframe(df)`, did you mean `is_pandas_dataframe(df.to_native())`?
  return _is_pandas_dataframe(obj) or isinstance(

Hint: Instead of e.g. `is_pandas_dataframe(df)`, did you mean `is_pandas_dataframe(df.to_native())`?
  return _is_pandas_dataframe(obj) or isinstance(


We see our familiar foot shape.

One thing I'm interested in seeing are the faces. Faces are defined as a group of 3-4 vertices in the model, connected by edges. These exist is the `faces` dataframe we just created.

In [12]:
faces.head()

Unnamed: 0,1,2,3,4,5
1723,f,716/1/716,32/2/32,34/3/34,713/4/713
1724,f,87/5/87,17/6/17,18/7/18,84/8/84
1725,f,60/9/60,796/9/796,797/10/797,68/11/68
1726,f,65/12/65,786/12/786,789/13/789,64/14/64
1727,f,84/8/84,18/7/18,13/15/13,89/16/89


We see that faces are saved as polygons here with 4 sides (corresponding to the 4 columns). Each column contains the index of a vertex, along with the index of its `normal` vector and `texture`. In row 1, we observe the following.

In [13]:
faces.iloc[0]

1            f
2    716/1/716
3      32/2/32
4      34/3/34
5    713/4/713
Name: 1723, dtype: object

This face is defined by the vertices 716, 32, 34, 713. It does look like we need to reset the index for both our `vertex_textures` and `vertex_normals`, since in the dataframe they are being refered to their aboslute index (716/1/716). Let's do that now.

In [14]:
vertex_normals = vertex_normals.reset_index()
vertex_textures = vertex_textures.reset_index()
vertex_normals.head()

Unnamed: 0,index,1,2,3,4
0,800,vn,0.8359,0.0076,-0.5488
1,801,vn,-0.057,-0.9954,0.077
2,802,vn,0.6646,0.1227,0.737
3,803,vn,0.1375,-0.9905,0.005
4,804,vn,-0.7827,0.0011,-0.6224


Good. Now we can always reference the index number from our `face` object to any one of the indices. Let's do some EDA on the faces.

## Basic EDA on Faces

We isolate the faces' dataset to do some EDA on it.

In [15]:
faces.head()

Unnamed: 0,1,2,3,4,5
1723,f,716/1/716,32/2/32,34/3/34,713/4/713
1724,f,87/5/87,17/6/17,18/7/18,84/8/84
1725,f,60/9/60,796/9/796,797/10/797,68/11/68
1726,f,65/12/65,786/12/786,789/13/789,64/14/64
1727,f,84/8/84,18/7/18,13/15/13,89/16/89


Let's see how many faces we have in total

In [18]:
len(faces)

805

Confirm that we don't have any unnecessary footer information.

In [19]:
faces.tail()

Unnamed: 0,1,2,3,4,5
2523,f,798/34/798,94/33/94,111/33/111,637/9/637
2524,f,799/9/799,796/9/796,60/9/60,107/9/107
2525,f,787/118/787,700/17/700,96/14/96,45/14/45
2526,f,648/59/648,800/75/800,96/75/96,
2527,f,800/75/800,652/59/652,96/75/96,


It looks like we have some missing values though. This likely means we're dealing with a mesh that contains both traingles and polygons. For peace of mind, lets see how many triangles and polygons we have respectively in the model.

In [17]:
faces.info()

<class 'pandas.core.frame.DataFrame'>
Index: 805 entries, 1723 to 2527
Data columns (total 5 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   1       805 non-null    object
 1   2       805 non-null    object
 2   3       805 non-null    object
 3   4       805 non-null    object
 4   5       781 non-null    object
dtypes: object(5)
memory usage: 37.7+ KB


The majority of objects appear to be polygons, and only 24 faces appear to be triangles. This  leads us to a conundrum of whether or not we convert all faces to triangles or polygons. Converting to traingles would likely increase our dataframe size considerably. However, converting from polygon to triangle seems like the most optimal solution, since going the other way would likely entail creating new vertices, and that's not something I'd like to attempt right now.

Some considerations-->

My first thought for achieving this task is to take each row of our `faces` dataframe and split out the coordinates into groups of 3. This would mean, one polygon gets split into two triangles. However, the exact coordinates we choose will play a part in determining if we successfully have converted our mesh into triangles or not. 

We could choose three points at random in our list of 4 to make a face. However, the polygon will only be properly split into two triangles if we slice along two diagonally opposite points. The figure below shows this.

![face splitting needs to be done accurately](../img/face_splitting.png){width=80%}

In `Figure 1`, we choose vertices 2 and 4 to slice our polygon on. This would work well, as we now have two triangles in place of the one polygon. However, choosing at random make us run into the risk of choosing two adjacent points. In `Figure 2`, we chose points 3 and 4 as our cutting plane, and this has resulted in our triangles intersecting each other. We also have a new vertex created at the center of the intersection, and 4 new triangles instead of the two.

We need to be deliberate with which vertices we choose to split on.

Systematically speaking, we could iterate through each vertex in our face and calculate the distance between each other vertex. In the end, the longest distance will correspond to our diagonal line. We don't even need to perform this calculation for every point, 3 points out of our 4 will be enough to guarantee that we have found the diagonal.

Here I define a function `edge_len()` that will return the index of the two vertices which correspond to our diagonal.

In [None]:
def edge_len(f: tuple):
    if len(f) != 4:
        pass
    
    i, j, k, z  = f

    X_vec = np.array(
        [vertices.loc[i][2], 
         vertices.loc[j][2],
         vertices.loc[k][2]
         ])

    Y_vec = np.array(
        [vertices.loc[i][3], 
         vertices.loc[j][3],
         vertices.loc[k][3]
         ])
    
    Z_vec = np.array(
        [vertices.loc[i][4], 
         vertices.loc[j][4],
         vertices.loc[k][4]
         ])

In [24]:
vertices.loc[12][2]

  vertices.loc[12][2]


np.float64(-0.111872)