The simplest case of anslysis can be done with the linear regression.

In [None]:
import cv2
import sklearn.linear_model
import matplotlib.pyplot as plt
import numpy as np

# Main idea

Let's work with letter `b-13`. It is clearly seen that is has right slant.

In [None]:
im = cv2.imread("../input/rukopys/glyphs/b-13.png")
plt.imshow(im)

This will create vectors of $x$ and $y$ coordinates of all black pixels in image.

In [None]:
glyph_x, glyph_y = [
    column.reshape(-1, 1) for column
    in np.where(np.sum(im, axis=2).transpose() == 0)
]

Let's create a linear regression to predict $x = a_1 + a_2 y$. Such equation creates a straight vertical line if $a_2 = 0$.

In [None]:
model = sklearn.linear_model.LinearRegression()
model.fit(glyph_y, glyph_x)

Let's extract model coefficients in order to draw a line.

In [None]:
a1, a2 = float(model.intercept_), float(model.coef_)
im_height, im_width, _ = im.shape

print(a1, a2)

The line will be drawn from the point $A$, where $y=0$, and therefore $x = a_1$ to the point $B$, where $y=\mathrm{height}_{\mathrm{image}}$, so $x = a_1 + a_2 \cdot \mathrm{height}_{\mathrm{image}}$.

In [None]:
line = cv2.line(
    im.copy(),
    # point A
    (int(a1), 0),
    # point B
    (int(a1 + a2 * im_height), im_height),
    color=(255, 0, 0)
)
plt.imshow(line)

$a_2$ can be used as a coefficient of slant $s$. Although, it does not lie in $[-1, 1]$ range, but we can transform it with the scaled [sigmoid function](https://en.wikipedia.org/wiki/Sigmoid_function) and multiply by $-1$:

$
s = - (2\sigma(a_2) - 1) =
\dfrac
    {-2}
    {1 + e^{-a_2}}
+1
$.

Here's the plot of transformation.

In [None]:
import math

def sigma(x):
    return -2 / (1 + np.exp(-x)) + 1

x = np.linspace(-10, 10, 100)
plt.plot(x, sigma(x))

# Zip it to the function

Let's create a function that can take a picture and returns its slant coefficient.

In [None]:
def get_slant(im: np.ndarray, verbose=False) -> float:

    if not isinstance(im, np.ndarray):
        raise TypeError("Image is empty")
        
    x, y = [
        column.reshape(-1, 1) for column
        in np.where(np.sum(im, axis=2).transpose() == 0)
    ]
    
    model = sklearn.linear_model.LinearRegression()
    model.fit(y, x)

    slant = sigma(model.coef_)
    
    if verbose:
        
        a1, a2 = float(model.intercept_), float(model.coef_)
        
        line = cv2.line(
            im.copy(),
            (int(a1), 0),
            (int(a1 + a2 * im_height), im_height),
            color=(255, 0, 0)
        )
        plt.imshow(line)
        
    return float(slant)

We can test it on some symbols.

In [None]:
get_slant(cv2.imread("../input/rukopys/glyphs/i-3.png"), verbose=True)

In [None]:
get_slant(cv2.imread("../input/rukopys/glyphs/a-14.png"), verbose=True)

In [None]:
get_slant(cv2.imread("../input/rukopys/glyphs/a-34.png"), verbose=True)

# Create labels

Let's use following classification:

* If $-0.1 \leq s \leq 0.1$, then slant is `straight`.
* If $s < -0.1$ - `right`.
* Otherwise the slant is `left`.

Such decision is just an example. This subject require deeper research.

In [None]:
def label_slant(slant: float) -> str:
    if -0.1 <= slant <= 0.1:
        return "straight"
    elif slant <= -0.1:
        return "right"
    else:
        return "left"

# Create new dataset

In [None]:
import pandas as pd

In [None]:
df = pd.read_csv("../input/rukopys/glyphs.csv")
df.head()

Let's map every filename with `get_slant` function.

In [None]:
slants = [
    get_slant(cv2.imread(
        "../input/rukopys/" + name
    ))
    for name
    in df["filename"]
]

slants[:5]

`slant_directions` now will contain labels.

In [None]:
slant_directions = list(map(label_slant, slants))

slant_directions[:5]

We can check how it works on the original dataset.

In [None]:
df.assign(
    slant=slants, slant_direction=slant_directions
).head()

Write the output.

In [None]:
pd.DataFrame({
    "filename": df["filename"],
    "slant": slants,
    "slant_direction": slant_directions   
}).to_csv("output.csv")

# Cons of such approach

This is suprisingly simple and well solution. Although, it fails on some letters, which have long tails or whatever.

In [None]:
get_slant(cv2.imread("../input/rukopys/glyphs/a-42.png"), verbose=True)

This letter is actually should be considered as straight-slanted, but we see right-slanted line. So can you do better?