![](https://www.grunge.com/img/gallery/ancient-languages-that-completely-disappeared/intro-1523990289.jpg)

The QLattice

The QLattice is a supervised machine learning tool for symbolic regression developed by Abzu . It is inspired by Richard Feynman's path integral formulation. That's why the python module to use it is called Feyn, and the Q in QLattice is for Quantum.

QLattices have been shown in research to perform very well and yield simple and explainable models: Symbolic regression outperforms other models for small data sets

Abzu provides free QLattices for non-commercial use to anyone. These free community QLattices gets allocated for us automatically if we use Feyn without an active subscription, as we will do in this notebook. Read more about how it works here: https://docs.abzu.ai/docs/guides/getting_started/community.html

The feyn Python module is not installed on Kaggle by default so we have to pip install it first.

Code	# of texts	Langage
SUX	53673	Sumerian

NEA	32966	Neo-Assyrian

STB	17817	Standard Babylonian

LTB	15947	Late Babylonian

NEB	9707	Neo-Babylonian

MPB	5508	Middle Babylonian Peripheral

OLB	3803	Old Babylonian

In [None]:
!pip install feyn

In [None]:
import pandas as pd
import numpy as np
df = pd.read_csv("../input/cuneiform-language-identification/train.csv")
df.head(10)

![](https://upload.wikimedia.org/wikipedia/commons/thumb/f/f9/Cuneiform_evolution_from_archaic_script.jpg/290px-Cuneiform_evolution_from_archaic_script.jpg)

In [None]:
df.tail()

In [None]:
df.lang.value_counts()

## We will be doing binary classification by setting STB (Standard Babylon) language as 1 and other language as 0

![](https://wallpaperaccess.com/full/900982.jpg)

In [None]:
df["lang"]=(df["lang"]=="STB").astype(int)
df.tail()

In [None]:
df.lang.value_counts()

## We are making separate columns for the first 5 characters of the language. Why 5? Because if it is greater than 5 then many columns will have NaN values

![](https://qph.fs.quoracdn.net/main-qimg-76a0015bdd44e4a356aefe4599dcdb3a.webp)

In [None]:
for i in range(0, 5):
    df[f"c{i}"] = df["cuneiform"].str[i]
df = df.drop(["cuneiform"],axis=1)
df.head()

![](http://babylonianempire8c.weebly.com/uploads/6/0/3/7/60372241/9340231.jpg?631)

In [None]:
import feyn
from sklearn.model_selection import train_test_split
train,test = train_test_split(df, test_size = 0.2, random_state = 42)

In [None]:
ql = feyn.connect_qlattice()
ql.reset(random_seed=42)

In [None]:
stypes={
    "c0": "c",
    "c1": "c",
    "c2": "c",
    "c3": "c",
    "c4": "c",
}

![](https://www.thegreatcoursesdaily.com/wp-content/uploads/2020/08/The-Story-of-Human-Language_Ancient-Hieroglyphs_QBS_Featured.jpg)

In [None]:
models = ql.auto_run(train, output_name="lang", kind="classification", criterion="aic", n_epochs=10, stypes=stypes)

![](https://www.holidify.com/images/cmsuploads/compressed/main-qimg-a150e96dc0ca704874a06024450df298-c_20180221161007.jpeg)

In [None]:
models[0].plot_roc_curve(train, label="train data")
models[0].plot_roc_curve(test, label="test data")

Accuracy of almost 80%

In [None]:
models[0].plot_confusion_matrix(test, threshold=.5)

This notebook is inspired from Casper Wilstrups's notebook on Qlattice.
https://www.kaggle.com/wilstrup/use-a-qlattice-to-detect-the-language/notebook

UPVOTE if you like it or fork it :)

![](https://www.ancient-origins.net/sites/default/files/field/image/Oldest-language.jpg)

# THANK YOU