# Sorani Kurdish data using mplcairo

Matplotlib's standard rendering backends do not support bidirectional text, nor text requiring complex rendering.

|Matplotlib (standard) |Matplotlib (mplcairo) |
|--------------------  |--------------------- |
|<img src="img/std_matplotlib_output.png" alt="Kurdish bar chart (standard)"  style="background-color: white; width: 50%"> |<img src="img/mplcairo_output.png" alt="Kurdish bar chart (mplcairo)"  style="background-color: white; width: 50%"> |

Two things can be observered by comparing the two bar charts:

1. Arabic script text is inverted, the bidirectional algorithm is not applied.
2. Complex rendering is not applied, so all Arabic characters are in their isolated forms, rather than having initial, medial or final forms applied as required.

There are a set of hacks that have been used in the past for Hebrew, Arabic and Persian. TO address the first problem, [python-bidi](https://github.com/MeirKriheli/python-bidi) provides a Python implementation of the bidirectional algorithm, which generates a visually ordered string rather than a logically ordered string so that RTL languages can display in packages or terminals that do not have bidirectional support.

The second problem for Arabic and Persian (and a few other languages) is addressed by another hack: [arabic-reshaper](https://github.com/mpcabd/python-arabic-reshaper/). The Arabic reshaper leverages of the `python-bidi` to reorder strings visually, then the isolated character glyphs are replaced by Unicode presentational forms of the initial, medial and final glyphs of each character. The presentational forms are not intended to be used in production, they were added for backwards compatibility with older standards. Additionally, not all arabic characters are represented by presentational forms in Unicode, and most RTL scripts do not have presentational forms. It has limited langauge support.

Sorani Kurdish is one of the Arabic script languages that can not be supported by hacks using `python-bidi` and `arabic-reshaper`.

The text rendering issues in Matplotlib will also affect other Middle-eastern scripts, African scripts, South Asian and South East Asian scripts. It will also affect Latin and Cyrillic script text where variant glyphs, stylistic sets or localised features are needed for correct rendering. So existing hacks are not suitable.

Matplotlib can use alternative backends, one of the newer backends that needs to be installed seperately is [mplcairo](https://github.com/matplotlib/mplcairo). `mplcairo` provides an alternative `cairo` backend for matplotlib, which can make use of [Raqm](https://github.com/HOST-Oman/libraqm) for complex text layout.

In the following code, we will use `mplcairo` to create a simple bar chart containing Sorani Kurdish labels.

## Setup

In [7]:
import pandas as pd
import locale, platform
import mplcairo
import matplotlib as mpl
if platform.system() == "Darwin":
    mpl.use("module://mplcairo.macosx")
else:
    mpl.use("module://mplcairo.qt")
import matplotlib.pyplot as plt
import unicodedata as ud, regex as re

## Helper functions

In [8]:
def convert_digits(s, sep = (",", ".")):
    nd = re.compile(r'^-?\p{Nd}[,.\u066B\u066C\u0020\u2009\p{Nd}]*$')
    tsep, dsep = sep
    if nd.match(s):
        s = s.replace(tsep, "")
        s = ''.join([str(ud.decimal(c, c)) for c in s])
        if dsep in s:
            return float(s.replace(dsep, ".")) if dsep != "." else float(s)
        return int(s)
    return s

seps = ("\u066C", "\u066B")
digitsconv = lambda x: convert_digits(x.replace("-", "٠"), sep = seps)

## Process data and plot data

In [9]:
import pandas as pd
conv = {
    'سووریا': digitsconv,
    'عێراق': digitsconv,
    'ئێران': digitsconv,
    'تورکیا': digitsconv,
    'جیھانی': digitsconv
}
df = pd.read_table("../data/demographics.tsv", converters=conv)
df

Unnamed: 0,---,جیھانی,تورکیا,ئێران,عێراق,سووریا
0,باشوور,3381000,0,3381000,0,0
1,ھەورامی,54000,0,26000,28000,0
2,سۆرانی,1576000,0,502000,567000,0
3,کۆیگشتی,26712000,15016000,4398000,3916000,1661000
4,زازایی-ئەلڤێکا,184000,179000,0,0,0
5,زازایی-دەملی,1125000,1125000,0,0,0
6,شکاکی,49000,23000,26000,0,0
7,ڕەوەند,90000,38000,20000,33000,0
8,ئەوانەیبەتورکیدەدوێن,5732000,5732000,0,0,0
9,کرمانجی,14419000,7919000,443000,3185000,1661000


In [10]:
col_list=["تورکیا" ,"ئێران" ,"عێراق" ,"سووریا"]

total_df = df[col_list].sum(axis=0)
print(total_df)

تورکیا    30032000
ئێران      8796000
عێراق      7729000
سووریا     3322000
dtype: int64


`total_df` is a Pandas series, when using Matplotlib, we can either use the series indices and values for we can convert each to lists and use the lists.

In [11]:
y = list(total_df.values)
x = list(total_df.index)
print(x, y)

['تورکیا', 'ئێران', 'عێراق', 'سووریا'] [30032000, 8796000, 7729000, 3322000]


Using indicies and values:

In [12]:
plt.rcParams.update({'font.family':'Vazirmatn'})
fig, ax = plt.subplots()
ax.bar(total_df.index, total_df.values, color='royalblue', alpha=0.7)
ax.grid(color='#95a5a6', linestyle='--', linewidth=2, axis='y', alpha=0.7)
ax.set_xlabel("ناوچە", size=14)
ax.set_ylabel(" ڕێژەی دانیشتووان", size=14)
ax.set_title('ڕێژەی دانیشتووانی کورد', size=18)
#fig.subplots_adjust(left=0.20)
plt.show()

This will generate the following plot:

![Kurdish bar chart example using mplcairo](img/mplcairo_output.png)

## Resources

Data taken from [Sorani Kurkish wikipedia](https://ckb.wikipedia.org/wiki/%D8%AF%D8%A7%D9%86%DB%8C%D8%B4%D8%AA%D9%88%D9%88%D8%A7%D9%86%DB%8C_%DA%A9%D9%88%D8%B1%D8%AF#%DA%A9%D9%88%D8%B1%D8%AF%D8%B3%D8%AA%D8%A7%D9%86)

Refer to:

* [Eastern Arabic numerals](https://en.wikipedia.org/wiki/Eastern_Arabic_numerals)
* [mplcairo](https://github.com/matplotlib/mplcairo)
* [Raqm](https://github.com/HOST-Oman/libraqm)