In this Tutorial, I am going to experiment with the `Table` class under the `Datascience` module for some exploratory analysis of the Quran. Let us get started.

First, I used a [Quran](https://www.kaggle.com/hammaadali/quran-clean-without-araab) database already available under Kaggle and brought it under my folder. 

below is the standard Kaggle prefix.

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

Next, I have to install the `datascience` module using `pip`. Full documentation of this module can be found [here](http://data8.org/datascience/).

In [None]:
!pip install datascience

In [None]:
from datascience import *

Here is the first step, reading the table using `read_table` method.

In [None]:
q = Table.read_table('/kaggle/input/quran-clean-without-araab/Quran-clean-without-aarab.csv')

In [None]:
q

With `Table` we can show certain number of rows from top, for example

In [None]:
q.show(3)

Similarly we can selec certain columns only

In [None]:
q.select('Ayah')

How many rows are there in this quranic table. In other words, how many total Ayaat are there in the Quran?

In [None]:
q.num_rows

let us see the last row

In [None]:
q.row(q.num_rows-1)

## Counting Words

Let us see how you can `apply` a function to each row of the Table.

Here is a function that given a sentence will return the number of words in that sentence.

In [None]:
def count_words(item):
    return len(item.split())

Thus, we can `apply` a function on a table as follows

In [None]:
q.apply(count_words, 'Ayah')

given this we can write the following handy code,and assign the result to a new table called `qwc` meaning quran with word counts.

In [None]:
qwc = q.with_columns("words", q.apply(count_words, 'Ayah'))
qwc

Let me rename the first coulumn from unnamed to SrNo

In [None]:
qwc = qwc.relabeled(0,'SrNo')
qwc

In [None]:
qwc.row(qwc.num_rows-1)

## Find English translation

Notice that the datasource we got does not have English translation. But kaggle has other [datasources](https://www.kaggle.com/mohamedwaelbishr2018/qurancsv) that has English translations. Sounds like a good use case for `join`ing two tables. Let us do that.

In [None]:
en = Table.read_table('/kaggle/input/qurancsv/Quran.csv')
en

As you can see, we have the English translation under a column called "EnglishTranslation", here is how to relabel the filed (which is the fourth one if you count from zero) to a shorter column name.

In [None]:
en = en.relabeled(4, 'en')
en

In [None]:
en = en.select(['SrNo', 'en'])
en

I have noticed a problem while checking the 'SrNo' of both datasets. The first dataset `qwc` has SrNo starting from Zero, while the second `en` starts from 1. So, let us create a new column `sr` in the second dataset so we have similar ID value to join both tables. 

In [None]:
en = en.with_columns('sr', en.column('SrNo')-1).drop('SrNo')
en

In [None]:
en.row(en.num_rows-1)

Everything seems in place. So, let us join. It works as follows:
Join the first table `qwc` using the `SrNo` as the ID with the corresponding table `en` which has the same ID under column `sr`. This will append all remaining columns of the second table `en` to the new table which I chose to name it as `quran`.

In [None]:
quran = qwc.join('SrNo', en, 'sr')
quran

for convinience let us relable further some long fileds.

In [None]:
quran = quran.relabeled(1, 'sno').relabeled(2,'vno')
quran

## Display a verse

So, let us see how to find verse 2:255 in the quran?

In [None]:
quran.where('sno', 2).where('vno',255)

another example: find me first verses of all surahs.

In [None]:
quran.where('vno', 1)

## Conditional selectivity

With the above example, we need to be mindful of the fact that the Arabic text in this dataset includes the `bismillah` as the first part of the verse no. 1 of each surah, whereas the English translation does not do that. This have implications on the count of words. 

So, let us adjust the `words` count of the first verse of each surah by discounting 4 words (which is the count of words in Basmalah). But the last caviat is that we do not want to do this discounting on the first chapter of the Quran, because as we know the first verse of Sura Fatiha is a basmalah.

The same goes for Surah at-Tawbah which does not have the basmalah at the start, and hence it should not be discounted. 

So, let us start defining a function that does that exactly.

In [None]:
def discount_basmalah(sura_no, verse_no, words):
    if (sura_no==1 or sura_no==9):
        return words
    if (verse_no == 1):
        return words-4
    else:
        return words

all it matters now is to apply that function to each row of the quran.

In [None]:
quran = quran.with_columns('wc', quran.apply(discount_basmalah,'sno','vno', 'words'))
quran

## Makki or Madani

Let us bring in another interesting table about the place of revelation, i.e., Makki or Madani. I have brought in the [data](https://www.kaggle.com/abdulbaqi/quran-makki-madani), so let us join it with out table.

In [None]:
q_place = Table.read_table('/kaggle/input/quran-makki-madani/quran-toc.csv')
q_place

Let us just select the two columns that interest us when it comes to joining.

In [None]:
q_select = q_place.select(['No.','Place'])
q_select

Everything is ready, so let us do it.

In [None]:
quran = quran.join('sno',q_select,'No.')
quran

we can run a quick analysis of the verse lengths, by using the `sort` function.

In [None]:
quran.sort('wc', descending=True)

The above tells us that Madani surahs are generally has bigger verse size, and that among the Meccan surahs the verse 20 or surah al-Muzzammil (sno. 73) is the largest among Meccan surah (and 4th largest in the Quran).

If we are interested to know the largest verses in only **Meccan** surah, then we apply the filter and then do the sort as follows.

In [None]:
quran.where('Place', 'Meccan').sort('wc', descending=True)

The `words` column in the `quran` table seems redundent, so we can drop it.

In [None]:
quran2 = quran.drop('words')
quran2

With a small logic, we can have another column that counts the number of characters in each verse. The logic is that we can use `len` function to count all characters in a verse which includes the **whitespace** which we can then discount by no. of words plus one. Let us do it.

In [None]:
lc = quran2.apply(len, 'Ayah')-quran2.column('wc')+1

In [None]:
quran2 = quran2.with_columns('lc', lc)
quran2

as previous, I am curious to know which verses has the most letters.

In [None]:
quran2.sort('lc', descending=True)

Having the letter counts would enable lots more analysis on the stylistic properties of the Quran. For example, what is the average size of a single word in the Quran?

In [None]:
np.array(quran2.column('lc')).sum()/np.array(quran2.column('wc')).sum()

If we wanted to be a more detailed, we can repeat the same for Meccan and Medinan surahs. 

In [None]:
qmeccan = quran2.where('Place','Meccan').select('wc','lc')
np.array(qmeccan.column('lc')).sum()/np.array(qmeccan.column('wc')).sum()

In [None]:
qmedinan = quran2.where('Place','Medinan').select('wc','lc')
np.array(qmedinan.column('lc')).sum()/np.array(qmedinan.column('wc')).sum()

It shows that on average, Quranic words are around 4.3 letters and that Medinan surahs has slighly bigger word size but not that much significant. 

## Saving the File

Above was just scratching the surface of what we can do with the Quranic dataset. I will leave the rest for you. Here, I am going to save the Quran Table as a csv file. 

In [None]:
quran.to_csv('quran-en-ar-place.csv')