## <p style="text-align:center" color="red"><span style="color:red">Sentiment analysis on the Quran Karim Dataset - french version</span></p>

<table align="center">
  <td>
    <a target="_blank" href="https://colab.research.google.com/github/labrijisaad/Sentiment-analysis-on-the-Quran-Karim-dataset"><img src="https://www.tensorflow.org/images/colab_logo_32px.png" />Run in Google Colab</a>
  </td>
</table>

- In this notebook, I tried to write a script that can label the sentiment of a Quran Karim French Dataset using the **`pre-trained CamemBERT`** model.
- Here is the model documentation: [**`pre-trained CamemBERT`**](https://camembert-model.fr/#:~:text=CamemBERT%3A%20a%20Tasty%20French%20Language,achieves%20strong%20results%20in%20NLI.)


> 🙌 Notebook made by [@labriji_saad](https://github.com/labrijisaad)

### Importing the requirements

In [1]:
%%capture
import pandas as pd
import numpy as np
from yaspin import yaspin

### Reading the **`wolof Quran Karim Dataset`**

In [2]:
df_wolof = pd.read_csv("Quran Karim wolof.csv", encoding='utf-8-sig')
df_wolof

Unnamed: 0,index sourat,index verse,name sourat in wolof,verse in wolof
0,1,1,ubbiku ga saar wu jëkk wi,"Ci turu Yàlla, miy Yërëmaakoon , di Jaglewaak..."
1,1,2,ubbiku ga saar wu jëkk wi,"Xeeti cant yépp ñeel na Yàlla, miy Boroom àdd..."
2,1,3,ubbiku ga saar wu jëkk wi,"Yërëmaakoon bi, Jaglewaakoon bi,"
3,1,4,ubbiku ga saar wu jëkk wi,"Di Buur, di Boroom Bis-pénc ba."
4,1,5,ubbiku ga saar wu jëkk wi,"Yaw doŋŋ la nuy jaamu, te ci Yaw doŋŋ doŋŋ la ..."
...,...,...,...,...
6231,114,2,nit ñi,"Kiy Buurub nit ñi,"
6232,114,3,nit ñi,"Di Yàlla nit ñi,"
6233,114,4,nit ñi,"ci ayu jax-jaxali, Saytaane,"
6234,114,5,nit ñi,"kiy jax-jaxal ci biir dënni nit ñi,"


### Reading the **`french Quran Karim Dataset`**

In [3]:
df_french = pd.read_csv("Quran Karim french.csv", encoding='utf-8')
df_french

Unnamed: 0,index sourat,index verse,name sourat in french,verse in french
0,1,1,al fatiha,au nom d allah le tout miséricordieux le très...
1,1,2,al fatiha,louange à allah seigneur de l univers
2,1,3,al fatiha,le tout miséricordieux le très miséricordieux
3,1,4,al fatiha,maître du jour de la rétribution
4,1,5,al fatiha,c est toi seul que nous adorons et c est toi ...
...,...,...,...,...
6231,114,2,an nas,le souverain des hommes
6232,114,3,an nas,dieu des hommes
6233,114,4,an nas,contre le mal du mauvais conseiller furtif
6234,114,5,an nas,qui souffle le mal dans les poitrines des hom...


### Merging the TWO datasets

In [4]:
index_sourat = df_french['index sourat'].tolist()
index_verse = df_french['index verse'].tolist()
verse_in_french = df_french['verse in french'].tolist()
name_sourat_in_french = df_french['name sourat in french'].tolist()

verse_in_wolof = df_wolof['verse in wolof'].tolist()
name_sourat_in_wolof = df_wolof['name sourat in wolof'].tolist()

df = pd.DataFrame()
df["index sourat"] = index_sourat
df["index verse"] = index_verse
df["name sourat in french"] = name_sourat_in_french
df["name sourat in wolof"] = name_sourat_in_wolof
df["verse in french"] = verse_in_french
df["verse in wolof"] = verse_in_wolof

df

Unnamed: 0,index sourat,index verse,name sourat in french,name sourat in wolof,verse in french,verse in wolof
0,1,1,al fatiha,ubbiku ga saar wu jëkk wi,au nom d allah le tout miséricordieux le très...,"Ci turu Yàlla, miy Yërëmaakoon , di Jaglewaak..."
1,1,2,al fatiha,ubbiku ga saar wu jëkk wi,louange à allah seigneur de l univers,"Xeeti cant yépp ñeel na Yàlla, miy Boroom àdd..."
2,1,3,al fatiha,ubbiku ga saar wu jëkk wi,le tout miséricordieux le très miséricordieux,"Yërëmaakoon bi, Jaglewaakoon bi,"
3,1,4,al fatiha,ubbiku ga saar wu jëkk wi,maître du jour de la rétribution,"Di Buur, di Boroom Bis-pénc ba."
4,1,5,al fatiha,ubbiku ga saar wu jëkk wi,c est toi seul que nous adorons et c est toi ...,"Yaw doŋŋ la nuy jaamu, te ci Yaw doŋŋ doŋŋ la ..."
...,...,...,...,...,...,...
6231,114,2,an nas,nit ñi,le souverain des hommes,"Kiy Buurub nit ñi,"
6232,114,3,an nas,nit ñi,dieu des hommes,"Di Yàlla nit ñi,"
6233,114,4,an nas,nit ñi,contre le mal du mauvais conseiller furtif,"ci ayu jax-jaxali, Saytaane,"
6234,114,5,an nas,nit ñi,qui souffle le mal dans les poitrines des hom...,"kiy jax-jaxal ci biir dënni nit ñi,"


> Now, we have a Wolof-French parallel dataset, the next step is to run a pre-trained sentiment analysis model on the **`verse in French`** column:  **In this way we get a labeled wolof dataset.**

## <p style="text-align:center" color="red"><span style="color:red"> Sentiment analysis with a pre-trained CamemBERT  model </span></p>


### Importing the necessary libraries

In [5]:
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification
from transformers import pipeline
from tqdm import tqdm

In [6]:
tokenizer = AutoTokenizer.from_pretrained("tblard/tf-allocine")
model = TFAutoModelForSequenceClassification.from_pretrained("tblard/tf-allocine")
nlp = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

All model checkpoint layers were used when initializing TFCamembertForSequenceClassification.

All the layers of TFCamembertForSequenceClassification were initialized from the model checkpoint at tblard/tf-allocine.
If your task is similar to the task the model of the checkpoint was trained on, you can already use TFCamembertForSequenceClassification for predictions without further training.


In [7]:
# Testing the operation of the model
print(nlp("Deux choses donnent la paix en ce monde : l’honnêteté et la piété")) # sentence extracted from the dataset.

[{'label': 'POSITIVE', 'score': 0.9348116517066956}]


### Sentiment Analysis with CamemBERT model

In [8]:
tqdm.pandas()

def predict_with_CamemBERT(text):
    sentiment_result = nlp(text)
    return str(sentiment_result)[12:20]

df['sentiment with CamemBERT-french model'] = df['verse in french'].progress_apply(predict_with_CamemBERT)

100%|██████████████████████████████████████████████████████████████████████████████| 6236/6236 [24:08<00:00,  4.31it/s]


In [9]:
df

Unnamed: 0,index sourat,index verse,name sourat in french,name sourat in wolof,verse in french,verse in wolof,sentiment with CamemBERT-french model
0,1,1,al fatiha,ubbiku ga saar wu jëkk wi,au nom d allah le tout miséricordieux le très...,"Ci turu Yàlla, miy Yërëmaakoon , di Jaglewaak...",POSITIVE
1,1,2,al fatiha,ubbiku ga saar wu jëkk wi,louange à allah seigneur de l univers,"Xeeti cant yépp ñeel na Yàlla, miy Boroom àdd...",POSITIVE
2,1,3,al fatiha,ubbiku ga saar wu jëkk wi,le tout miséricordieux le très miséricordieux,"Yërëmaakoon bi, Jaglewaakoon bi,",POSITIVE
3,1,4,al fatiha,ubbiku ga saar wu jëkk wi,maître du jour de la rétribution,"Di Buur, di Boroom Bis-pénc ba.",POSITIVE
4,1,5,al fatiha,ubbiku ga saar wu jëkk wi,c est toi seul que nous adorons et c est toi ...,"Yaw doŋŋ la nuy jaamu, te ci Yaw doŋŋ doŋŋ la ...",POSITIVE
...,...,...,...,...,...,...,...
6231,114,2,an nas,nit ñi,le souverain des hommes,"Kiy Buurub nit ñi,",POSITIVE
6232,114,3,an nas,nit ñi,dieu des hommes,"Di Yàlla nit ñi,",POSITIVE
6233,114,4,an nas,nit ñi,contre le mal du mauvais conseiller furtif,"ci ayu jax-jaxali, Saytaane,",POSITIVE
6234,114,5,an nas,nit ñi,qui souffle le mal dans les poitrines des hom...,"kiy jax-jaxal ci biir dënni nit ñi,",POSITIVE


### Saving the dataset

In [10]:
df.to_csv("Quran Karim wolof-labeled.csv", encoding="utf-8-sig", index=False)
df = pd.read_csv("Quran Karim wolof-labeled.csv", encoding='utf-8-sig')
df

Unnamed: 0,index sourat,index verse,name sourat in french,name sourat in wolof,verse in french,verse in wolof,sentiment with CamemBERT-french model
0,1,1,al fatiha,ubbiku ga saar wu jëkk wi,au nom d allah le tout miséricordieux le très...,"Ci turu Yàlla, miy Yërëmaakoon , di Jaglewaak...",POSITIVE
1,1,2,al fatiha,ubbiku ga saar wu jëkk wi,louange à allah seigneur de l univers,"Xeeti cant yépp ñeel na Yàlla, miy Boroom àdd...",POSITIVE
2,1,3,al fatiha,ubbiku ga saar wu jëkk wi,le tout miséricordieux le très miséricordieux,"Yërëmaakoon bi, Jaglewaakoon bi,",POSITIVE
3,1,4,al fatiha,ubbiku ga saar wu jëkk wi,maître du jour de la rétribution,"Di Buur, di Boroom Bis-pénc ba.",POSITIVE
4,1,5,al fatiha,ubbiku ga saar wu jëkk wi,c est toi seul que nous adorons et c est toi ...,"Yaw doŋŋ la nuy jaamu, te ci Yaw doŋŋ doŋŋ la ...",POSITIVE
...,...,...,...,...,...,...,...
6231,114,2,an nas,nit ñi,le souverain des hommes,"Kiy Buurub nit ñi,",POSITIVE
6232,114,3,an nas,nit ñi,dieu des hommes,"Di Yàlla nit ñi,",POSITIVE
6233,114,4,an nas,nit ñi,contre le mal du mauvais conseiller furtif,"ci ayu jax-jaxali, Saytaane,",POSITIVE
6234,114,5,an nas,nit ñi,qui souffle le mal dans les poitrines des hom...,"kiy jax-jaxal ci biir dënni nit ñi,",POSITIVE


> - 🙌 Notebook made by [@labriji_saad](https://github.com/labrijisaad)
> - 🔗 Linledin [@labriji_saad](https://www.linkedin.com/in/labrijisaad/)