#Nadeem Pipeline

1. The project takes a sentence from the user that expresses the user's feelings. 
2. The sentence is inputed into a sentiment analysis model.
3. The sentiment output is taken, and a random verse (بيت شعر) is printed based on the label.
4. Finally, the meter (البحر الشعري للبيت) is predicted for the verse using a contextual language model. Namely, RoBERTa trained on classical Arabic poetry (الشعر الجاهلي) developed by the team to utilize for further poetry tasks.

![picture](https://drive.google.com/uc?export=view&id=1NhFOAC947HXuqkmUdhfV8G6HucA66x66)






#Nadeem Datasets
The Training data is available on github through this [Link](https://github.com/reemalfarwan/nadeem/tree/main/Datasets)

#Nadeem Models
1. AraRoBERTa_Poem. 
- Colab Page:  [link](https://colab.research.google.com/drive/1XIcJuiV7pXoO_G9_BnYhKeiekK0IAWio?usp=sharing)
- Available on Hugginingface: [Link](https://huggingface.co/reemalyami/AraRoBERTa_Poem)

2. AraRoBERTa_Poem_Classification

- Colab Page:  [link](https://drive.google.com/file/d/1RbdXGMt3rVn0dLnFUjMDF5VMdM1zB8o-/view?usp=sharing)
- Available on Hugginingface: [Link](https://huggingface.co/reemalyami/AraRoBERTa_Poem_classification)

###Team Members
1. Ahmed AlZoman. (Ahmadalzoman@gmail.com)
2. Reem AlFarwan. (alfarwan.reem@gmail.com)
3. Abdurahman AlShanqiti. (abdulrahmansh31@gmail.com)

###Install Dependencies 

In [None]:
# Install `transformers` from master
!pip install git+https://github.com/huggingface/transformers
!pip list | grep -E 'transformers|tokenizers'

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/huggingface/transformers
  Cloning https://github.com/huggingface/transformers to /tmp/pip-req-build-berhcdfj
  Running command git clone -q https://github.com/huggingface/transformers /tmp/pip-req-build-berhcdfj
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
    Preparing wheel metadata ... [?25l[?25hdone
Collecting huggingface-hub<1.0,>=0.1.0
  Downloading huggingface_hub-0.7.0-py3-none-any.whl (86 kB)
[K     |████████████████████████████████| 86 kB 4.1 MB/s 
Collecting pyyaml>=5.1
  Downloading PyYAML-6.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_12_x86_64.manylinux2010_x86_64.whl (596 kB)
[K     |████████████████████████████████| 596 kB 17.0 MB/s 
Collecting tokenizers!=0.11.3,<0.13,>=0.11.1
  Downloading tokenizers-0.12.1-cp37-cp37m-manylinux_2_12_x86_64.manylinu

###Import Libraries

In [None]:
import pandas as pd

import csv
import re # for regular expression
import string
import numpy as np 

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.model_selection import train_test_split
from sklearn import svm

from sklearn.model_selection import cross_val_score, cross_val_predict, KFold
from sklearn.metrics  import confusion_matrix, classification_report
from sklearn import metrics
from sklearn.preprocessing import StandardScaler


### Load Training Data

The SVM Model is used to develop a sentiment analysis model, the process takes few seconds only :)

In [None]:
df = pd.read_csv('https://raw.githubusercontent.com/reemalfarwan/nadeem/main/Datasets/saudi_sentiment.csv')

In [None]:
df.head()

Unnamed: 0,id,tweet,label
0,1.529675e+18,علموه ان المفارق علي المشتاق شين وعلموه ان كان...,2
1,1.529671e+18,صاح قلبي وكان وده يتبعك وطاحت دموعي لان الحظ ش...,2
2,1.529662e+18,طارق الحبيب انسحب عليه لا عاد يقدم برامج ولا ...,2
3,1.52966e+18,ليش محد قال اسمي معقوله لهالدرجه شين,2
4,1.529653e+18,شف البعض منهم البعض يعميهم الريال حتي لو هو شي...,2


In [None]:
data= df['tweet']
target= df['label']

In [None]:
#conevrt the text into numeric values using TF-IDF

tfidf_vectorizer=TfidfVectorizer(use_idf=True, max_df=0.95)
text_feature_set=tfidf_vectorizer.fit_transform(data)

In [None]:
# Split the data to training and testing 
X_train, X_test, y_train, y_test = train_test_split(text_feature_set, target, test_size=0.20, random_state=0)
print(X_train.shape,X_test.shape,y_train.shape,y_test.shape)


(1602, 8061) (401, 8061) (1602,) (401,)


In [None]:
#Train the Model 

import time 
start = time.time()
classifier_svm = svm.SVC(kernel='linear', C=1, probability=True, verbose=True).fit(X_train,y_train)

print("SVM accuracy: %.2f"%classifier_svm.score(X_test, y_test))
end = time.time()
print(end-start)

[LibSVM]SVM accuracy: 0.93
1.2685844898223877


In [None]:
#Input a phrase in mind like: ياكبر الفرح في قلبي مبسوط مره

phrase = input("أكتب جملة تصف سعادتك أو حزنك: ")

أكتب جملة تصف سعادتك أو حزنك: كبر الفرح في قلبي مبسوط مره


In [None]:
#The phrase sentiment is predicted
text_feature_set=tfidf_vectorizer.transform([phrase])

result = classifier_svm.predict(text_feature_set)

In [129]:
result

array([1])

In [None]:
#Load the Verses and the meters data. Alos, the Meter prediction model (AraRoBERTa_Poem_classification)

from transformers import pipeline


df_verse = pd.read_csv('https://raw.githubusercontent.com/reemalfarwan/nadeem/main/Datasets/arabic_poems_verses.csv')#call the verse file 
df_meter_names = pd.read_csv('https://raw.githubusercontent.com/reemalfarwan/nadeem/main/Datasets/meter_labels.csv')#call the meter names file (labels) 
df_meter_names["label"] = pd.to_numeric(df_meter_names["label"])


def predict_meter(txt):
  #This function calls the finetuned model to predict the verse (البيت الشعري) metre (الوزن الشعري)
  #NOTE: The team developed both the pretrained LM and finetuned it on the classification task

  classifier = pipeline("text-classification", model='reemalyami/AraRoBERTa_Poem_classification')
  verse_predction = classifier("txt")

  #1. It returns a list similar to [{'label': 'LABEL_1', 'score': 0.4}]
  #2. To extract the number the list is parsed using regx
  predicted_label_string = verse_predction[0]['label']
  predicted_label_number = int(re.findall("\d", predicted_label_string)[0]) #it returns a list similar to ['1'] and the needed number is at the 1st index
  meter_name = df_meter_names['verse_name'].loc[df_meter_names['label']==predicted_label_number]

  print('The Suitable Verse:', txt, '\n'*2, 'The Predicted Meter of the Verse', meter_name)
  #print('***'*5, verse_name, '***'*5)

In [130]:
df_verse.head()

Unnamed: 0,verse,Label
0,قالَ السَماءُ كَئيبَةٌ وَتَجَهَّما قُلتُ اِبتَ...,1
1,قالَ الصِبا وَلّى فَقُلتُ لَهُ اِبتَسِم لَن يُ...,1
2,قُلتُ اِبتَسِم وَاِطرَب فَلَو قارَنتَها قَضَّي...,1
3,قُلتُ اِبتَسِم ما أَنتَ جالِبَ دائِها وَشِفائِ...,1
4,قُلتُ اِبتَسِم لَم يَطلُبوكَ بِذَمِّهِم لَو لَ...,1


In [128]:
#output a Verse (بيت شعر) sutiable to the user sentiment, based on the user input label. Also, the predicted verse meter (بحر بيت الشعر) 

verse = ''
if result[0] == 1:
  df_pos = df_verse.loc[df_verse['Label'] == 1]
  verse = df_pos['verse'].sample()
  predict_meter(verse)
elif result[0] == 2:
  df_neg = df_verse.loc[df_verse['Label'] == 2]
  verse = df_neg['verse'].sample()
  predict_meter(verse)
else:
  print('Check your input, please.')

The Suitable Verse: 85    فَما لِحَوادِثِ الدُنيا بَقاءُ وَكُن رَجُلاً ع...
Name: verse, dtype: object 

 The Predicted Meter of the Verse 0    الطويل
Name: verse_name, dtype: object
