# Rhetorical Roles

In [NER](https://github.com/d-saikrishna/NLP/blob/master/NER/NER_Learnings.ipynb), we saw how every word or a phrase can be identified as an entity (ORG, PERSON etc.,). But sometimes, the entire statement or a group of statements may mean something (FACT, ANALYSIS etc.,) Identification of the rhetorical roles is another NLP Task.

## Use-Cases:
1. Extractive Summarization of Documents: Extracting salient sentences that would summarize the document.
2. Abstractive Summarization od Documents: Generating concise text summaries.
3. Prediction (of judgements etc)

These use-cases can be done without RR also. But use of RR in model input has shown to increase the performance of summarisation/prediction.

# Opennyai RR
    
12 Rhetorical roles are defined by Opennyai. Moreover, Opennyai defined RRs at sentence level. Therefore, every sentence of a cout judgement could be wither NONE or: PREAMBLE, FACT, RULING BY LOWER COURT, ISSUE, ARGUMENT_PETITIONER, ARGUMENT_RESPONDENT, ANALYSIS, STATUTE, PRECEDENT_RELIED, PRECEDENT_NOT_RELIED, RATIO, RULING BY PRESENT COURT. It can be understood as a **Multi-class classification problem**. Definitions to these RR classes can be found here: [Link](https://opennyai.readthedocs.io/en/latest/rr/rr_structure.html)


Currently, not every class/RR is classified well. While PREAMBLE, ISSUE, NONE sentences are well identified. PRECEDENT_NOT_RELIED, RATIO and ARGUMENT_RESPONDENT are poorly identified. **The overall F1 Score of the model is 0.79.**

In [6]:
from opennyai import RhetoricalRolePredictor
from opennyai.utils import Data
from opennyai import Pipeline

In [3]:
#Sample court judgements
text1 = open('SampleTexts/sample_judgment1.txt').read()
text2 = open('SampleTexts/sample_judgment2.txt').read()

# you can also load your text files directly into this
texts_to_process = [text1, text2]

# create Data object for data  preprocessing before running ML models
data = Data(texts_to_process, preprocessing_nlp_model='en_core_web_trf')

#Other pre-processing models available are: en_core_web_md, en_core_web_sm (fastest but less accurate)

[38;5;4mℹ Pre-processing will happen on CPU![0m


The following pre-processing steps takes place with the above code:

1. Separating preamble from judgment text
2. Sentence splitting of judgment text
3. Convert upper case words in preamble to title case
4. Replace newline characters within a sentence with space in judgment text

More on pre-processing: [Link](https://opennyai.readthedocs.io/en/latest/preprocessing/preprocessing.html) (chunks etc)

In [4]:
# If you have access to GPU then set this to True else False
use_gpu = False

In [7]:
pipeline = Pipeline(components=['Rhetorical_Role'], use_gpu=use_gpu, verbose=True)

[38;5;4mℹ Loading Rhetorical Role...[0m
[38;5;4mℹ Rhetorical Roles will use CPU![0m


Downloading:   0%|          | 0.00/420M [00:00<?, ?B/s]

Some weights of the model checkpoint at bert-base-uncased were not used when initializing BertModel: ['cls.predictions.decoder.weight', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.bias', 'cls.predictions.transform.dense.bias', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight']
- This IS expected if you are initializing BertModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing BertModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Downloading: "https://storage.googleapis.com/indianlegalbert/OPEN_SOURCED_FILES/Rhetorical_Role_Benchmark/Model/model.pt" to /home/krishna/.opennyai/rhetor

  0%|          | 0.00/998M [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/28.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

In [8]:
results = pipeline(data)

json_result_doc_1 = results[0]

[38;5;4mℹ Preprocessing rhetorical role model input!!![0m


100%|█████████████████████████████████████████████| 2/2 [00:17<00:00,  8.51s/it]
Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


[38;5;4mℹ Processing documents with rhetorical role model!!![0m


100%|█████████████████████████████████████████████| 2/2 [00:28<00:00, 14.36s/it]


In [17]:
json_result_doc_1['annotations'][32]

{'start': 3693,
 'end': 3804,
 'text': 'Being aggrieved thereby, the appellants filed an appeal under Section 37 of the 1996 Act before the High Court.',
 'labels': ['FAC'],
 'id': '3560773625804cd987910559ac33ab22_32'}

# References
1. [EkStep RR](https://github.com/Legal-NLP-EkStep/rhetorical-role-baseline)
2. 