April/May 2022
# LAW3027 - Tutorial3: Legal NER and Text Categorization

#### Intended Learning Outcomes (ILOs)

By the end of this notebook, you will know:

- Extract and visualize legal entities from case-law
- Classify certain provisions of case-law 


#### Libraries to be used in today's tutorial: BlackStone

Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project from the Incorporated Council of Law Reporting for England and Wales' research lab, ICLR&D. Blackstone was written by Daniel Hoadley. For more information please refer to: https://github.com/ICLRandD/Blackstone . You don't need to install this library as its already installed in the virtual environment which was set up for tutorial 2 on Contract Automation.

#### Virtual Environment

You don't need to create a new virtual environment for this tutorial. You can activate the same virtual environment which you created in the previous tutorial using the `environment.yml` file from https://github.com/maastrichtlawtech/law3027-advanced-legal-analytics/blob/main/environment.yml. Just make sure that you remember the name of that virtual environment. Do the following steps to activate the environment and run the jupyter notebook on the terminal (MacOS) or Anaconda Prompt (Windows):

 `conda activate name_of_the_environment`
 
 `jupyter notebook`



#### Prerequisite: Revise programming concepts: Lists, For Loops and Functions 
It is recommended that you watch the following videos also to refresh some programming concepts which you will be using throughout the course.

Please watch and practice the examples from the following videos to refresh concepts about for loops, functions and lists. 

Lists: https://youtu.be/tw7ror9x32s 

Functions: https://youtu.be/NSbOtYzIQI0 

For Loops: https://youtu.be/OnDr4J2UXSA 

Append() function usage in a list: https://youtu.be/5IEhquZghp0 

## 1. Name Entity Recognition (NER)

You have learned about NER in Ch-3 of this datacamp tutorial: https://app.datacamp.com/learn/courses/introduction-to-natural-language-processing-in-python

### Recognizing Legal Entities 

The NER component of the Blackstone model has been trained to detect the following entity types:

| Entity        | Name           |Examples     |
| ------------- |-------------|-------------|
| CASENAME    | Case names |e.g. Smith v Jones, In re Jones, In Jones' case |
| CITATION      | Citations (unique identifiers for reported and unreported cases)     |e.g. (2002) 2 Cr App R 123 |
| INSTRUMENT | Written legal instruments     |    e.g. Theft Act 1968, European Convention on Human Rights, CPR |
| PROVISION | Unit within a written legal instrument   | e.g. section 1, art 2(3) |
| COURT | Court or tribunal   | e.g. Court of Appeal, Upper Tribunal |
| JUDGE | References to judges |e.g. Eady J, Lord Bingham of Cornhill |


In [None]:
import spacy
# Load the blackstone model
nlp = spacy.load("en_blackstone_proto")

#### 1.1 Apply the Blackstone model to the text

In [None]:
text = """ 31 As we shall explain in more detail in examining the submission of the Secretary of State (see paras 77 and following), it is the Secretary of State’s case that nothing has been done by Parliament in the European Communities Act 1972 or any other statute to remove the prerogative power of the Crown, in the conduct of the international relations of the UK, to take steps to remove the UK from the EU by giving notice under article 50EU for the UK to withdraw from the EU Treaty and other relevant EU Treaties. The Secretary of State relies in particular on Attorney General v De Keyser’s Royal Hotel Ltd [1920] AC 508 and R v Secretary of State for Foreign and Commonwealth Affairs, Ex p Rees-Mogg [1994] QB 552; he contends that the Crown’s prerogative power to cause the UK to withdraw from the EU by giving notice under article 50EU could only have been removed by primary legislation using express words to that effect, alternatively by legislation which has that effect by necessary implication. The Secretary of State contends that neither the ECA 1972 nor any of the other Acts of Parliament referred to have abrogated this aspect of the Crown’s prerogative, either by express words or by necessary implication.
"""
# Apply the model to the text
doc = nlp(text)

#### 1.2 Print the legal entities and their corresponding labels

In [None]:
# Iterate through the entities identified by the model
for ent in doc.ents:
    print(ent.text, ent.label_)

#### 1.3 What value is returned by ent.text and ent.label_ respectively ?



#### 1.4 Now let's write a function called  `perform_entity_recognition()`  to perform legal entity recognition for a given text. 

The function should take a `string` as input and return a dataframe with two columns: `entity_name` and `entity_label`. See the detailed comments below and complete the code for each comment.

In [None]:
import pandas as pd
def perform_entity_recognition(text):
    #load the blackstone model in the nlp variable
    nlp = spacy.load("en_blackstone_proto")
    #apply the blackstone model to the input text
    
    #create an empty list called list_ents_names to store the entity names
    
    #create an empty list called list_ents_labels to store the entity labels
    
    #Iterate over doc.ents and append the two lists with the corresponding values of entities and names respectively

    
    #zip the two lists to create a new list called list_names_labels
    
    
    #create a dataframe from the list, list_names_labels. Name the columns of the dataframe: `entity_name` and `entity_label`

    #return the dataframe
    

#### 1.5 Call the `perform_entity_recognition()` function on the above text and check the output

#### 1.6

Now take a piece of text (or an entire court decision) from any court decision on the BAILI (https://www.bailii.org/) website and identify legal entities from it. Are all the recognized entities correct ? What kind of errors did you encounter when running the 'perform_entity_recognition()' function on a a real world case from BAILI ?



#### 1.7:
Refer to the reading here: https://docs.microsoft.com/en-us/legal/cognitive-services/language-service/cner-characteristics-and-limitations . Identify atleast one example of True positive, False Positive and False Negative for your piece of text.  

### 2 Visualizing Legal Entities

spaCy ships with an excellent set of visualisers, including a visualiser for NER. Blackstone comes with a custom colour palette that can be used to make it easier to distiguish entities on the source text when using displacy. We will visualize legal entities using spaCy's displacy visualizer. For more information please refer to: https://github.com/ICLRandD/Blackstone 


In [None]:
from spacy import displacy
from blackstone.displacy_palette import ner_displacy_options

#### 2.1 Visualize the identified legal entities in the above text

In [None]:
doc = nlp(text)  # remember that nlp = spacy.load("en_blackstone_proto") (make sure that nlp model is already in memory before you apply it on the text)

# Call displacy and pass `ner_displacy_options` into the option parameter`


### 3 Text Classification or Categorization

In contrast with the NER component (which has been trainined to identify tokens and series of tokens of interest), the text categoriser classifies longer spans of text, such as sentences. For more information please refer to: https://github.com/ICLRandD/Blackstone 

The Text Categoriser has been trained to classify text according to one of five mutually exclusive categories, which are as follows:

|Category     | Description |
|--------------|-------------|
|AXIOM     |The text appears to postulate a well-established principle|
|CONCLUSION|The text appears to make a finding, holding, determination or conclusion |
|ISSUE     |The text appears to discuss an issue or question |
|LEGAL_TEST| The text appears to discuss a legal test|
|UNCAT     |The text does not fall into one of the four categories |


Blackstone's text categoriser generates a predicted categorisation for a `doc`. The `textcat` pipeline component has been designed to be applied to individual sentences rather than a single document consisting of many sentences.

We first need to create a helper function to identify the highest scoring prediction generated by the text categoriser. How will we do this?

In [None]:
def get_top_category(doc): # function takes a spaCy doc as input
    """
    Function to identify the highest scoring category
    prediction generated by the text categoriser. 
    """
    cats = doc.cats
    max_score = max(cats.values()) 
    max_cats = [k for k, v in cats.items() if v == max_score] # identify the key in the cats dictionary where the value (of dictionary) is equal to the max_score
    max_cat = max_cats[0]
    return (max_cat, max_score)



#### 3.1

Call this function using a sample text and use the print() statement to print the value of cats, max_score, max_cats and max_cat to understand the functionality of this function. You may do this multiple times with different texts to completelt understand the logic of this function.

In [None]:
text = """It is a well-established principle of law that the transactions of independent states between each other are governed by other laws than those which municipal courts administer. \
It is, however, in my judgment, insufficient to react to the danger of over-formalisation and “judicialisation” simply by emphasising flexibility and context-sensitivity. \
The question is whether on the facts found by the judge, the (or a) proximate cause of the loss of the rig was “inherent vice or nature of the subject matter insured” within the meaning of clause 4.4 of the Institute Cargo Clauses (A)."""

In [None]:
# Apply the model to the text
doc = nlp(text)

Remember that the text categorizer component of blackstone has been designed to be applied to individual sentences rather than a single document. So, in order to use this function we first need to split the text into sentences. 

#### 3.2 Extract the sentences from the `text` variable above

In [None]:
# Get the sentences in the passage of text

#### 3.3

Now write a function to identify the category of each sentence in a text. 

Hint: The function takes the text (`string`) as an input argument and you can use the previous code cell to split the text into sentences. Then you need to think how will you identify the category of each sentence. Apply the `get_top_category()` function  to find the category for each sentence.

In [None]:
def predict_sentence_category(input_text):
    #complete the code here

### 4. Legislation Linker 

Blackstone's Legislation Linker attempts to couple a reference to a PROVISION to it's parent INSTRUMENT by using the NER model to identify the presence of an INSTRUMENT and then navigating the dependency tree to identify the child provision.

Once Blackstone has identified a PROVISION:INSTRUMENT pair, it will attempt to generate target URLs to both the provision and the instrument on legislation.gov.uk.

#### 4.1 

Refer to the documentation here: https://github.com/ICLRandD/Blackstone and extract the links for various provisions in the text below.

In [None]:
import spacy
from blackstone.utils.legislation_linker import extract_legislation_relations
nlp = spacy.load("en_blackstone_proto")

text = "The Secretary of State was at pains to emphasise that, if a withdrawal agreement is made, it is very likely to be a treaty requiring ratification and as such would have to be submitted for review by Parliament, acting separately, under the negative resolution procedure set out in section 20 of the Constitutional Reform and Governance Act 2010. Theft is defined in section 1 of the Theft Act 1968"



### 4.2

Now try to extract the legislation links from another text from a court decision or a legislation. Check if the identified links are correct ? Count the number of True Positives, False Positives and False Negatives.