# Predicting Judicial Decisions of the European Court of Human Rights

In this notebook, we aim to train a classification model to classify cases as 'violation' or 'non-violation'. 
The cases were originally downloaded from HUDOC and structured based on the articles they fall under.

In [1]:
from sklearn.linear_model.logistic import LogisticRegression
import numpy as np
import re
import os
import copy
from pandas import DataFrame

To read our dataset, we use os.walk to walk through a sub-tree of directories and files and load all of our training data and labels. We avoid the folder 'both' as the files inside are labelled both as violation and non-violation.
Our data set will be loaded into dictionaries, the keys corresponding to articles and the values will be a list of cases (X - our training set) or labels (Y).

In [2]:
def read_dataset(PATH):
    X_dataset = {}
    Y_dataset = {}
    for path, dirs, files in os.walk(PATH):
        for filename in files:
            fullpath = os.path.join(path, filename)
            if "both" not in fullpath:
                with open(fullpath, 'r', encoding="utf8") as file:
                    X_dataset, Y_dataset = add_file_to_dataset(fullpath, X_dataset, Y_dataset, file.read())

    return X_dataset, Y_dataset       

In [3]:
def add_file_to_dataset(fullpath, x_dataset, y_dataset, file):
    article = extract_article(fullpath)
    file = preprocess(file)
    if article not in x_dataset.keys() :
        x_dataset[article] = []
        y_dataset[article] = []
    x_dataset[article] = x_dataset[article] + [file]
    label = 0 if "non-violation" in fullpath else 1
    y_dataset[article] = y_dataset[article] + [label]
    return x_dataset, y_dataset  

We use regex to extract the number of the Article from the fullpath and insert the file into the list under that specific Article.

In [4]:
def extract_article(path): 
    pattern = r"(Article\d+)"
    result = re.search(pattern, path)
    article = result.group(1)
    return article

### Preprocessing 

Similar to the research paper this work is based on, we will only use the PROCEDURE and THE FACTS paragraphs of the cases as our training set. Otherwise, the model may be biased.

In [5]:
def preprocess(file): 
    file = extract_paragraphs(file)
    return file

In [6]:
def extract_paragraphs(file): 
    pat = r'((PROCEDURE|procedure\n|THE FACTS|the facts\n).+?)(III|THE LAW|the law\n|PROCEEDINGS|ALLEGED VIOLATION OF ARTICLE)'
    result = re.search(pat, file, re.S)
    return result.group(1)

### Loading the data

In [7]:
base_path = "Datasets\\Human rights dataset"

In [8]:
X_train_docs, Y_train = read_dataset(base_path + "\\train")
#X_extra_test_docs, Y_extra_test = read_dataset(base_path + "\\test_violations")

### Viewing our data

In [9]:
X_train_docs["Article2"][0][:100]

'PROCEDURE\n1.\xa0\xa0The case originated in an application (no. 12773/03) against the Republic of Bulgaria '

In [10]:
X_train_docs["Article2"][1][:100]

'PROCEDURE\n1.\xa0\xa0The case originated in an application (no. 42980/04) against the Republic of Bulgaria '

In [11]:
len(X_train_docs["Article2"])

114

In [12]:
article2 = {'Article2': X_train_docs['Article2'], 'Label': Y_train['Article2']}

In [13]:
article2_df = DataFrame(article2, columns= ['Article2', 'Label'])
article2_df.head()

Unnamed: 0,Article2,Label
0,PROCEDURE\n1. The case originated in an appli...,0
1,PROCEDURE\n1. The case originated in an appli...,0
2,PROCEDURE\n1. The case originated in an appli...,0
3,PROCEDURE\n1. The case originated in an appli...,0
4,PROCEDURE\n1. The case originated in an appli...,0


In [14]:
article2_df.sample(frac=1)

Unnamed: 0,Article2,Label
47,PROCEDURE\n1. The case originated in an appli...,0
65,PROCEDURE\n1. The case originated in an appli...,1
38,PROCEDURE\n1. The case originated in an appli...,0
26,PROCEDURE\n1. The case originated in an appli...,0
85,PROCEDURE\n1. The case originated in an appli...,1
...,...,...
98,PROCEDURE\n1. The case originated in an appli...,1
14,PROCEDURE\n1. The case originated in an appli...,0
51,PROCEDURE\n1. The case originated in an appli...,0
78,PROCEDURE\n1. The case originated in an appli...,1


### Dividing our training set into training and validation

We will divide our training set into a training and validation set (90%, 10%). 

Also, similarly to Medvedeva, M., Vols, M. & Wieling, M. Artif Intell Law (2019), we want to remove the articles which contain too few cases. We include Article 11 "as an estimate of how well the model performs when only very few cases are available".

In [15]:
from sklearn.model_selection import train_test_split

Article 2 example:

In [16]:
X_train_article, X_test_article, Y_train_article, Y_test_article = train_test_split(X_train_docs["Article2"], Y_train["Article2"], test_size=0.10, random_state=42)


In [17]:
X_train_article[0][:100]

'PROCEDURE\n1.\xa0\xa0The case originated in seventeen applications (nos. 24093/14, 24104/14, 24106/14, 2410'

In [18]:
len(X_train_article), len(X_test_article), len(X_train_docs["Article2"])

(102, 12, 114)

The entire dataset:

In [19]:
def divide_dataset(train_set, label_set):
    divided_data_train = dict.fromkeys(train_set.keys(),[])
    divided_data_test = dict.fromkeys(train_set.keys(),[])
    divided_labels_train = dict.fromkeys(train_set.keys(),[])
    divided_labels_test = dict.fromkeys(train_set.keys(),[])
    
    for key in train_set.keys():
        if len(train_set[key]) < 50:
            divided_data_train.pop(key)
            divided_data_test.pop(key)
            divided_labels_train.pop(key)
            divided_labels_test.pop(key)
            continue
            
        data_train_article, data_test_article, labels_train_article, labels_test_article = train_test_split(train_set[key], label_set[key], test_size=0.10, random_state=42)
        
        divided_data_train[key] = data_train_article
        divided_data_test[key] = data_test_article
        divided_labels_train[key] = labels_train_article
        divided_labels_test[key] = labels_test_article
    return divided_data_train, divided_data_test, divided_labels_train, divided_labels_test

In [20]:
X_train_docs, X_test_docs, Y_train, Y_test = divide_dataset(X_train_docs, Y_train)

Key included:  Article10
Key included:  Article11
Key skipped over:  Article12
Key included:  Article13
Key included:  Article14
Key skipped over:  Article18
Key included:  Article2
Key included:  Article3
Key skipped over:  Article4
Key included:  Article5
Key included:  Article6
Key included:  Article7
Key included:  Article8


In [21]:
print(X_train_docs.keys())

dict_keys(['Article10', 'Article11', 'Article13', 'Article14', 'Article2', 'Article3', 'Article5', 'Article6', 'Article7', 'Article8'])


In [22]:
len(X_train_docs["Article2"]), len(X_test_docs["Article2"])

(102, 12)

### Tokenization



In [23]:
from sklearn.feature_extraction.text import CountVectorizer

####  For article 2:

In [24]:
count_vect = CountVectorizer()

In [25]:
article2_term_doc = count_vect.fit_transform(X_train_docs["Article2"])    

This is a term-document matrix. The list of documents we provided to the CountVectorizer represent the rows, and the columns are the unique words (our vocabulary) which appear in the all the documents. The term-document matrix is a sparse matrix which counts the number of times each unique word has appeared in a particular document.

In [26]:
article2_term_doc.shape

(102, 14694)

In [27]:
print(article2_term_doc.toarray())

[[ 0  1  0  0 ...  0  0  0  0]
 [ 0  0  0  0 ...  0  0  0  0]
 [ 0 16  0  0 ...  0  0  0  0]
 [ 0  0  0  0 ...  0  0  0  0]
 ...
 [ 0  0  0  0 ...  0  0  0  0]
 [10  1  0  0 ...  0  0  0  0]
 [ 1  0  0  0 ...  0  0  0  0]
 [ 0  0  0  0 ...  0  0  0  0]]


In [28]:
# Our vocabulary
features = count_vect.get_feature_names()
features[1030:1045]

['abetted',
 'abetting',
 'abide',
 'abilities',
 'ability',
 'ablaze',
 'able',
 'ablution',
 'abnormal',
 'abode',
 'abolish',
 'abolished',
 'abolishing',
 'abolition',
 'abort']

In [29]:
len(X_train_docs["Article2"]), len(features)

(102, 14694)

#### For the entire dataset:

In [30]:
def tokenize_dataset(train_set, test_set):
    dataset_vectorizer = copy.deepcopy(train_set)
    train_term_doc_list = copy.deepcopy(train_set)
    test_term_doc_list = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        vect = CountVectorizer()
        article_term_doc = vect.fit_transform(train_set[key]).toarray()    
        test_term_doc = vect.transform(test_set[key]).toarray(test_set[key])
        
        dataset_vectorizer[key] = vect
        train_term_doc_list[key] = article_term_doc
        test_term_doc_list[key] = test_term_doc
    return dataset_vectorizer, train_term_doc_list, test_term_doc_list       

In [31]:
dataset_CountVectorizer, X_train, X_test = tokenize_dataset(X_train_docs, X_test_docs)

In [32]:
features = dataset_CountVectorizer["Article2"].get_feature_names()
features[1030:1045]

['abetted',
 'abetting',
 'abide',
 'abilities',
 'ability',
 'ablaze',
 'able',
 'ablution',
 'abnormal',
 'abode',
 'abolish',
 'abolished',
 'abolishing',
 'abolition',
 'abort']

### Classification

#### For Article 2:

In [33]:
classifier = LogisticRegression()
classifier.fit(X_train["Article2"], Y_train["Article2"])



LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
                   intercept_scaling=1, l1_ratio=None, max_iter=100,
                   multi_class='warn', n_jobs=None, penalty='l2',
                   random_state=None, solver='warn', tol=0.0001, verbose=0,
                   warm_start=False)

In [34]:
# Predict the output
y_test_pred = classifier.predict(X_test["Article2"])

In [35]:
import sklearn.metrics as sm

In [36]:
# Compute performance metrics
print("Logistic regression performance:")
print("Mean absolute error =", round(sm.mean_absolute_error(Y_test["Article2"], y_test_pred), 2))
# print("Mean squared error =", round(sm.mean_squared_error(Y_test["Article2"], y_test_pred), 2)) 
# print("Median absolute error =", round(sm.median_absolute_error(Y_test["Article2"], y_test_pred), 2)) 
# print("Explain variance score =", round(sm.explained_variance_score(Y_test["Article2"], y_test_pred), 2))
# print("R2 score =", round(sm.r2_score(Y_test["Article2"], y_test_pred), 2))
print("Accuracy score =", sm.accuracy_score(Y_test["Article2"], y_test_pred))
print("F1 score =", sm.f1_score(Y_test["Article2"], y_test_pred))

df = pd.DataFrame({'Actual': Y_test["Article2"], 'Predicted': y_test_pred})
df

Logistic regression performance:
Mean absolute error = 0.42
Accuracy score = 0.5833333333333334
F1 score = 0.4444444444444445


Unnamed: 0,Actual,Predicted
0,1,0
1,0,1
2,0,0
3,1,1
4,0,0
5,0,1
6,1,0
7,1,0
8,0,0
9,0,0


#### For the entire dataset:

In [37]:
def dataset_classify_LogReg(train_set, train_label_set, test_set, test_label_set):
    accuracy = 0
    
    for key in train_set.keys():
            classifier_instance = LogisticRegression()
            classifier_instance.fit(train_set[key], train_label_set[key])
            test_pred = classifier_instance.predict(test_set[key])
            
            print("Logistic regression performance for :", key)
            print("Mean absolute error =", round(sm.mean_absolute_error(test_label_set[key], test_pred), 2))
            print("Accuracy score =", round(sm.accuracy_score(test_label_set[key], test_pred), 2))
            print("F1 score =", round(sm.f1_score(test_label_set[key], test_pred), 2))
            
            accuracy = accuracy + round(sm.accuracy_score(test_label_set[key], test_pred), 2)
    print("Average accuracy: ", accuracy / len(train_set.keys()))
            

In [38]:
dataset_LogReg = dataset_classify_LogReg(X_train, Y_train, X_test, Y_test)



Logistic regression performance for : Article10
Mean absolute error = 0.36
Accuracy score = 0.64
F1 score = 0.56




Logistic regression performance for : Article11
Mean absolute error = 0.43
Accuracy score = 0.57
F1 score = 0.67




Logistic regression performance for : Article13
Mean absolute error = 0.14
Accuracy score = 0.86
F1 score = 0.82




Logistic regression performance for : Article14
Mean absolute error = 0.28
Accuracy score = 0.72
F1 score = 0.75




Logistic regression performance for : Article2
Mean absolute error = 0.42
Accuracy score = 0.58
F1 score = 0.44




Logistic regression performance for : Article3
Mean absolute error = 0.28
Accuracy score = 0.72
F1 score = 0.7




Logistic regression performance for : Article5
Mean absolute error = 0.5
Accuracy score = 0.5
F1 score = 0.59




Logistic regression performance for : Article6
Mean absolute error = 0.3
Accuracy score = 0.7
F1 score = 0.67




Logistic regression performance for : Article7
Mean absolute error = 0.4
Accuracy score = 0.6
F1 score = 0.5




Logistic regression performance for : Article8
Mean absolute error = 0.37
Accuracy score = 0.63
F1 score = 0.56
Average accuracy:  0.6519999999999999


##### 65.1% average accuracy for a simple classification model (Logistic Regression)

In [39]:
from sklearn import svm

In [40]:
def dataset_classify_LinearSVC(train_set, train_label_set, test_set, test_label_set):
    accuracy = 0
    
    for key in train_set.keys():
            classifier_instance = svm.LinearSVC()
            classifier_instance.fit(train_set[key], train_label_set[key])
            test_pred = classifier_instance.predict(test_set[key])
            
            print("Logistic regression performance for :", key)
            print("Mean absolute error =", round(sm.mean_absolute_error(test_label_set[key], test_pred), 2))
            print("Accuracy score =", round(sm.accuracy_score(test_label_set[key], test_pred), 2))
            print("F1 score =", round(sm.f1_score(test_label_set[key], test_pred), 2))
            
            accuracy = accuracy + round(sm.accuracy_score(test_label_set[key], test_pred), 2)
    print("Average accuracy: ", accuracy / len(train_set.keys()))

In [41]:
dataset_LogReg = dataset_classify_LinearSVC(X_train, Y_train, X_test, Y_test)



Logistic regression performance for : Article10
Mean absolute error = 0.27
Accuracy score = 0.73
F1 score = 0.67




Logistic regression performance for : Article11
Mean absolute error = 0.57
Accuracy score = 0.43
F1 score = 0.6
Logistic regression performance for : Article13
Mean absolute error = 0.14
Accuracy score = 0.86
F1 score = 0.82




Logistic regression performance for : Article14
Mean absolute error = 0.28
Accuracy score = 0.72
F1 score = 0.75
Logistic regression performance for : Article2
Mean absolute error = 0.42
Accuracy score = 0.58
F1 score = 0.44




Logistic regression performance for : Article3
Mean absolute error = 0.37
Accuracy score = 0.63
F1 score = 0.64




Logistic regression performance for : Article5
Mean absolute error = 0.53
Accuracy score = 0.47
F1 score = 0.53




Logistic regression performance for : Article6
Mean absolute error = 0.3
Accuracy score = 0.7
F1 score = 0.67




Logistic regression performance for : Article7
Mean absolute error = 0.4
Accuracy score = 0.6
F1 score = 0.5




Logistic regression performance for : Article8
Mean absolute error = 0.37
Accuracy score = 0.63
F1 score = 0.59
Average accuracy:  0.635


##### 63.5% average accuracy for a Support Vector Machine model (Linear SVC)