# Predicting Judicial Decisions of the European Court of Human Rights

In this notebook, we aim to train a classification model to classify cases as 'violation' or 'non-violation'. 
The cases were originally downloaded from HUDOC and structured based on the articles they fall under.

In [1]:
from sklearn.linear_model.logistic import LogisticRegression
import numpy as np
import re
import os
import copy
from pandas import DataFrame

To read our dataset, we use os.walk to walk through a sub-tree of directories and files and load all of our training data and labels. We avoid the folder 'both' as the files inside are labelled both as violation and non-violation.
Our data set will be loaded into dictionaries, the keys corresponding to articles and the values will be a list of cases (X - our training set) or labels (Y).

In [2]:
def read_dataset(PATH):
    X_dataset = {}
    Y_dataset = {}
    for path, dirs, files in os.walk(PATH):
        for filename in files:
            fullpath = os.path.join(path, filename)
            if "both" not in fullpath:
                with open(fullpath, 'r', encoding="utf8") as file:
                    X_dataset, Y_dataset = add_file_to_dataset(fullpath, X_dataset, Y_dataset, file.read())

    return X_dataset, Y_dataset       

In [3]:
def add_file_to_dataset(fullpath, x_dataset, y_dataset, file):
    article = extract_article(fullpath)
    file = preprocess(file)
    if article not in x_dataset.keys() :
        x_dataset[article] = []
        y_dataset[article] = []
    x_dataset[article] = x_dataset[article] + [file]
    label = 0 if "non-violation" in fullpath else 1
    y_dataset[article] = y_dataset[article] + [label]
    return x_dataset, y_dataset  

We use regex to extract the number of the Article from the fullpath and insert the file into the list under that specific Article.

In [4]:
def extract_article(path): 
    pattern = r"(Article\d+)"
    result = re.search(pattern, path)
    article = result.group(1)
    return article

### Preprocessing 

Similar to the research paper this work is based on, we will only use the PROCEDURE and THE FACTS paragraphs of the cases as our training set. Otherwise, the model may be biased.

In [5]:
def preprocess(file): 
    file = extract_paragraphs(file)
    return file

In [6]:
def extract_paragraphs(file): 
    pat = r'((PROCEDURE|procedure\n|THE FACTS|the facts\n).+?)(III|THE LAW|the law\n|PROCEEDINGS|ALLEGED VIOLATION OF ARTICLE)'
    result = re.search(pat, file, re.S)
    return result.group(1)

### Loading the data

In [7]:
base_path = "Datasets\\Human rights dataset"

In [8]:
X_train_docs, Y_train = read_dataset(base_path + "\\train")
#X_extra_test_docs, Y_extra_test = read_dataset(base_path + "\\test_violations")

### Dividing our training set into training and validation

We will divide our training set into a training and validation set (90%, 10%). 

Also, similarly to Medvedeva, M., Vols, M. & Wieling, M. Artif Intell Law (2019), we want to remove the articles which contain too few cases. We include Article 11 "as an estimate of how well the model performs when only very few cases are available".

In [9]:
from sklearn.model_selection import train_test_split

In [10]:
def divide_dataset(train_set, label_set):
    divided_data_train = dict.fromkeys(train_set.keys(),[])
    divided_data_test = dict.fromkeys(train_set.keys(),[])
    divided_labels_train = dict.fromkeys(train_set.keys(),[])
    divided_labels_test = dict.fromkeys(train_set.keys(),[])
    
    for key in train_set.keys():
        if len(train_set[key]) <= 50:
            divided_data_train.pop(key)
            divided_data_test.pop(key)
            divided_labels_train.pop(key)
            divided_labels_test.pop(key)
            continue
            
        data_train_article, data_test_article, labels_train_article, labels_test_article = train_test_split(train_set[key], label_set[key], test_size=0.10, random_state=42)
        
        divided_data_train[key] = data_train_article
        divided_data_test[key] = data_test_article
        divided_labels_train[key] = labels_train_article
        divided_labels_test[key] = labels_test_article
    return divided_data_train, divided_data_test, divided_labels_train, divided_labels_test

In [11]:
X_train_docs, X_test_docs, Y_train, Y_test = divide_dataset(X_train_docs, Y_train)

In [12]:
print(X_train_docs.keys())

dict_keys(['Article10', 'Article11', 'Article13', 'Article14', 'Article2', 'Article3', 'Article5', 'Article6', 'Article8'])


### Tokenization with CountVectorizer



In [13]:
from sklearn.feature_extraction.text import CountVectorizer
import spacy
from spacy.lang import en

In [14]:
# tokenize the doc and lemmatize its tokens
lemmatizer = spacy.lang.en.English()
def my_tokenizer(doc):
    tokens = lemmatizer(doc)
    return([token.lemma_ for token in tokens])

In [15]:
def tokenize_dataset(train_set, test_set):
    dataset_vectorizer = copy.deepcopy(train_set)
    train_term_doc_list = copy.deepcopy(train_set)
    test_term_doc_list = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        vect = CountVectorizer(ngram_range=(1,2), lowercase=True, max_features=700000, tokenizer=my_tokenizer, binary=True, min_df=3)
        article_term_doc = vect.fit_transform(train_set[key]).toarray()    
        test_term_doc = vect.transform(test_set[key]).toarray(test_set[key])
        
        dataset_vectorizer[key] = vect
        train_term_doc_list[key] = article_term_doc
        test_term_doc_list[key] = test_term_doc
    return dataset_vectorizer, train_term_doc_list, test_term_doc_list       

In [16]:
dataset_CountVectorizer, X_train, X_test = tokenize_dataset(X_train_docs, X_test_docs)

In [17]:
features = dataset_CountVectorizer["Article2"].get_feature_names()
features[1030:1045]

['- 22',
 '- 27',
 '- 30',
 '- 46',
 '- 5',
 '- 6',
 '- 66',
 '- 69',
 '- 7',
 '- 73',
 '- 8',
 '- 84',
 '- a',
 '- agent',
 '- and']

### Classification with Ngrams and CountVectorizer

In [18]:
import sklearn.metrics as sm

In [19]:
def dataset_classify_LogReg(train_set, train_label_set, test_set, test_label_set):
    accuracy = 0
    
    for key in train_set.keys():
        
        #classifier_instance = LogisticRegression(solver = 'saga', penalty='l1')
        classifier_instance = LogisticRegression(solver = 'lbfgs')
        classifier_instance.fit(train_set[key], train_label_set[key])
        test_pred = classifier_instance.predict(test_set[key])

        print("Logistic regression performance for :", key)
        print("Mean absolute error =", round(sm.mean_absolute_error(test_label_set[key], test_pred), 3))
        print("Accuracy score =", round(sm.accuracy_score(test_label_set[key], test_pred), 3))
        print("F1 score =", round(sm.f1_score(test_label_set[key], test_pred), 3))

        accuracy = accuracy + round(sm.accuracy_score(test_label_set[key], test_pred), 3)
            
    print("Average accuracy: ", accuracy / len(train_set.keys()))
            

In [20]:
dataset_LogReg = dataset_classify_LogReg(X_train, Y_train, X_test, Y_test)

Logistic regression performance for : Article10
Mean absolute error = 0.136
Accuracy score = 0.864
F1 score = 0.857
Logistic regression performance for : Article11
Mean absolute error = 0.286
Accuracy score = 0.714
F1 score = 0.75
Logistic regression performance for : Article13
Mean absolute error = 0.091
Accuracy score = 0.909
F1 score = 0.889
Logistic regression performance for : Article14
Mean absolute error = 0.207
Accuracy score = 0.793
F1 score = 0.786
Logistic regression performance for : Article2
Mean absolute error = 0.417
Accuracy score = 0.583
F1 score = 0.286




Logistic regression performance for : Article3
Mean absolute error = 0.228
Accuracy score = 0.772
F1 score = 0.755
Logistic regression performance for : Article5
Mean absolute error = 0.4
Accuracy score = 0.6
F1 score = 0.667




Logistic regression performance for : Article6
Mean absolute error = 0.261
Accuracy score = 0.739
F1 score = 0.714




Logistic regression performance for : Article8
Mean absolute error = 0.348
Accuracy score = 0.652
F1 score = 0.619
Average accuracy:  0.7362222222222222


##### 73.6% average accuracy for a Logistic Regression model(solver = 'lbfgs') with CountVectorizer(ngram_range=(1,2), lowercase=True, max_features=700000, tokenizer=my_tokenizer, binary=True, min_df=3)

In [21]:
from sklearn import svm

In [24]:
def dataset_classify_LinearSVC(train_set, train_label_set, test_set, test_label_set):
    accuracy = 0
    
    for key in train_set.keys():
        
        classifier_instance = svm.LinearSVC(C=0.5)
        classifier_instance.fit(train_set[key], train_label_set[key])
        test_pred = classifier_instance.predict(test_set[key])

        print("Logistic regression performance for :", key)
        print("Mean absolute error =", round(sm.mean_absolute_error(test_label_set[key], test_pred), 3))
        print("Accuracy score =", round(sm.accuracy_score(test_label_set[key], test_pred), 3))
        print("F1 score =", round(sm.f1_score(test_label_set[key], test_pred), 3))

        accuracy = accuracy + round(sm.accuracy_score(test_label_set[key], test_pred), 3)
            
    print("Average accuracy: ", accuracy / len(train_set.keys()))


In [25]:
dataset_LogReg = dataset_classify_LinearSVC(X_train, Y_train, X_test, Y_test)

Logistic regression performance for : Article10
Mean absolute error = 0.182
Accuracy score = 0.818
F1 score = 0.8
Logistic regression performance for : Article11
Mean absolute error = 0.429
Accuracy score = 0.571
F1 score = 0.667
Logistic regression performance for : Article13
Mean absolute error = 0.091
Accuracy score = 0.909
F1 score = 0.889
Logistic regression performance for : Article14
Mean absolute error = 0.241
Accuracy score = 0.759
F1 score = 0.759
Logistic regression performance for : Article2
Mean absolute error = 0.417
Accuracy score = 0.583
F1 score = 0.286
Logistic regression performance for : Article3
Mean absolute error = 0.246
Accuracy score = 0.754
F1 score = 0.731
Logistic regression performance for : Article5
Mean absolute error = 0.4
Accuracy score = 0.6
F1 score = 0.667
Logistic regression performance for : Article6
Mean absolute error = 0.261
Accuracy score = 0.739
F1 score = 0.714
Logistic regression performance for : Article8
Mean absolute error = 0.391
Accurac

##### 70.4% average accuracy for a Support Vector Machine model (LinearSVC(C=0.5)) with CountVectorizer(ngram_range=(1,2), lowercase=True, max_features=700000, tokenizer=my_tokenizer, binary=True, min_df=3)

### Tokenization and Classification with TfIdf Vectorizer

In [26]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [27]:
def tokenize_dataset(train_set, test_set):
    dataset_vectorizer = copy.deepcopy(train_set)
    train_term_doc_list = copy.deepcopy(train_set)
    test_term_doc_list = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        vect = TfidfVectorizer(ngram_range=(2,4), lowercase=True, max_features=900000)
        article_term_doc = vect.fit_transform(train_set[key]).toarray()    
        test_term_doc = vect.transform(test_set[key]).toarray(test_set[key])
        
        dataset_vectorizer[key] = vect
        train_term_doc_list[key] = article_term_doc
        test_term_doc_list[key] = test_term_doc
    return dataset_vectorizer, train_term_doc_list, test_term_doc_list   
            

In [28]:
dataset_TfidfVectorizer, X_train, X_test = tokenize_dataset(X_train_docs, X_test_docs)

In [29]:
features = dataset_TfidfVectorizer["Article2"].get_feature_names()
features[1030:1045]

['08 it follows',
 '08 it follows from',
 '08 mikiyeva',
 '08 mikiyeva and',
 '08 mikiyeva and menchayeva',
 '08 of',
 '08 of 21',
 '08 of 21 july',
 '08 of 25',
 '08 of 25 february',
 '08 on',
 '08 on suspicion',
 '08 on suspicion of',
 '09 18',
 '09 18 december']

In [30]:
def dataset_classify_LogReg(train_set, train_label_set, test_set, test_label_set):
    accuracy = 0
    
    for key in train_set.keys():
        classifier_instance = LogisticRegression(solver = 'lbfgs')
        classifier_instance.fit(train_set[key], train_label_set[key])
        test_pred = classifier_instance.predict(test_set[key])

        print("Logistic regression performance for :", key)
        print("Mean absolute error =", round(sm.mean_absolute_error(test_label_set[key], test_pred), 3))
        print("Accuracy score =", round(sm.accuracy_score(test_label_set[key], test_pred), 3))
        print("F1 score =", round(sm.f1_score(test_label_set[key], test_pred), 3))

        accuracy = accuracy + round(sm.accuracy_score(test_label_set[key], test_pred), 3)
            
    print("Average accuracy: ", accuracy / len(train_set.keys()))
            

In [31]:
dataset_LogReg = dataset_classify_LogReg(X_train, Y_train, X_test, Y_test)

Logistic regression performance for : Article10
Mean absolute error = 0.182
Accuracy score = 0.818
F1 score = 0.75
Logistic regression performance for : Article11
Mean absolute error = 0.143
Accuracy score = 0.857
F1 score = 0.889
Logistic regression performance for : Article13
Mean absolute error = 0.136
Accuracy score = 0.864
F1 score = 0.842
Logistic regression performance for : Article14
Mean absolute error = 0.207
Accuracy score = 0.793
F1 score = 0.75
Logistic regression performance for : Article2
Mean absolute error = 0.25
Accuracy score = 0.75
F1 score = 0.667
Logistic regression performance for : Article3
Mean absolute error = 0.263
Accuracy score = 0.737
F1 score = 0.746
Logistic regression performance for : Article5
Mean absolute error = 0.367
Accuracy score = 0.633
F1 score = 0.703
Logistic regression performance for : Article6
Mean absolute error = 0.239
Accuracy score = 0.761
F1 score = 0.725
Logistic regression performance for : Article8
Mean absolute error = 0.348
Accur

##### 76.27% average accuracy for a Logistic Regression (solver = 'lbfgs') with TfidfVectorizer(ngram_range=(2,4), lowercase=True, max_features=800000)

In [32]:
def dataset_classify_LinearSVC(train_set, train_label_set, test_set, test_label_set):
    accuracy = 0
    
    for key in train_set.keys():
            classifier_instance = svm.LinearSVC(C=0.1, max_iter=1500)
            classifier_instance.fit(train_set[key], train_label_set[key])
            test_pred = classifier_instance.predict(test_set[key])
            
            print("Logistic regression performance for :", key)
            print("Mean absolute error =", round(sm.mean_absolute_error(test_label_set[key], test_pred), 3))
            print("Accuracy score =", round(sm.accuracy_score(test_label_set[key], test_pred), 3))
            print("F1 score =", round(sm.f1_score(test_label_set[key], test_pred), 3))
            
            accuracy = accuracy + round(sm.accuracy_score(test_label_set[key], test_pred), 3)
            
    print("Average accuracy: ", accuracy / len(train_set.keys()))


In [33]:
dataset_LogReg = dataset_classify_LinearSVC(X_train, Y_train, X_test, Y_test)

Logistic regression performance for : Article10
Mean absolute error = 0.182
Accuracy score = 0.818
F1 score = 0.75
Logistic regression performance for : Article11
Mean absolute error = 0.143
Accuracy score = 0.857
F1 score = 0.889
Logistic regression performance for : Article13
Mean absolute error = 0.136
Accuracy score = 0.864
F1 score = 0.842
Logistic regression performance for : Article14
Mean absolute error = 0.207
Accuracy score = 0.793
F1 score = 0.75
Logistic regression performance for : Article2
Mean absolute error = 0.25
Accuracy score = 0.75
F1 score = 0.667
Logistic regression performance for : Article3
Mean absolute error = 0.263
Accuracy score = 0.737
F1 score = 0.746
Logistic regression performance for : Article5
Mean absolute error = 0.367
Accuracy score = 0.633
F1 score = 0.703
Logistic regression performance for : Article6
Mean absolute error = 0.239
Accuracy score = 0.761
F1 score = 0.725
Logistic regression performance for : Article8
Mean absolute error = 0.326
Accur

##### 76.52% average accuracy for a Support Vector Machine (LinearSVC(C=0.1, max_iter=1500)) with Tfidf vectorizer