# Predicting Judicial Decisions of the European Court of Human Rights

In this notebook, we aim to train a classification model to classify cases as 'violation' or 'non-violation'. 
The cases were originally downloaded from HUDOC and structured based on the articles they fall under.

In [1]:
from sklearn.linear_model.logistic import LogisticRegression
import numpy as np
import re
import os
import copy
from pandas import DataFrame

To read our dataset, we use os.walk to walk through a sub-tree of directories and files and load all of our training data and labels. We avoid the folder 'both' as the files inside are labelled both as violation and non-violation.
Our data set will be loaded into dictionaries, the keys corresponding to articles and the values will be a list of cases (X - our training set) or labels (Y).

In [2]:
def read_dataset(PATH):
    X_dataset = {}
    Y_dataset = {}
    for path, dirs, files in os.walk(PATH):
        for filename in files:
            fullpath = os.path.join(path, filename)
            if "both" not in fullpath:
                with open(fullpath, 'r', encoding="utf8") as file:
                    X_dataset, Y_dataset = add_file_to_dataset(fullpath, X_dataset, Y_dataset, file.read())

    return X_dataset, Y_dataset       

In [3]:
def add_file_to_dataset(fullpath, x_dataset, y_dataset, file):
    article = extract_article(fullpath)
    file = preprocess(file)
    if article not in x_dataset.keys() :
        x_dataset[article] = []
        y_dataset[article] = []
    x_dataset[article] = x_dataset[article] + [file]
    label = 0 if "non-violation" in fullpath else 1
    y_dataset[article] = y_dataset[article] + [label]
    return x_dataset, y_dataset  

We use regex to extract the number of the Article from the fullpath and insert the file into the list under that specific Article.

In [4]:
def extract_article(path): 
    pattern = r"(Article\d+)"
    result = re.search(pattern, path)
    article = result.group(1)
    return article

### Preprocessing 

Similar to the research paper this work is based on, we will only use the PROCEDURE and THE FACTS paragraphs of the cases as our training set. Otherwise, the model may be biased.

In [5]:
def preprocess(file): 
    file = extract_paragraphs(file)
    return file

In [6]:
def extract_paragraphs(file): 
    pat = r'((PROCEDURE|procedure\n|THE FACTS|the facts\n).+?)(III|THE LAW|the law\n|PROCEEDINGS|ALLEGED VIOLATION OF ARTICLE)'
    result = re.search(pat, file, re.S)
    return result.group(1)

### Loading the data

In [7]:
base_path = "Datasets\\Human rights dataset"

In [8]:
X_train_docs, Y_train = read_dataset(base_path + "\\train")
#X_extra_test_docs, Y_extra_test = read_dataset(base_path + "\\test_violations")

Also, similarly to Medvedeva, M., Vols, M. & Wieling, M. Artif Intell Law (2019), we want to remove the articles which contain too few cases. We include Article 11 "as an estimate of how well the model performs when only very few cases are available".

In [9]:
def select_articles(train_set):
    selected_training_set = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        if len(train_set[key]) <= 50:
            selected_training_set.pop(key)
            continue
    return selected_training_set

In [10]:
X_train_docs = select_articles(X_train_docs)

In [11]:
X_train_docs.keys()

dict_keys(['Article10', 'Article11', 'Article13', 'Article14', 'Article2', 'Article3', 'Article5', 'Article6', 'Article8'])

### Tokenization with CountVectorizer



In [12]:
from sklearn.feature_extraction.text import CountVectorizer
import spacy
from spacy.lang import en

In [13]:
# tokenize the doc and lemmatize its tokens
lemmatizer = spacy.lang.en.English()
def my_tokenizer(doc):
    tokens = lemmatizer(doc)
    return([token.lemma_ for token in tokens])

In [18]:
def tokenize_dataset(train_set):
    train_term_doc_list = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        vect = CountVectorizer(ngram_range=(1,2), lowercase=True, tokenizer=my_tokenizer, max_features=700000, binary=True, min_df=3)
        article_term_doc = vect.fit_transform(train_set[key]).toarray()    
        train_term_doc_list[key] = article_term_doc
               
        print(key)
        print("The number of features: " + str(len(vect.get_feature_names())))
        print()
        
    return train_term_doc_list       

In [19]:
X_train = tokenize_dataset(X_train_docs)

Article10
The number of features: 34393

Article11
The number of features: 14412

Article13
The number of features: 39143

Article14
The number of features: 49597

Article2
The number of features: 28021

Article3
The number of features: 79063

Article5
The number of features: 44330

Article6
The number of features: 80337

Article8
The number of features: 68252



### Classification with Logistic Regression

In [21]:
import sklearn.metrics as sm
from sklearn.model_selection import cross_validate
from sklearn.metrics import make_scorer
from scipy import stats

In [22]:
scoring = {'accuracy': make_scorer(sm.accuracy_score),
           'precision': make_scorer(sm.precision_score),
           'recall': make_scorer(sm.recall_score),
           'f1': make_scorer(sm.f1_score)}

In [23]:
def statistical_significance(params, predictions, X, Y):
    newX = pd.DataFrame({"Constant": np.ones(len(X))}).join(pd.DataFrame(X))
    MSE = (sum((Y - predictions) ** 2)) / (len(newX) - len(newX.columns))

    # Note if you don't want to use a DataFrame replace the two lines above with
    # newX = np.append(np.ones((len(X),1)), X, axis=1)
    # MSE = (sum((y-predictions)**2))/(len(newX)-len(newX[0]))

    var_b = MSE * (np.linalg.inv(np.dot(newX.T, newX)).diagonal())
    sd_b = np.sqrt(var_b)
    ts_b = params / sd_b

    p_values =[2 * (1 - stats.t.cdf(np.abs(i), (len(newX) - 1))) for i in ts_b]

    sd_b = np.round(sd_b, 3)
    ts_b = np.round(ts_b, 3)
    p_values = np.round(p_values, 3)
    params = np.round(params, 4)

    myDF3 = pd.DataFrame()
    myDF3["Coefficients"], myDF3["Standard Errors"], myDF3["t values"], myDF3["Probabilites"] = [params, sd_b, ts_b, p_values]
    print(myDF3)

In [24]:
def dataset_classify_LogReg(train_set, train_label_set):
    accuracy = 0
    precision = 0
    recall = 0
    f1 = 0
    
    for key in train_set.keys():
        print(key)
        
        classifier_instance = LogisticRegression(solver = 'lbfgs', max_iter=500)
        #params = np.append(classifier_instance.intercept_,classifier_instance.coef_)
        scores = cross_validate(classifier_instance, train_set[key], train_label_set[key], cv=10, scoring=scoring)
        
        #print("Statistical significance: " + statistical_significance(params, ,train_set[key], train_label_set[key] ))
        
        print("Accuracy: %0.2f" % (scores["test_accuracy"].mean()))
        print("Precision: %0.2f" % (scores["test_precision"].mean()))
        print("Recall: %0.2f" % (scores["test_recall"].mean()))
        print("F1: %0.2f" % (scores["test_f1"].mean()))   
        print()
            
        accuracy = accuracy + scores["test_accuracy"].mean()
        precision = precision + scores["test_precision"].mean()
        recall = recall + scores["test_recall"].mean()
        f1 = f1 + scores["test_f1"].mean()
        
    print("Average accuracy score:  %0.3f " % (accuracy / len(train_set.keys())))
    print("Average precision score:  %0.3f " % (precision / len(train_set.keys())))
    print("Average recall score:  %0.3f " % (recall / len(train_set.keys())))
    print("Average f1 score:  %0.3f " % ( f1 / len(train_set.keys())))

In [25]:
dataset_LogReg = dataset_classify_LogReg(X_train, Y_train)

Article10
Accuracy: 0.54
Precision: 0.51
Recall: 0.58
F1: 0.52

Article11
Accuracy: 0.77
Precision: 0.82
Recall: 0.84
F1: 0.79

Article13
Accuracy: 0.76
Precision: 0.76
Recall: 0.77
F1: 0.74

Article14
Accuracy: 0.70
Precision: 0.73
Recall: 0.74
F1: 0.70

Article2
Accuracy: 0.62
Precision: 0.68
Recall: 0.59
F1: 0.59

Article3
Accuracy: 0.73
Precision: 0.75
Recall: 0.76
F1: 0.73

Article5
Accuracy: 0.64
Precision: 0.66
Recall: 0.67
F1: 0.63

Article6
Accuracy: 0.68
Precision: 0.68
Recall: 0.68
F1: 0.66

Article8
Accuracy: 0.67
Precision: 0.67
Recall: 0.69
F1: 0.67

Average accuracy score:  0.679 
Average precision score:  0.696 
Average recall score:  0.703 
Average f1 score:  0.670 


##### 67.9% average accuracy for a Logistic Regression model(solver = 'lbfgs') with CountVectorizer(ngram_range=(1,2), lowercase=True, max_features=700000, tokenizer=my_tokenizer, binary=True, min_df=3) with 10-fold Cross Validation

### Tokenization with CountVectorizer



In [26]:
def tokenize_dataset(train_set):
    train_term_doc_list = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        if len(train_set[key]) <= 50:
            train_term_doc_list.pop(key)
            continue
        vect = CountVectorizer(ngram_range=(1,2), lowercase=True, max_features=700000, binary=True, min_df=3)
        article_term_doc = vect.fit_transform(train_set[key]).toarray()    
        train_term_doc_list[key] = article_term_doc  
        
        print(key)
        print("The number of features: " + str(len(vect.get_feature_names())))
        print()
        
    return train_term_doc_list       

In [27]:
X_train = tokenize_dataset(X_train_docs)

Article10
The number of features: 33811

Article11
The number of features: 13257

Article13
The number of features: 39376

Article14
The number of features: 50060

Article2
The number of features: 27229

Article3
The number of features: 84557

Article5
The number of features: 45591

Article6
The number of features: 85307

Article8
The number of features: 71839



### Classification with Linear SVC

In [28]:
from sklearn import svm

In [29]:
def dataset_classify_LinearSVC(train_set, train_label_set):
    accuracy = 0
    precision = 0
    recall = 0
    f1 = 0
    
    for key in train_set.keys():
        print(key)
        
        classifier_instance = svm.LinearSVC(C=0.5)
        scores = cross_validate(classifier_instance, train_set[key], train_label_set[key], cv=10, scoring=scoring)
                
        print("Accuracy: %0.2f" % (scores["test_accuracy"].mean()))
        print("Precision: %0.2f" % (scores["test_precision"].mean()))
        print("Recall: %0.2f" % (scores["test_recall"].mean()))
        print("F1: %0.2f" % (scores["test_f1"].mean()))   
        print()
            
        accuracy = accuracy + scores["test_accuracy"].mean()
        precision = precision + scores["test_precision"].mean()
        recall = recall + scores["test_recall"].mean()
        f1 = f1 + scores["test_f1"].mean()
        
    print("Average accuracy score:  %0.3f " % (accuracy / len(train_set.keys())))
    print("Average precision score:  %0.3f " % (precision / len(train_set.keys())))
    print("Average recall score:  %0.3f " % (recall / len(train_set.keys())))
    print("Average f1 score:  %0.3f " % ( f1 / len(train_set.keys())))

In [30]:
dataset_LogReg = dataset_classify_LinearSVC(X_train, Y_train)

Article10
Accuracy: 0.55
Precision: 0.49
Recall: 0.59
F1: 0.52

Article11
Accuracy: 0.75
Precision: 0.82
Recall: 0.81
F1: 0.77

Article13
Accuracy: 0.78
Precision: 0.78
Recall: 0.78
F1: 0.76

Article14
Accuracy: 0.72
Precision: 0.73
Recall: 0.78
F1: 0.74

Article2
Accuracy: 0.64
Precision: 0.71
Recall: 0.63
F1: 0.62

Article3
Accuracy: 0.73
Precision: 0.74
Recall: 0.76
F1: 0.73

Article5
Accuracy: 0.65
Precision: 0.65
Recall: 0.66
F1: 0.63

Article6
Accuracy: 0.67
Precision: 0.66
Recall: 0.69
F1: 0.65

Article8
Accuracy: 0.67
Precision: 0.67
Recall: 0.69
F1: 0.67

Average accuracy score:  0.684 
Average precision score:  0.695 
Average recall score:  0.709 
Average f1 score:  0.676 


##### 68.4% average accuracy for a Support Vector Machine model (LinearSVC(C=0.5)) with CountVectorizer(ngram_range=(1,2), lowercase=True, max_features=700000, binary=True, min_df=3) with 10-fold CrossValidation

### Tokenization with TfIdf Vectorizer

In [31]:
from sklearn.feature_extraction.text import TfidfVectorizer

In [32]:
def tokenize_dataset(train_set):
    train_term_doc_list = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        if len(train_set[key]) <= 50:
            train_term_doc_list.pop(key)
            continue
        vect = TfidfVectorizer(ngram_range=(1,4), lowercase=True, max_features=950000)
        article_term_doc = vect.fit_transform(train_set[key]).toarray()    
        train_term_doc_list[key] = article_term_doc
        
        print(key)
        print("The number of features: " + str(len(vect.get_feature_names())))
        print()
        
    return train_term_doc_list

In [33]:
X_train = tokenize_dataset(X_train_docs)

Article10
The number of features: 950000

Article11
The number of features: 367293

Article13
The number of features: 950000

Article14
The number of features: 950000

Article2
The number of features: 893531

Article3
The number of features: 950000

Article5
The number of features: 950000

Article6
The number of features: 950000

Article8
The number of features: 950000



### Classification with Logistic Regression

In [34]:
def dataset_classify_LogReg(train_set, train_label_set):
    accuracy = 0
    precision = 0
    recall = 0
    f1 = 0
    
    for key in train_set.keys():
        print(key)
        
        classifier_instance = LogisticRegression(solver = 'saga')
        scores = cross_validate(classifier_instance, train_set[key], train_label_set[key], cv=10, scoring=scoring)
                
        print("Accuracy: %0.2f" % (scores["test_accuracy"].mean()))
        print("Precision: %0.2f" % (scores["test_precision"].mean()))
        print("Recall: %0.2f" % (scores["test_recall"].mean()))
        print("F1: %0.2f" % (scores["test_f1"].mean()))      
        print()
            
        accuracy = accuracy + scores["test_accuracy"].mean()
        precision = precision + scores["test_precision"].mean()
        recall = recall + scores["test_recall"].mean()
        f1 = f1 + scores["test_f1"].mean()
        
    print("Average accuracy score:  %0.3f " % (accuracy / len(train_set.keys())))
    print("Average precision score:  %0.3f " % (precision / len(train_set.keys())))
    print("Average recall score:  %0.3f " % (recall / len(train_set.keys())))
    print("Average f1 score:  %0.3f " % ( f1 / len(train_set.keys())))

In [35]:
dataset_LogReg = dataset_classify_LogReg(X_train, Y_train)

Article10
Accuracy: 0.56
Precision: 0.56
Recall: 0.53
F1: 0.53

Article11
Accuracy: 0.75
Precision: 0.81
Recall: 0.81
F1: 0.76

Article13
Accuracy: 0.76
Precision: 0.77
Recall: 0.75
F1: 0.74

Article14
Accuracy: 0.67
Precision: 0.72
Recall: 0.58
F1: 0.62

Article2
Accuracy: 0.72
Precision: 0.75
Recall: 0.75
F1: 0.71

Article3
Accuracy: 0.73
Precision: 0.74
Recall: 0.77
F1: 0.73

Article5
Accuracy: 0.63
Precision: 0.66
Recall: 0.67
F1: 0.64

Article6
Accuracy: 0.73
Precision: 0.73
Recall: 0.71
F1: 0.71

Article8
Accuracy: 0.69
Precision: 0.71
Recall: 0.67
F1: 0.68

Average accuracy score:  0.694 
Average precision score:  0.717 
Average recall score:  0.693 
Average f1 score:  0.680 


##### 69.4% average accuracy for a Logistic Regression (solver = 'saga') with TfidfVectorizer(ngram_range=(1,4), lowercase=True, max_features=950000) with 10-fold Cross Validation

### Tokenization with TfIdf Vectorizer

In [36]:
def tokenize_dataset(train_set):
    train_term_doc_list = copy.deepcopy(train_set)
    
    for key in train_set.keys():
        if len(train_set[key]) <= 50:
            train_term_doc_list.pop(key)
            continue
        vect = TfidfVectorizer(ngram_range=(2,3), lowercase=True, max_features=900000, min_df=3)
        article_term_doc = vect.fit_transform(train_set[key]).toarray()    
        train_term_doc_list[key] = article_term_doc
        
        print(key)
        print("The number of features: " + str(len(vect.get_feature_names())))
        print()
        
    return train_term_doc_list

In [37]:
X_train = tokenize_dataset(X_train_docs)

Article10
The number of features: 50784

Article11
The number of features: 16815

Article13
The number of features: 64655

Article14
The number of features: 82692

Article2
The number of features: 40088

Article3
The number of features: 166397

Article5
The number of features: 83255

Article6
The number of features: 168068

Article8
The number of features: 132089



### Classification with Linear SVC

In [38]:
def dataset_classify_LinearSVC(train_set, train_label_set):
    accuracy = 0
    precision = 0
    recall = 0
    f1 = 0
    
    for key in train_set.keys():
        print(key)
        
        classifier_instance = svm.LinearSVC(C=0.1, max_iter=1500)
        scores = cross_validate(classifier_instance, train_set[key], train_label_set[key], cv=10, scoring=scoring)
                
        print("Accuracy: %0.2f" % (scores["test_accuracy"].mean()))
        print("Precision: %0.2f" % (scores["test_precision"].mean()))
        print("Recall: %0.2f" % (scores["test_recall"].mean()))
        print("F1: %0.2f" % (scores["test_f1"].mean()))      
        print()
            
        accuracy = accuracy + scores["test_accuracy"].mean()
        precision = precision + scores["test_precision"].mean()
        recall = recall + scores["test_recall"].mean()
        f1 = f1 + scores["test_f1"].mean()
        
    print("Average accuracy score:  %0.3f " % (accuracy / len(train_set.keys())))
    print("Average precision score:  %0.3f " % (precision / len(train_set.keys())))
    print("Average recall score:  %0.3f " % (recall / len(train_set.keys())))
    print("Average f1 score:  %0.3f " % ( f1 / len(train_set.keys())))

In [39]:
dataset_LogReg = dataset_classify_LinearSVC(X_train, Y_train)

Article10
Accuracy: 0.53
Precision: 0.49
Recall: 0.52
F1: 0.50

Article11
Accuracy: 0.75
Precision: 0.81
Recall: 0.81
F1: 0.76

Article13
Accuracy: 0.78
Precision: 0.79
Recall: 0.76
F1: 0.76

Article14
Accuracy: 0.70
Precision: 0.78
Recall: 0.67
F1: 0.68

Article2
Accuracy: 0.73
Precision: 0.76
Recall: 0.73
F1: 0.72

Article3
Accuracy: 0.75
Precision: 0.76
Recall: 0.78
F1: 0.75

Article5
Accuracy: 0.63
Precision: 0.67
Recall: 0.68
F1: 0.63

Article6
Accuracy: 0.74
Precision: 0.74
Recall: 0.72
F1: 0.72

Article8
Accuracy: 0.69
Precision: 0.71
Recall: 0.65
F1: 0.67

Average accuracy score:  0.700 
Average precision score:  0.724 
Average recall score:  0.702 
Average f1 score:  0.687 


##### 70.0% average accuracy for a Support Vector Machine (LinearSVC(C=0.1, max_iter=1500)) with Tfidf vectorizer(ngram_range=(2,3), lowercase=True, max_features=800000, min_df=3) with 10-fold Cross Validation