# Litigation value extraction for OLDP

This is a demo how to use our open data platform and to create your own annotations. 
As example annotation we choice the *litigation value* (Streitwert) that should be available for a large portion of cases (1:1 relation).

Before you start read the following pages:
- General infos: http://de.openlegaldata.io/pages/api/


In [24]:
import json
import re
import locale
import os
import sys

In [25]:
# API key
API_KEY = os.getenv('OLDP_API_KEY', 'ADD_YOUR_KEY_HERE')

if API_KEY == 'ADD_YOUR_KEY_HERE':
    print('Error: API_KEY is not set.\n\nYou can set OLDP_API_KEY as environment variable to add it directly to the code.')
    print('If you do not have your own key yet, you can get it for free on: https://de.openlegaldata.io/accounts/api/')
    sys.exit()


### Install OLDP-Client

```
pip install git+https://github.com/openlegaldata/oldp-sdk-python.git
```

Test API access:

```
curl -X GET https://de.openlegaldata.io/api/annotation_labels/ -H 'Authorization: Token YOUR_API_KEY_HERE'
```

In [26]:
# Import client SDK
from oldp_client import ApiClient, Configuration
from oldp_client.rest import ApiException
from oldp_client.api.cases_api import CasesApi
from oldp_client.api.cases_api import CasesApi

from oldp_client.api.annotation_labels_api import AnnotationLabelsApi
from oldp_client.api.case_annotations_api import CaseAnnotationsApi

# Setup API
config = Configuration()
config.api_key['Authorization'] = API_KEY
config.api_key_prefix['Authorization'] = 'Token'

# If something goes wrong, it is recommened to enable debugging
#config.debug = True

api_client = ApiClient(config)

# Define endpoints
cases_api = CasesApi(api_client)
annotation_labels_api = AnnotationLabelsApi(api_client)
case_annotations_api = CaseAnnotationsApi(api_client)

### Annotations

List all my (private) annotation labels.

In [16]:
# Available annotation labels
res = annotation_labels_api.annotation_labels_list(private=True)

#print(res)

for item in res.results:
    print(item)
    
# Create annotation label
label_id = 1  # created over GUI

{'annotation_value_type': 'int',
 'color': None,
 'created_at': datetime.datetime(2019, 2, 14, 9, 26, 54, 302148, tzinfo=tzlocal()),
 'id': 1,
 'many_annotations_per_label': False,
 'name': 'Streitwert',
 'owner': 'openlegaldata',
 'private': True,
 'slug': 'litigation-value',
 'trusted': 'False',
 'updated_at': datetime.datetime(2019, 2, 14, 10, 28, 23, 990808, tzinfo=tzlocal()),
 'use_marker': False}


### Load cases from dump

Reading a large number of items from the API is slow and can exceed your [throttle limits](https://oldp.readthedocs.io/en/latest/api.html#throttle-rates). Instead it is recommended to download a data dump and do the processing locally. Data dumps can be found [here](https://static.openlegaldata.io/dumps/).

```
# Download and decompress
wget https://static.openlegaldata.io/dumps/de/2019-02-19_oldp_cases.json.gz
gzip -d 2019-02-19_oldp_cases.json.gz
```

Next, we can read the dump file line-by-line:

In [27]:
# Load cases from dump file (alternatively we could get cases from API: search for "Streitwert")
file_path = 'data/dumps/cases.json'
n = 1000
cases = []

with open(file_path, 'r') as f:
    head = [next(f) for x in range(n)]

    for case_json in head:
        
        case = json.loads(case_json)
        if 'Streitwert' in case['content']:
            cases.append(case)


In [29]:
# ALTERNATIVE APPROACH (not in use)
# Use search API to find cases with `Streitwert`

# WARNING: search result does contain the content only as plain text not as HTML.

for c in cases_api.cases_search_list(text='streitwert').results:
    print(c.slug)
    # c['text']



bfh-2010-08-20-v-e-209
bfh-2011-11-17-iv-s-1510
bgh-2015-07-29-iv-zr-4515
bgh-2016-02-16-x-zr-11013
bfh-2016-07-19-iv-e-216
bfh-2012-11-29-iv-e-712
bfh-2015-01-22-iv-s-1714
bfh-2015-11-12-iv-e-815
bfh-2016-04-06-iv-e-915
bgh-2016-09-15-i-zr-2416


In [28]:
# Extract `Streitwert` from case content

clean_html_pattern = re.compile('<.*?>')

# Set German locale (for number parsing)
locale.setlocale(locale.LC_ALL, 'de_DE.UTF8')

#c = cases[4]
for i, c in enumerate(cases):

    print('%#i - %s (id: %i)' % (i, c['slug'], c['id']))

    cc = c['content']
    cc = re.sub(clean_html_pattern, '', cc)
    # Der Streitwert wird auf 2.500,- Euro festgesetzt.
    # Der Wert des Streitgegenstandes wird auf 2.500 Euro festgesetzt.
    # Der Streitwert wird für jedes Verfahren auf 5.000,- € festgesetzt.


    pattern = re.compile('Der Streitwert wird auf ([\s0-9-\.,]*?) (EUR|Euro|€) festgesetzt')

    for m in pattern.finditer(cc):
        print(m)
        print(m.group(1))
        value = m.group(1).rstrip('-')
        
        try:
            print(locale.atof(value))
        except ValueError:
            print('Cannot parse value: %s' % value)
            
    print('--')
#cases[0]['content']
#print(cc)

0 - vg-gelsenkirchen-2018-12-19-8-l-218418 (id: 116768)
<re.Match object; span=(364, 412), match='Der Streitwert wird auf 2.500,- Euro festgesetzt'>
2.500,-
2500.0
--
1 - lg-dusseldorf-2018-12-18-9-s-118235-c-10817 (id: 125216)
<re.Match object; span=(20469, 20515), match='Der Streitwert wird auf 4.405,96 € festgesetzt'>
4.405,96
4405.96
--
2 - vg-aachen-2018-12-14-3-l-102818 (id: 125219)
--
3 - vg-minden-2018-12-13-10-nc-318 (id: 116788)
--
4 - vg-minden-2018-12-12-10-l-103818 (id: 116796)
--
5 - vg-koln-2018-12-11-5-k-223818 (id: 116811)
--
6 - vg-munster-2018-12-06-9-l-80818 (id: 116823)
<re.Match object; span=(87, 136), match='Der Streitwert wird auf 5.000,00 Euro festgesetzt>
5.000,00
5000.0
--
7 - vg-koln-2018-12-04-2-k-749518 (id: 116834)
--
8 - vg-koln-2018-12-04-25-k-724315 (id: 116833)
--
9 - vg-dusseldorf-2018-12-04-9-l-322218 (id: 116832)
<re.Match object; span=(191, 240), match='Der Streitwert wird auf 3.750,-\xa0 Euro festgese>
3.750,- 
Cannot parse value: 3.750,- 
--
10 

In [21]:
# Send data back to API
# - save ligitation value as new annotation
data = {
    'belongs_to': 116768, # Case id
    'label': label_id,
    'value_int': 2500
}

res = case_annotations_api.case_annotations_list(belongs_to=data['belongs_to'], label=label_id, limit=1)

if len(res.results) == 1:
    # Delete old one
    res = case_annotations_api.case_annotations_delete(id=res.results[0].id)
    
    print('Old annotation deleted!')
    
# Annotation does not exist, create new
res = case_annotations_api.case_annotations_create(data)
    
print(res)


{'belongs_to': 116768,
 'created_at': datetime.datetime(2019, 2, 18, 14, 47, 47, 635254, tzinfo=tzlocal()),
 'id': 4,
 'label': 1,
 'updated_at': datetime.datetime(2019, 2, 18, 14, 47, 47, 635311, tzinfo=tzlocal()),
 'value_int': 2500,
 'value_str': None}


 
