In [1]:
## imports
import requests
from pathlib import Path
%reload_ext restmagic

### ES analyzer for polish documents

```json
{
  "settings": {
    "analysis": {
      "analyzer": {
        "nlp-analyzer": {
          "type": "morfologik",
          "tokenizer": "standard",
          "filter": [
            "morfologik_stem",
            "synonym",
            "lowercase"
          ]
        }
      },
      "filter": {
        "synonym": {
        "type": "synonym",
        "synonyms_path": "nlp_synonimy.txt"
  }
    }
  }
}

```
gdzie zawartość pliku _nlp_synonimy.txt_ to:
```
kpk => kodeks postępowania karnego
kpc => kodeks postępowania cywilnego
kk => kodeks karny
kc => kodeks cywilny
```

### create index for polish bills

In [2]:
%rest GET http://localhost:9200

{
  "name": "macbook-pro-filip.home",
  "cluster_name": "elasticsearch_filip",
  "cluster_uuid": "6P5ejRjlSzOlnCAg2q-hDA",
  "version": {
    "number": "7.10.1",
    "build_flavor": "default",
    "build_type": "tar",
    "build_hash": "1c34507e66d7db1211f66f3513706fdf548736aa",
    "build_date": "2020-12-05T01:00:33.671820Z",
    "build_snapshot": false,
    "lucene_version": "8.7.0",
    "minimum_wire_compatibility_version": "6.8.0",
    "minimum_index_compatibility_version": "6.0.0-beta1"
  },
  "tagline": "You Know, for Search"
}

<Response [200]>

In [18]:
%%rest PUT http://localhost:9200/ustawy
Content-Type: application/json

{
  "settings": {
    "analysis": {
      "analyzer": {
        "nlp-analyzer": {
          "type": "morfologik",
          "tokenizer": "standard",
          "filter": [
            "morfologik_stem",
            "synonym",
            "lowercase"
          ]
        }
      },
      "filter": {
        "synonym": {
        "type": "synonym",
        "synonyms_path": "nlp_synonimy.txt"
  }
    }
  }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "nlp-analyzer"
      }
    }
  }
}

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "index": "ustawy"
}

<Response [200]>

In [29]:
%%rest GET http://localhost:9200/ustawy/_analyze
Content-Type: application/json
    
{
    "text": "Ustawa o nowym polskim ładzie jest do reformy"
}

{
  "tokens": [
    {
      "token": "ustawa",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "o",
      "start_offset": 7,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "kk",
      "start_offset": 9,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "i",
      "start_offset": 12,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "nowym",
      "start_offset": 14,
      "end_offset": 19,
      "type": "<ALPHANUM>",
      "position": 4
    },
    {
      "token": "polskim",
      "start_offset": 20,
      "end_offset": 27,
      "type": "<ALPHANUM>",
      "position": 5
    },
    {
      "token": "ładzie",
      "start_offset": 28,
      "end_offset": 34,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}

<Response [200]>

### Load all bills into ES 

In [47]:
root_dir = "../../"
files = Path(root_dir, "ustawy").glob("*")

for f in files:
    with open(f, "r", encoding="UTF-8") as ff:
        content = ff.read()
    
    text_id = f.stem
    
    requests.put(f"http://localhost:9200/ustawy/_doc/{text_id}", json={"content": content})

In [31]:
root_dir = "../../"
files = Path(root_dir, "ustawy").glob("*")

In [46]:
a = next(files)
a.

'.txt'

### acts containing word ustawa in any form 

In [3]:
%%rest GET http://localhost:9200/ustawy/_search?filter_path=hits.total.value
Content-Type: application/json
    
{
  "query": {
    "match": {
      "content": "ustawa"
    }
  }
}

{
  "hits": {
    "total": {
      "value": 1178
    }
  }
}

<Response [200]>

### search for **ustawa** form of word ustawa

using termvectors here and providing an artificial doc consisting of this one particular word I want to get count of

In [4]:
%%rest GET http://localhost:9200/ustawy/_termvectors
Content-Type: application/json

{
  "doc" : {
    "content": "ustawa"
  },
  "offsets": false,
  "positions": false,
  "field_statistics": false,
  "term_statistics": true
}

{
  "_index": "ustawy",
  "_type": "_doc",
  "_version": 0,
  "found": true,
  "took": 42,
  "term_vectors": {
    "content": {
      "terms": {
        "ustawa": {
          "doc_freq": 1178,
          "ttf": 24934,
          "term_freq": 1
        }
      }
    }
  }
}

<Response [200]>

### search for **ustaw** form of word ustawa

In [5]:
%%rest GET http://localhost:9200/ustawy/_termvectors
Content-Type: application/json

{
  "doc" : {
    "content": "ustaw"
  },
  "offsets": false,
  "positions": false,
  "field_statistics": false,
  "term_statistics": true
}

{
  "_index": "ustawy",
  "_type": "_doc",
  "_version": 0,
  "found": true,
  "took": 1,
  "term_vectors": {
    "content": {
      "terms": {
        "ustawa": {
          "doc_freq": 1178,
          "ttf": 24934,
          "term_freq": 1
        },
        "ustawić": {
          "doc_freq": 378,
          "ttf": 913,
          "term_freq": 1
        }
      }
    }
  }
}

<Response [200]>

The same results as above which makes sense because I am not puttiing restrictions on other inflectional forms

### search for phrase **kodeks postępowania cywilnego** in this specified order

In [6]:
%%rest GET http://localhost:9200/ustawy/_search?filter_path=hits.total.value
Content-Type: application/json

{
  "query": {
    "match_phrase": {
      "content": "kodeks postępowania cywilnego"
    }
  }
}

{
  "hits": {
    "total": {
      "value": 99
    }
  }
}

<Response [200]>

### search for acts containing **whodzi w życie** in any form (up to 2 additional words in search phrase)

In [7]:
%%rest GET http://localhost:9200/ustawy/_search?filter_path=hits.total.value
Content-Type: application/json

{
  "query": {
    "match_phrase": {
      "content": {
        "query": "wchodzi w życie",
        "slop": 2
      }
    }
  }
}

{
  "hits": {
    "total": {
      "value": 1174
    }
  }
}

<Response [200]>

### find 10 documents that are most relevant for **konstytucja** phrase

In [12]:
%%rest -q GET http://localhost:9200/ustawy/_search?filter_path=hits.hits._id,hits.hits._score
Content-Type: application/json

{
  "query": {
    "match": {
      "content": "konstytucja"
    }
  },
  "size": 10
}

<Response [200]>

In [14]:
results = _.json()["hits"]["hits"] 
results = [(d["_id"], d["_score"]) for d in results]
results.sort(key=lambda it: it[1], reverse=True)
results

[('1997_629', 6.869184),
 ('2000_443', 6.663479),
 ('1997_604', 6.632288),
 ('1996_350', 6.6273947),
 ('1997_642', 6.2522817),
 ('2001_23', 6.056855),
 ('1996_199', 5.9267144),
 ('1999_688', 5.848894),
 ('2001_1082', 5.4653444),
 ('1997_681', 5.4653444)]

In [17]:
%%rest GET http://localhost:9200/ustawy/_search?filter_path=hits.hits.highlight
Content-Type: application/json

{
  "query": {
    "match": {
      "content": "konstytucja"
    }
  },
  "highlight": {
    "fields": {
      "content": {
        "number_of_fragments": 3
      }
    }
  },
  "size": 10
}

{
  "hits": {
    "hits": [
      {
        "highlight": {
          "content": [
            "o zmianie ustawy konstytucyjnej o trybie przygotowania\n           i uchwalenia <em>Konstytucji</em> Rzeczypospolitej",
            "W ustawie  konstytucyjnej z  dnia 23 kwietnia 1992 r. o trybie przygotowania i \nuchwalenia <em>Konstytucji</em>",
            "Do zgłoszenia projektu <em>Konstytucji</em> załącza się wykaz \n                obywateli popierających zgłoszenie"
          ]
        }
      },
      {
        "highlight": {
          "content": [
            "umowy międzynarodowej i nie wypełnia przesłanek określonych w art. 89\n     ust. 1 lub art. 90 <em>Konstytucji</em>",
            "międzynarodowej lub załącznika nie\n     wypełnia przesłanek określonych w art. 89 ust. 1 lub art. 90 <em>Konstytucji</em>",
            "co do zasadności wyboru\n  trybu ratyfikacji umowy międzynarodowej, o którym mowa w art. 89 ust. 2\n  <em>Konstytucji</em>"
          ]
        }
      },
      

<Response [200]>