# teryt
**teryt** library – efficient search engine for TERC, SIMC and ULIC systems.

# User guide (WIP)
## Step 1: Migrate official CSV databases
If you have not yet downloaded official TERYT databases,
visit [this TERYT website](
https://eteryt.stat.gov.pl/eTeryt/rejestr_teryt/udostepnianie_danych/baza_teryt/uzytkownicy_indywidualni/pobieranie/pliki_pelne.aspx?contrast=default
) and download SIMC, TERC and ULIC databases in `.csv` format.

Warning: Only `.csv` extensions and official (neither address nor statistical) versions are supported.

Now, place SIMC database file in `teryt/data/SIMC`, and so on with other systems.
Make sure there's only one file in every `teryt/data/<SYSTEM>` directory. 

## Step 2: Initialize TERYT systems
All systems can be initialized as below:

In [1]:
import teryt
simc = teryt.Simc()  # could also be SIMC or simc
terc = teryt.TERC()
ulic = teryt.ulic()

## Step 3: Usage!

### Searching system entries
To search for entries, simply use system's `System.search` method.
For example:

In [2]:
terc.search("Warszawa")

Unnamed: 0,index,WOJ,POW,GMI,RODZ,NAZWA,NAZWA_DOD,STAN_NA
0,2057,14,65,,,Warszawa,"miasto stołeczne, na prawach powiatu",2021-01-01
1,2058,14,65,1.0,1.0,Warszawa,"gmina miejska, miasto stołeczne",2021-01-01


### Search keywords
`System.search` accepts plenty of keyword arguments for searching,
called **fields**,  which are the source database's columns (called **roots**) representants.

#### `ULIC` secname (`str`)
Second name of a street.

Example:

In [3]:
k = ulic.search(secname="Księcia")
k

Unnamed: 0,index,WOJ,POW,GMI,RODZ_GMI,SYM,SYM_UL,CECHA,NAZWA_1,NAZWA_2,STAN_NA
0,429,02,14,08,4,0987390,24145,ul.,Henryka Wiernego,Księcia,2021-02-05
1,3235,02,03,01,1,0954082,36208,ul.,Jana II,Księcia,2021-02-05
2,4199,02,64,06,9,0986952,10230,ul.,Witolda,Księcia,2021-02-05
3,5628,06,61,01,1,0922018,10230,ul.,Witolda,Księcia,2021-02-05
4,7603,04,61,01,1,0928363,10230,ul.,Witolda,Księcia,2021-02-05
...,...,...,...,...,...,...,...,...,...,...,...
124,253965,14,61,01,1,0966079,51258,rondo,Siemowita III,Księcia,2021-02-05
125,260928,14,18,04,4,0921438,52928,ul.,Janusza I Starego,Księcia,2021-02-05
126,266940,24,64,01,1,0930868,38139,ul.,Leszka Białego,Księcia,2021-02-05
127,268012,20,62,01,1,0957241,39268,ul.,Stanisława,Księcia,2021-02-05


In [4]:
i1 = k.get_entry(1)  # index 3235
i1

Street(
    name='Jana II', 
    secname='Księcia', 
    terid='0203011', 
    system=ULIC, 
    voivodship=UnitLink(code='02', value='DOLNOŚLĄSKIE', index=0), 
    powiat=UnitLink(code='03', value='głogowski', index=22), 
    gmina=UnitLink(code='01', value='Głogów', index=23), 
    gmitype=Link(code='1', value='gmina miejska'), 
    streettype='ul.', 
    id='36208', 
    integral_id='0954082', 
    date='2021-02-05', 
    index=3235
)

In [5]:
i1.fullname

'ul. Księcia Jana II'

In [6]:
i1.gmina

UnitLink(code='01', value='Głogów', index=23)

In [7]:
i1.gmina.as_unit

Unit(
    name='Głogów', 
    terid='0203011', 
    system=TERC, 
    voivodship=UnitLink(code='02', value='DOLNOŚLĄSKIE', index=0), 
    powiat=UnitLink(code='03', value='głogowski', index=22), 
    gmina=UnitLink(code='01', value='Głogów', index=23), 
    gmitype=Link(code='1', value='gmina miejska'), 
    function='gmina miejska', 
    date='2021-01-01', 
    index=23
)

In [8]:
i1.gmina.as_unit.powiat.as_unit

Unit(
    name='głogowski', 
    terid='0203', 
    system=TERC, 
    voivodship=UnitLink(code='02', value='DOLNOŚLĄSKIE', index=0), 
    powiat=UnitLink(code='03', value='głogowski', index=22), 
    function='powiat', 
    date='2021-01-01', 
    index=22
)

#### `COMMON` date (`str`)
"State as of", the date in `STAN_NA` column.

#### `COMMON` name (`str`)
Name of the searched locality, street or unit.

Examples:

In [9]:
simc.search(name="Oś")

Locality(
    name='Oś', 
    terid='1604032', 
    system=SIMC, 
    voivodship=UnitLink(code='16', value='OPOLSKIE', index=2077), 
    powiat=UnitLink(code='04', value='kluczborski', index=2107), 
    gmina=UnitLink(code='03', value='Lasowice Wielkie', index=2114), 
    gmitype=Link(code='2', value='gmina wiejska'), 
    loctype=Link(code='01', value='wieś'), 
    cnowner=Link(code='1', value=True), 
    id='0497791', 
    integral_id='0497791', 
    date='2021-01-01', 
    index=56979
)

You can also use "match", "contains", "startswith" and "endswith" name searching:

In [11]:
simc.search(match=".{31}")

Unnamed: 0,index,WOJ,POW,GMI,RODZ_GMI,RM,MZ,NAZWA,SYM,SYMPOD,STAN_NA
0,13140,6,6,1,1,99,1,Krakowskie Przedmieście-Kolonia,930118,930070,2021-01-01
1,58704,18,16,11,5,1,1,Wólka Sokołowska k. Wólki Niedźwiedzkiej,1066490,1066490,2021-01-01
2,86411,26,10,4,2,5,1,Leśniczówka Skarżysko Kościelne,992823,252954,2021-01-01
3,97619,30,2,2,2,0,1,Kuźnica Czarnkowska-Wybudowanie,1016138,525062,2021-01-01



#### `SIMC` loctype (`str`)
Locality type.

#### `COMMON` gmina (`str`)
Gmina of the searched locality, street or unit.

#### `COMMON` voivodship (`str`)
Voivodship of the searched locality, street or unit.

#### `TERC` function (`str`)
Unit function.

#### `ULIC` streettype (`str`)
Street type.

#### `COMMON` powiat (`str`)
Voivodship of the searched locality, street or unit.

#### `SIMC` cnowner (`bool`)
States whether a locality owns a common name.
As of 09.02, all Polish localities are "cnowners". Using this keyword 
may result in a kind warning of no uniqueness.

#### `SIMC, ULIC` id (`str`)
ID of a locality or street.

#### `SIMC, ULIC` integral_id (`str`)
Integral ID of a locality/street.

#### `COMMON` gmitype (`str`)
Gmina type of the searched locality, street or unit.

----

Column names as the above listed arguments are also acceptable.
It means that you can pass database's columns names 
(called **root names**) instead of passing the field name.
Fields were involved in order to unify columns of the systems' databases.

### Search results
Results of a search returned from `System.search` are not in fact DataFrame.
It's `Entry` or `EntryGroup`, synced with fields.

That's what you can do with them:


