# Wissensaggregator Mittelalter und frühe Neuzeit

## Strukturdaten einlesen/aktualisieren

Inhalt

- [Länder einlesen](#Länder-einlesen)
- [Verwaltungsebenen einlesen](#Verwaltungsebenen-einlesen)
- [Orte aus GeoNames einlesen](#Orte-aus-GeoNames-einlesen)
  - [Fremsprachliche Namen](#Fremdsprachliche-Namen)
  - [Deutsche Namen eintragen](#Deutsche-Namen-eintragen)

Die Funktionen, welche Daten laden und umwandeln sind im Modul `WiagDataSetup` enthalten. Dieses verwendet seinerseits folgende Module:

- MySQL
- DataFrames
- JSON
- Infiltrator

Diese werden am besten vorab direkt in einem Julia-Terminal installiert.
``` julia
cd("C:\\Users\\Georg\\Documents\\projekte\\WiagDataSetup.jl")
Pkg.activate(".")
Pkg.add("MySQL")
Pkg.add("DataFrames")
Pkg.add("JSON")
Pkg.add("Infiltrator")
```

Pfad zum Modul `WiagDataSetup`

In [1]:
wds_path="../.."

"../.."

In [2]:
cd(wds_path)

In [3]:
using Pkg

In [4]:
Pkg.activate(".")

[32m[1m  Activating[22m[39m project at `C:\Users\georg\Documents\projekte\WiagDataSetup.jl`


Nur für die Entwicklung des Moduls relevant.

In [5]:
using Revise

Modul laden

In [6]:
using WiagDataSetup; Wds=WiagDataSetup

WiagDataSetup

Verbinde die Datenbank.

Im Verlauf der Einleseschritte kann die folgende Fehlermeldung erscheinen:
*Commands out of sync; you can't run this command now*.
In diesem Fall diesen Befehl nochmal absetzen.

In [9]:
Wds.setDBWIAG(user="georg", db="wiag2")

Passwort für User georg: ········


MySQL.Connection(host="127.0.0.1", user="georg", port="3306", db="wiag2")

Verzeichnis für Basisdaten, z.b. SKOS-Schemes, Länder

In [10]:
data_path="C:\\Users\\georg\\Documents\\projekte-doc\\WiagDataSetup"

"C:\\Users\\georg\\Documents\\projekte-doc\\WiagDataSetup"

## Länder einlesen
Die Quelle enthält im Wesentlichen Deutschland und seine Nachbarn. Die numerische ID ist der numerische ISO-Code.

Quelle: https://download.geonames.org/export/dump/countryInfo.txt

In [None]:
using CSV, MySQL, DataFrames

In [None]:
country_file=joinpath(data_path, "GeoNames", "countryInfo.txt")

In [None]:
df_country = CSV.read(country_file, DataFrame);

In [None]:
first(df_country, 5)

In [None]:
df_country[end-5:end, [Symbol("ISO-Numeric"), :Country]]

In [None]:
select_cols = [
    Symbol("ISO-Numeric") => :id,
    :Country => :name,
    :ISO => :country_code,
    :Capital => :capital
]

In [None]:
df_country_db = select(df_country, select_cols...);

In [None]:
Wds.filltable!("country", df_country_db, clear_table = true)

GeoNames-IDs ergänzen

Quelle: `wiag_bundeslaender_normdaten_lhofman.xls` übertragen in `country.csv`

In [None]:
using MySQL, DataFrames, CSV

In [None]:
cy_file=joinpath(data_path, "csv", "country.csv")

In [None]:
df_cy=CSV.read(cy_file, DataFrame)

In [None]:
sql="DROP TABLE IF EXISTS country_gn_id"
DBInterface.execute(Wds.dbwiag, sql);

In [None]:
sql="CREATE TEMPORARY TABLE country_gn_id (" *
"id INT," *
"country_code VARCHAR(31)," *
"country_code_3 VARCHAR(31)," *
"geonames_id INT)"
DBInterface.execute(Wds.dbwiag, sql);

In [None]:
Wds.filltable!("country_gn_id", df_cy[!, [:id, :country_code, :country_code_3, :geonames_id]])

In [None]:
sql="UPDATE country AS cy, (SELECT id, country_code, country_code_3, geonames_id FROM country_gn_id) as gn " *
"SET cy.country_code_3 = gn.country_code_3, cy.geonames_id = gn.geonames_id " *
"WHERE cy.id = gn.id"
DBInterface.execute(Wds.dbwiag, sql);

externe IDs ergänzen  
2022-04-08: Es liegen nur externe IDs für Länder vor. Diese werden nicht übernommen; bzw. wieder gelöscht.

In [None]:
cei_file=joinpath(data_path, "csv", "country_id_external.csv")

In [None]:
df_cei=CSV.read(cei_file, DataFrame, types=[Int, String, String, String, Int, String]);

In [None]:
size(df_cei)

In [None]:
df_cei[1:7, :]

In [None]:
Wds.filltable!("place_id_external", df_cei[!, [:geonames_id, :authority_id, :value]])

externe IDs für Bundesländer ergänzen

Quelle: Quelle: `wiag_bundeslaender_normdaten_lhofman.xls` übertragen in `country_level_1_id_external.csv`

In [None]:
cei_l1_file=joinpath(data_path, "csv", "country_level_1_id_external.csv")

In [None]:
df_cei_l1=CSV.read(cei_l1_file, DataFrame, types=[Int, String, String, String, Int, String]);

In [None]:
df_cei_l1[1:7, :]

In [None]:
Wds.filltable!("place_id_external", df_cei_l1[!, [:geonames_id, :authority_id, :value]])

Kontrolle

``` sql
SELECT name, country_code, authority_id, value 
FROM place_id_external AS pei
JOIN country_level_1 AS cl1 ON pei.geonames_id = cl1.geonames_id;
```

Es fehlen Hamburg, Bremen

## Verwaltungsebenen einlesen
Die Quelle enthält die oberste Verwaltungsebene nach dem Land selbst, also für Deutschland die Bundesländer, für Frankreich die Regionen. Die Quelle umfasst alle Daten in GeoNames und wird daher gefiltert.

In [None]:
using DataFrames; using CSV

In [None]:
cl1_file=joinpath(data_path, "Geonames", "admin1CodesASCII.txt")

In [None]:
df_cl1=CSV.read(cl1_file, DataFrame, header=["code", "name", "ascii_name", "geonames_id"]);

In [None]:
first(df_cl1, 7)

Länder Code extrahieren

In [None]:
get_country_code(code)=split(code::AbstractString, ".")[1]

In [None]:
df_cl1[!, :country_code] .= get_country_code.(df_cl1[!, :code]);

In [None]:
first(df_cl1, 7)

Code der Verwaltungsebene extrahieren

In [None]:
get_admin1_code(code)=split(code::AbstractString, ".")[2]

In [None]:
df_cl1[!, :admin1_code] .= get_admin1_code.(df_cl1[!, :code]);

Länder auslesen

In [None]:
using MySQL

In [None]:
sql = "SELECT id as country_id, country_code FROM country " * 
"WHERE country_code in (SELECT distinct(country_code) FROM place)"
df_country_m = DBInterface.execute(Wds.dbwiag, sql) |> DataFrame;

In [None]:
delete!(df_country_m, 16)

In [None]:
df_ccl1 = leftjoin(df_country_m, df_cl1, on = :country_code);

Index ergänzen

In [None]:
df_ccl1[!, :id] .= 1:size(df_ccl1, 1);

In [None]:
first(df_ccl1[!, [:id, :country_id, :country_code, :name]], 5)

In [None]:
size(df_ccl1)

In [None]:
import_cols = [:id, :country_id, :country_code, :admin1_code, :name, :ascii_name, :geonames_id]

In [None]:
names(df_ccl1)

In [None]:
Wds.filltable!("country_level_1", select(df_ccl1, import_cols), clear_table=true)

## Orte aus GeoNames einlesen
Geonames stellt eine Sammlung aller Objekte (features) zur Verfügung. Alle Objekte eines Landes sind in jeweils einer Datei enthalten.
https://download.geonames.org/export/dump/

Lies Daten zu 
- Deutschland
- Nachbarländer
- Baltikum
- Kroatien
- Kaliningrad/Königsberg

Ländercodes: DE, NL, BE, FR, IT, CH, AT, DK, PL, LU, CZ, LI, LV, LT, HR, EE, RU

Pfad zu den Daten

In [None]:
gn_path = "C:\\Users\\georg\\Documents\\projekte-doc\\WiagDataSetup\\GeoNames"

Wähle aus den Features in den Daten von GeoNames nur Orte (P) aus, sowie Klöster (MSTY) und Konvente (CVNT).
Siehe http://www.geonames.org/export/codes.html.

Filterfunktion definieren

In [None]:
function filter_places(feature_class, feature_code)
    return ((!ismissing(feature_class) && feature_class == "P")
            || (!ismissing(feature_code) && feature_code in ("MSTY", "CVNT")))
end

Spaltennamen und Spaltentypen

Länder auslesen

In [None]:
using MySQL
using DataFrames
using CSV

In [None]:
df_country_m = DBInterface.execute(Wds.dbwiag, "SELECT id as country_id, country_code FROM country") |> DataFrame;

In [None]:
first(df_country_m, 5)

In [None]:
gn_header = ["geonames_id", "name", "asciiname", "alternatenames",
             "latitude", "longitude", "feature_class", "feature_code",
             "country_code", "cc2", "admin1_code", "admin2_code", "admin3_code", "admin4_code",
             "population", "elevation", "dem", "timezone", "modification_date"];

gn_types = [Int, String, String, String, Float64, Float64,
            String, String, String, String,
            String, String, String, String,
            Int, Int, Int, String, String];

In [None]:
place_cols = [:country_id, :name, :asciiname, :latitude, :longitude, 
        :feature_class, :feature_code, :country_code, :cc2, :admin1_code,
        :population, :elevation, :dem, :timezone, :modification_date, :geonames_id];

### Niederlande

In [None]:
gn_filename = joinpath(gn_path, "NL", "NL.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
size(gn_dff)

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
gn_dff[1:5, [:country_id, :name, :latitude, :longitude, :elevation, :dem, :population]]

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Belgien

In [None]:
gn_filename = joinpath(gn_path, "BE", "BE.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Frankreich

In [None]:
gn_filename = joinpath(gn_path, "FR", "FR.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Italien

In [None]:
gn_filename = joinpath(gn_path, "IT", "IT.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Schweiz

In [None]:
gn_filename = joinpath(gn_path, "CH", "CH.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Österreich

In [None]:
gn_filename = joinpath(gn_path, "AT", "AT.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Dänemark

In [None]:
gn_filename = joinpath(gn_path, "DK", "DK.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Polen

In [None]:
gn_filename = joinpath(gn_path, "PL", "PL.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Deutschland

In [None]:
gn_filename = joinpath(gn_path, "DE", "DE.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
gn_dff[1:5, :]

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Luxemburg

In [None]:
gn_filename = joinpath(gn_path, "LU", "LU.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Tschechien

In [None]:
gn_filename = joinpath(gn_path, "CZ", "CZ.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Liechtenstein

In [None]:
gn_filename = joinpath(gn_path, "LI", "LI.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Estland

Die erste Zeile der Daten kann nicht gelesen werden. Sie wird manuell aus den Quelldaten gelöscht. Das ist unerheblich, weil es sich nicht um ein relevantes Feature handelt.

In [None]:
gn_filename = joinpath(gn_path, "EE", "EE-x1.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
size(gn_dff)

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Lettland

In [None]:
gn_filename = joinpath(gn_path, "LV", "LV.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Litauen

In [None]:
gn_filename = joinpath(gn_path, "LT", "LT.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Kroatien

In [None]:
gn_filename = joinpath(gn_path, "HR", "HR.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff = rightjoin(df_country_m, gn_dff, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff[!, place_cols], clear_table = false)

### Königsberg - Russland

In [None]:
gn_filename = joinpath(gn_path, "RU", "RU.txt")

In [None]:
gn_df = CSV.read(gn_filename, DataFrame, header=gn_header, types=gn_types);

filtern nach Art des Features

In [None]:
gn_dff = filter([:feature_class, :feature_code] => filter_places, gn_df);

In [None]:
gn_dff[1:5, [:geonames_id, :name, :country_code, :population]]

In [None]:
size(gn_dff)

Extrahiere Königsberg

In [None]:
gn_dff_kb = filter(:name => isequal("Kaliningrad"), gn_dff)

Spalte für die Länder_ID einfügen

In [None]:
gn_dff_kb = rightjoin(df_country_m, gn_dff_kb, on = :country_code);

In [None]:
Wds.filltable!("place", gn_dff_kb[!, place_cols], clear_table = false)

In [None]:
DBInterface.execute(Wds.dbwiag, "SELECT COUNT(*) FROM place") |> DataFrame

### Fremdsprachliche Namen

In [17]:
using CSV, DataFrames, MySQL

In [18]:
gnl_path = "C:\\Users\\georg\\Documents\\projekte-doc\\WiagDataSetup\\GeoNames\\alternatenames"

"C:\\Users\\georg\\Documents\\projekte-doc\\WiagDataSetup\\GeoNames\\alternatenames"

In [19]:
gnl_header = ["id", "geonames_id", "lang", "label", 
              "is_preferred", "isShort", "isColloquial", "is_historic", "from", "to"];

gnl_types = [Int, Int, String, String,
             Int, Int, Int, Int, String, String];

In [20]:
lang_codes = ["la", "fr", "cz", "de", "pl", "en", "nl", "it"];

In [21]:
filter_lang(lc) = !ismissing(lc) && lc in lang_codes

filter_lang (generic function with 1 method)

In [22]:
country_codes = ["DE", "NL", "BE", "FR", "IT", "CH", "AT", "DK",
    "PL", "LU", "CZ", "LI", "LV", "LT", "HR", "EE", "RU"];

### Schleife über die Länder

Orte einlesen, um nur relevante Namen zu übernehmen

In [287]:
sql = "SELECT id as place_id, name, geonames_id " *
      "FROM place WHERE place_type_id = 1"
p_df = DBInterface.execute(Wds.dbwiag, sql) |> DataFrame;

In [290]:
size(p_df)

(391240, 3)

In [291]:
function labels_by_country(cc)
    gnl_filename = joinpath(gnl_path, cc, cc * ".txt")
    gnl_df = CSV.read(gnl_filename, DataFrame, header=gnl_header, types=gnl_types);
    gnl_df = filter(:lang => filter_lang, gnl_df);
    gnl_p_df = innerjoin(gnl_df, p_df, on = :geonames_id);
    @info cc
    n_cc = Wds.filltable!("place_label", select(gnl_p_df, Not([:isShort, :isColloquial, :from, :to, :name])))    
end
    

labels_by_country (generic function with 1 method)

In [294]:
labels_by_country.(country_codes)

┌ Info: DE
└ @ Main In[291]:6
┌ Info: Rows inserted: 4364
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1209
┌ Info: NL
└ @ Main In[291]:6
┌ Info: Rows inserted: 1848
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1209
┌ Info: BE
└ @ Main In[291]:6
┌ Info: Rows inserted: 862
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1209
┌ Info: FR
└ @ Main In[291]:6
┌ Info: 10000
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1186
┌ Info: 20000
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1186
┌ Info: 30000
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1186
┌ Info: Rows inserted: 33108
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1209
┌ Info: IT
└ @ Main In[291]:6
┌ Info: 10000
└ @ WiagDataSetup

17-element Vector{Int64}:
  4364
  1848
   862
 33108
 32162
  6277
   728
   669
  3859
   203
  1240
    72
   529
   787
   481
   663
    11

### Deutsche Namen eintragen

Trage in `place` für Orte in Deutschland, Österreich und der Schweiz den deutschen Namen ein.

In [None]:
using MySQL, DataFrames

In [None]:
db_exec(sql) = DBInterface.execute(Wds.dbwiag, sql) |> DataFrame

In [None]:
sql = "SELECT label, p.name FROM place_label AS pl " *
"JOIN place AS p ON pl.geonames_id = p.geonames_id " *
"WHERE pl.lang = 'de' AND p.country_code = 'DE' " *
"AND pl.label <> p.name " *
"LIMIT 12"
df_name_udt = db_exec(sql)

In [None]:
sql = "SELECT count(*) FROM place_label AS pl " *
"JOIN place AS p ON pl.geonames_id = p.geonames_id " *
"WHERE pl.lang = 'de' AND p.country_code = 'DE' " *
"AND pl.label <> p.name "
n = db_exec(sql)

Es scheint im Allgemeinen keine gute Idee zu sein, generell den deutschen Eintrag aus `place_label` zu übernehmen. Man wird einzelne Orte evtl. redaktionell bearbeiten müssen, indem man einen deutschen Namen in `place_label` als bevorzugten Namen auszeichnet.

## Neue Organisation der Orte
Unterscheide Orte nach ihren Quellen (analog zu Items) (Tabelle `place_type`)

### Tabelle `place_type`
über DbSchema

In [None]:
out_path = "C:\\Users\\georg\\Documents\\projekte-doc\\WiagDataSetup\\data_sql"

In [11]:
using DataFrames, Dates

In [12]:
df_place_type = DataFrame();

In [13]:
insertcols!(df_place_type,
    :id => [1],
    :name => ["Ort GN"],
    :note => ["Orte aus GeoNames"],
    :created_by => 7,
    :date_created => now(),
    :changed_by => 7,
    :date_changed => now(),
    :table_name => "place",
    :name_app => "place",
)

Unnamed: 0_level_0,id,name,note,created_by,date_created,changed_by
Unnamed: 0_level_1,Int64,String,String,Int64,DateTime,Int64
1,1,Ort GN,Orte aus GeoNames,7,2022-04-11T09:06:26.483,7


In [14]:
rec_place_ut = (
    id = 2,
    name = "Ort Utrecht",
    note = "Orte der Priester aus Utrecht",
    created_by = 7,
    date_created = now(),
    changed_by = 7,
    date_changed = now(),
    table_name = "place",
    name_app = "place_ut",
)

(id = 2, name = "Ort Utrecht", note = "Orte der Priester aus Utrecht", created_by = 7, date_created = DateTime("2022-04-11T09:06:29.865"), changed_by = 7, date_changed = DateTime("2022-04-11T09:06:29.865"), table_name = "place", name_app = "place_ut")

In [15]:
push!(df_place_type, rec_place_ut)

Unnamed: 0_level_0,id,name,note,created_by,date_created
Unnamed: 0_level_1,Int64,String,String,Int64,DateTime
1,1,Ort GN,Orte aus GeoNames,7,2022-04-11T09:06:26.483
2,2,Ort Utrecht,Orte der Priester aus Utrecht,7,2022-04-11T09:06:29.865


In [16]:
table_name = "place_type";
Wds.filltable!(table_name, df_place_type)

┌ Info: Rows inserted: 2
└ @ WiagDataSetup C:\Users\georg\Documents\projekte\WiagDataSetup.jl\src\WiagDataSetup.jl:1209


2

`place_type_id` nachtragen
```sql
UPDATE place SET place_type_id = 1;
```

`id_in_source` nachtragen
```sql
UPDATE place SET id_in_source = geonames_id;
```

`geonames_id` als Index ersetzen durch `place_id`

```sql
UPDATE place_label AS pll, (SELECT id, geonames_id FROM place) as p
SET pll.place_id = p.id
WHERE pll.geonames_id = p.geonames_id;
```

Einträge in `place_label` löschen, die sich auf Länder, Kantone, Bundesländer und also nicht auf Orte beziehen

```sql
DELETE FROM place_label WHERE place_id IS NULL;
```

In `place_label` die Namen aus der GeoNames Ortetabelle übernehmen, wie von bk vorgeschlagen.  
Die Einträge haben dann keine Angabe für die Sprache, weil die Angaben in der Quelle fehlt.

```sql
UPDATE place_label SET is_geonames_name = false;
```

```sql
INSERT INTO place_label (SELECT NULL, geonames_id, name, NULL, 0, 0, id, 1 FROM place where place_type_id = 1);
```