## Adding missing information to JSON files.

This little Jupyter Notebook can be used to add missing information to already existing GeoJSON files from our repository. It was and will further be applied during our [semi-automatic post-processing pipeline](../README.md#manual-post-correction-for-geojson-files).

Two things can be done:
* Sending requests to the Geonames API for retrieving additional information. In this case, we mainly focused on obtaining the features' URLs and coordinates but others can be added. Please consider: Geonames has an API rate limit of 1000 call per hour.
* Correcting entries from the GeoJSONs. This part focuses frequent errors in entries from the files that can then be half-automatically corrected using a manual input in form of a dictionary with the missing information. More about this can be read in the DOCSTRING of the function. An example when this would be useful:
    * In the file Z114800707.json, the feature "constantinopel" could not be found via the Geonames API. However, a manual research delivers results for Istanbul (also called Constantinople/Konstantinopel when it was capital of the Roman Empire). The results can be saved in a Python dictionary of the following sort:
      ``` {"rodda": {"url": "https://www.geonames.org/350173", "coordinates": ["31.22598", "30.02255"]}, "dsjise": {"url": "https://www.geonames.org/360995", "coordinates": ["31.20861", "30.00944"]}, "masr el atik": {"url": "https://www.geonames.org/360630", "coordinates": ["31.24967","30.06263"]}} ```
    The script will then overwrite the information for all occurrences.

We used this [format](https://en.wikipedia.org/wiki/GeoJSON) for creating and completing the GeoJSON files.

In [2]:
import json
import glob
import os

import geocoder

In [1]:
METADATA_NER_PATH: str = '../data/output/titles_ner_tagged_jsons/'
TEXT_NER_PATH: str = '../data/output/text_ner/'

In [3]:
def add_info(multiple_files: bool = True,
             filename: str = "Z114800707",
             geonames_request: bool = True,
             correct_entries: bool = False,
             corrections: dict = None,
             ) -> None:
    """
    Function for adding additional information to existing GeoJSON files or correcting multiple entries at once.
    :param multiple_files: Boolean value that defines whether multiple files at ones should be corrected or only one.
    :param filename: Name of the GeoJSON file that needs to be corrected (only if multiple_files is False). Default is one of the files @sarahondraszek corrected.
    :param geonames_request: Boolean value that defines whether an API request @ Geonames should be performed. Default is True.
    :param correct_entries: Boolean value defining whether entries should be corrected. Default is False.
    :param corrections: Dictionary of all entries that need to be corrected. Default value is None.
    Format is the following: {source_label: {"url": "XYZ", "coordinates": [null, null]}, ...}
    :return: None. File will be saved under the same alias as before.
    """

    if corrections is None:
        corrections = {"oberegypten": {"url": "https://www.geonames.org/359888", "coordinates": ["32", "26"]}}

    if multiple_files:
        file_indicator = "*"
    else:
        file_indicator = filename

    for file in glob.glob(TEXT_NER_PATH + file_indicator + '.json'):
        with open(file, 'r') as f:
            json_file = json.load(f)
            for feature in json_file["features"]:
                feature_label = feature["properties"]["source_label"]
                if geonames_request:
                    # With GeoNames URL placeholder
                    if feature["properties"]["url"] == "https://www.geonames.org/None":
                        g = geocoder.geonames(feature_label, key='kartriert')
                        feature["properties"]["url"] = "https://www.geonames.org/" + str(g.geonames_id)
                        feature["geometry"]["coordinates"] = [g.lng, g.lat]

                if correct_entries:
                    for label in corrections.keys():
                        if feature_label == label:
                            feature["properties"]["url"] = corrections[label]["url"]
                            feature["geometry"]["coordinates"] = corrections[label]["coordinates"]

                json_dump = json.dumps(json_file, indent=4)
                with open('../data/output/text_ner/with_url/' + os.path.basename(file), 'w') as f_w:
                    f_w.write(json_dump)

In [4]:
# Please adjust the execution of the function according to your needs.
add_info(
    multiple_files=False,
    geonames_request=False,
    correct_entries=True,
    corrections={"mochha": {"url": "https://www.geonames.org/7086188", "coordinates": ["73.36598", "33.71494"]},
                 }
)