# Get Geographical Data of Every German County
Read the file ["Readme.ipynb"](Readme.ipynb) for more information.

## Modules
Needed to use non-Python functionalities already programmed by someone else.

In [1]:
# Used to convert the json from the API into a Python list
from json2xml.utils import readfromurl
import json    # to save the data in "json"-format in a file
# Used to check if their is a local file with the data or if a new API pull is inevitable
import os.path

## Control
Set variables to "True" to trigger the action described by the comment and the name of the variable.

In [2]:
# decides whether a new pull from the API is made or if a local backup should be used
counties_geography_new_pull_from_api = True

The variable "number_of_counties" is also set here: It determines how many counties must be present in the data. If there are fewer or more, the current data source is declared a fail and (if possible) another one is used. It could also be set from the file calling this one, the try-except-construct exists for this purpose.

In [3]:
try:
    number_of_counties
except NameError:
    number_of_counties = 412

### Check the Controls
Check if the necessary files exist, otherwise the data must be taken from somewhere else (in this case from the API).

In [4]:
if not(os.path.isfile("unpolished_data/german_counties_geography.txt")):
    counties_geography_new_pull_from_api = True

## Get the Data and Save it in a File
### Pull the Data from the API
A pull from the ["COVID-19 Datenhub"](https://npgeo-corona-npgeo-de.hub.arcgis.com/datasets/917fc37a709542548cc3be077a786c17_0?geometry=-30.805%2C46.211%2C52.823%2C55.839) is initiated if "counties_geography_new_pull_from_api" is set to "True" or if "pulling" from the local backup of an old API-pull is not possible because of missing data. 
<br/>
<br/>
The program raises an error and "pulls" from a local backup of the API if the data does not contain as many counties as requested in the variable "number_of_counties".
<br/>
<br/>
If the unpolished data passes this rudimentary test, it is stored as it is in the file "german_counties_geography.txt" in the folder "unpolished_data". If the folder or the file does not yet exist, it is created.<br/>
This file can be used in further executions as a local backup of the API-pull.

In [5]:
if counties_geography_new_pull_from_api:
    # get the data - the url is kept in this long format to easily copy and check it manually
    counties_geography_raw = readfromurl("https://services7.arcgis.com/mOBPykOjAyBO2ZKk/arcgis/rest/services/RKI_Landkreisdaten/FeatureServer/0/query?where=1%3D1&outFields=*&outSR=4326&f=json")
    # data is maybe faulty: if so report to user and use local backup of old API pull
    if len(counties_geography_raw["features"]) != number_of_counties:
        counties_geography_new_pull_from_api = False
        print("The provided data from the API does not have the preset number of counties of" +
        "{}, it has {}.".format(number_of_counties, len(counties_geography_raw["features"])))
    else:    # data seems to be fine: continue
        # check if the needed directory is available - otherwise create it
        if not(os.path.isdir("unpolished_data")): os.makedirs("unpolished_data")
        with open("unpolished_data/german_counties_geography.txt", "w") as file:
            file.write(json.dumps(counties_geography_raw))
        print("Data directly from API is ready to go!")

Data directly from API is ready to go!


### "Pull" the Data from the Local API Backup
If the use of the data from a local backup of the API-pull is requested and possible,
the data from a local backup of an old API pull is used.
<br/>
The data from the backup is provided without further tests.

In [6]:
if not(counties_geography_new_pull_from_api) and not(counties_geography_use_polished_data):
    with open("unpolished_data/german_counties_geography.txt", "r") as file:
        counties_geography_raw = json.loads(file.read())
    print("Data from old API-pull is ready to go!")

## Initiate Polished Version and Find the Counties with Multiple Polygons
In the following, the required data is saved in the variable "counties_geography" and unnecessary dictionary shells are discarded.
<br/>
### Extract Necessary Data
The data inside the dictionary "counties_geography_raw" is hard to reach and contains unnecessary information. Therefore the new dictionary "counties_geography" collects the following keys and the following kind of data (all packaged in a dictionary reachable through the AdmUnitID of a county):
   - name: Name of the county
   - population: Number of inhabitants from the last official estimate (varies, therefore check out the ["COVID-19 Datenhub"](https://npgeo-corona-npgeo-de.hub.arcgis.com/) for more information). These numbers are also used in the official incidence calculations.
   - area_in_m2: Area of the county in square meters, cannot be calculated from the polygons stored in geometry.

In [7]:
counties_geography = dict()
for county in counties_geography_raw['features']:
    # convert the AdmUnitId to a string to keep uniformity:
    # after saving and loading from a file all keys are strings
    AdmUnitId = str(county['attributes']['AdmUnitId'])
    counties_geography[AdmUnitId] = dict(
        name = county['attributes']['county'],
        population = county['attributes']['EWZ'],
        area_in_m2 = county['attributes']['Shape__Area']
    )

## Calculate the Population Density
The population density is calculated by dividing the population number by the area in square meters. In order to scale it to kilometers, the result is multiplied by 1,000,000.<br/>
The final result is stored in the dictionary "counties_geography".

In [8]:
for county in counties_geography.values():
    county["population_density"] = (county['population'] * 1000000)/county['area_in_m2']

## Save the Polished Data in a File
The polished data is saved in the file "german_counties_geography.txt" in the folder "polished_data", inside the current directory. If the directory does not exist, it is created.

In [9]:
if not(os.path.isdir("polished_data")): os.makedirs("polished_data")
with open("polished_data/german_counties_geography.txt", "w") as file:
    file.write(json.dumps(counties_geography))