# Abruf des aktuellen Datenbestands und Export als Excel

Als Input werden viele Data Owner wissen wollen, welche Datensätze von ihnen bereits OGD sind und wie sie beschrieben wurden. Daher soll dieses Notebook mal alle Daten abrufen und als Excel verfügbar machen. Wir hatten dies via SAS mal gehabt. Aktuell gibt es aber nichts dazu.


## Lade Packages

In [21]:
#pip install geopandas altair fiona requests folium mplleaflet contextily seaborn datetime plotly 

In [22]:
import pandas as pd
import numpy as np
import altair as alt
import matplotlib.pyplot as plt
from datetime import datetime
import geopandas as gpd
import folium 
import requests
import urllib3.request, json 

### SSL-Einstellungen

In [23]:
SSL_VERIFY = False
# evtl. SSL_VERIFY auf False setzen wenn die Verbindung zu https://www.gemeinderat-zuerich.ch nicht klappt (z.B. wegen Proxy)
# Um die SSL Verifikation auszustellen, bitte die nächste Zeile einkommentieren ("#" entfernen)
# SSL_VERIFY = False

In [24]:
if not SSL_VERIFY:
    import urllib3
    urllib3.disable_warnings()

### Parameter einstellen

In [25]:
# Execution mode
# 1 = normal mode - with hundrets of ckan-api calls, will take a few minutes
# 2 = test mode - limited to 20 api-calls
# 3 = mapping mode - no api calls, just the fast mapping and image creation 
mode = 2

# general settings
now = datetime.now()
today = now.strftime("%Y-%m-%d")


# api settings
ckanurl = "https://data.stadt-zuerich.ch"
listapi = ckanurl + "/api/3/action/package_list"
showapi = ckanurl + "/api/3/action/package_show?id=" 

queryapi = ckanurl + "/api/3/action/package_search?q=&rows=999"

# file settings
pkgcsv = "pkg-list.csv"
orgcsv = "organizations.csv"
orgmapcsv = "org-mapping.csv"
err_miss_map = "error_missing-mapping.csv"
excel_out = "Report OGD Datensätze nach Organisationseinheit.xlsx"

In [26]:
print(today)
print(listapi)
print(showapi)
print(queryapi)

2021-11-17
https://data.stadt-zuerich.ch/api/3/action/package_list
https://data.stadt-zuerich.ch/api/3/action/package_show?id=
https://data.stadt-zuerich.ch/api/3/action/package_search?q=&rows=999


In [27]:
#headers = {'Accept': 'application/json'}
#r = requests.get(listapi, headers=headers, verify=SSL_VERIFY)
#params = r.json()
#params

Lese alle Datensätze in einen Dataframe

In [28]:
listdata = pd.read_json(listapi) 
ids = listdata[['result']]
ids

Unnamed: 0,result
0,2014-02-13_urnengang-vom-9-februar-2014_raeuml...
1,2015-03-26_steigende-geburtenzahlen_folge-der-...
2,2015-03-26_wo-die-juengsten-wohnen
3,2015-04-21_kantonale-wahlen-12-april-2015_prof...
4,accessible-map-app-beta
...,...
839,zuripicknick-app
840,zuri-tours
841,zuri-zahlen
842,zwischenhalt-zurich


In [29]:
all_data = pd.read_json(queryapi)


In [37]:

# PHASE 1: CKAN-API GET INFORMATION ABOUT ALL DATASETS
# query all packages
#with urllib.request.urlopen(queryapi) as url:
        #data = json.loads(url.read().decode())

# prepare empty list
list_pkg = []

# loop trough all the packages 
for dataset in all_data["result"]["results"]:
    
      #we are only interested in active datasets (no harvesters or showcases)      
    if (dataset["type"]=="dataset") & (dataset["state"]=="active"):
        pkg_title = dataset["title"]
        pkg_author = dataset["author"]
        pkg_name = dataset["name"]        
        #pkg_dataqualtity = dataset["dataQuality"]
        pkg_updateInterval = dataset["updateInterval"]
        #pkg_FirstPublished = dataset["dateFirstPublished"]          
        pkg_license = dataset["license_id"]  
        pkg_metadatacreated = dataset["metadata_created"]
        pkg_spatialRelationship = dataset["spatialRelationship"]
        pkg_timeRange = dataset["timeRange"]
  

        
        # add relevant attributes to a list
        element_list_pkg = [pkg_title, pkg_author, pkg_name, pkg_timeRange, pkg_updateInterval,  pkg_spatialRelationship, pkg_metadatacreated.split('T'), pkg_license]
        list_pkg.append(element_list_pkg)
        
# Convert list_pkg to dataframe for further processing (merging with mappings)
data_list = pd.DataFrame(list_pkg, columns = ['title', 'author','name', 'timeRange','update_Interval', 'Raum','metadatacreated', 'license']) 


In [38]:
data_list.head(2)    

Unnamed: 0,title,author,name,timeRange,update_Interval,Raum,metadatacreated,license
0,Daten der Verkehrszählung zum motorisierten In...,"Dienstabteilung Verkehr, Sicherheitsdepartement",sid_dav_verkehrszaehlung_miv_od2031,2012 - vorgestern,[taeglich],Stadt Zürich,"[2020-03-11, 12:06:00.811855]",cc-zero
1,Daten der automatischen Fussgänger- und Velozä...,"Tiefbauamt, Abteilung Verkehr + Stadtraum, Tie...",ted_taz_verkehrszaehlungen_werte_fussgaenger_velo,laufende Nachführung seit 2009,[taeglich],Stadt Zürich,"[2020-03-16, 12:37:39.927543]",cc-zero


In [39]:
data_list.to_csv('data_list.csv')

Wie sehen alle Daten aus als JSON?

In [33]:
all_data_json = all_data["result"]["results"]

In [34]:
df_all_data = pd.json_normalize(all_data_json) 
#df_all_data.to_csv('df_all_data.csv')
df_all_data.head(1)

Unnamed: 0,owner_org,maintainer,relationships_as_object,private,maintainer_email,num_tags,id,metadata_created,metadata_modified,author,...,organization.created,organization.title,organization.name,organization.is_organization,organization.state,organization.image_url,organization.revision_id,organization.type,organization.id,organization.approval_status
0,stadt-zurich,Open Data Zürich,[],False,opendata@zuerich.ch,15,6212fd20-e816-4828-a67f-90f057f25ddb,2020-03-11T12:06:00.811855,2021-11-17T06:28:36.589771,"Dienstabteilung Verkehr, Sicherheitsdepartement",...,2015-06-25T13:53:59.198168,Stadt Zürich,stadt-zurich,True,active,https://www.stadt-zuerich.ch/content/dam/stzh/...,02c59c8a-fb42-4d2a-8a11-a12a036617ee,organization,stadt-zurich,approved


In [35]:
df_all_data.dtypes

owner_org                       object
maintainer                      object
relationships_as_object         object
private                           bool
maintainer_email                object
num_tags                         int64
id                              object
metadata_created                object
metadata_modified               object
author                          object
author_email                    object
dateFirstPublished              object
state                           object
version                         object
license_id                      object
type                            object
resources                       object
num_resources                    int64
sszFields                       object
tags                            object
dataType                        object
spatialRelationship             object
dateLastUpdated                 object
groups                          object
creator_user_id                 object
updateInterval           

In [36]:
gud_data = pd.read_json("https://data.stadt-zuerich.ch/api/3/action/package_search?q=Gesundheits-+und+Umweltdepartement")
gud_data

Unnamed: 0,help,success,result
count,https://data.stadt-zuerich.ch/api/3/action/hel...,True,20
facets,https://data.stadt-zuerich.ch/api/3/action/hel...,True,{}
results,https://data.stadt-zuerich.ch/api/3/action/hel...,True,"[{'owner_org': 'stadt-zurich', 'maintainer': '..."
search_facets,https://data.stadt-zuerich.ch/api/3/action/hel...,True,{}
sort,https://data.stadt-zuerich.ch/api/3/action/hel...,True,"score desc, date_last_modified desc"
