# Open Government Data, Canton Zurich

### **Dataset**: Zürcher Ehedaten des 16. bis 18. Jahrhunderts

### **Description**: Enthält in standardisierter Kurzform alle Eheeinträge, die in den überlieferten Kirchenbüchern der Kirchgemeinden des heutigen Kantons Zürich von der Reformation bis zum Jahr 1800 verzeichnet sind. Die Daten stehen sowohl im Format CSV wie auch als 'Linked Open Data' über den Linked Data Service (LINDAS) im Format RDF bereit. Unter "Weitere Informationen" sind sechs SPARQL-Beispielabfragen der RDF-Daten auf LINDAS verlinkt.

*Autogenerated Jupyter Notebook and basic Python code for data set* **468@staatsarchiv-kanton-zuerich**.

## Dataset profile
- **Issued** `2019-08-26T00:00:00`
- **Modified** `2022-08-22T16:30:17`
- **Startdate** `1525-01-01`
- **Enddate** `1800-12-31`
- **Theme** `['Kultur, Medien, Informationsgesellschaft, Sport', 'Bevölkerung']`
- **Keyword** `['ehe', 'eheschliessungen', 'gemeinden', 'heiraten', 'kirchgemeinden', 'verheiratet', 'linked-data', 'sparql', 'rdf', 'ogd']`
- **Publisher** `['Staatsarchiv des Kantons Zürich']`
- **Landingpage** `https://zenodo.org/record/5827549`


## Import Python modules

In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('ggplot')

params = {
    'text.color': (0.25, 0.25, 0.25),
    'figure.figsize': [18, 6],
   }

plt.rcParams.update(params)

import pandas as pd 

## Load data

- The dataset has **`1` distribution(s)** in CSV format.
- All available CSV distributions are listed below and can be read into a pandas dataframe.

In [2]:
# Distribution 0
# Ktzhdistid               : 787
# Title                    : Zürcher Ehedaten des 16. bis 18. Jahrhunderts als tabellarische Daten
# Description              : Tabellen mit Daten zu Eheeinträgen im Kanton Zürich des 16., 17. und 18. Jahrhunderts. Enthaltene Informationen (Spalten): Signatur; Nachname Mann; Vorname Mann; Herkunft Mann; Nachname Frau; Vorname Frau; Herkunft Frau; Zusatzinformationen Mann; Zusatzinformationen Frau; Datum; Kirchgemeinde; Band; Webseite (zum Eintrag in der Archivdatenbank); ID.
# Issued                   : 2019-08-26T17:57:34
# Modified                 : 2022-06-30T09:53:41
# Rights                   : NonCommercialAllowed-CommercialAllowed-ReferenceRequired

df = pd.read_csv('https://zenodo.org/record/5827549', on_bad_lines='warn', encoding_errors='ignore')
if df.shape[1] <= 1:
    df = pd.read_csv('https://zenodo.org/record/5827549', sep=';', on_bad_lines='warn', encoding_errors='ignore')

Skipping line 8: expected 1 fields, saw 2
Skipping line 19: expected 1 fields, saw 6
Skipping line 27: expected 1 fields, saw 9
Skipping line 34: expected 1 fields, saw 9
Skipping line 40: expected 1 fields, saw 9
Skipping line 91: expected 1 fields, saw 3
Skipping line 104: expected 1 fields, saw 2
Skipping line 108: expected 1 fields, saw 2
Skipping line 122: expected 1 fields, saw 4
Skipping line 123: expected 1 fields, saw 2
Skipping line 124: expected 1 fields, saw 2
Skipping line 126: expected 1 fields, saw 2
Skipping line 169: expected 1 fields, saw 2
Skipping line 177: expected 1 fields, saw 2
Skipping line 195: expected 1 fields, saw 4
Skipping line 196: expected 1 fields, saw 2
Skipping line 256: expected 1 fields, saw 2
Skipping line 258: expected 1 fields, saw 2
Skipping line 260: expected 1 fields, saw 2
Skipping line 262: expected 1 fields, saw 2
Skipping line 264: expected 1 fields, saw 2
Skipping line 289: expected 1 fields, saw 2
Skipping line 319: expected 1 fields, s

Skipping line 27: expected 1 fields, saw 9
Skipping line 34: expected 1 fields, saw 9
Skipping line 40: expected 1 fields, saw 9
Skipping line 104: expected 1 fields, saw 6
Skipping line 108: expected 1 fields, saw 4
Skipping line 110: expected 1 fields, saw 2
Skipping line 120: expected 1 fields, saw 2
Skipping line 122: expected 1 fields, saw 9
Skipping line 123: expected 1 fields, saw 4
Skipping line 124: expected 1 fields, saw 4
Skipping line 210: expected 1 fields, saw 2
Skipping line 220: expected 1 fields, saw 2
Skipping line 249: expected 1 fields, saw 2
Skipping line 304: expected 1 fields, saw 2
Skipping line 386: expected 1 fields, saw 7
Skipping line 565: expected 1 fields, saw 3
Skipping line 571: expected 1 fields, saw 25
Skipping line 576: expected 1 fields, saw 2
Skipping line 594: expected 1 fields, saw 3
Skipping line 595: expected 1 fields, saw 2
Skipping line 596: expected 1 fields, saw 2
Skipping line 603: expected 1 fields, saw 2
Skipping line 608: expected 1 fiel

## Analyze data

In [3]:
# drop columns that have no values
df.dropna(how='all', axis=1, inplace=True)

In [4]:
print(f'The dataset has {df.shape[0]:,.0f} rows (observations) and {df.shape[1]:,.0f} columns (variables).')
print(f'There seem to be {df.duplicated().sum()} exact duplicates in the data.')

The dataset has 561 rows (observations) and 1 columns (variables).
There seem to be 139 exact duplicates in the data.


In [5]:
df.info(memory_usage='deep', verbose=True)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 561 entries, 0 to 560
Data columns (total 1 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   <!DOCTYPE html>  561 non-null    object
dtypes: object(1)
memory usage: 56.2 KB


In [6]:
df.head()

Unnamed: 0,<!DOCTYPE html>
0,"<html lang=""en"" dir=""ltr"">"
1,<head>
2,"<meta charset=""utf-8"">"
3,"<meta http-equiv=""X-UA-Compatible"" content=""IE..."
4,"<meta name=""viewport"" content=""width=device-wi..."


In [7]:
# display a small random sample transposed in order to see all variables
df.sample(3).T

Unnamed: 0,406,274,452
<!DOCTYPE html>,"ng-init=""vm.citationResult = 'Staatsar...","<td>Views <i class=""fa fa-question-c...","<li><a href=""http://about.zeno..."


In [8]:
# describe non-numerical features
with pd.option_context('display.float_format', '{:,.2f}'.format):
    display(df.describe(exclude='number'))

Unnamed: 0,<!DOCTYPE html>
count,561
unique,422
top,</div>
freq,19


In [9]:
# describe numerical features
with pd.option_context('display.float_format', '{:,.2f}'.format):
    display(df.describe(include='number'))

ValueError: No objects to concatenate

In [None]:
# check missing values with missingno
# https://github.com/ResidentMario/missingno
import missingno as msno
msno.matrix(df, labels=True, sort='descending');

In [None]:
# plot a histogram for each numerical feature
df.hist(bins=25, layout=(-1, 5), edgecolor='black');

In [None]:
# continue your code here...

**Contact**: Staatsarchiv des Kantons Zürich |  | staatsarchivzh@ji.zh.ch