# Questions for die Post Dataset

## Setup SQL

In [1]:
%load_ext sql
%config SqlMagic.autocommit=False # avoiding the error: FAILED: IllegalStateException COMMIT is not supported yet.
%sql hive://hadoop@localhost:10000/post

## Basic Checks

In [2]:
%sql SELECT * FROM plz limit 1

 * hive://hadoop@localhost:10000/post
Done.


rec_art,onrp,bfsnr,plz_typ,postleitzahl,plz_zz,gplz,ortbez18,ortbez27,sprachcode,sprachcode_abw,briefz_durch,gilt_ab_dat,plz_briefzust,plz_coff,kanton
1,2597,4252,30,4303,3,4303,Kaiseraugst Lieb.,Kaiseraugst Liebrüti,1,,2597,1997-03-05,430303,,AG


In [3]:
%sql SELECT * FROM streets limit 1

 * hive://hadoop@localhost:10000/post
Done.


rec_art,strid,onrp,strbezk,strbezl,strbez2k,strbez2l,str_lok_typ,strbez_spc,strbez_coff,str_ganzfach,str_fach_onrp
4,15007157,1095,"Torrent, place du","Torrent, place du",Place du Torrent,Place du Torrent,1,2,J,,


In [4]:
%sql SELECT * FROM bevoelkerung limit 1

 * hive://hadoop@localhost:10000/post
Done.


stichdatum,plz,anzahl,ortbez18,typ
2022-03-01,8203,237,Schaffhausen,f


In [5]:
%sql SELECT * FROM nachnamen limit 1

 * hive://hadoop@localhost:10000/post
Done.


stichdatum,plz,nachname,anzahl,rang,ortbez18,geschlecht
2022-03-01,2905,Gerber,14,1,Courtedoux,m


## Count all zip codes (PLZ) per canton and order them by size.
How many cantons do we have and what tells us the output? Why is there a difference?

In [7]:
%%sql
select kanton, count(*) as cnt from plz group by kanton order by cnt DESC

 * hive://hadoop@localhost:10000/post
Done.


kanton,cnt
BE,716
VD,567
ZH,476
TI,386
AG,363
VS,344
GR,329
FR,312
SG,264
TG,232


## Find `suurstoffi` in the `streets` table and find the corresponding record in `plz`

In [8]:
%sql select * from streets where lower(STRBEZ2L) like '%suurstoffi%'

 * hive://hadoop@localhost:10000/post
Done.


rec_art,strid,onrp,strbezk,strbezl,strbez2k,strbez2l,str_lok_typ,strbez_spc,strbez_coff,str_ganzfach,str_fach_onrp
4,76139559,3431,Suurstoffi,Suurstoffi,Suurstoffi,Suurstoffi,1,1,J,,


In [9]:
%sql select * from plz where onrp = 3431

 * hive://hadoop@localhost:10000/post
Done.


rec_art,onrp,bfsnr,plz_typ,postleitzahl,plz_zz,gplz,ortbez18,ortbez27,sprachcode,sprachcode_abw,briefz_durch,gilt_ab_dat,plz_briefzust,plz_coff,kanton
1,3431,1707,10,6343,0,6343,Rotkreuz,Rotkreuz,1,,8452,1996-03-04,633160,J,ZG


## Find the top 5 newest `gilt_ab_dat` entries in `plz`

In [18]:
%sql select * from plz order by gilt_ab_dat DESC limit 5

 * hive://hadoop@localhost:10000/post
Done.


rec_art,onrp,bfsnr,plz_typ,postleitzahl,plz_zz,gplz,ortbez18,ortbez27,sprachcode,sprachcode_abw,briefz_durch,gilt_ab_dat,plz_briefzust,plz_coff,kanton
1,10919,1103,80,6210,2,6210,Sursee Wassergrabe,Sursee Wassergrabe,1,,7217,2022-03-16,621060,,LU
1,10917,2701,80,4089,77,4089,Basel ZollExtern,Basel ZollExtern,1,,2452,2022-03-07,400202,,BS
1,10916,2701,80,4089,76,4089,Basel ZollBeschau,Basel ZollBeschau,1,,2452,2022-03-07,400202,,BS
1,10915,2701,80,4089,75,4089,Basel ZollGesperrt,Basel ZollGesperrt,1,,2452,2022-03-07,400202,,BS
1,10913,6248,80,3960,6,3960,Sierre Rue Falcon,Sierre Rue de l'Ile Falcon,2,,7249,2022-03-07,396060,,VS


## Check the `bevoelkerung` table for "your" `plz`.
Do you know what `typ` means?
Check out the [schema](https://swisspost.opendatasoft.com/explore/dataset/bevoelkerung_proplz/information/?disjunctive.plz&disjunctive.typ&disjunctive.ortbez18&sort=stichdatum) from the "post" if you don't.

In [27]:
%sql select * from bevoelkerung where plz = 8640

 * hive://hadoop@localhost:10000/post
Done.


stichdatum,plz,anzahl,ortbez18,typ
2022-03-01,8640,1949,Rapperswil SG/Hurd,f
2022-03-01,8640,5257,Rapperswil SG/Hurd,m
2022-03-01,8640,5070,Rapperswil SG/Hurd,w


## Find the top three `PLZ` for `typ='f'`

In [29]:
%sql select * from bevoelkerung where typ='f' order by anzahl DESC limit 3

 * hive://hadoop@localhost:10000/post
Done.


stichdatum,plz,anzahl,ortbez18,typ
2022-03-01,6300,19836,Zugerberg/Zug,f
2022-03-01,6900,15460,Massagno/Lugano/Pa,f
2022-03-01,8001,10784,Zürich,f


## Is your `nachname` in the `nachnamen` table?
Order it by anzahl

In [31]:
%%sql

select * from nachnamen where nachname = 'Egli' and geschlecht = 'm' order by anzahl

 * hive://hadoop@localhost:10000/post
Done.


stichdatum,plz,nachname,anzahl,rang,ortbez18,geschlecht
2022-03-01,3045,Egli,6,5,Meikirch,m
2022-03-01,8425,Egli,8,5,Oberembrach,m
2022-03-01,9652,Egli,8,4,Neu St. Johann,m
2022-03-01,3020,Egli,8,5,Bern,m
2022-03-01,9643,Egli,9,4,Krummenau,m
2022-03-01,8497,Egli,9,3,Fischenthal,m
2022-03-01,6217,Egli,9,5,Kottwil,m
2022-03-01,9601,Egli,10,1,Lütisburg Station,m
2022-03-01,3312,Egli,10,4,Fraubrunnen,m
2022-03-01,8345,Egli,11,1,Adetswil,m


## What are the top 10 lastnames in the table `nachnamen`?

In [33]:
%%sql

select nachname, sum(anzahl) as cnt from nachnamen group by nachname order by cnt DESC limit 10

 * hive://hadoop@localhost:10000/post
Done.


nachname,cnt
Müller,45217
Meier,25388
Schmid,19500
Keller,13161
,9733
Gerber,6919
Weber,6563
Huber,5950
Meyer,5134
Schneider,5031
