# Names and coordinates of Swiss cities

In this notebook, we will import the name and the GPS coordinates from all municipalities in Switzerland.

We will import a CSV file and convert it directly into a dataframe.
Our source is the Swiss Federal Office of Topography. The file is updated every month.



## Import the data

In [1]:
#Install the required libraries.
import requests
from bs4 import BeautifulSoup
import pandas as ps
import zipfile
import io
import os
import csv
import numpy as np

In [26]:
#Get the connection to the website
url = 'https://www.cadastre.ch/de/services/service/registry/plz.html'

#Accessing the entire website
website = requests.get(url)

#Creating a beautiful soup object with the webpage, using the html parser
soup = BeautifulSoup(website.content, 'html.parser')

#Finding the right section
section = soup.find('div', class_= 'parsys_column row')

#Finding the section with the link to the file
link = section.find('a', string = 'CSV (Excel) WGS84 ')

#Extract the link with the desired data
data_file = link['href']

#Unzip the zip file and store it in the same directory as this file is stored
get_data = requests.get(data_file)
content = zipfile.ZipFile(io.BytesIO(get_data.content))
data_folder = content.extractall()

#Create an array with all the data stored in the file
Data = []

#Open the CSV File and read it with the csv reader
with open('PLZO_CSV_WGS84/PLZO_CSV_WGS84.csv') as CSV_File:
    read = csv.reader(CSV_File, delimiter = ';')
    #store every row of the File in the array
    for row in read:
        Data.append(row)

#Change the list to a numpy array, in order to be able to define the header further down in the code
Data = np.asarray(Data)

#Convert the array in a Dataframe with the first row as header of the table
dataframe_1 = ps.DataFrame(Data[1:,:], columns = Data[0,:])

#Dataframe with the Information needed
dataframe = dataframe_1[['Ortschaftsname', 'Gemeindename', 'PLZ', 'E', 'N']]

#Change the column header E, N to Longitude and Latitude for a better understanding while working with the data
dataframe = dataframe.rename({'E' : 'Longitude', 'N' : 'Latitude'}, axis = 'columns')
display(dataframe)

Unnamed: 0,Ortschaftsname,Gemeindename,PLZ,Longitude,Latitude
0,Aeugst am Albis,Aeugst am Albis,8914,8.48831328535326,47.26700438726633
1,Aeugstertal,Aeugst am Albis,8914,8.49364170604486,47.282760808853396
2,Zwillikon,Affoltern am Albis,8909,8.431458619350813,47.287633089462105
3,Affoltern am Albis,Affoltern am Albis,8910,8.448945112880077,47.27916857724247
4,Bonstetten,Bonstetten,8906,8.467611445312384,47.31551041988308
...,...,...,...,...,...
4128,Ruggell,Ruggell,9491,9.528830277668794,47.24100025808611
4129,Schellenberg,Schellenberg,9488,9.54600684065143,47.23136529420569
4130,Thunersee,Thunersee,9999,7.715524563539287,46.688033279726746
4131,Brienzersee,Brienzersee,9999,7.9740967049736815,46.72850583845078


The csv file couldn't be converted directly into a pandas dataframe, because the csv file, is not in the in the correct unicode form. 

Because of that it had to be read first as a csv and then we had to convert it into a numpy array to finally convert it into a pandas dataframe.

## Data in the table

In the first column, are the names of localities.

In the second column are the names of the municipalities and lakes. 
The first column (Ortschaftsname) contains the name of places. This places aren't always proper municipalities, but they all are a part of a municipality (is defined in the second column). 

In the third columns are the postcodes.

In the fourts column is the Longitude and in the fifth column the latitude.

