# Dataframe from CSV on the web, with geoplot

by Koenraad De Smedt at UiB



---
This notebook is not related to NLP, but is just another example of how to make a Pandas dataframe from a file with *character separated values* (.csv) on the web. It shows how information from a dataframe can be used to make a rough geoplot.

---

In [None]:
import pandas as pd

The example is a dataset by Erik Bolstad containing geo locations of postal zone centers in Norway. The separator between columns is a tab and there is no header.

In [None]:
postnr_url = 'https://www.erikbolstad.no/postnummer-koordinatar/txt/postnummer.csv'

Read the csv file from url into a data frame. Indicate that the separator is a tab. The header is on line 0, which is the first line.

In [None]:
postnr_frame = pd.read_csv(postnr_url, sep = '\t', encoding = 'utf-8', header = 0)
postnr_frame.columns

Summarize the numerical fields. We will be interested in LAT (latitude) and LON (longitude).

In [None]:
postnr_frame.describe()

Create a scatter plot. Notice that there is an outlier very far East. This seems to be an error.

In [None]:
postnr_frame.plot.scatter(x = 'LON', y = 'LAT', s = 4, figsize = (15,10))

Filter out rows with longitudes that are too far East and plot again.

In [None]:
postnr_frame = postnr_frame[postnr_frame.LON < 31] 
postnr_frame.plot.scatter(x = 'LON', y = 'LAT', s = 4, figsize=(8,10))

Alternatively filter out all rows with uncertain locations. Change some plot options as well.

In [None]:
postnr_frame = postnr_frame[postnr_frame.DATAKVALITETSFORKLARING != 'Usikker plassering']
postnr_frame.plot.scatter(x='LON', y='LAT', c='green', marker='.',
                          s=6, alpha=0.4, figsize=(8,10),
                          title='Postal zones in Norway')

### Exercise

(optional, nothing to do with language) The [Directorate of Fisheries](https://www.fiskeridir.no/Akvakultur/Registre-og-skjema/akvakulturregisteret) has a CSV file of fish farms in Norway. Plot their coordinates. You can also try to restrict the plot to certain fish types (*ART*) or other features. That file seems to be encoded in *iso-8859-1* with semicolon as separator.