WISO100303 / Johannes Schmidt & Peter Regner

# **An introduction to scientific programming**

<br> <br> <br> <br><br> <br> <br> <br>

In [None]:
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

In [None]:
plt.rcParams["figure.figsize"] = (14, 8)   # this does not seem to have any effect in datalore as of 2024-04-18
FIGSIZE = (12, 12)  # used later for changing figure size

In [None]:
# workaround: Datalore does not allow to publish attached files, so we have to download it.
def download_attached_files():
    import urllib
    import os.path
    fnames = {
              'numpy_broadcasting.png': 'https://files.boku.ac.at/filr/public-link/file-download/0d7483c99572915f01958920f90a6dc9/18662/-1064417480102885523/numpy_broadcasting.png',
              'vienna-map.png': 'https://files.boku.ac.at/filr/public-link/file-download/0d7483c99572915f0195892102336dfd/18659/-4041302498646712195/vienna-map.png',
              'public-toilets.csv': 'https://files.boku.ac.at/filr/public-link/file-download/0d7483c99572915f01958920fab66dd5/18658/-1977532774310129682/public-toilets.csv',
    }
    for fname, url in fnames.items():
        if not os.path.exists(fname):
            urllib.request.urlretrieve(url, filename=fname)

download_attached_files()

# Recap: Numpy arrays

Last time we used `np.array()` to create arrays. There are some other helpful functions to create arrays:

In [None]:
many_ones = np.ones([4,4])
many_ones

In [None]:
many_ones[0]

In [None]:
many_ones[0, 0]

Note that indexing can be used to assign values too:

In [None]:
many_ones[-1, -1] = 0

In [None]:
many_ones

In [None]:
many_ones[-1, :] = 0

In [None]:
many_ones

Today we will focus on more methods to do indexing and slicing of arrays.

# Let's do something with real data

Public toilets in Vienna:
https://www.data.gv.at/katalog/dataset/d9f5e582-3773-4f0b-8403-5d34718f6cf7

Note: we use the order (longitude, latitude) here. This is the order used in shape files and necessary for plotting too.

In [None]:
import urllib
import os.path
from pathlib import Path
import pandas as pd

def read_public_toilets():
    """Download CSV with geocordinates of public toilets in Vienna, parse it and return a numpy
    array of shape (N,2), where each point is (longitude_x, latitude_y)."""
    fname = 'public-toilets.csv'
    if not os.path.exists(fname):
        URI = ('https://data.wien.gv.at/daten/geo?'
               'service=WFS&request=GetFeature&version=1.1.0&'
               'typeName=ogdwien:WCANLAGEOGD&srsName=EPSG:4326&outputFormat=csv')
        urllib.request.urlretrieve(URI, filename=fname)
        
    d = pd.read_csv(fname)
    return d.SHAPE.str.extract(r'POINT \((\d+\.\d+) (\d+\.\d+)\)').astype(float).values

In [None]:
# note that the CSV file will not be re-downloaded from the official website unless it is deleted
# in the Datalore Notebook files without running download_attached_files() at the beginning of the notebook
public_toilets = read_public_toilets()

In [None]:
public_toilets.shape

In [None]:
public_toilets[:5]

In [None]:
stephansplatz = np.array([16.372223, 48.208432])

In [None]:
# Length of one degree longitude/latitude on Stephansplatz approximately:
LON_TO_KM = 74.1
LAT_TO_KM = 111.19

Doing calculations with longitude/latitude is difficult. Let's project the points to Cartesian coordinates in kilometers with origin at the Stephansplatz. Of course this it's not very accurate to project longitude/latitude that way, but for a small area like Vienna it should be good enough:

In [None]:
def to_km(locations):
    return (locations - stephansplatz) * np.array([LON_TO_KM, LAT_TO_KM])

In [None]:
public_toilets_km = to_km(public_toilets)

In [None]:
public_toilets_km[:5]

**This is not a good way how to work with maps for larger areas, even if it works reasonably well on small scale. This is just an example of how to use Numpy!**

# Numpy broadcasting rules

<!-- broken in Datalore, see cell below
<img src="numpy_broadcasting.png" width="800">
-->

<small>
Source: [scipy lecture notes](http://scipy-lectures.org/intro/numpy/operations.html#broadcasting) (CC-BY 4.0)
</small>

- operations with Numpy are (mostly) elementwise.
- if the shape does not match, the smaller array is duplicated along missing axis

![Numpy broadcasting rules](images/numpy_broadcasting.png)

Examples:

    A      (2d array):  5 x 4
    B      (1d array):      1
    Result (2d array):  5 x 4

    A      (2d array):  5 x 4
    B      (1d array):      4
    Result (2d array):  5 x 4

    A      (3d array):  15 x 3 x 5
    B      (3d array):  15 x 1 x 5
    Result (3d array):  15 x 3 x 5

Full documentation:
https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

**How does that work in detail?**

NumPy broadcasting allows arrays of different shapes to undergo element-wise operations.

Sizes of the dimensions are compared, starting with the trailing (i.e. rightmost) one:
- if the sizes match, there is nothing to do
- if the size equals 1 or if there is no size due to a lower number of dimensions, the smaller array is copied along the dimensions to match the shape of the larger array.
- if the sizes do not match and both are >1, an error is raised: "ValueError: operands could not be broadcast together"

**Hint:** With more than two dimensions things get a bit confusing. Consider using the library `xarray` if you need that often!

Ok, so can we plot our data set to get a better overview?

In [None]:
plt.figure(figsize=FIGSIZE)

plt.plot(public_toilets_km.T[0], public_toilets_km.T[1], 'ko', markersize=3, label='Public toilet')
plt.plot([0], [0], 'P', label='Stephansplatz', color='chocolate')

plt.gca().set_aspect('equal')
plt.legend()
plt.title('Map of public toilets in Vienna')
plt.xlabel('Longitude [km]')
plt.ylabel('Latitude[km]')
plt.grid();

# Reading and plotting images

Let's add a simple (inaccurate) map of Vienna (adapted version from [this one](https://commons.wikimedia.org/wiki/Category:Maps_of_Vienna#/media/File:Gemeindebezirke_Wiens.svg), license: public domain). See also [interactive map of public toilets](https://m.wien.gv.at/stadtplan/#base=karte&zoom=12&lat=48.2023&lon=16.4142&layer=wc).

In [None]:
vienna = plt.imread('vienna-map.png')

In [None]:
vienna.shape

The third dimension here is the color space. If you have never seen RGB values before, you can [google for "color picker"](https://www.google.com/search?q=color+picker) to see a demonstration.

In [None]:
# distance from Stephansplatz to borders of the PNG file in km
left = -13.682179147809752
right = 16.639373238835283
bottom = -9.81358722568525
top = 12.989326274442831
extent = left, right, bottom, top

In [None]:
plt.figure(figsize=FIGSIZE)

plt.plot(public_toilets_km.T[0], public_toilets_km.T[1], 'ko', markersize=3, label='Public toilet')
plt.plot([0], [0], 'P', label='Stephansplatz', color='chocolate')
plt.gca().set_aspect('equal')
plt.legend()
plt.title('Map of public toilets in Vienna')
plt.xlabel('Longitude [km]')
plt.ylabel('Latitude[km]')
plt.grid()

plt.imshow(vienna, extent=extent);

There are [better ways to plot geographic maps](https://matplotlib.org/basemap/users/examples.html). This is just a simple example to play with Numpy.

# Distances between locations

In [None]:
def distance(point1, point2):
    """Calculate eukledian distance between two points. Points are passed
    as lists or arrays of length 2. Numpy arrays of many dimensions are 
    supported: axis=0 must be the dimension for x/y coordinates."""
    return ((point1[0] - point2[0])**2 + (point1[1] - point2[1])**2)**0.5

In [None]:
a = np.array([1, 1])
b = np.array([0, 0])

In [None]:
distance(a, b)

In [None]:
you_are_here =  np.array([16.3365605,48.2364388])

Unfortunately the array has the wrong shape, the first _axis_ must be the one for x/y coordinates:

In [None]:
public_toilets_km.shape

Transposing the array solves this problem:

In [None]:
public_toilets_km.T.shape

Why is this necessary? We can look up how `distance()`:

In [None]:
distance?

In [None]:
distance??

In [None]:
public_toilets_km[:4]

In [None]:
public_toilets_km[:4].T

In [None]:
public_toilets_km[0]

In [None]:
public_toilets_km.T[0]

In [None]:
distances_to_me = distance(public_toilets_km.T, to_km(you_are_here))

In [None]:
distances_to_me[:5]

There is a toilet very close, less than 500m!

In [None]:
distances_to_me.min()

Often there is another way to call Numpy functions:

In [None]:
np.min(distances_to_me)

<small>Note: In Numpy, usually there is no difference between these two ways of calling either the function `np.min(distances_to_me)` or the method `distances_to_me.min()`. A convention says methods should be used when the object is modified and the function if a new object is returned. But it this convention is not very widesprad, it is not necessary to stick to it.

But where is the toilet?

In [None]:
closest_idx = distances_to_me.argmin()
closest_idx

In [None]:
public_toilets[closest_idx]

In [None]:
def google_maps_link(location):
    """Return link to turbine in Google maps.
    
    See documentation:
    https://developers.google.com/maps/documentation/urls/guide
    https://stackoverflow.com/questions/47038116/google-maps-url-with-pushpin-and-satellite-basemap

    """
    xlong, ylat = location
    
    # alternative API which does not allow marker
    # f"https://www.google.com/maps/@?api=1&map_action=map&center={ylat},{xlong}&basemap=satellite"
    
    # alternative API which does not allow sattelite
    # f"https://www.google.com/maps/search/?api=1&query={ylat},{xlong}"
    
    # zoom level z=xxx seems to be broken somehow (?)
    return f"http://maps.google.com/maps?q=loc:{ylat}+{xlong}&z=13"

In [None]:
print(google_maps_link(public_toilets[closest_idx]))

In [None]:
print(google_maps_link(you_are_here))

In [None]:
def osm_location_link(location, zoom_level):
    """Return link to location in openstreetmaps."""
    xlong, ylat = location
    return f"https://www.openstreetmap.org/?mlat={ylat}&mlon={xlong}#map={zoom_level}/{ylat}/{xlong}"

In [None]:
print(osm_location_link(public_toilets[closest_idx], 15))

In [None]:
plt.figure(figsize=FIGSIZE)

plt.plot(public_toilets_km.T[0], public_toilets_km.T[1], 'ko', markersize=3, label='Public toilet');
plt.plot([0], [0], 'P', label='Stephansplatz', color='chocolate')
plt.gca().set_aspect('equal')
plt.title('Map of public toilets in Vienna')
plt.xlabel('Longitude [km]')
plt.ylabel('Latitude[km]')
plt.grid()

plt.imshow(vienna, extent=extent)

plt.plot(public_toilets_km.T[0][closest_idx], public_toilets_km.T[1][closest_idx],
         'mv', label='Closest toilet')
plt.plot(*to_km(you_are_here), '*g', label='You are here')

plt.legend();

## Exercise 1 - Find the hottest year

Find the hottest year!

<small>Data source: https://www.wien.gv.at/statistik/wetter/</small>

In [None]:
years = np.array([1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975,
                  1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990,
                  1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
                  2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013])

mean_temperature_celsius = np.array([10.2, 8.6, 8.7, 9.1, 8.6, 10.1, 10.2, 9.7, 9.2, 9.2, 9.8, 9.3, 9.6,
                                     10.2, 10.1, 9.6, 10.1, 9.1, 9.6, 8.7, 10.1, 10.0, 10.8, 9.4, 9.0, 9.6,
                                     9.3, 10.4, 10.7, 10.9, 9.7, 11.1, 10.8, 11.8, 10.4, 8.9, 10.0, 10.8,
                                     10.7, 11.7, 10.6, 11.3, 11.0, 10.4, 10.2, 10.7, 11.7, 11.4, 11.0, 9.9,
                                     11.1, 11.3, 10.9])

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #

In [None]:
# # # # # YOUR SOLUTION GOES HERE # # # # #