# 02 Visualization of data 

Part of ["Introduction to Data Science" course](https://github.com/kupav/data-sc-intro) by Pavel Kuptsov, [kupav@mail.ru](mailto:kupav@mail.ru)

Recommended reading for this section:

1. Vanderplas, J.T. (2016). Python data science handbook: Essential tools for working with data. Sebastopol, CA: O’Reilly Media, Inc. 

The following Python modules will be required. Make sure that you have them installed
- `matplotlib`
- `numpy`
- `requests`

## Lesson 1

### Simple line plots

Plotting in Python is done via modules. The most known one is Matplotlib.

First we need to import it. Also we will need NumPy.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

Notice that we import not the Matplotlib itself, but `matplotlib.pyplot`. This is a submodule (a module being a part of another module). 

Submodule `pyplot` provides an interface layer for low level commands doing the actual plotting. 
Another interface layer is `pylab`, but `pylab` is now not recommended for using.

Next step is to create a figure and axes.

There are several ways to do it. We will use a command
```python
fig, ax = plt.subplots()
```
Function name `subplots()` sounds misleading. It assumes many plots, but we want just one. But it creates just one when we call it without arguments (default values are used).

In [None]:
fig, ax = plt.subplots()

The returned `fig` is an object representing the whole figure. This is a container that holds all the plot parts: axes, graphics, text, and labels.

The returned `ax` stands for the axes, a visual object that is seen above: a bounding box with ticks and labels, which will eventually contain the plot elements. 

Although we can use any variable names, `plt`, `fig`, and `ax` or `axs` are in fact standard (`axs` are used when we plot multiple subplots in one figure).

Now we are ready for the plotting.

We are going to start from a line plot that represents how one variable, say $y$ behave when the other one, say $x$ changes.

First we compute the data.

In [None]:
import numpy as np

x = np.linspace(-6, 6, 101)  # array of values from -6 to 6, number of values is 101
print(f"x={x}")

y = np.sin(x) # apply math function sin to each value in x array
print(f"y={y}")

Now let us plot $y$ vs $x$

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()

ax.plot(x, y);

One plot can contain many curves

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-6, 6, 101)
ax.plot(x, np.sin(x))
ax.plot(x, np.cos(x));

Grid lines sometimes make it better

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-6, 6, 101)
ax.plot(x, np.sin(x))
ax.plot(x, np.cos(x))
ax.grid();  # here we enable plotting grid lines

Size of an image can be specified when calling `subplots`:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots(figsize=(12, 4))  # size in inches, 1 in = 2.54 cm

x = np.linspace(-6, 6, 101)
ax.plot(x, np.sin(x))
ax.plot(x, np.cos(x))
ax.grid();

One figure can contain many plots. Here is an example

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# sharex, sharey are used when subplots have identical x and y values ranges
fig, axs = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True, figsize=(9, 6))

x = np.linspace(-2*np.pi, 2*np.pi, 100)

axs[0,0].plot(x, np.sin(x))
axs[0,1].plot(x, np.cos(x))

axs[1,0].plot(x, np.sin(x**2))
axs[1,1].plot(x, np.cos(x**2))

# Adjust spaces between subplots
plt.tight_layout();

Created image can be saved to a file as follows:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-6, 6, 101)
ax.plot(x, np.sin(x))
ax.grid(); 

plt.savefig('sin.png')  # save plot as png file

### Line colors

Line colors and styles can be adjusted. Parameter `color` is responsible for color specifying. It accepts a string argument representing almost any imaginable color.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 6, 100)

ax.plot(x, np.sin(x - 0), color='cyan')        # specify color by name
ax.plot(x, np.sin(x - 1), color='k')           # short color code (rgbcmyk)
ax.plot(x, np.sin(x - 2), color='0.6')         # Grayscale between 0 and 1
ax.plot(x, np.sin(x - 3), color='#658999')     # Hex code (RRGGBB from 00 to FF)
ax.plot(x, np.sin(x - 4), color=(1.0,0.5,0.1)) # RGB tuple, values 0 to 1
ax.plot(x, np.sin(x - 5), color='chocolate');  # all HTML color names supported

If no color is specified, Matplotlib will automatically cycle through a set of default colors for multiple lines.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 6, 100)

ax.plot(x, np.sin(x - 0))
ax.plot(x, np.sin(x - 1))
ax.plot(x, np.sin(x - 2))
ax.plot(x, np.sin(x - 3))
ax.plot(x, np.sin(x - 4))
ax.plot(x, np.sin(x - 5));

Default colors that are used in the cycle can be called directly by their names 'C0', 'C1', 'C2' and so on.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 6, 100)

ax.plot(x, np.sin(x - 0))
ax.plot(x, np.sin(x - 1))
ax.plot(x, np.sin(x - 2))
ax.plot(x, np.sin(x - 3), color='C0')
ax.plot(x, np.sin(x - 4), color='C1')
ax.plot(x, np.sin(x - 5), color='C2');

###  Line styles

Similarly, the line style can be adjusted using the linestyle parameter.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 1, 100)

ax.plot(x, x + 0, color='C0', linestyle='solid')
ax.plot(x, x + 1, color='C1', linestyle='dashed')
ax.plot(x, x + 2, color='C2', linestyle='dashdot')
ax.plot(x, x + 3, color='C3', linestyle='dotted')
ax.plot(x, x + 4, color='C0', linestyle='-')       # solid
ax.plot(x, x + 5, color='C1', linestyle='--')      # dashed
ax.plot(x, x + 6, color='C2', linestyle='-.')      # dashdot
ax.plot(x, x + 7, color='C3', linestyle=':');      # dotted

Line style and color can be combined into a single unnamed parameter that must go right after the data.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 1, 100)

ax.plot(x, x + 0, '-b')  # solid blue
ax.plot(x, x + 1, '--c') # dashed cyan
ax.plot(x, x + 2, '-.k') # dashdot black
ax.plot(x, x + 3, ':r'); # dotted red

Single-character color codes correspond to the standard abbreviations in the RGB (Red/Green/Blue) and CMYK (Cyan/Magenta/Yellow/blacK) color systems.

### Axes limits

Matplotlib adjusts limits for axes automatically. But also it can be controlled manually.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-2, 2,100)
ax.plot(x, x**3)

ax.set_xlim(-3, 3)    # x-limits
ax.set_ylim(-10, 10); # y-limits

Plot axes can be easily reverted by changing order of the corresponding limits

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-2, 2,100)
ax.plot(x, x**3)

ax.set_xlim(-3, 3)    # same x-limits as above
ax.set_ylim(10, -10); # y-limits in the reversed order

### Labeling plots

Methods for labeling the whole plot and each axis

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-5, 5, 100)
ax.plot(x, np.sin(x**2))

ax.set_title("Example of plot labeling")
ax.set_xlabel("x")
ax.set_ylabel("sin(x**2)");

Labels look much better if math symbols are printed "like in math books": not `sin x**2` but $\sin x^2$

Latex notation is used for this. 
- enclose an expression in dollar signs: `$x+y$` looks like $x+y$
- add a leading backslash to math function names: `$\sin x$` looks like $\sin x$
- also add a leading backslash to Greece letter names: `$\beta$` looks like $\beta$
- use underscore for subscripts: `$x_1$` looks like this $x_1$
- use circumflex for superscripts: `$x^2$` looks like this $x^2$

Take into account that strings with latex formating usually need `r` prefix: not `"$\sin x_1$"` but `r"$\sin x_1$"`. 

This prefix means "raw" string, it suppresses Python inner formating symbols to avoid conflicts with latex. An example: the string "Hello\nworld" contains a new line command `\n`, but r"Hello\nworld" contains just a backslash and letter n.

In [None]:
print("Hello\nworld")   # \n is treated as the new line command
print(r"Hello\nworld")  # \n is mere a backslash and n letter

Now re-plot the above plot with pretty labels. 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-5, 5, 100)
ax.plot(x, np.sin(x**2))

ax.set_title("Example of plot labeling")
ax.set_xlabel(r"$x$")
ax.set_ylabel(r"$\sin(x^2)$");

When multiple lines are being shown within a single axes, 
a plot legend helps to distinguish them.

To create the plot legend one first needs to assign labels to each curve 
and then show the legend 

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(-3, 3, 100)
y1 = np.cos(x)
y2 = np.cos(x)**2

ax.plot(x, y1, '-C0', label="$\cos x$")
ax.plot(x, y2, '-.C1', label="$(\cos x)^2$")

ax.set_title("Two curves and the legend")
ax.set_xlabel("$x$")
ax.legend();

### Line plot example: AMD vs Intel

Realistic example: AMD and Intel stock prices (from [Kaggle S&P 500 stock data](https://www.kaggle.com/camnugent/sandp500))

In [None]:
# AMD and INTEL stock proces from 2013-02-08 to 2014-09-10 (YYYY-MM-DD)
dat = [[0,2.59,21.0],[3,2.67,21.03],[4,2.77,21.19],[5,2.75,21.25],[6,2.75,21.23],[7,2.71,21.115],[11,2.82,21.085],[12,2.7,20.73],[13,2.6,20.25],[14,2.61,20.42],[17,2.53,20.23],[18,2.46,20.58],[19,2.53,20.93],[20,2.49,20.88],[21,2.42,21.03],[24,2.4,21.27],[25,2.43,21.51],[26,2.43,21.75],[27,2.55,21.89],[28,2.56,21.58],[31,2.59,21.69],[32,2.61,21.64],[33,2.6,21.655],[34,2.63,21.65],[35,2.6,21.375],[38,2.65,21.26],[39,2.67,21.14],[40,2.75,21.18],[41,2.64,21.04],[42,2.54,21.33],[45,2.51,21.15],[46,2.54,21.765],[47,2.55,21.83],[48,2.55,21.835],[52,2.44,21.43],[53,2.39,21.455],[54,2.32,21.05],[55,2.33,21.135],[56,2.29,20.94],[59,2.59,21.09],[60,2.63,21.75],[61,2.61,22.26],[62,2.52,21.825],[63,2.48,21.675],[66,2.4,21.38],[67,2.44,21.915],[68,2.4,21.93],[69,2.51,22.24],[70,2.47,22.44],[73,2.46,22.88],[74,2.53,23.375],[75,2.61,23.66],[76,2.68,23.38],[77,2.64,23.4],[80,2.68,23.76],[81,2.82,23.95],[82,3.22,23.99],[83,3.41,24.11],[84,3.6,23.96],[87,3.61,23.91],[88,3.54,24.15],[89,3.83,24.25],[90,3.86,24.36],[91,3.95,24.5],[94,4.17,24.08],[95,4.26,23.84],[96,4.38,24.2],[97,3.83,23.94],[98,4.07,24.04],[101,4.1,24.08],[102,4.02,24.15],[103,3.96,24.07],[104,4.01,24.05],[105,4.05,23.923],[109,4.05,24.08],[110,3.98,24.27],[111,4.04,24.21],[112,4.0,24.28],[115,3.96,25.24],[116,4.01,25.36],[117,3.91,24.7],[118,3.94,24.65],[119,3.91,24.59],[122,4.06,25.01],[123,3.96,24.71],[124,3.9,24.46],[125,3.95,24.99],[126,3.94,24.92],[129,4.05,25.1],[130,4.09,25.465],[131,4.07,25.0],[132,3.87,24.185],[133,4.0,24.195],[136,4.05,23.58],[137,4.15,23.88],[138,4.14,24.005],[139,4.08,24.05],[140,4.08,24.23],[143,4.1,23.886],[144,3.97,23.72],[145,4.06,23.762],[147,4.07,24.06],[150,4.0,23.185],[151,4.05,23.135],[152,3.98,23.25],[153,4.45,23.99],[154,4.32,23.9],[157,4.4,23.94],[158,4.43,24.25],[159,4.38,24.15],[160,4.64,23.244],[161,4.03,23.04],[164,3.9,22.77],[165,3.66,22.75],[166,3.63,22.93],[167,3.7,23.06],[168,3.82,23.26],[171,3.75,23.24],[172,3.82,23.38],[173,3.77,23.335],[174,3.81,23.2],[175,3.8,23.22],[178,3.82,22.924],[179,3.72,22.8],[180,3.69,22.7],[181,3.71,22.45],[182,3.65,22.51],[185,3.65,22.64],[186,3.69,22.52],[187,3.82,22.57],[188,3.69,22.03],[189,3.66,21.915],[192,3.6,22.28],[193,3.63,22.523],[194,3.61,22.17],[195,3.63,22.26],[196,3.65,22.44],[199,3.58,22.275],[200,3.39,22.191],[201,3.42,22.285],[202,3.38,22.06],[203,3.27,21.98],[207,3.27,22.067],[208,3.31,22.635],[209,3.41,22.6],[210,3.57,22.67],[213,3.69,22.91],[214,3.87,22.985],[215,3.82,22.81],[216,3.75,22.63],[217,3.83,23.44],[220,3.82,23.39],[221,3.85,23.74],[222,3.93,23.9],[223,3.95,23.915],[224,3.83,23.769],[227,3.79,23.62],[228,3.8,23.705],[229,3.91,23.7],[230,3.89,23.41],[231,3.86,22.98],[234,3.81,22.921],[235,3.86,22.83],[236,3.9,22.885],[237,3.9,22.6],[238,3.91,22.81],[241,3.86,22.83],[242,3.72,22.48],[243,3.65,22.59],[244,3.79,23.1],[245,3.83,23.255],[248,3.97,23.45],[249,4.02,23.39],[250,4.09,23.695],[251,4.09,23.92],[252,3.53,23.875],[255,3.37,24.135],[256,3.18,24.071],[257,3.14,23.735],[258,3.23,23.78],[259,3.34,24.235],[262,3.32,24.36],[263,3.33,24.523],[264,3.3,24.495],[265,3.34,24.47],[266,3.31,24.325],[269,3.32,24.255],[270,3.33,24.03],[271,3.32,24.245],[272,3.28,24.059],[273,3.27,24.09],[276,3.34,24.169],[277,3.44,24.43],[278,3.54,24.6],[279,3.52,24.385],[280,3.5,24.52],[283,3.47,24.6],[284,3.42,24.7],[285,3.42,24.56],[286,3.37,25.23],[287,3.34,23.87],[290,3.39,23.75],[291,3.45,23.65],[292,3.56,23.9],[294,3.64,23.84],[297,3.66,23.7],[298,3.62,23.55],[299,3.57,23.74],[300,3.64,24.26],[301,3.66,24.82],[304,3.63,24.93],[305,3.72,24.82],[306,3.68,24.42],[307,3.69,24.47],[308,3.69,24.29],[311,3.59,24.45],[312,3.65,24.655],[313,3.65,25.15],[314,3.65,25.14],[315,3.69,25.055],[318,3.75,25.32],[319,3.77,25.43],[321,3.8,25.7],[322,3.78,25.6],[325,3.85,25.85],[326,3.87,25.955],[328,3.95,25.79],[329,4.0,25.78],[332,4.13,25.46],[333,4.18,25.585],[334,4.18,25.43],[335,4.09,25.31],[336,4.17,25.53],[339,4.13,25.5],[340,4.3,26.51],[341,4.47,26.67],[342,4.38,26.54],[343,4.18,25.85],[347,4.17,25.59],[348,3.67,25.31],[349,3.62,25.13],[350,3.47,24.81],[353,3.41,24.72],[354,3.54,24.9],[355,3.48,24.68],[356,3.48,24.74],[357,3.43,24.54],[360,3.33,23.95],[361,3.37,23.82],[362,3.31,23.52],[363,3.41,23.99],[364,3.47,24.205],[367,3.63,24.29],[368,3.7,24.47],[369,3.69,24.55],[370,3.7,24.7],[371,3.69,24.755],[375,3.7,24.76],[376,3.72,24.5],[377,3.69,24.74],[378,3.69,24.42],[381,3.71,24.63],[382,3.69,24.62],[383,3.7,24.8],[384,3.71,24.76],[385,3.71,24.76],[388,3.67,24.5],[389,3.7,24.61],[390,3.71,24.5],[391,3.73,24.63],[392,3.95,24.64],[395,3.81,24.84],[396,3.85,24.73],[397,3.91,24.76],[398,3.86,24.57],[399,3.85,24.5],[402,3.8,24.7],[403,3.89,24.82],[404,3.98,25.02],[405,4.05,25.425],[406,4.04,25.17],[409,4.05,25.12],[410,4.05,25.46],[411,4.05,25.3772],[412,3.91,25.31],[413,3.88,25.62],[416,4.01,25.814],[417,4.07,25.99],[418,4.06,25.89],[419,4.0,26.41],[420,4.01,26.16],[423,3.88,26.485],[424,3.99,26.91],[425,3.98,26.98],[426,3.85,26.425],[427,3.65,26.1799],[430,3.71,26.56],[431,3.79,26.77],[432,3.76,26.93],[433,3.69,27.04],[437,4.12,26.95],[438,4.3,26.84],[439,4.25,26.75],[440,4.28,26.75],[441,4.05,26.26],[444,3.96,26.33],[445,4.03,26.48],[446,4.09,26.69],[447,4.2,26.45],[448,4.12,26.41],[451,4.09,26.17],[452,4.01,26.2],[453,3.97,26.37],[454,3.93,26.34],[455,3.87,26.3],[458,3.97,26.37],[459,4.03,26.45],[460,3.98,26.33],[461,3.96,26.01],[462,4.02,25.82],[465,4.09,26.04],[466,4.05,26.04],[467,4.1,26.2],[468,4.01,26.15],[469,4.02,26.289],[473,4.04,26.71],[474,4.0,26.88],[475,4.03,26.96],[476,4.0,27.32],[479,3.97,27.26],[480,3.94,27.66],[481,4.04,27.6],[482,4.08,27.66],[483,4.06,28.17],[486,4.07,27.91],[487,4.2,28.24],[488,4.29,27.93],[489,4.29,27.96],[490,4.28,29.87],[493,4.44,30.01],[494,4.47,29.95],[495,4.4,29.93],[496,4.36,30.09],[497,4.1,30.2],[500,4.01,30.23],[501,3.94,30.5],[502,3.96,30.88],[503,4.03,30.78],[504,4.11,30.93],[507,4.19,30.9],[508,4.22,30.98],[509,4.34,30.98],[510,4.24,31.14],[514,4.26,31.03],[515,4.22,30.79],[516,4.23,30.89],[517,4.29,31.26],[518,4.37,31.25],[521,4.64,31.49],[522,4.53,31.71],[523,4.66,34.65],[524,4.57,33.7],[525,3.83,33.7],[528,3.78,34.06],[529,3.8,34.7901],[530,3.76,34.5],[531,3.73,34.25],[532,3.76,34.25],[535,3.73,34.23],[536,3.79,34.19],[537,3.82,34.35],[538,3.91,33.89],[539,3.97,33.745],[542,4.0,34.05],[543,4.08,32.82],[544,4.12,32.845],[545,4.1,32.68],[546,4.12,32.6],[549,4.14,33.02],[550,4.08,33.13],[551,4.11,34.1],[552,4.08,33.94],[553,4.13,34.17],[556,4.21,34.41],[557,4.27,34.34],[558,4.28,34.5],[559,4.24,35.15],[560,4.25,34.94],[563,4.18,34.81],[564,4.19,34.8],[565,4.16,34.79],[566,4.14,34.65],[567,4.17,34.92],[571,4.16,34.57],[572,4.19,34.57],[573,4.15,34.905],[574,4.15,35.0],[577,4.13,35.33],[578,4.08,34.91],[579,4.08,35.02]]

In [None]:
import matplotlib.pyplot as plt
import numpy as np

npd = np.array(dat)
scl1 = 1  # npd[0, 1]
scl2 = 10 # npd[0, 2]

fig, ax = plt.subplots()
ax.plot(npd[:, 0], npd[:, 1] / scl1, label='AMD')
ax.plot(npd[:, 0], npd[:, 2] / scl2, label='INTEL')
ax.set_xlabel("Day from 08-02-2013 to 10-09-2014")
ax.legend();

Notice different scales of price per share. To compare the curves we either divide Intel prices by 10, or divide both of the data rows by the corresponding first values.

We see that Intel price is more stable: it grows almost monotonically, while AMD oscillates, probably unpredictable.

### Line plots with markers

Also the plot looks like a smooth curve it actually contains many lines segments 
connecting data point. If required the data points can be specially indicated by markers.

Obviously it makes sense only if the number of data point is not very large.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 2, 11)
y = x**2

fig, ax = plt.subplots()

ax.plot(x, y, marker='o') # line with filled circles
ax.plot(x, y+1, marker='o', linestyle="")  # markers only, no connecting line
ax.plot(x, y+2, marker='*', linestyle="")  # stars 
ax.plot(x, y+3, marker='.', linestyle="")  # points
ax.plot(x, y+3, marker='d', linestyle="")  # diamods
ax.plot(x, y+4, "r+");  # compact form: plot with red (r) pluses (+)

### Scatter plots

If plotting individual data points is specially required a scatter plot can be 
used instead of a line plot. In fact, plotting above with markers we already 
produced simple scatter plots. But Matplotlib suggests a special plotting function for it: 
`ax.scatter` instead of `ax.plot`.

Scatter plots with `ax.scatter` provide more control for points view: sizes and colors of each 
point can be specified separately.

Below is an example how to set point size according to x values

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 6, 11) # x values
y = np.sin(x)             # corresponding y values
sizes = 1 + x**4          # size of point grows as (1+x**4)

ax.scatter(x, y, marker="o", s=sizes);  # s is a named parameters 

One more example that shows what can be done with scatter plots

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

rng = np.random.default_rng()  # recall that here we create a random number generator

N = 50  # numer of points to plot

x = rng.random(size = N)  # random x 
y = rng.random(size = N)  # random y
sizes = 1000*rng.random(size = N)  # random sizes
colors = rng.random(size = N)  # random colors

# To plot colorbar we need to take an image that is return by scatter and pass
# it to a colotbar function. 
im = ax.scatter(x, y, s=sizes, c=colors, alpha=0.5, edgecolor='k', cmap='hot')
fig.colorbar(im);

Named parameters above are as follows: `s` stands for sizes, `c` is colors, `alpha` is a transparency. It must be a scalar. We can not set the transparency for each point separately. 

We did not specified a marker type. The default one is used - a filled circle. Parameter `edgecolor` defines color of marker edges.

Color here is encoded by a real number from 0 to 1. This is controlled by a named parameter `cmap` (derived from color map). In the above example a color map 'hot' is used. There are plenty of them built in a Matplotlib.

Mapping from numbers to color is decoded to the right of the plot. This is called colorbar. This is plotted by calling `colorbar` method of the object `fig`.

### Scatter plot example: movies popularity

Here is a more realistic example. Below is a dataset `dat` with a movie year, duration and popularity records.

In [None]:
# ['year', 'duration', 'popularity']
dat = [[1990,111,68],[1991,113,68],[1983,104,79],[1979,122,6],[1978,94,14],[1983,140,68],[1984,101,14],[1989,99,28],[1985,104,6],[1990,149,32],[1982,188,81],[1982,117,17],[1966,103,46],[1986,112,49],[1966,103,6],[1985,112,39],[1976,150,51],[1929,84,2],[1963,109,62],[1988,110,68],[1988,101,15],[1981,116,8],[1987,101,31],[1991,105,79],[1988,127,23],[1990,97,80],[1988,108,53],[1989,88,88],[1979,110,19],[1960,90,20],[1957,91,42],[1956,96,62],[1992,90,36],[1955,86,14],[1955,95,38],[1962,91,64],[1972,91,18],[1958,104,52],[1967,130,11],[1954,103,11],[1979,121,60],[1983,118,67],[1966,190,8],[1966,125,35],[1986,107,82],[1965,172,59],[1985,55,88],[1984,140,67],[1988,104,53],[1978,106,85],[1992,286,5],[1978,108,19],[1992,95,3],[1987,120,61],[1992,117,28],[1981,106,82],[1992,97,68],[1991,115,68],[1924,110,79],[1991,90,69],[1988,118,7],[1991,115,51],[1986,108,52],[1985,97,81],[1985,104,41],[1971,102,83],[1959,91,11],[1926,126,55],[1987,102,79],[1989,118,20],[1988,141,8],[1926,66,76],[1988,103,75],[1974,128,8],[1955,115,15],[1977,136,27],[1988,100,61],[1991,89,37],[1989,103,40],[1996,96,39],[1992,100,9],[1980,124,3],[1988,90,70],[1982,120,70],[1987,100,6],[1971,101,62],[1957,99,25],[1969,86,66],[1958,77,37],[1959,90,16],[1967,100,54],[1968,113,0],[1964,102,81],[1965,100,71],[1962,134,30],[1964,99,34],[1958,90,70],[1955,90,83],[1962,100,8],[1957,90,29],[1973,87,39],[1987,97,48],[1932,92,57],[1987,104,9],[1982,115,8],[1992,101,44],[1988,83,47],[1989,126,14],[1987,95,7],[1982,101,40],[1992,125,50],[1983,134,8],[1986,117,7],[1986,108,11],[1991,116,84],[1990,123,48],[1946,93,66],[1984,95,36],[1990,101,74],[1988,96,52],[1989,103,49],[1988,96,0],[1990,127,69],[1942,123,59],[1972,100,55],[1977,102,2],[1990,105,88],[1948,99,31],[1945,103,80],[1988,76,68],[1985,55,79],[1982,188,7],[1975,120,2],[1971,96,10],[1970,126,83],[1972,90,55],[1935,75,72],[1978,97,49],[1944,114,25],[1958,100,1],[1941,75,32],[1948,100,7],[1982,195,15],[1969,98,2],[1949,117,74],[1946,101,42],[1940,90,14],[1961,120,6],[1956,106,24],[1945,126,31],[1937,91,32],[1938,104,49],[1935,90,88],[1939,87,26],[1938,74,19],[1956,98,33],[1952,110,34],[1953,83,57],[1954,81,69],[1950,107,69],[1969,103,67],[1989,105,16],[1990,90,16],[1987,91,71],[1982,128,45],[1991,91,88],[1970,94,31],[1982,108,84],[1983,60,20],[1984,158,6],[1973,101,54],[1988,172,5],[1972,124,65],[1970,137,0],[1973,116,10],[1952,107,66],[1976,116,35],[1969,94,36],[1991,98,2],[1974,144,81],[1964,112,77],[1977,117,29],[1942,88,12],[1968,103,38],[1973,105,57],[1982,123,69],[1990,102,72],[1987,98,45],[1985,105,49],[1993,60,13],[1974,89,83],[1993,65,62],[1958,101,67],[1981,129,86],[1928,139,44],[1992,106,6],[1985,119,10],[1990,121,48],[1973,129,83],[1958,96,50],[1986,101,45],[1990,135,76],[1948,110,73],[1987,91,66],[1965,123,9],[1980,104,58],[1988,120,17],[1978,114,64],[1988,117,29],[1988,90,13],[1973,122,28],[1975,112,82],[1982,94,70],[1952,109,4],[1954,40,62],[1989,114,18],[1990,102,6],[1971,98,1],[1983,50,28],[1989,80,19],[1987,91,40],[1984,94,75],[1984,99,63],[1989,125,73],[1963,138,80],[1935,64,33],[1988,103,71],[1966,175,63],[1974,313,28],[1977,104,54],[1983,104,51],[1936,84,50],[1986,71,40],[1972,71,50],[1987,119,14],[1990,107,61],[1982,92,65],[1977,91,34],[1989,90,15],[1970,26,22],[1943,99,87],[1992,99,85],[1989,113,86],[1987,103,6],[1987,119,8],[1979,128,32],[1984,106,72],[1986,88,16],[1989,90,77],[1975,109,69],[1985,111,43],[1965,122,44],[1963,120,63],[1931,68,66],[1931,110,22],[1967,111,50],[1968,103,57],[1989,91,12],[1987,118,7],[1980,91,0],[1988,119,77],[1990,111,73],[1991,119,74],[1946,105,65],[1987,101,20],[1948,81,39],[1962,91,37],[1989,108,51],[1992,135,87],[1966,99,15],[1986,88,49],[1991,96,72],[1961,66,4],[1990,110,66],[1962,110,11],[1989,90,20],[1991,95,31],[1989,103,27],[1992,60,13],[1977,123,82],[1987,109,3],[1988,108,7],[1954,96,48],[1957,82,57],[1986,120,71],[1979,122,67],[1989,97,81],[1974,124,81],[1990,106,22],[1980,90,74],[1990,94,35],[1971,74,36],[1969,128,77],[1942,18,75],[1948,103,72],[1965,133,68],[1946,110,20],[1939,96,18],[1950,138,23],[1986,96,33],[1990,89,24],[1988,93,16],[1989,104,53],[1956,120,15],[1992,90,14],[1940,120,61],[1949,115,81],[1986,120,25],[1975,89,4],[1942,85,36],[1989,120,5],[1990,93,33],[1967,90,28],[1992,86,16],[1987,86,24],[1989,95,83],[1983,99,28],[1980,103,40],[1986,108,15],[1983,114,15],[1981,135,87],[1977,107,59],[1980,135,66],[1986,120,87],[1989,127,86],[1990,125,6],[1989,113,5],[1988,103,19],[1981,94,62],[1989,89,10],[1990,56,12],[1940,83,24],[1961,190,39],[1989,60,12],[1990,97,72],[1987,130,71],[1991,158,57],[1987,118,41],[1993,30,60],[1991,99,78],[1979,110,7],[1991,110,81],[1991,102,33],[1946,97,12],[1950,86,30],[1949,93,84],[1985,90,82],[1986,87,39],[1992,85,69],[1976,116,48],[1974,131,55],[1975,117,87],[1977,134,34],[1987,97,23],[1990,104,20],[1947,118,10],[1974,117,1],[1981,117,5],[1980,144,32],[1945,91,54],[1993,92,34],[1979,92,51],[1970,123,3],[1984,105,41],[1991,86,64],[1961,140,20],[1966,103,60],[1974,127,41],[1985,96,45],[1967,103,80],[1993,53,48],[1992,97,8],[1987,94,41],[1987,94,38],[1956,101,32],[1985,84,20],[1984,85,14],[1992,108,80],[1986,103,8],[1979,115,8],[1986,95,52],[1981,96,80],[1936,70,28],[1978,450,1],[1990,103,47],[1992,90,67],[1988,95,2],[1987,95,84],[1984,112,83],[1991,106,55],[1977,113,17],[1981,116,76],[1979,110,64],[1989,118,66],[1989,101,42],[1980,124,33],[1977,121,44],[1983,132,4],[1991,104,72],[1956,99,52],[1957,120,84],[1990,89,79],[1972,129,75],[1979,122,43],[1986,100,6],[1971,114,15],[1979,113,34],[1965,97,62],[1940,130,78],[1944,96,44],[1973,87,31],[1992,85,26],[1989,83,80],[1990,94,10],[1983,95,30],[1991,118,8],[1988,98,76],[1972,92,33],[1989,103,32],[1987,92,45],[1934,85,57],[1931,74,66],[1930,92,0],[1926,109,72],[1928,90,83],[1935,96,35],[1936,110,74],[1931,91,67],[1929,100,70],[1932,112,81],[1931,84,64],[1939,108,40],[1933,97,82],[1928,96,72],[1925,125,73],[1929,74,73],[1932,71,85],[1930,76,62],[1962,105,60],[1982,116,8],[1989,86,88],[1953,120,50],[1979,120,24],[1975,124,6],[1987,93,44],[1989,120,49],[1975,111,64],[1979,89,75],[1986,122,77],[1959,102,76],[1954,108,56],[1961,134,43],[1983,109,3],[1988,100,60],[1987,102,44],[1986,120,69],[1978,117,11],[1988,134,8],[1966,95,20],[1964,51,27],[1979,180,62],[1932,66,66],[1986,120,19],[1990,126,82],[1988,115,25],[1992,133,11],[1991,76,40],[1988,99,2],[1991,136,8],[1984,108,17],[1986,105,20],[1969,125,59],[1991,186,81],[1992,99,58],[1985,100,73],[1983,82,24],[1984,93,14],[1987,107,66],[1982,117,1],[1987,126,6],[1992,111,58],[1989,89,59],[1938,298,82],[1936,89,86],[1976,99,85],[1988,88,65],[1983,93,61],[1987,96,5],[1989,87,36],[1991,96,23],[1986,110,8],[1987,93,51],[1989,104,9],[1970,94,41],[1984,100,81],[1978,112,46],[1982,109,74],[1972,109,82],[1987,112,6],[1974,103,23],[1992,102,14],[1986,106,22],[1984,100,53],[1980,102,49],[1974,109,28],[1980,110,61],[1991,115,55],[1931,95,84],[1972,78,84],[1964,132,29],[1952,98,49],[1948,87,16],[1940,81,57],[1946,110,57],[1948,98,48],[1990,105,84],[1964,130,2],[1987,85,0],[1984,90,3],[1985,95,27],[1949,84,87],[1964,170,10],[1960,123,32],[1976,106,6],[1961,109,60],[1956,121,21],[1952,95,48],[1968,134,78],[1991,132,75],[1967,108,50],[1957,153,51],[1975,107,76],[1981,109,23],[1991,101,62],[1991,116,73],[1991,145,34],[1991,115,22],[1991,113,25],[1991,112,66],[1991,113,12],[1992,95,31],[1986,114,2],[1987,112,50],[1988,163,32],[1991,99,53],[1988,87,42],[1966,120,20],[1989,83,9],[1943,265,77],[1992,88,19],[1987,95,71],[1980,99,70],[1973,102,11],[1987,94,5],[1947,56,61],[1928,148,33],[1986,90,63],[1987,105,71],[1934,80,9],[1950,104,42],[1991,89,10],[1987,94,23],[1989,114,68],[1980,110,65],[1982,136,59],[1980,106,62],[1940,127,51],[1987,112,27],[1989,121,5],[1992,102,8],[1932,65,61],[1991,108,60],[1990,93,1],[1980,129,45],[1988,97,6],[1982,120,73],[1987,89,17],[1973,103,79],[1970,129,18],[1988,89,76],[1986,100,5],[1955,108,58],[1989,90,88],[1981,132,80],[1927,60,45],[1937,59,6],[1991,83,3],[1983,90,10],[1987,86,47],[1990,131,44],[1986,100,74],[1963,99,7],[1991,117,43],[1988,102,43],[1948,100,29],[1973,103,3],[1983,97,34],[1975,91,42],[1990,108,19],[1977,89,42],[1955,67,66],[1988,103,14],[1983,97,51],[1989,145,8],[1991,120,8],[1977,94,68],[1979,96,82],[1981,195,76],[1986,105,84],[1977,136,54],[1972,175,8],[1974,201,8],[1976,109,6],[1972,86,81],[1975,82,84],[1973,88,59],[1970,130,88],[1980,117,82],[1977,143,76],[1977,124,36],[1972,98,40],[1953,116,71],[1955,103,69],[1954,113,25],[1945,69,26],[1939,101,11],[1956,129,69],[1989,113,69],[1992,79,69],[1978,145,87],[1987,90,77],[1970,90,49],[1989,96,1],[1990,107,3],[1987,101,21],[1984,150,27],[1984,96,73],[1987,95,8],[1989,104,25],[1987,90,71],[1986,90,41],[1969,102,78],[1984,106,78],[1935,54,50],[1990,98,14],[1986,98,66],[1977,127,35],[1989,119,53],[1987,88,62],[1989,86,55],[1987,90,15],[1990,110,23],[1978,114,67],[1986,89,79],[1953,94,82],[1969,80,44],[1988,98,4],[1981,119,62],[1988,116,2],[1990,101,60],[1985,118,71],[1986,84,28],[1987,109,11],[1983,94,4],[1983,91,56],[1990,94,7],[1989,93,65],[1990,115,66],[1988,127,62],[1992,128,7],[1992,121,24],[1949,58,19],[1978,109,78],[1971,137,8],[1990,100,49],[1960,109,56],[1957,112,43],[1987,95,36],[1951,122,75],[1986,93,86],[1971,84,80],[1985,128,69],[1989,116,76],[1978,132,21],[1974,90,21],[1980,116,57],[1977,109,18],[1986,105,1],[1938,96,27],[1987,95,41],[1959,88,63],[1929,68,2],[1941,95,3],[1986,132,47],[1982,101,4],[1987,114,34],[1991,60,9],[1991,94,31],[1960,101,52],[1961,100,83],[1954,107,40],[1963,118,73],[1957,109,84],[1978,111,53],[1964,188,62],[1961,172,10],[1958,114,13],[1953,92,54],[1977,91,80],[1977,105,80],[1979,112,52],[1991,145,49],[1991,130,55],[1992,116,1],[1986,141,56],[1986,94,75],[1992,98,55],[1979,85,17],[1991,118,13],[1943,64,83],[1986,125,20],[1991,102,2],[1962,150,80],[1989,101,70],[1990,107,25],[1988,95,73],[1979,129,31],[1983,132,32],[1967,99,36],[1990,101,63],[1970,105,36],[1992,84,71],[1939,85,83],[1988,100,74],[1991,118,54],[1992,60,24],[1991,40,65],[1990,60,32],[1987,50,75],[1990,5,77],[1991,16,63],[1988,83,78],[1986,103,9],[1990,120,70],[1974,124,50],[1988,85,33],[1980,94,4],[1949,110,72],[1981,104,44],[1989,81,15],[1955,92,70],[1986,130,77],[1986,130,61],[1977,110,6],[1991,113,62],[1981,86,68],[1991,144,8],[1992,101,38],[1986,119,6],[1986,119,20],[1989,88,65],[1980,86,66],[1989,100,69],[1992,89,0],[1990,92,28],[1990,110,18],[1992,95,71],[1967,81,27],[1990,181,8],[1987,130,81],[1988,120,41],[1975,130,69],[1988,109,75],[1991,98,6],[1988,110,71],[1989,109,30],[1983,101,2],[1987,102,57],[1986,109,8],[1985,112,59],[1988,111,6],[1984,109,50],[1983,98,51],[1988,106,18],[1965,199,26],[1989,105,11],[1981,91,17],[1957,89,47],[1964,120,39],[1975,103,35],[1988,98,34],[1990,117,3],[1991,206,78],[1991,97,80],[1991,87,8],[1987,118,3],[1970,194,81],[1973,127,40],[1962,123,85],[1989,102,9],[1988,90,17],[1960,103,88],[1972,128,59],[1981,97,79],[1976,97,70],[1977,137,8],[1989,89,41],[1980,100,29],[1991,240,3],[1950,112,77],[1992,61,60],[1953,95,65],[1991,94,25],[1987,109,25],[1986,94,61],[1989,110,51],[1984,102,10],[1990,127,6],[1986,113,66],[1982,107,73],[1948,89,35],[1952,99,50],[1974,117,66],[1970,99,29],[1955,100,80],[1968,90,33],[1976,94,70],[1987,86,45],[1988,91,72],[1974,111,73],[1941,85,38],[1990,102,29],[1940,105,88],[1955,100,28],[1987,60,44],[1963,112,2],[1951,111,5],[1987,83,50],[1956,124,11],[1992,88,61],[1970,91,75],[1991,108,58],[1988,83,27],[1980,109,25],[1991,110,11],[1989,107,87],[1991,85,78],[1986,87,57],[1987,103,37],[1990,90,40],[1950,93,65],[1992,112,45],[1958,128,10],[1987,91,47],[1946,93,25],[1941,57,85],[1940,57,87],[1941,56,85],[1992,137,26],[1950,105,64],[1957,107,29],[1939,94,75],[1971,110,68],[1992,153,74],[1983,72,0],[1952,90,14],[1969,101,62],[1953,79,26],[1929,129,65],[1978,126,40],[1981,104,32],[1986,98,47],[1982,92,67],[1985,108,62],[1955,116,8],[1969,127,72],[1987,139,45],[1969,114,39],[1991,117,6],[1993,95,53],[1988,90,10],[1976,128,0],[1986,94,23],[1984,95,75],[1990,95,8],[1986,112,36],[1984,102,21],[1992,100,88],[1986,89,47],[1978,88,27],[1989,122,12],[1979,94,22],[1980,180,75],[1986,107,12],[1982,115,64],[1989,104,8],[1985,121,68],[1989,114,66],[1985,115,62],[1991,124,3],[1988,116,50],[1989,74,87],[1971,108,22],[1973,106,83],[1988,360,12],[1927,78,31],[1989,93,59],[1990,89,0],[1962,119,39],[1992,121,28],[1991,88,59],[1988,94,80],[1988,96,9],[1990,95,88],[1962,123,77],[1972,105,42],[1965,108,83],[1965,108,46],[1983,98,8],[1988,90,9],[1990,106,50],[1988,107,23],[1988,94,67],[1989,94,7],[1987,87,36],[1970,100,3],[1990,93,6],[1991,119,6],[1987,97,66],[1987,103,35],[1989,122,6],[1949,100,74],[1974,104,6],[1989,84,29],[1988,90,59],[1986,84,6],[1991,60,54],[1977,100,75],[1971,111,69],[1984,90,73],[1990,97,28],[1986,104,88],[1985,135,13],[1993,104,80],[1989,88,29],[1987,153,6],[1991,102,5],[1969,135,66],[1986,96,75],[1987,90,49],[1943,60,27],[1986,98,11],[1987,93,73],[1990,119,43],[1991,111,19],[1991,142,4],[1940,56,17],[1992,53,20],[1990,106,58],[1986,111,59],[1992,96,5],[1951,101,17],[1979,198,86],[1991,87,53],[1980,92,80],[1969,110,29],[1968,121,22],[1980,92,35],[1986,120,6],[1989,110,28],[1976,90,24],[1988,81,7],[1992,128,24],[1988,92,42],[1992,138,8],[1991,98,75],[1958,167,10],[1988,89,79],[1990,98,5],[1947,96,84],[1990,109,23],[1988,91,48],[1991,110,26],[1939,109,30],[1988,81,24],[1987,120,41],[1988,97,4],[1990,102,17],[1991,135,60],[1990,98,41],[1972,99,60],[1991,135,20],[1966,127,79],[1992,213,13],[1982,128,88],[1985,96,86],[1971,90,42],[1974,105,20],[1973,100,65],[1968,105,26],[1970,107,72],[1971,102,72],[1986,103,12],[1986,89,42],[1984,110,48],[1989,97,71],[1975,105,59],[1968,360,80],[1992,96,79],[1991,90,9],[1979,90,46],[1991,80,30],[1990,83,27],[1975,118,32],[1973,127,28],[1987,155,72],[1924,95,74],[1966,102,69],[1986,90,88],[1972,100,4],[1986,83,38],[1983,123,7],[1986,82,54],[1971,118,62],[1988,93,58],[1981,115,20],[1976,90,78],[1988,103,13],[1936,77,74],[1977,105,11],[1985,56,86],[1954,110,22],[1960,185,67],[1955,150,70],[1992,95,77],[1988,116,11],[1963,93,36],[1987,99,3],[1986,93,84],[1987,110,5],[1965,128,37],[1988,101,30],[1988,120,24],[1978,103,11],[1986,97,40],[1985,116,45],[1990,88,41],[1986,90,29],[1985,88,31],[1964,101,51],[1979,88,2],[1982,122,30],[1989,99,20],[1987,97,36],[1933,72,66],[1992,95,25],[1985,117,45],[1986,96,10],[1993,103,8],[1981,111,7],[1967,114,67],[1992,123,41],[1990,113,8],[1987,115,13],[1992,104,36],[1967,85,64],[1987,135,32],[1991,112,75],[1978,183,82],[1984,106,31],[1986,108,57],[1983,131,52],[1982,151,64],[1985,161,88],[1981,127,37],[1985,124,9],[1988,122,67],[1989,99,43],[1992,103,61],[1991,28,36],[1970,129,67],[1987,100,52],[1983,134,46],[1968,151,30],[1990,97,39],[1970,140,75],[1941,117,2],[1984,102,5],[1949,78,66],[1989,99,6],[1991,111,35],[1983,105,54],[1957,173,74],[1975,101,72],[1967,109,81],[1972,110,87],[1968,109,60],[1963,243,80],[1992,130,82],[1977,110,61],[1956,201,61],[1985,94,62],[1943,90,79],[1993,76,26],[1972,108,80],[1991,60,21],[1973,99,54],[1991,117,13],[1931,125,44],[1992,83,32],[1988,101,74],[1991,95,66],[1935,234,87],[1988,60,85],[1980,97,76],[1948,127,4],[1937,100,78],[1987,91,5],[1988,85,44],[1983,91,65],[1987,93,16],[1990,87,33],[1985,116,9],[1963,80,2],[1963,95,79],[1959,100,3],[1961,154,71],[1986,99,41],[1989,88,42],[1990,110,7],[1989,109,51],[1986,221,53],[1941,94,79],[1939,80,53],[1940,95,52],[1986,103,62],[1989,84,57],[1985,130,25],[1983,90,68],[1984,101,4],[1985,106,68],[1984,106,83],[1988,121,56],[1955,117,4],[1958,98,4],[1988,90,56],[1986,91,10],[1968,139,83],[1966,81,81],[1968,88,37],[1969,101,6],[1984,96,7],[1957,147,19],[1968,158,57],[1985,95,58],[1990,88,30],[1959,88,8],[1970,97,45],[1971,102,47],[1981,88,65],[1967,127,49],[1988,89,85],[1990,85,75],[1991,114,33],[1992,121,39],[1986,109,16],[1991,104,64],[1988,96,71],[1986,137,82],[1992,115,59],[1997,109,60],[1979,117,83],[1985,97,64],[1984,96,50],[1973,96,73],[1974,82,84],[1972,92,37],[1966,101,70],[1967,107,67],[1977,120,86],[1969,110,48],[1975,90,75],[1968,106,9],[1973,119,39],[1972,87,9],[1988,161,24],[1955,60,82],[1987,88,74],[1966,126,3],[1990,92,46],[1987,106,23],[1970,91,62],[1989,107,64],[1990,108,30],[1987,97,67],[1989,93,43],[1974,114,59],[1973,112,39],[1953,96,67],[1980,111,40],[1989,91,47],[1956,83,51],[1973,102,12],[1989,90,24],[1992,139,64],[1982,125,1],[1987,101,54],[1986,116,39],[1970,90,17],[1965,106,51],[1955,109,23],[1977,90,62],[1968,100,33],[1992,96,76],[1964,102,61],[1985,94,84],[1991,160,88],[1956,119,9],[1979,105,5],[1955,111,82],[1961,153,38],[1970,110,68],[1966,95,12],[1966,104,6],[1987,134,68],[1989,117,57],[1968,102,32],[1961,98,54],[1960,135,3],[1991,144,49],[1943,108,32],[1950,85,27],[1950,110,72],[1947,103,4],[1975,93,85],[1949,90,57],[1987,103,69],[1993,75,0],[1947,87,17],[1990,92,18],[1953,94,71],[1954,91,78],[1977,146,84],[1979,119,10],[1987,100,73],[1966,103,34],[1962,183,7],[1986,128,8],[1988,92,78],[1986,85,8],[1988,138,14],[1981,118,84],[1991,102,32],[1987,164,1],[1962,100,35],[1983,90,72],[1989,91,35],[1991,105,72],[1963,113,38],[1961,141,26],[1937,61,83],[1987,94,68],[1936,87,4],[1991,114,16],[1931,87,60],[1979,95,68],[1984,90,45],[1979,153,8],[1990,94,7],[1963,81,88],[1963,86,85],[1975,87,21],[1986,97,63],[1991,87,24],[1979,87,47],[1987,119,7],[1986,91,16],[1989,90,75],[1990,94,10],[1984,130,25],[1974,89,31],[1975,94,76],[1982,136,64],[1987,91,82],[1966,107,40],[1990,86,1],[1945,135,88],[1991,125,46],[1949,59,1],[1949,60,8],[1949,60,31],[1950,60,6],[1948,59,72],[1948,60,18],[1948,59,61],[1968,73,8],[1983,112,54],[1971,104,88],[1985,114,70],[1961,113,77],[1983,69,6],[1984,77,36],[1958,83,54],[1982,111,53],[1989,150,75],[1988,84,50],[1953,92,3],[1960,122,31],[1966,76,70],[1966,82,26],[1970,93,26],[1953,95,63],[1935,88,8],[1969,126,12],[1930,95,50],[1954,123,52],[1937,80,43],[1976,95,88],[1981,124,6],[1982,81,81],[1975,129,6],[1981,117,39],[1970,146,84],[1989,109,63],[1990,96,30],[1985,82,41],[1986,107,49],[1984,141,6],[1992,85,25],[1979,94,11],[1983,107,33],[1971,121,7],[1945,82,27],[1952,112,86],[1968,133,36],[1990,90,84],[1937,71,45],[1956,106,84],[1974,121,64],[1971,138,83],[1991,117,45],[1943,82,85],[1991,97,50],[1946,110,31],[1960,152,65],[1951,166,40],[1951,83,59],[1962,96,6],[1955,200,9],[1957,110,60],[1961,110,60],[1980,161,74],[1952,134,36],[1987,90,26],[1926,139,49],[1946,106,55],[1920,137,29],[1954,90,27],[1928,130,49],[1933,120,4],[1991,95,48],[1990,129,78],[1989,30,35],[1989,55,6],[1991,130,34],[1944,139,45],[1982,93,23],[1974,109,63],[1987,120,37],[1945,94,76],[1969,161,46],[1964,105,16],[1976,92,25],[1984,83,34],[1987,85,37],[1944,100,35],[1988,75,32],[1978,90,2],[1955,87,28],[1957,97,12],[1971,88,44],[1984,92,17],[1991,89,54],[1970,89,69],[1940,90,27],[1967,85,28],[1980,99,62],[1970,111,72],[1990,135,8],[1966,123,75],[1973,91,6],[1942,101,61],[1991,87,52],[1982,92,68],[1987,90,51],[1989,118,11],[1973,93,47],[1987,90,42],[1971,115,7],[1970,117,50],[1988,90,3],[1981,94,9],[1935,60,71],[1988,92,57],[1990,97,29],[1989,94,25],[1987,116,21],[1976,139,45],[1987,73,59],[1969,144,50],[1988,92,13],[1988,100,26],[1975,112,16],[1972,116,88],[1970,112,75],[1973,122,28],[1989,86,31],[1979,108,12],[1983,86,24],[1986,140,71],[1951,102,23],[1985,92,49],[1980,123,71],[1972,92,29],[1983,102,52],[1984,96,20],[1991,60,82],[1987,98,87],[1990,96,63],[1990,103,22],[1972,128,58],[1985,95,22],[1970,170,8],[1969,123,33],[1985,131,61],[1976,112,39],[1987,86,69],[1991,60,6],[1938,55,83],[1938,55,33],[1982,92,75],[1971,109,60],[1979,112,22],[1948,88,77],[1976,132,36],[1990,126,8],[1993,90,48],[1991,193,56],[1924,123,63],[1986,120,8],[1963,89,79],[1971,100,65],[1971,88,79],[1985,104,19],[1938,96,21],[1990,59,48],[1982,150,80],[1947,61,80],[1947,56,43],[1937,60,53],[1992,54,43],[1947,58,21],[1947,53,66],[1949,66,62],[1992,53,78],[1937,60,52],[1937,60,17],[1949,59,40],[1991,102,47],[1957,73,5],[1953,79,7],[1982,136,5],[1960,164,29],[1986,91,24],[1985,109,54],[1990,128,8],[1952,93,23],[1949,119,7],[1976,176,41],[1990,98,17],[1982,111,51],[1956,97,56],[1955,57,51],[1962,182,35],[1986,60,4],[1986,60,24],[1986,59,79],[1986,58,63],[1982,101,86],[1981,127,2],[1993,108,6],[1992,60,22],[1990,60,6],[1992,52,40],[1977,255,82],[1989,90,5],[1972,15,21],[1992,92,26],[1992,163,68],[1959,60,54],[1992,118,55],[1987,95,25],[1992,165,45],[1993,88,26],[1986,119,54],[1993,102,25],[1989,116,65],[1959,60,26],[1973,105,54],[1992,72,70],[1982,208,84],[1975,85,60],[1953,120,0],[1991,128,18],[1990,45,3],[1992,112,1],[1992,105,60],[1992,83,65],[1992,166,6],[1992,133,68],[1986,49,67],[1990,92,3],[1990,50,49],[1986,52,65],[1989,90,52],[1991,98,81],[1986,60,3],[1974,60,34],[1992,101,76],[1983,91,24],[1990,98,54],[1991,99,6],[1953,91,56],[1992,121,86],[1992,102,26],[1992,136,66],[1989,61,12],[1987,60,14],[1991,60,44],[1988,96,26],[1991,80,41],[1991,91,2],[1956,27,45],[1953,75,80],[1975,95,72],[1991,101,23],[1991,101,75],[1960,91,5],[1991,121,77],[1991,84,7],[1988,65,79],[1958,92,30],[1989,90,70],[1979,90,38],[1942,253,31],[1992,87,68],[1967,105,47],[1980,93,13],[1991,108,19],[1991,101,76],[1991,92,44],[1957,60,1],[1992,56,44],[1939,55,73],[1934,54,23],[1932,210,68],[1965,165,66],[1939,112,3],[1939,110,24],[1938,110,28],[1991,56,78],[1992,59,8],[1991,52,35],[1991,56,3],[1934,54,48],[1992,112,5],[1991,54,28],[1991,53,75],[1993,58,77],[1992,134,16],[1991,52,30],[1932,226,19],[1989,103,43],[1988,78,19],[1977,75,18],[1991,65,4]]

First we convert it to a NumPy array for convenience. 

In [None]:
import numpy as np

npd = np.array(dat)
print(npd.shape)

Now we plot the scatter plot with the year along x axis, duration along y axis and the color bar represents popularity

In [None]:
import matplotlib.pyplot as plt

year = npd[:, 0]
durat = npd[:, 1]
pop = npd[:, 2]

fig, ax = plt.subplots()

im = ax.scatter(year, durat, c=pop, cmap='cool')
ax.set_xlabel("year")
ax.set_ylabel("duration")

cb = fig.colorbar(im);
cb.set_label("popularity")

Inspecting this plot we can draw the following conclusions

- The most typical duration is about 100 minutes and the popularity of such movies can be both low and high. 
- Very long movies (near 400) begin to appear in recent years and they are not popular. 
- Moderately lengthy movies (near 300) can be very popular.
- Very short movies (near 0) are rather popular.

### Histograms

Assume we analyze ages of first year students in some university. 
We want to know how many different ages are presented, what ages are 
more typical, how many youngest students we have and so on.

File with ages is located in a repository. First we download it and then analyze.

In [None]:
# This module allows to work with web pages
import requests

# This is an URL of a repository
base_url = "https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"

# We need a file with this name
file_name = "student_ages.txt"

# Here we downlaod the file
web_data = requests.get(base_url + file_name)
assert web_data.status_code == 200

# The result is an object whose attribute text represnets a file as a text
print(web_data.text)

We have now a long string. Observe the line breaks, i.e., symbols '\n' inside the string. 
They are not seen explicitly but the string is divided nice into many lines. 

We need to transform it to a list of integer numbers.

First we need to remove line breaks. Strings in Python have a method that replaces one substring with another:
```python
s.replace(<substring_to_find>, <new_substring_to_replace_with>)
```

Then we split the string by the symbol `","`

In [None]:
# Remove "\n" with empty string
s = web_data.text.replace("\n", "")
print(s)

In [None]:
# Split line by commans to obtain a list
str_dat = s.split(",")
print(str_dat)

In [None]:
# Now run over lits and convert it to int
dat = [int(n) for n in str_dat]
print(dat)

Now we a ready to analyze the data. We plot a histogram.

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.hist(dat);

The histogram shows how many times each value appears in a dataset. But Matplotlib histograms are adopted for floats. All range of values (from the smallest to the largest ones) is divided into a some number of bins and then a number of falling within each bin is counted. For integers however it is natural to have one bin for each value. 

Thus we need to tune the bin number. First compute how many bins we need.

In [None]:
a_min = min(dat)
a_max = max(dat)
print(f"a_min={a_min}, a_max={a_max}")
n_bins = a_max - a_min + 1
print(f"n_bins={n_bins}")

Now replot the histogram

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.hist(dat, bins=n_bins)
ax.set_xlabel("Age")
ax.set_ylabel("Number of students");

We see that most of students have ages from 17 to 19. But there are very small number of much elder ones. 

It is inconvenient to compare very different values. In situation like this we plot vertically not a number of the students, but logarithm of this number. Python has a special command for it.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.hist(dat, bins=n_bins)
ax.set_xlabel("Age")
ax.set_ylabel("Number of students")
ax.set_yscale('log');  # Here we set logarithmic scale along a vertical axis

Logarithmic scale allowed to notice the second group of students: those who are about 30.

Histogram can be tuned up.

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

ax.hist(dat, bins=n_bins, edgecolor='k', color='g', rwidth=0.75)
ax.set_xlabel("Age")
ax.set_ylabel("Number of students")
ax.set_yscale('log');  

Here `edgecolor` setups color for edges of the bars, `color` controls the filling color, and `rwidth` is a relative with of the bars.

### Multiple histograms 

Two or more histograms can be plotted together in one axes.

Assume that we have ages of first year students from two universities and we need to compare them.

First we load both files from the repository and process them in the same way as above.

In [None]:
# This module allows to work with web pages
import requests

# This is an URL of a repository
base_url = "https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"

# We need two files with this names
file_name1 = "student_ages.txt"
file_name2 = "student_ages2.txt"

# Here we downlaod the file
web_data1 = requests.get(base_url + file_name1)
assert web_data1.status_code == 200

web_data2 = requests.get(base_url + file_name2)
assert web_data2.status_code == 200

# Remove "\n" with empty string
s1 = web_data1.text.replace("\n", "")
s2 = web_data2.text.replace("\n", "")

# Split line by commans to obtain a list
str_dat1 = s1.split(",")
str_dat2 = s2.split(",")

# Now run over lits and convert it to int
dat1 = [int(n) for n in str_dat1]
dat2 = [int(n) for n in str_dat2]

print(dat1)
print(dat2)

In [None]:
# We merge two lists to find their common min and max
merged_dat = dat1 + dat2

a_min = min(merged_dat)
a_max = max(merged_dat)
print(f"a_min={a_min}, a_max={a_max}")
n_bins = a_max - a_min + 1
print(f"n_bins={n_bins}")

Now we plot the histograms: two datasets are passed together bing wrapped into a two element list: `[dat1, dat2]`

In [None]:
import matplotlib.pyplot as plt

fig, ax = plt.subplots()

# Here we divide n_bins by 2 to merge neighr bins and make bars wider
ax.hist([dat1, dat2], bins=n_bins//2, edgecolor='k')

# Notice the way we add the legend: We specify the labels here
ax.legend(["University 1", "University 2"])

ax.set_xlabel("Age")
ax.set_ylabel("Number of students")
ax.set_yscale('log');

Observe that when two histograms are plotted together widths of bars are decreased and for each bin two bars are shown: one bar per each dataset.

### Bar plots

When the dataset is short it can be represented via bars. Basically this is the same as line plot, but instead of line segments connecting different point we draw a rectangle bar for each point.

Below are the dataset with passing rates for different filed of studies in one of Russian Universities recorder for 10 years from 2011 to 2020. We are going to show them as a bar plot.

In [None]:
# Year, Instrumentation Engineering, Information Security, Computer Sciences
dat = [[2011, 170, 174, 188], [2012, 150, 191, 173], [2013, 172, 215, 194], [2014, 159, 195, 194], [2015, 159, 206, 194], [2016, 154, 200, 199], [2017, 163, 202, 204], [2018, 168, 202, 203], [2019, 148, 210, 209], [2020, 144, 205, 194]]

First we plot vertical bars

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

W = 0.25

npd = np.array(dat)
ax.bar(npd[:, 0],       npd[:, 1], width=W, label="Instrumentation Engineering")
ax.bar(npd[:, 0] + W,   npd[:, 2], width=W, label="Information Security")
ax.bar(npd[:, 0] + W*2, npd[:, 3], width=W, label="Computer Sciences")
ax.legend() 
ax.set_xlabel('Year')
ax.set_ylabel('Passing rate');
#ax.legend(loc='lower center');

Bars can also be horizontal. Observe how x and y are swapped.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

H = 0.25

npd = np.array(dat)
ax.barh(npd[:, 0],       npd[:, 1], height=H, label="Instrumentation Engineering")
ax.barh(npd[:, 0] + H,   npd[:, 2], height=H, label="Information Security")
ax.barh(npd[:, 0] + H*2, npd[:, 3], height=H, label="Computer Sciences")
ax.set_ylabel('Year')
ax.set_xlabel('Passing rate');
ax.legend(loc='upper left');

### Exercises

1\. Create a plot with graphs of two functions: $y=e^{-x^2}$ and $y=3x^2 e^{-x^2}$. Add x and y axes labels and the legend. Turn on the grid. Customize curve colors and styles.

2\. File `"apple_vs_cabot.csv"` that you can find at `"https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"` contains stock prices of "Apple Inc." and "Cabot Oil & Gas" recorded  from 08 Feb 2013 till 07 Feb 2018. For simplicity instead of the actual date a day number within this range is stored in the file. Using an example above read this file and create a line plot demonstrating variation of theses two prices.

In [None]:
# Total energy supply (TES) by source, World 2015
# Unit of measurement: Thousand tons of oil equivalent (ktoe)
# ["Coal", "Natural gas", "Nuclear", "Hydro", "Wind, solar, etc", "Biofuels and waste", "Oil"]
dat = [3842742,2928795,670172,334851,203821,1271235,4328233]

3\. Above is the dataset describing world total energy supply recorded at 2015. Create a bar plot for it. Write text labels with values on each bar. Add x and y axes labels.

4\. File `"moscow_temp.csv"` that you can find at `"https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"` contains daily records of average temperature in Moscow for many years. Read this file and create a histogram showing how often certain temperatures are registered. The data are represented in Fahrenheit. Before plotting convert them into  Celsius. Add appropriate axes labels. 

## Lesson 2

### Text labels

We now discuss how to write some text on plots

We again take a dataset of passing rates, but consider only one speciality over five years. First we draw a bar plot.

In [None]:
# Year, Instrumentation Engineering, Information Security, Computer Sciences
dat = [[2011, 170, 174, 188], [2012, 150, 191, 173], [2013, 172, 215, 194], [2014, 159, 195, 194], [2015, 159, 206, 194], [2016, 154, 200, 199], [2017, 163, 202, 204], [2018, 168, 202, 203], [2019, 148, 210, 209], [2020, 144, 205, 194]]

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

npd = np.array(dat)

N = 5  # take last five years
years = npd[N:, 0]  # take column of years 
instreng = npd[N:, 1] # take column of Instrumentation Engineering

ax.bar(years, instreng)
ax.set_xlabel('Year')
ax.set_ylabel('Passing rate');

To add text labels we need to know what to plot and where to plot. This information contains in the dataset. Let us see how we can iterate over it and take all we need.

In [None]:
# We can iterate directly over npd
for data_row in npd:
    print(data_row)

In [None]:
# We need only 5 years
N = 5
for data_row in npd[N:,]:
    print(data_row)

In [None]:
# On each iteration we have all we need. Just take it
N = 5
for data_row in npd[N:,]:
    x = data_row[0]
    y = data_row[1]
    s = str(data_row[1])
    print(x, y, s)  # we print the rate twice: as integers and as text (looks the same)

Now we know what we need. Add this code to the plotting code

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

npd = np.array(dat)

N = 5  # take last five years
years = npd[N:, 0]  # take column of years 
instreng = npd[N:, 1] # take column of Instrumentation Engineering

ax.bar(years, instreng)
ax.set_xlabel('Year')
ax.set_ylabel('Passing rate');

# Adding text labels
for data_row in npd[N:,]:
    x = data_row[0]
    y = data_row[1]
    s = str(data_row[1])
    
    ax.text(x, y, s) # add text label s, the text starts at (x,y)

It does not look good. It will be better to center numbers withing bars and move them either above or below the bar top.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

npd = np.array(dat)

N = 5  # take last five years
years = npd[N:, 0]  # take column of years 
instreng = npd[N:, 1] # take column of Instrumentation Engineering

ax.bar(years, instreng)
ax.set_xlabel('Year')
ax.set_ylabel('Passing rate');

# Adding text labels
for data_row in npd[N:,]:
    x = data_row[0]
    y = data_row[1]
    s = str(data_row[1])
    
    ax.text(x, y+5, s, horizontalalignment='center')  # move the labels up and ceneter at (x,y)

Almost good. We just need more space above. Also there is a more convenient version of a named parameter: `ha` does the same as `horizontalalignment`.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

npd = np.array(dat)

N = 5  # take last five years
years = npd[N:, 0]  # take column of years 
instreng = npd[N:, 1] # take column of Instrumentation Engineering

ax.bar(years, instreng)
ax.set_xlabel('Year')
ax.set_ylabel('Passing rate');

# Adding text labels
for data_row in npd[N:,]:
    x = data_row[0]
    y = data_row[1]
    s = str(data_row[1])
    
    ax.text(x, y+5, s, ha='center')  # move the labels up and ceneter at (x,y)
    
ax.set_ylim(0, 190);  # here we add more space above

Finally, we can change the font properties. This is done via `fontdict`. Font name is given by parameter `family`. Either font family or specific font name can be passed here.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

npd = np.array(dat)

N = 5  # take last five years
years = npd[N:, 0]  # take column of years 
instreng = npd[N:, 1] # take column of Instrumentation Engineering

ax.bar(years, instreng)
ax.set_xlabel('Year')
ax.set_ylabel('Passing rate');

fontdict = {"family": "serif", "size": "large", "weight": "bold", "color": "green"}

# Adding text labels
for data_row in npd[N:,]:
    x = data_row[0]
    y = data_row[1]
    s = str(data_row[1])
    
    ax.text(x, y+5, s, ha='center', fontdict=fontdict)
    
ax.set_ylim(0, 190);

### Annotating: text with an arrow

Sometimes not only a text is required but also an arrow should be plotted to indicate a particular position on the plot. 

To show an example how it can be done we again use curves AMD vs Intel stock prices.

In [None]:
# AMD and INTEL stock proces from 2013-02-08 to 2014-09-10 (YYYY-MM-DD)
dat = [[0,2.59,21.0],[3,2.67,21.03],[4,2.77,21.19],[5,2.75,21.25],[6,2.75,21.23],[7,2.71,21.115],[11,2.82,21.085],[12,2.7,20.73],[13,2.6,20.25],[14,2.61,20.42],[17,2.53,20.23],[18,2.46,20.58],[19,2.53,20.93],[20,2.49,20.88],[21,2.42,21.03],[24,2.4,21.27],[25,2.43,21.51],[26,2.43,21.75],[27,2.55,21.89],[28,2.56,21.58],[31,2.59,21.69],[32,2.61,21.64],[33,2.6,21.655],[34,2.63,21.65],[35,2.6,21.375],[38,2.65,21.26],[39,2.67,21.14],[40,2.75,21.18],[41,2.64,21.04],[42,2.54,21.33],[45,2.51,21.15],[46,2.54,21.765],[47,2.55,21.83],[48,2.55,21.835],[52,2.44,21.43],[53,2.39,21.455],[54,2.32,21.05],[55,2.33,21.135],[56,2.29,20.94],[59,2.59,21.09],[60,2.63,21.75],[61,2.61,22.26],[62,2.52,21.825],[63,2.48,21.675],[66,2.4,21.38],[67,2.44,21.915],[68,2.4,21.93],[69,2.51,22.24],[70,2.47,22.44],[73,2.46,22.88],[74,2.53,23.375],[75,2.61,23.66],[76,2.68,23.38],[77,2.64,23.4],[80,2.68,23.76],[81,2.82,23.95],[82,3.22,23.99],[83,3.41,24.11],[84,3.6,23.96],[87,3.61,23.91],[88,3.54,24.15],[89,3.83,24.25],[90,3.86,24.36],[91,3.95,24.5],[94,4.17,24.08],[95,4.26,23.84],[96,4.38,24.2],[97,3.83,23.94],[98,4.07,24.04],[101,4.1,24.08],[102,4.02,24.15],[103,3.96,24.07],[104,4.01,24.05],[105,4.05,23.923],[109,4.05,24.08],[110,3.98,24.27],[111,4.04,24.21],[112,4.0,24.28],[115,3.96,25.24],[116,4.01,25.36],[117,3.91,24.7],[118,3.94,24.65],[119,3.91,24.59],[122,4.06,25.01],[123,3.96,24.71],[124,3.9,24.46],[125,3.95,24.99],[126,3.94,24.92],[129,4.05,25.1],[130,4.09,25.465],[131,4.07,25.0],[132,3.87,24.185],[133,4.0,24.195],[136,4.05,23.58],[137,4.15,23.88],[138,4.14,24.005],[139,4.08,24.05],[140,4.08,24.23],[143,4.1,23.886],[144,3.97,23.72],[145,4.06,23.762],[147,4.07,24.06],[150,4.0,23.185],[151,4.05,23.135],[152,3.98,23.25],[153,4.45,23.99],[154,4.32,23.9],[157,4.4,23.94],[158,4.43,24.25],[159,4.38,24.15],[160,4.64,23.244],[161,4.03,23.04],[164,3.9,22.77],[165,3.66,22.75],[166,3.63,22.93],[167,3.7,23.06],[168,3.82,23.26],[171,3.75,23.24],[172,3.82,23.38],[173,3.77,23.335],[174,3.81,23.2],[175,3.8,23.22],[178,3.82,22.924],[179,3.72,22.8],[180,3.69,22.7],[181,3.71,22.45],[182,3.65,22.51],[185,3.65,22.64],[186,3.69,22.52],[187,3.82,22.57],[188,3.69,22.03],[189,3.66,21.915],[192,3.6,22.28],[193,3.63,22.523],[194,3.61,22.17],[195,3.63,22.26],[196,3.65,22.44],[199,3.58,22.275],[200,3.39,22.191],[201,3.42,22.285],[202,3.38,22.06],[203,3.27,21.98],[207,3.27,22.067],[208,3.31,22.635],[209,3.41,22.6],[210,3.57,22.67],[213,3.69,22.91],[214,3.87,22.985],[215,3.82,22.81],[216,3.75,22.63],[217,3.83,23.44],[220,3.82,23.39],[221,3.85,23.74],[222,3.93,23.9],[223,3.95,23.915],[224,3.83,23.769],[227,3.79,23.62],[228,3.8,23.705],[229,3.91,23.7],[230,3.89,23.41],[231,3.86,22.98],[234,3.81,22.921],[235,3.86,22.83],[236,3.9,22.885],[237,3.9,22.6],[238,3.91,22.81],[241,3.86,22.83],[242,3.72,22.48],[243,3.65,22.59],[244,3.79,23.1],[245,3.83,23.255],[248,3.97,23.45],[249,4.02,23.39],[250,4.09,23.695],[251,4.09,23.92],[252,3.53,23.875],[255,3.37,24.135],[256,3.18,24.071],[257,3.14,23.735],[258,3.23,23.78],[259,3.34,24.235],[262,3.32,24.36],[263,3.33,24.523],[264,3.3,24.495],[265,3.34,24.47],[266,3.31,24.325],[269,3.32,24.255],[270,3.33,24.03],[271,3.32,24.245],[272,3.28,24.059],[273,3.27,24.09],[276,3.34,24.169],[277,3.44,24.43],[278,3.54,24.6],[279,3.52,24.385],[280,3.5,24.52],[283,3.47,24.6],[284,3.42,24.7],[285,3.42,24.56],[286,3.37,25.23],[287,3.34,23.87],[290,3.39,23.75],[291,3.45,23.65],[292,3.56,23.9],[294,3.64,23.84],[297,3.66,23.7],[298,3.62,23.55],[299,3.57,23.74],[300,3.64,24.26],[301,3.66,24.82],[304,3.63,24.93],[305,3.72,24.82],[306,3.68,24.42],[307,3.69,24.47],[308,3.69,24.29],[311,3.59,24.45],[312,3.65,24.655],[313,3.65,25.15],[314,3.65,25.14],[315,3.69,25.055],[318,3.75,25.32],[319,3.77,25.43],[321,3.8,25.7],[322,3.78,25.6],[325,3.85,25.85],[326,3.87,25.955],[328,3.95,25.79],[329,4.0,25.78],[332,4.13,25.46],[333,4.18,25.585],[334,4.18,25.43],[335,4.09,25.31],[336,4.17,25.53],[339,4.13,25.5],[340,4.3,26.51],[341,4.47,26.67],[342,4.38,26.54],[343,4.18,25.85],[347,4.17,25.59],[348,3.67,25.31],[349,3.62,25.13],[350,3.47,24.81],[353,3.41,24.72],[354,3.54,24.9],[355,3.48,24.68],[356,3.48,24.74],[357,3.43,24.54],[360,3.33,23.95],[361,3.37,23.82],[362,3.31,23.52],[363,3.41,23.99],[364,3.47,24.205],[367,3.63,24.29],[368,3.7,24.47],[369,3.69,24.55],[370,3.7,24.7],[371,3.69,24.755],[375,3.7,24.76],[376,3.72,24.5],[377,3.69,24.74],[378,3.69,24.42],[381,3.71,24.63],[382,3.69,24.62],[383,3.7,24.8],[384,3.71,24.76],[385,3.71,24.76],[388,3.67,24.5],[389,3.7,24.61],[390,3.71,24.5],[391,3.73,24.63],[392,3.95,24.64],[395,3.81,24.84],[396,3.85,24.73],[397,3.91,24.76],[398,3.86,24.57],[399,3.85,24.5],[402,3.8,24.7],[403,3.89,24.82],[404,3.98,25.02],[405,4.05,25.425],[406,4.04,25.17],[409,4.05,25.12],[410,4.05,25.46],[411,4.05,25.3772],[412,3.91,25.31],[413,3.88,25.62],[416,4.01,25.814],[417,4.07,25.99],[418,4.06,25.89],[419,4.0,26.41],[420,4.01,26.16],[423,3.88,26.485],[424,3.99,26.91],[425,3.98,26.98],[426,3.85,26.425],[427,3.65,26.1799],[430,3.71,26.56],[431,3.79,26.77],[432,3.76,26.93],[433,3.69,27.04],[437,4.12,26.95],[438,4.3,26.84],[439,4.25,26.75],[440,4.28,26.75],[441,4.05,26.26],[444,3.96,26.33],[445,4.03,26.48],[446,4.09,26.69],[447,4.2,26.45],[448,4.12,26.41],[451,4.09,26.17],[452,4.01,26.2],[453,3.97,26.37],[454,3.93,26.34],[455,3.87,26.3],[458,3.97,26.37],[459,4.03,26.45],[460,3.98,26.33],[461,3.96,26.01],[462,4.02,25.82],[465,4.09,26.04],[466,4.05,26.04],[467,4.1,26.2],[468,4.01,26.15],[469,4.02,26.289],[473,4.04,26.71],[474,4.0,26.88],[475,4.03,26.96],[476,4.0,27.32],[479,3.97,27.26],[480,3.94,27.66],[481,4.04,27.6],[482,4.08,27.66],[483,4.06,28.17],[486,4.07,27.91],[487,4.2,28.24],[488,4.29,27.93],[489,4.29,27.96],[490,4.28,29.87],[493,4.44,30.01],[494,4.47,29.95],[495,4.4,29.93],[496,4.36,30.09],[497,4.1,30.2],[500,4.01,30.23],[501,3.94,30.5],[502,3.96,30.88],[503,4.03,30.78],[504,4.11,30.93],[507,4.19,30.9],[508,4.22,30.98],[509,4.34,30.98],[510,4.24,31.14],[514,4.26,31.03],[515,4.22,30.79],[516,4.23,30.89],[517,4.29,31.26],[518,4.37,31.25],[521,4.64,31.49],[522,4.53,31.71],[523,4.66,34.65],[524,4.57,33.7],[525,3.83,33.7],[528,3.78,34.06],[529,3.8,34.7901],[530,3.76,34.5],[531,3.73,34.25],[532,3.76,34.25],[535,3.73,34.23],[536,3.79,34.19],[537,3.82,34.35],[538,3.91,33.89],[539,3.97,33.745],[542,4.0,34.05],[543,4.08,32.82],[544,4.12,32.845],[545,4.1,32.68],[546,4.12,32.6],[549,4.14,33.02],[550,4.08,33.13],[551,4.11,34.1],[552,4.08,33.94],[553,4.13,34.17],[556,4.21,34.41],[557,4.27,34.34],[558,4.28,34.5],[559,4.24,35.15],[560,4.25,34.94],[563,4.18,34.81],[564,4.19,34.8],[565,4.16,34.79],[566,4.14,34.65],[567,4.17,34.92],[571,4.16,34.57],[572,4.19,34.57],[573,4.15,34.905],[574,4.15,35.0],[577,4.13,35.33],[578,4.08,34.91],[579,4.08,35.02]]

In [None]:
import matplotlib.pyplot as plt
import numpy as np

npd = np.array(dat)
scl1 = 1
scl2 = 10

fig, ax = plt.subplots()
ax.plot(npd[:, 0], npd[:, 1] / scl1, label='AMD')
ax.plot(npd[:, 0], npd[:, 2] / scl2, label='INTEL')
ax.set_xlabel("Day from 08-02-2013 to 10-09-2014")
ax.legend()

arrowprops = {'facecolor': 'red', 'shrink': 0.01}

ax.annotate('Something happened here', 
            xy=(90, 2.8),  xycoords='data',
            xytext=(0.3, 0.35), textcoords='axes fraction',
            arrowprops=arrowprops,
            horizontalalignment='left', verticalalignment='bottom');

The arrow is drawn from the point `xytext` to the point `xy`. Coordinates of these points can be defined with respect of data, see `xycoords` or as fraction of axes, see `textcoords` (also a lot more coordinates are available). Text alignment with respect of the point `xytext` is defined by 
`horizontalalignment` and `verticalalignment`. The view of the arrow is defined by `arrowprops` parameter.

### 3D Surface

XY-plots on a plane, i.e., 2D plot, are often no enough to represent multidimensional data. Sometimes we can improve it by using 3D plots.

3D plot represents function of two variables. For example, consider a function $z=\sin\sqrt{x^2+y^2}$. We vary $x$ and $y$ within some range and see how $z$ is changed.

First we define a function to plot. (But we could also compute it in place, without a user-defined function.)

In [None]:
def f(x, y):
    # Test function to demonstrate surface plotting
    return np.sin(np.sqrt(x ** 2 + y ** 2))

Now we prepare data.

To plot 2D graph we need 1D array of $x$ values. We compute $y$ values for each point $x$.

For 3D plot we need 2D array of $x$ and $y$. Each pair ($x$, $y$) is a point on plane. For each such pair we compute $z$.

This array is prepared using NumPy function `meshgrid` as follow:

In [None]:
# This is just an example. We take only 5 points to make it more clear.
import numpy as np

x = np.linspace(0, 1, 5)
y = np.linspace(2, 3, 11)
X, Y = np.meshgrid(x, y)
print(f"x={x}, y={y}\n")
print(f"X=\n{X}\n\nY={Y}")

Now we create a grid $101\times 101$ point and plot a surface.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Make data.
X = np.linspace(-5, 5, 101)
Y = np.linspace(-5, 5, 101)
X, Y = np.meshgrid(X, Y)
Z = f(X, Y)

# {"projection": "3d"} activates using 3D plotting
fig, ax = plt.subplots(subplot_kw={"projection": "3d"})

# Plot the surface.
surf = ax.plot_surface(X, Y, Z, cmap='tab20c')  # use fancy color map

# Add a color bar which maps values to colors.
fig.colorbar(surf, shrink=0.5, aspect=10, pad=0.07);

### 3D Scatter plots

Scatter plots in 3D works similarly as 2D version, but now we can plot one more coordinate.

Below is the dataset that show four parameters of 200 various cars:
- MPG, Miles Per Gallon, i.e., fuel consumption
- Horsepower of an engine
- Weight of a car
- Acceleration, i.e, seconds to reach the speed of 100 miles per hour

We are going to plot Horsepower, Weight and Acceleration as x, y, and z coordinates, and also assign point colors according to MPG

In [None]:
# This module allows to work with web pages
import requests

# This is an URL of a repository
base_url = "https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"

# We need this file
file_name = "cars.csv"

# Here we downlaod the file
web_data = requests.get(base_url + file_name)
assert web_data.status_code == 200

print(web_data.text[:1000])

In [None]:
# Observe method splitlines. It takes into account all sorts of newline symbols
raw_data = web_data.text.splitlines()
print(raw_data[:5])

In [None]:
# Drop out two first elements. The first one is a header, the second contains datatypes
print("hdr=", raw_data[0])
str_data = raw_data[2:]
print(str_data[:5])

In [None]:
# Run along each line and split it by the separator. This is ";"
tot_data = [line.split(";") for line in str_data]
print(tot_data[:5])

In [None]:
# Now take only requred columns and convert them to floats
# 'MPG' -> 1, 'Horsepower' -> 4, 'Weight' -> 5, 'Acceleration' -> 6
dat = [[float(d[1]), float(d[4]), float(d[5]), float(d[6])] for d in tot_data]
dat[:10]

When dataset is ready we start plotting

In [None]:
import matplotlib.pyplot as plt

# if we do not call numpy, we use list comprehension to obtain one row of data
x = [d[1] for d in dat]  # Horsepower
y = [d[2] for d in dat]  # Weight
z = [d[3] for d in dat]  # Acceleration
c = [d[0] for d in dat]  # MPG

# observe how we set a size of the figure
fig, ax = plt.subplots(subplot_kw={"projection": "3d"}, figsize=(8,8))
im = ax.scatter(x, y, z, c=c, cmap='copper')
cb = fig.colorbar(im, shrink=0.5, aspect=15, pad=0.05, orientation="horizontal");
ax.set_xlabel('Horsepower')
ax.set_ylabel('Weight')
ax.set_zlabel('Acceleration')
cb.set_label('MPG')  # observe hwo we set a label for colorbar

One can see that heavy cars have higher engine power but nevertheless smaller acceleration. Also they consume a lot of fuel.

### 2D Histograms

Recall that a histogram shows how often each value or a range of values are encountered in a 1D dataset, i.e., in a sequence of numbers.

We also can analyze in a similar manner 2D dataset - a series of pairs of numbers. 

To show how it works we plot 2D diagram for previous dataset: consider car weights and fuel consumptions. 

In [None]:
# This module allows to work with web pages
import requests

# This is an URL of a repository
base_url = "https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"

# We need this file 
file_name = "cars.csv"

# Here we downlaod the file
web_data = requests.get(base_url + file_name)
assert web_data.status_code == 200

# Observe method splitlines. It takes into account all sorts of newline symbols
raw_data = web_data.text.splitlines()

# Drop out two first elements. The first one is a header, the second contains datatypes
print("hdr=", raw_data[0])
str_data = raw_data[2:]

# Run along each line and split it by the separator. This is ";"
tot_data = [line.split(";") for line in str_data]

In [None]:
# Now take only requred columns and convert them to floats
# 'Weight' -> 5, 'MPG' -> 1
dat = [[float(d[5]), float(d[1])] for d in tot_data]
dat[:10]

In [None]:
import matplotlib.pyplot as plt

x = [d[0] for d in dat]
y = [d[1] for d in dat]

fig, ax = plt.subplots()

# hist2d returns x and y edges of bins and also their contend. 
# Also it return the plotted image that is requred to plot color bar
_, _, _, im = ax.hist2d(x, y, bins=50, cmap='hot_r')
fig.colorbar(im);
ax.set_xlabel("Weight")
ax.set_ylabel("MPG");

We observe that heavy cars consume more fuel.

### Showing images

Matplotlib can one and show image files. Here is an example.

First we need some preparation. We download an image form the repository and save it in a local folder.

In [None]:
# This module allows to work with web pages
import requests

# This is an URL of a repository
base_url = "https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"

# We need this file
file_name = "turing.jpeg"

# Here we downlaod the file
web_data = requests.get(base_url + file_name)
assert web_data.status_code == 200

# And save it locally
with open(file_name, "wb") as f:
    f.write(web_data.content)

Now we open and show this image

In [None]:
import matplotlib.pyplot as plt

img = plt.imread("turing.jpeg")

The image is a NumPy array. Its shape is represents height, width and number of colors of the image.

In [None]:
print(type(img))
print(img.shape)

Now finally show it:

In [None]:
fig, ax = plt.subplots()
ax.imshow(img)
ax.axis('off');

### Customizing ticks: a simple way

Matplotlib is very good at locating and formating tick marks and corresponding labels along axes. But nevertheless sometimes we need more. For example, if we need to put dates along an axis it requires some more efforts. 

The simplest way of customization of the ticks is functions `ax.set_xticks` and `ax.set_xticklabels`. 
The first of them specifies locations of tick marks and the second one writes corresponding sting labels for them.

To demonstrate how it works we will plot stock prices of AMD and Intel, but now with dates along x axis.

First we obtain data from a repository.

In [None]:
# This module allows to work with web pages
import requests

# This is an URL of a repository
base_url = "https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"

# We need this file
file_name = "amd_vs_intel.csv"

# Here we downlaod the file
web_data = requests.get(base_url + file_name)
assert web_data.status_code == 200

# Take a look at the data
print(web_data.text[:200])

In [None]:
# Observe method splitlines. It takes into account all sorts of newline symbols
raw_data = web_data.text.splitlines()
print(raw_data[:10])

In [None]:
# Drop out header
print("hdr=", raw_data[0])
str_data = raw_data[1:]

# Run along each line and split it by the separator. This is ","
tot_data = [line.split(",") for line in str_data]
print(tot_data[:5])

In [None]:
# Now convert columns to floats
dat = [[d[0], float(d[1]), float(d[2])] for d in tot_data]
dat[:10]

In [None]:
# Now extract data rows

# Dates. Observe that dates are represented as strings 'yyyy-mm-dd'
ymd = [d[0] for d in dat]
print(f"ymd={ymd[:3]}")

# AMD prices
pa = [float(d[1]) for d in dat]
print(f"pa={pa[:3]}")  # print only three first values 

# Intel prices
pi = [float(d[2]) for d in dat]
print(f"pi={pi[:3]}") # print only three first values 

# Actual size of the dataset
print(f"full_size={len(ymd)}")

In [None]:
# Try to plot prices vs dates as it is
import matplotlib.pyplot as plt 

fig, ax = plt.subplots()

ax.plot(ymd, pa)
ax.plot(ymd, pi);

In [None]:
# To understand, what is happen, try to replot with much less number of points
import matplotlib.pyplot as plt 

fig, ax = plt.subplots()

N=10

ax.plot(ymd[:N], pa[:N])
ax.plot(ymd[:N], pi[:N]);

We see that Matplotlib accepts a list of strings in place of x. In this case Matplotlib uses sequence numbers as actual values along x axis and print corresponding strings at these positions. 

But it can not locate them automatically without overlapping. 

We will fix it as follows. First we will specify exactly what tick locations we want using `ax.set_xticks`

In [None]:
# See how many dates we have
print(len(ymd))

In [None]:
# It will be appropriate to take values from ymd with the step 200
tc = ymd[::200]
print(tc)

In [None]:
# Try to plot our data again
import matplotlib.pyplot as plt 

fig, ax = plt.subplots()

ax.plot(ymd, pa)
ax.plot(ymd, pi);

# Specify what tick position we want. Since tc is a list of strings,
# Matplotlib wisely finds sequence numers of the strings from tc
# within the oroginal list ymd and locate there corrsponding labels. 
# This is exactly what we need  
ax.set_xticks(tc);

Dates still overlap. But we also can rotate them with `ax.set_xticklabels`.

This function tells Matplotlib what text labels must be shown at ticks positions. While we already have good ones,
they must be specified also here, because this is a mandatory parameter. But also this function has named parameter `rotation`. We set `rotation=45`.

In [None]:
# Final good plot
import matplotlib.pyplot as plt 

fig, ax = plt.subplots()

ax.plot(ymd, pa)
ax.plot(ymd, pi);

ax.set_xticks(tc)

# Pass here tc just to make this function wirk. We actually need rotation 
ax.set_xticklabels(tc, rotation=45);

Now all labels are located correctly, without overlapping

### Advanced customizing ticks: an idea

Within each axis there are major and minor tick marks. 

Major ticks are usually bigger or more pronounced, while minor ticks are usually smaller. 

By default, Matplotlib rarely makes use of minor ticks, but one place we can see them is within logarithmic plots:

In [None]:
import matplotlib.pyplot as plt
import numpy as np
fig, ax = plt.subplots()
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlim(10,100)
ax.set_ylim(1,1e3)
ax.grid();

In the example above major ticks for y axis shows integer powers of ten: $10^0$,  $10^1$,  $10^2$ and  $10^3$ and have longer marks. Minor tics indicate intermediate values, their marks are shorter and have no labels. 

But for x axis both major and minor ticks are labeled. 

Location of the ticks can be customized by setting the locator object.

Format of the labels is can be customized by formatter object. 

Here are the default locators and formatters for the plot above:

In [None]:
# Show default locators for x
print(f"default major locator:   {ax.xaxis.get_major_locator()}")
print(f"default minor locator:   {ax.xaxis.get_minor_locator()}")
print()

# Show default formatters for y
print(f"default major formatter: {ax.xaxis.get_major_formatter()}")
print(f"default minor formatter: {ax.xaxis.get_minor_formatter()}")

Line `<matplotlib.ticker.LogLocator object at 0x7f81c44cf850>` shows a module and submodule where the locator is defined (`matplotlib.ticker`), the name of the this particular locator (`LogLocator`) and its address in the computer memory (we do not need the address now).

What we see is that both major and minor tick labels have their locations specified by a `LogLocator`. 

Formatters `LogFormatterSciNotation` print numbers in scientific notation, like $2\times 10^{1}$. 

Locators and formatters belong to a `ticker` submodule. Thus we have to import it before using:
`from matplotlib import ticker`.

Many locators and formatters are available. 

### Hiding labels and ticks

Let try to hide either labels only or ticks and labels together. We need `ticker.NullLacator` and `ticker.NullFormatter`.

First we plot some graph

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(-4, 4, 100)
y = (1-x**2)*np.exp(-x**2 / 2)  # curve is called 'mexican hat'

fig, ax = plt.subplots()
ax.plot(x, y);

Now use `NullFormatter` to hide tick labels

In [None]:
import matplotlib.pyplot as plt
from matplotlib import ticker  # import ticker to customize ticks
import numpy as np

x = np.linspace(-4, 4, 100)
y = (1-x**2)*np.exp(-x**2 / 2)
fig, ax = plt.subplots()
ax.plot(x, y)

ax.xaxis.set_major_formatter(ticker.NullFormatter())  # it hides x tick labels

To hide ticks themselves we use `NullLocator`.

In [None]:
import matplotlib.pyplot as plt
from matplotlib import ticker
import numpy as np

x = np.linspace(-4, 4, 100)
y = (1-x**2)*np.exp(-x**2 / 2)

fig, ax = plt.subplots()
ax.plot(x, y)

ax.xaxis.set_major_locator(ticker.NullLocator())  # it hides x ticks

### Reducing or increasing the number of ticks

A common problem: smaller subplots can end up with crowded labels.

In [None]:
import matplotlib.pyplot as plt

fig, axs = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
for ax in axs.flat:
    ax.set_xlim(0, 0.001)
    ax.set_ylim(0, 0.001)

The simplest way to fix it - using `MaxNLocator`, which allows us to specify the maximum number of ticks that will be displayed. 

Given this maximum number, Matplotlib will use internal logic to choose the particular tick locations.

In [None]:
import matplotlib.pyplot as plt
from matplotlib import ticker

fig, axs = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)
for ax in axs.flat:
    ax.set_xlim(0, 0.001)
    ax.set_ylim(0, 0.001)
    ax.xaxis.set_major_locator(ticker.MaxNLocator(3))  # use only three tics 

### Tick step

More power way to control the ticks is to setup the step between them. Locator `MultipleLocator`
locates ticks at a multiple of the number provided. 

Here is the plot of sine and cosine functions:

In [None]:
import matplotlib.pyplot as plt
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 3 * np.pi, 1000)

ax.plot(x, np.sin(x), label='Sine')
ax.plot(x, np.cos(x), label='Cosine')

ax.grid()
ax.legend();

Since these functions are $2\pi$ periodic, it is more natural to space the ticks and grid lines in multiples of $\pi$. We can do this by setting a `MultipleLocator`. 

In [None]:
import matplotlib.pyplot as plt
from matplotlib import ticker
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 3 * np.pi, 1000)

ax.plot(x, np.sin(x), label='Sine')
ax.plot(x, np.cos(x), label='Cosine')

ax.grid()
ax.legend()

# Both major and minor ticks are shown
ax.xaxis.set_major_locator(plt.MultipleLocator(np.pi / 2))
ax.xaxis.set_minor_locator(plt.MultipleLocator(np.pi / 4))

### Custom format for labels

The plot above could be better: we can see that the ticks are multiples of $\pi$, it would be better to see $\pi$ symbols there instead of its decimal representation. 

We custom the tick formatter. The most power one is `FuncFormatter`, that accepts a user-defined function giving fine-grained control over the tick outputs.

First we define a function that takes a float equal to $n \pi/2$ and converts it to a nice looking string:

In [None]:
def format_func(value, tick_number):
    # find number of multiples of pi/2
    N = int(np.round(value / (0.5 * np.pi)))
    if N == 0:
        return "0"
    elif N == 1:
        return r"$\pi/2$"
    elif N == 2:
        return r"$\pi$"
    elif N % 2 > 0:
        return r"${0}\pi/2$".format(N)
    else:
        return r"${0}\pi$".format(N // 2)
    
# Check how the function works
for n in range(0, 9):
    s = format_func(n*np.pi/2, None) # tick_number is not used
    print(f"n={n}, s={s}")

Now pass this function to `FuncFormatter` and draw the plot

In [None]:
import matplotlib.pyplot as plt
from matplotlib import ticker
import numpy as np

fig, ax = plt.subplots()

x = np.linspace(0, 3 * np.pi, 1000)

ax.plot(x, np.sin(x), label='Sine')
ax.plot(x, np.cos(x), label='Cosine')

ax.grid()
ax.legend()

ax.xaxis.set_major_locator(plt.MultipleLocator(np.pi / 2))
ax.xaxis.set_minor_locator(plt.MultipleLocator(np.pi / 4))    
    
# Here we set the major formatter    
ax.xaxis.set_major_formatter(ticker.FuncFormatter(format_func))

### Exercises

1\. Create 3D surface plot of a function $z=x^2-y^2$. Add axes labels and color bar.

2\. Before doing the exercise find some image in the Internet and download it to your local directory. Now create a program that shows this image. Hide axes tics. Add a title to your plot and show there an image height, width and a number of colors.

3\. Take the dataset from a section where movies popularity are analyzed and plot a 3D scatter plot for it. Assign colors to markers according to one of the coordinates. Find the one that results in a more reasonable picture.

4\. Again take the dataset with movies popularity. Find 3 the largest and 3 the smallest popularity values and create a new array where 0 correspond to these three less popular movies, 2 are assigned to the three most popular movies and all other are marked by 1. Plot 3D scatter plot where marker colors are assigned according to this array.

5\. File `"canberra_weather.csv"` that you can find at `"https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"` contains daily weather observations for Canberra, Australian Capital Territory for June 2020. Read this file and create a line plot that show how minimal and maximal temperature changed day by day. Put labels with corresponding dates on x axes.4

6\. File `"chess_games.csv"`  that you can find at `"https://raw.githubusercontent.com/kupav/data-sc-intro/main/data/"` contains a set of just over 20,000 games collected from a selection of users on the site `Lichess.org` (taken from public dataset collection at `https://www.kaggle.com/datasnaek/chess`). 

Read this file and take columns "turns", "white_rating" and "black_rating". Use `abs("white_rating"-"black_rating")` (i.e., the absolute value of difference between columns "white_rating" and "black_rating") as x variable and column "turns" as y variable. Create 2D histogram for x and y variables that demonstrates how a number of game turns depends on the difference between the player rates. Add reasonable axes labels.

### Clear temporary files

In [None]:
import os

temp_files = ["sin.png", "turing.jpeg"]

ask = input("Do you really want to remove temporary files? (y/n)")
if ask[0].lower() == 'y':
    flag = False
    for file in temp_files:
        try:
            os.remove(file)
            print(f"Removed {file}")
            flag = True
        except FileNotFoundError:
            pass
    if not flag:
        print("No files to remove")