<h1>Data Visualization with Matplotlib</h1>

We choose <b>matplotlib</b> for data visualization. 

Reference<br>

reference https://matplotlib.org/stable/api/index.html

https://matplotlib.org/stable/contents.html

<a href="http://www.aosabook.org/en/matplotlib.html">How matplotlib works?</a>

https://matplotlib.org/stable/_modules/matplotlib/backends/backend_agg.html#RendererAgg.draw_path_collection

https://matplotlib.org/stable/api/artist_api.html?highlight=artist#module-matplotlib.artist

https://matplotlib.org/stable/api/index.html

Main Contents today
- What is matplotlib
- Matplotlib plot style
- Anatonmy of the matplotlib figure
    - Graphic primitives
    - Raster rendering v.s. Vector rendering
    - Dimensions & resolution
- Coordinate systems


### What is Matplotlib
- A powerful and flexible 2D/3D plotting library for Python.
- Matplotlib is a multi-platform data visualization library built on NumPy arrays, and designed to work with the broader SciPy stack.
- It was conceived by John Hunter in 2002, originally as a patch to IPython for enabling interactive MATLAB-style plotting via gnuplot from the IPython command line. 




Key Features

✔ Publication-quality plots

✔ Customizable (lines, markers, colors, fonts, etc.)

✔ Supports multiple backends (AGG, GTK, Qt, Tkinter, etc.)

✔ Various plot types (line, bar, scatter, histogram, etc.)

Importing the package:
- import matplotlib as mpl
- import matplotlib.pyplot as plt


In [None]:
import matplotlib as mpl
import matplotlib.pyplot as plt

Matplotlib Styles 

Matplotlib provides several built-in styles to quickly change the appearance of plots for different use cases (e.g., academic papers, presentations, dark mode).

- default
- ggplot
- dark_background
- bmh
- fivethirtyeight

In [None]:
print(plt.style.available)


1. Default Style ('default')
- Features: Matplotlib’s original style—simple but slightly outdated.

- Best for: Basic plotting, maximum compatibility.

2. 'ggplot'
- Features: Mimics R’s ggplot2 library:

    - Gray background + white gridlines

    - High readability

    - Ideal for statistical visuals

- Best for: Data analysis reports.

3. 'dark_background'
- Features: Dark background + bright lines/text:

    - Great for night/dark mode

    - Works well with dark-themed slides

- Downside: May not print clearly.

4. 'bmh' (Bayesian Methods for Hackers)
- Features:

    - Light background + borderless

    - Academic-friendly

- Best for: Research papers, technical docs.

5. 'fivethirtyeight'
- Features: Inspired by FiveThirtyEight’s data journalism:

    - Bold titles + large fonts

    - "Magazine-style" plots

- Best for: Eye-catching blogs/presentations.

In [None]:
import matplotlib.pyplot as plt

plt.style.use('ggplot')  # Apply a style
plt.plot([1, 2, 3], [4, 5, 6])
plt.show()

In [None]:
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

plt.plot(x, np.sin(x))
plt.plot(x, np.cos(x))

plt.show()

### Graphic primitives


- patches （几何形状） 
- lines （线）
- text （文字）

Each of these graphic primitives have also a lot of other properties such as
color (facecolor and edgecolor), transparency (from 0 to 1), patterns (e.g.
dashes), styles (e.g. cap styles), special effects (e.g. shadows or outline),
antialiased (True or False), etc.


draw a line with slope may look like a simple action, but up until 1980 the sloped-line means a step lines. 

![image.png](attachment:image.png)



In [None]:
from PIL import Image
import numpy as np
# 加载图片
img = Image.open('./cat.jpg')  # 替换为你的图片路径
img_array = np.array(img)
img_array.shape

In [None]:
img

In [None]:
inverted = 255 - np.array(img)
img_invert = Image.fromarray(inverted)
img_invert

### Raster (Bitmap) Images v.s. Vector Images

1. Raster (Bitmap) Images: Composed of pixels (tiny colored squares).

Pros:

Perfect for photos and complex textures.

Supported everywhere (JPEG, PNG, GIF).

Cons:

Lose quality when scaled up (pixelation/blurring).

Large file sizes at high resolutions.

Formats: JPEG, PNG, GIF, TIFF.

2. Vector Images: Defined by math formulas (points, lines, curves).

Pros:

Infinitely scalable (no quality loss).

Small file size for simple graphics.

Cons:

Cannot represent realistic photos well.

Requires specialized software to edit (e.g., Illustrator).

Formats: SVG, EPS, PDF, AI.


In [None]:
fig = plt.figure(figsize = (6,4))
plt.plot([1,2,3,4],[5,6,7,8])
plt.savefig('example_png.png')

In [None]:
fig = plt.figure(figsize = (6,4))
plt.plot([1,2,3,4],[5,6,7,8])
plt.savefig('example_eps.eps')

In [None]:
### Dimensions and resolutions

In [None]:
fig = plt.figure(figsize=(7,5),dpi = 150)
### Create a figure with 1050 * 750 pixels

plt.savefig("output.png")


In [None]:
def figure(dpi):
    fig = plt.figure(figsize=(4.25,.2))
    ax = plt.subplot(1,1,1)
    text = f"Text rendered at 10pt using {dpi} dpi"
    ax.text(0.5, 0.5, text, ha="center", va="center",
    fontsize=12, fontweight="bold")
    plt.savefig(f"figure-dpi-{dpi}.png" , dpi=dpi)


In [None]:
for dpi in [10,50,100,200,400]:
    figure(dpi)

Try to produce a figure with pixel size 1024x256 pixels and save it

In [None]:
fig = plt.figure(figsize = (8,2), dpi = 128)

In [None]:
fig.savefig('size.png', dpi = 128)

In [None]:
### Anatonmy of the figure

In [None]:

A matplotlib figure is composed of a hierarchy of elements that, when put
together, forms the following figure.

Basic elements of a figure.
- Figure
- Axes
- Axis
- Spines
- Artist

In [None]:
Figure: The most important element of a figure 

- creation: plt.figure()

    - Specify the size
    - Specify the background color
    - specify the title


Axes:  the second most important element that corresponds to
the actual area where your data will be rendered. (subplot)

- One figure can have one to many axes.

- Surrounded by four edges (leff, top, right and bottom) that are called
spines.


Axis : The decorated spines are called axis. 

- Each of them are made of a spine, major and minor ticks, major and minor ticks labels and an
axis label.




Artist : Everything on the ffgure, including Figure, Axes, and Axis ob‐
jects, is an artist.

### Create a figure.
1. plt.fiture
2. plt.gcf()


In [None]:
plt.figure(figsize=(12, 8), facecolor='red')
plt.suptitle('test')
plt.plot([1,2,3],[4,5,6])
plt.show()

In [None]:
### Get current figure, if no figure, create one


fig = plt.gcf()
plt.plot([1,2,3],[4,5,6])
plt.show()

### Creat an axes
1. plt.subplots()
2. plt.add_subplot()
3. plt.gca()

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize = (9,9)) 
ax0 = axes[0][0]
ax0.plot([1,2,3],[3,4,5])
ax1 = axes[1][0]
ax1.scatter([0,1,2],[3,4,3])

plt.show()

In [None]:
fig= plt.figure()
ax_left = fig.add_subplot(2,1,1)   
ax_left.plot([1,2,3],[4,5,6])
fig= plt.figure()
ax_right = fig.add_subplot(2,1,2)   
ax_right.scatter([0,1,2],[3,4,3])
plt.show()


In [None]:
ax = plt.gca()
### Get current axes, if no axes, create one
x = np.linspace(0,np.pi,1000)
y = np.sin(x)
ax.plot(x,y)


### explicit and implicit control of the axes
ax.plot v.s. plt.plot

ax is the axes we specified
plt will work with the current axes

In [None]:
fig, axes = plt.subplots(nrows=2, ncols=2, figsize = (9,9)) 
ax0 = axes[0][0]
ax0.plot([1,2,3],[3,4,5])
plt.xlim([1,5])
ax1 = axes[1][0]
ax1.scatter([0,1,2],[3,4,3])
plt.show()

In [None]:
Common comments on axes:

plt.title()
plt.xlabel()
plt.ylabel()
plt.xlim()
plt.ylim()
plt.xticks()
plt.yticks()

In [None]:
import matplotlib.pyplot as plt

plt.title("Simple Title", fontsize=16, color='blue', fontweight='bold',   bbox={'facecolor': 'yellow', 'alpha': 0.5, 'pad': 5},
)
plt.show()




In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize = (9,9)) 
ax0 = axes[0]
ax0.plot([1,2,3],[3,4,5])
ax0.set_title('line',color = 'blue' ,pad = 50)
ax1 = axes[1]
ax1.scatter([0,1,2],[3,4,3])
ax1.set_title('scatter',color = 'black',pad = 50,loc = 'right')
plt.tight_layout()
plt.show()

In [None]:
import matplotlib.pyplot as plt

plt.xlabel('X',fontsize = 16, alpha = 0.3, color = 'black',loc = 'right',labelpad = -20 )
plt.ylabel('Y',loc = 'top')


plt.show()


In [None]:
#### ax.set_xlabel()

In [None]:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize = (9,9)) 
ax0 = axes[0]
ax0.plot([1,2,3],[3,4,5])
ax0.set_title('line',color = 'blue' ,pad = 50)
ax0.set_xlabel('X')
ax1 = axes[1]
ax1.scatter([0,1,2],[3,4,3])
ax1.set_title('scatter',color = 'black',pad = 50,loc = 'right')
ax1.set_ylabel('Y',color = 'black',labelpad = 50,loc = 'top')

plt.tight_layout()
plt.show()

In [None]:
plt.xlabel('X',fontsize = 16, alpha = 0.3, color = 'black',loc = 'right',labelpad = -20 )
plt.ylabel('Y',loc = 'top')
plt.xlim([0,20])

plt.show()

In [None]:
plt.xlabel('X',fontsize = 16, alpha = 0.3, color = 'black',loc = 'right',labelpad = -20 )
plt.ylabel('Y',loc = 'top')
plt.xticks([0,5,10,15,20], ['零','伍','拾','拾伍','贰拾'],fontsize = 20, color = 'red', rotation=45)
plt.rcParams['font.sans-serif'] = ['SimHei']  
plt.rcParams['axes.unicode_minus'] = False  

plt.show()

In [None]:
fig, ax = plt.subplots()
ax.plot([0, 1, 2], [3, 4, 5])
ax.set_xticks(
    [0, 1, 2],
    labels=['一', '二', '三'],
    fontproperties='SimHei'  
)


ax.tick_params(
    axis='x',          # 控制x轴
    which='both',      # 'major'（主刻度）或 'minor'（次刻度）
    direction='inout',   # 刻度线方向（'in', 'out', 'inout'）
    length=6,          # 刻度线长度
    width=2,           # 刻度线宽度
    color='red',       # 刻度线颜色
    pad=5,            # 标签与刻度线的距离
    labelsize=12,      # 标签大小
    labelcolor='green',
    grid_color='gray'  # 网格线颜色
)
from matplotlib.ticker import MultipleLocator

ax.xaxis.set_minor_locator(MultipleLocator(0.5))  # 每0.5单位一个次刻度
ax.tick_params(axis='x', which='minor', length=3, color='black')


The notation in plt maybe confusing for the beginner, sometimes it is so simple but sometimes it is so complicate. Such as 
    
    fig, ax = plt.subplots(figsize=(10,6))

It may be easier to understand if we write it slightly different

    canvas, painter = plt.subplots(figsize=(10,6))

In [None]:
plt.plot([1.1,2,2.9],[0.9, 0.5, 0.1])

In [None]:
### Share label and ticks

In [None]:
import matplotlib.pyplot as plt

# Create subplots with shared x-axis
fig, (ax1, ax2) = plt.subplots(2, 1, sharex=True, figsize=(8, 6))

ax1.plot([1, 2, 3], [4, 5, 6])
ax2.plot([1, 2, 3], [1, 2, 1])

# Only the bottom subplot shows x-axis labels
ax1.set_ylabel('Y1')
ax2.set_ylabel('Y2')
ax2.set_xlabel('Shared X-axis')

plt.tight_layout()
plt.show()

In [None]:
import numpy as np

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 4))

data1 = np.random.rand(10, 10)
data2 = np.random.rand(10, 10) * 2

im1 = ax1.imshow(data1, cmap='viridis', vmin=0, vmax=2)  # Sync scale
im2 = ax2.imshow(data2, cmap='viridis', vmin=0, vmax=2)

# Add shared colorbar
fig.colorbar(im1, ax=[ax1, ax2], label='Shared Color Scale')
plt.show()

In [None]:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(5,2))
for label in ax.get_xaxis().get_ticklabels():
    label.set_fontweight("light")
plt.show()


### Different Types of Data Visualization
- Ranking 
- Distribution
- Proportion
- Correlation
- Evolution
- Maps


#### Ranking

- Barplot

<div>
<img src="./plots/bar.png" width="500"/>
</div>

- Spider plot

<div>
<img src="./plots/spider.png" width="500"/>
</div>

- Word cloud

<div>
<img src="./plots/wordcloud.jpg" width="500"/>
</div>

#### Distribution

- Density

<div>
<img src="./plots/density.png" width="500"/>
</div>

- Histgram

<div>
<img src="./plots/hist.png" width="500"/>
</div>

- box plot

<div>
<img src="./plots/box.png" width="500"/>
</div>

- violin plot

<div>
<img src="./plots/violin.png" width="250"/>
</div>

#### Proportion

- Pie chart

<div>
<img src="./plots/pie.png" width="250"/>
</div>

- tree map

<div>
<img src="./plots/tree.jpg" width="500"/>
</div>

- Donut Chart

<div>
<img src="./plots/dou.png" width="250"/>
</div>

In [None]:
#### Correlation

- Scatter plot
<div>
<img src="./plots/scatter.png" width="250"/>
</div>

- Heat map 
<div>
<img src="./plots/heat.jpg" width="400"/>
</div>

#### Evolution (time series)

- Line plot

<div>
<img src="./plots/line.png" width="400"/>
</div>

#### Map

- Map
<div>
<img src="./plots/map.jpg" width="400"/>
</div>


In [None]:

cropped = img_array[100:600, 200:500]  


img_crop = Image.fromarray(cropped)
img_crop

In [None]:
gray_img_array = np.array(img.convert('L'))  # 转换为灰度
gray_img = Image.fromarray(gray_img_array)
gray_img