# Scatter Plots and Error Bars

This notebook covers how to create scatter plots and visualize errors or uncertainty in your data.

In [2]:
import matplotlib.pyplot as plt
import numpy as np

plt.style.use('seaborn-v0_8-whitegrid')

## Simple Scatter Plots

Scatter plots are used to visualize the relationship between two variables.

### Scatter Plots with `plt.plot`
You can create a scatter plot with `plt.plot` by passing a marker style (like 'o' for a circle) and no line style.

In [None]:
x = np.linspace(0, 10, 30)
y = np.sin(x)

fig, ax = plt.subplots()
ax.plot(x, y, 'o', color='black');

### Scatter Plots with `plt.scatter`
The `plt.scatter` function is more powerful. It allows each point to have its own properties, like size, color, or transparency. This is great for visualizing multi-dimensional data.

In [None]:
# Let the color and size of the points vary based on some data
rng = np.random.RandomState(0)
x = rng.randn(100)
y = rng.randn(100)
colors = rng.rand(100)
sizes = 1000 * rng.rand(100)

fig, ax = plt.subplots()
ax.scatter(x, y, c=colors, s=sizes, alpha=0.3,
           cmap='viridis')
plt.show()

### `plot` Versus `scatter`: A Note on Efficiency
For large datasets (thousands of points), `plt.plot` can be much more efficient than `plt.scatter`. This is because `plt.scatter` has the overhead of creating a custom artist for each point, which gives it its flexibility. If all your points are the same, `plt.plot` is the better choice.

## Visualizing Errors

Visualizing the error or uncertainty in your data is a critical part of scientific plotting.

### Basic Errorbars
The `plt.errorbar` function can create a line plot with error bars for each point.

In [None]:
# Create some sample data with errors
x = np.linspace(0, 10, 50)
dy = 0.8
y = np.sin(x) + dy * np.random.randn(50)

fig, ax = plt.subplots()
ax.errorbar(x, y, yerr=dy, fmt='o', color='black',
            ecolor='lightgray', elinewidth=3, capsize=0);

### Continuous Errors
For some data, you might have a continuous error, like the standard deviation of a measurement. The `plt.fill_between` function is great for this.

In [None]:
# Example: Visualizing a Gaussian Process
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF

# define the model and draw some data
model = GaussianProcessRegressor(kernel=RBF(1.0))
x = np.linspace(0, 10, 30)
y = np.sin(x) + np.random.randn(30)

# fit the model
model.fit(x[:, np.newaxis], y)
xfit = np.linspace(0, 10, 1000)
yfit, std = model.predict(xfit[:, np.newaxis], return_std=True)

# Plot the result with fill_between
fig, ax = plt.subplots()
ax.plot(x, y, 'o', color='black')
ax.plot(xfit, yfit, '-', color='gray')
ax.fill_between(xfit, yfit - std, yfit + std, color='lightgray')
ax.set_title('Continuous Error (Gaussian Process)')
plt.show()