## Jupyter notebooks

Documentation: https://docs.jupyter.org/en/latest/

Jupyter notebooks may easily be the most deployed type of web application in Python!

<img src="https://jupyter.org/assets/homepage/main-logo.svg" alt="Jupyter Logo" width="200"/>

Jupyter notebooks support two main types of cells:

- **Code Cells**: Used to write and execute code (e.g., Python, R). The output appears directly below the cell.
- **Markdown Cells**: Used for formatted text, documentation, equations (LaTeX), images, and links.
- **Raw Cells**: Contain plain text that is not executed or rendered as Markdown. Useful for notes, instructions, or content to be processed by external tools.

You can switch between cell types using the toolbar or keyboard shortcuts.

#### Jupyter Server and Kernel

- **Jupyter Server**: The Jupyter server is the backend application that manages your notebooks, files, and computational resources. It provides the web interface, handles requests from your browser, and communicates with kernels to execute code. When you start Jupyter Notebook or JupyterLab, you are launching a Jupyter server.

- **Kernel**: A kernel is a separate process that runs and executes your code. Each notebook is connected to a kernel, which can be for different programming languages (e.g., Python, R, Julia). The kernel receives code from the notebook interface, executes it, and returns the output (results, plots, errors) back to the notebook.

**How they work together:**  
When you run a code cell in a notebook, the Jupyter server sends the code to the kernel. The kernel executes the code and sends the output back to the server, which then displays it in your browser.

#### Understanding the available datasets

##### 1. Penguins datset

- The penguins dataset contains measurements for three species of penguins (Adelie, Chinstrap, Gentoo) observed on different islands in the Palmer Archipelago, Antarctica.
- It includes features such as bill length and depth, flipper length, body mass, sex, and species.
- This dataset is commonly used for data visualization and machine learning exercises as an alternative to the classic iris dataset.
- It provides a real-world example for exploring classification, visualization, and data cleaning techniques.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
# Load and explore the penguins dataset
penguins = pd.read_csv('data/penguins.csv')

# Show the first few rows
display(penguins.head())

# Show basic info
penguins.info()

# Show summary statistics
display(penguins.describe(include='all'))

In [None]:
# Histogram of bill length using seaborn
sns.histplot(penguins['bill_length_mm'], bins=30, kde=True, edgecolor='black')
plt.title('Bill Length Distribution')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Frequency')
plt.show()

# Boxplot of body mass by species using seaborn
sns.boxplot(x='species', y='body_mass_g', data=penguins)
plt.title('Body Mass by Species')
plt.xlabel('Species')
plt.ylabel('Body Mass (g)')
plt.show()

# Scatter plot: bill length vs flipper length, colored by species using seaborn
sns.scatterplot(
    data=penguins,
    x='bill_length_mm',
    y='flipper_length_mm',
    hue='species',
    alpha=0.7
)
plt.title('Bill Length vs Flipper Length by Species')
plt.xlabel('Bill Length (mm)')
plt.ylabel('Flipper Length (mm)')
plt.legend(title='Species')
plt.show()

##### 2. Car Crashes Dataset

- The car crashes dataset contains data on traffic accidents, including the number of crashes, injuries, and fatalities by state or region.
- It typically includes features such as total crashes, alcohol-involved crashes, speeding-related crashes, and population statistics.
- This dataset is widely used for data visualization, exploratory data analysis, and statistical modeling to understand factors contributing to road accidents.
- It provides a practical example for learning about correlation, regression, and geospatial analysis in Python.

In [None]:
# Load and explore the penguins dataset
car_crashes = pd.read_csv('data/car_crashes.csv')

# Show the first few rows
display(car_crashes.head())

# Show basic info
car_crashes.info()

# Show summary statistics
display(car_crashes.describe(include='all'))

In [None]:
sns.histplot(car_crashes['total'], bins=15, kde=True, edgecolor='black')
plt.title('Distribution of Total Car Crashes per State')
plt.xlabel('Total Crashes')
plt.ylabel('Frequency')
# plt.show()

In [None]:
# Boxplot of insurance losses by state abbreviation
# Draw a map of insurance losses by state using plotly
import plotly.express as px

fig = px.choropleth(
    car_crashes,
    locations='abbrev',
    locationmode='USA-states',
    color='ins_losses',
    color_continuous_scale='Reds',
    scope='usa',
    labels={'ins_losses': 'Insurance Losses'},
    title='Insurance Losses by State (USA)'
)
fig.show()

In [None]:
# Scatter plot: insurance premium vs insurance losses, colored by speeding rate
plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=car_crashes,
    x='ins_premium',
    y='ins_losses',
    size='speeding',
    hue='speeding',
    palette='viridis',
    sizes=(20, 200),
    alpha=0.7
)
plt.title('Insurance Premium vs Losses (Size/Color: Speeding Rate)')
plt.xlabel('Insurance Premium')
plt.ylabel('Insurance Losses')
plt.legend(title='Speeding Rate', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

In [None]:
# Pairplot to explore relationships between numeric variables
sns.pairplot(car_crashes[['total', 'speeding', 'alcohol', 'ins_premium', 'ins_losses']])
plt.suptitle('Pairplot of Car Crashes Numeric Features', y=1.02)
plt.show()

#### 3. Chlorophyll Concentration Analysis

The `data/chla_subset.csv` dataset contains chlorophyll-a (chla) predictions for various water bodies, such as lakes and reservoirs. Each row represents a measurement event, including the following columns:

- `gnis_name`: Name of the water body (e.g., "Pepacton Reservoir", "Lake Montauk").
- `comid`: Unique identifier for the water body.
- `centroid_longitude` and `centroid_latitude`: Geographic coordinates of the water body's centroid.
- `date_acquired`: Date when the measurement or prediction was made.
- `predictions`: Predicted chlorophyll-a concentration (likely in µg/L).

This dataset is useful for analyzing spatial and temporal patterns of chlorophyll-a, which is an important indicator of water quality and algal biomass.

In [None]:
# Load and describe the chla dataset
chla = pd.read_csv('data/chla_subset.csv')

# Show the first few rows
display(chla.head())

# Show basic info
chla.info()

# Show summary statistics
display(chla.describe(include='all'))

In [None]:
chla['predictions'].plot.hist(bins=30, edgecolor='black')
plt.axvline(x=10, color='red', linestyle='--', label='Acceptable Value')
plt.title('Distribution of Chlorophyll-a Predictions')
plt.xlabel('Chlorophyll-a (µg/L)')
plt.ylabel('Frequency')
plt.legend()
plt.show()

### Introducing Voila!

Voila turns Jupyter notebooks into standalone web applications by running the notebook and serving only the output (hiding the code cells). It's perfect for sharing interactive dashboards and applications with non-technical users.

#### Deploying with Voila

Now let's learn how to deploy this interactive notebook as a web application using Voila.

### What happens when you use Voila?

1. **Code cells are hidden** - Only the output (widgets and plots) are displayed
2. **Interactive widgets still work** - Users can interact with sliders, buttons, etc.
3. **No code editing** - Users can't modify or see the underlying code
4. **Clean interface** - Professional-looking web application

### Running Voila

There are several ways to run Voila but the easiest is from the CLI:

```bash
# Basic usage - serve this notebook
voila 04_jupyter_voila.ipynb

# Serve all notebooks in current directory
voila .

# Customize port and host
voila 04_jupyter_voila.ipynb --port=8867 --Voila.ip=0.0.0.0

# Strip source code completely
voila 04_jupyter_voila.ipynb --strip_sources=True
```

## Try for yourself!