<a href="https://colab.research.google.com/github/jmarcano101/data110/blob/main/Week9_Visualizing_Proportions.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

let's start by loading the necessary packages. We'll use pandas for handling our data, matplotlib.pyplot for creating visualizations, and numpy for any numerical operations we might need. Ensuring these libraries are imported at the beginning sets us up with the tools we need to dive into the data and its presentation.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

Next, we'll use some sample data to plot a pie chart. Pie charts are awesome for showing how different parts make up a whole. Let’s create our data to see this in action.

In [None]:


# Data
labels = ['A', 'B', 'C', 'D']
sizes = [15, 30, 45, 10]

# Plot
plt.pie(sizes, labels=labels, autopct='%1.1f%%', startangle=140)
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle
plt.show()


Now, we’ll create data representing the seats distribution in the 8th German Bundestag. Our dataset includes three parties: CDU/CSU, SPD, and FDP, along with the number of seats they hold. We'll put this information into a pandas DataFrame, setting the stage for our pie chart visualization.

In [None]:


# Data for the 8th German Bundestag
data = {
    'Party': ['CDU/CSU', 'SPD', 'FDP'],
    'Seats': [243, 214, 39]
}

# Create the DataFrame
bundestag_df = pd.DataFrame(data)

# Create Pie Plot
plt.pie(bundestag_df['Seats'])
plt.legend(bundestag_df['Party'], title="Parties")
plt.axis('equal')  # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Party composition of the 8th German Bundestag, 1976–1980')
plt.show()



Now, we will do more change to add more readability by adding labels to our pie chart and using autopct, we enable the display of each party's share of the total directly on the chart. This makes our visualization more informative. We cap off our plot with a title that situates us in the historical context of the 8th German Bundestag.

In [None]:

# Plotting the pie chart
plt.pie(bundestag_df['Seats'], labels=bundestag_df['Party'], autopct='%1.1f%%')

# Title for the pie chart
plt.title('Party composition of the 8th German Bundestag, 1976–1980')

# Display the plot
plt.show()



For style, we'll apply the 'ggplot' style to our plot. Additionally, we'll start our pie chart at a 90-degree angle ( by using `startangle` ) to ensure our first slice aligns vertically at the top. This adjustment can help make certain parts of the data viewed better. Let's see how these tweaks enhance our visualization.

In [None]:


#Use the ggplot style
plt.style.use('ggplot')

# Plotting the pie chart
plt.pie(bundestag_df['Seats'], labels=bundestag_df['Party'], autopct='%1.1f%%', startangle=90)
plt.axis('equal')
plt.show()


let's switch to a stacked bar chart to visualize the Bundestag's party composition. Each party gets a color—grey for CDU/CSU, red for SPD, and yellow for FDP—making the chart easy to read and visually appealing. We stack the bars to show each party's seat share atop the previous, offering a clear view of the overall distribution.


In [None]:

# Use ggplot style
plt.style.use('ggplot')

# Create a figure and a set of subplots
fig, ax = plt.subplots()

# Data setup
parties = bundestag_df['Party']
seats = bundestag_df['Seats']
colors = ['grey', 'red', 'yellow']  # Example colors for the parties

# Cumulative sum to stack the bars
cumulative_seats = np.cumsum(seats)

# Create a bar for each party
for i, (party, seat) in enumerate(zip(parties, seats)):
    if i == 0:
        ax.bar('Bundestag', seat, label=party, color=colors[i])
    else:
        ax.bar('Bundestag', seat, bottom=cumulative_seats[i-1], label=party, color=colors[i])

# Adding labels and title
ax.set_ylabel('Seats')
ax.set_title('Party composition of the 8th German Bundestag, 1976–1980')
ax.legend()

# Show the plot
plt.show()


Using a loop to create our graph is especially handy when dealing with multiple categories, each one gets its spot in the stacked bar without manually plotting each segment. This approach not only simplifies the coding process but also makes our script more adaptable to datasets of varying sizes. By iterating over each party and stacking their seats on top of each other, we efficiently visualize the composition of the Bundestag in a way that's both scalable and easy to interpret.

In [None]:

plt.figure(figsize=(4,8))
# Initialize a base value for the bottom parameter of each bar
bottom_value = 0

# Plot each party's seats as a segment of a stacked bar
for index, row in bundestag_df.iterrows():
    plt.bar('Bundestag', row['Seats'], bottom=bottom_value, label=row['Party'])
    bottom_value += row['Seats']

plt.ylabel('Seats')
plt.title('Party composition of the 8th German Bundestag, 1976–1980')
plt.legend()
plt.show()


# Enumerate Function

The `enumerate` function allows you to iterate over a sequence of items and get a counter (index) for each item. It returns an enumerate object, which produces pairs of (index, value) during iteration.

Here's an example of how to use the `enumerate` function:

```python
fruits = ['apple', 'banana', 'cherry']

for i, fruit in enumerate(fruits):
  print(i, fruit)
```
Will print
```python
0 apple
1 banana
2 cherry

```

Another way to do compare proportions, bar graph.

In [None]:


# Use ggplot style for a cleaner look
plt.style.use('ggplot')

# Create the bar chart with customized colors and bar width
plt.bar(bundestag_df['Party'], bundestag_df['Seats'], color=['red', 'green', 'blue'], width=0.6)

# Add titles and labels for clarity
plt.title('Party Composition of the 8th German Bundestag, 1976-1980')
plt.xlabel('Party')
plt.ylabel('Number of Seats')

# Improve the layout for better readability
plt.xticks(rotation=45)  # Rotate party names for better fit
plt.tight_layout()  # Adjust layout to make room for the rotated party names

# Add a grid for easier comparison
plt.grid(axis='y', linestyle='--', alpha=0.7)

# Annotate the number of seats above each bar for direct readability
for i, value in enumerate(bundestag_df['Seats']):
    plt.text(i, value -20, str(value), ha='center', va='bottom',color='white')

# Show the plot
plt.show()


Next, we'll create a Donut Chart, which involves first crafting a pie chart and then strategically placing a white circle in its center. This simple yet effective modification transforms the traditional pie chart into a visually appealing donut chart.

In [None]:
# Assuming bundestag_df is already defined and includes 'Party' and 'Seats'
seats = bundestag_df['Seats']
parties = bundestag_df['Party']

# Create pie chart as usual
fig, ax = plt.subplots()
ax.pie(seats, labels=parties, startangle=90, autopct='%1.1f%%')

# Draw a circle at the center to turn the pie chart into a donut chart
centre_circle = plt.Circle((0,0),0.7,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)

plt.show()


Another way to display proportion is Mosaic Plots. **Mosaic plots** are a powerful visual tool for analyzing and displaying the relationship between two or more categorical variables. They provide a quick, intuitive overview of how different categories interact, making them particularly useful for spotting patterns or inconsistencies in complex datasets. In the following example, we'll explore the use of a mosaic plot to examine the relationship between two categorical variables: the city (New York or Los Angeles) and weather conditions (Sunny, Cloudy, or Rainy).

We'll start by preparing our data and converting it into a Pandas DataFrame. Then, utilizing the mosaic function from the **statsmodels** library, we'll create a mosaic plot that visually represents the distribution of weather conditions across these two cities. This method allows us to observe not only the frequency of each weather condition but also how these frequencies compare between different cities. Let's proceed with the code to generate our mosaic plot.

In [None]:
from statsmodels.graphics.mosaicplot import mosaic


# Set the style for nicer colors
plt.style.use('ggplot')

# Example data with different categories
data = {
    'City': ['New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York', 'Los Angeles', 'New York'],
    'Weather': ['Sunny', 'Cloudy', 'Sunny', 'Cloudy', 'Rainy', 'Sunny', 'Cloudy', 'Rainy', 'Rainy']
}

# Convert the data into a DataFrame
df = pd.DataFrame(data)

# Create a mosaic plot with the new categorical variables
plt.figure(figsize=(6, 4))
mosaic(df, ['City', 'Weather'])
plt.show()


!!! **New Toppic: Veroni Diagram **

### John Snow's map

In the mid-19th century, [John Snow's](https://www.google.com/search?q=John+Snow%27s&client=safari&sca_esv=6480816f92634337&rls=en&sxsrf=ACQVn0_Ykfu2l7-KQxPNj4nsEv-MHmdmwQ%3A1712764578182&ei=orYWZqrQCsvZ5NoPrOq_2AE&ved=0ahUKEwiqqZCigbiFAxXLLFkFHSz1DxsQ4dUDCA8&uact=5&oq=John+Snow%27s&gs_lp=Egxnd3Mtd2l6LXNlcnAiC0pvaG4gU25vdydzMgoQIxiABBiKBRgnMgoQIxiABBiKBRgnMgoQABiABBgUGIcCMgcQABiABBgKMgcQABiABBgKMgcQABiABBgKMgUQABiABDIHEAAYgAQYCjIFEAAYgAQyBxAAGIAEGApIvQhQkgZYkgZwAHgCkAEAmAHgAaABswOqAQMyLTK4AQPIAQD4AQGYAgOgAvoBwgIEEAAYR5gDAIgGAZAGCJIHBTIuMC4xoAfhDw&sclient=gws-wiz-serp) groundbreaking cholera map visually linked cholera cases to water sources in London, pioneering the use of geographic mapping in epidemiology. Building on this historical example, we'll explore the creation of a Voronoi diagram, a tool that partitions a plane into regions based on distance to points in a specific subset of the plane. This concept can help us understand how geographical proximity could influence disease spread or resource accessibility.

Utilizing John Snow's map as inspiration, we'll generate a Voronoi diagram to simulate how different water sources might serve distinct areas or influence health outcomes in a modern context. This approach not only pays homage to Snow's innovative work but also demonstrates the continued relevance of geographic analysis in public health and urban planning. Let's dive into the process.


In [None]:
pump_df = pd.read_csv('https://raw.githubusercontent.com/yy/dviz-course/master/data/pumps.csv')

In [None]:
pump_df.sample(5)

In [None]:
# finding number of pumps
len(pump_df)

In [None]:
# we can use shape two, but it give us number of row and column we only interested in the first output which is row
pump_df.shape  # 13 rows and 2 columns

In [None]:
#loading deats dataset
death_df=pd.read_csv('https://raw.githubusercontent.com/yy/dviz-course/master/data/deaths.csv')

In [None]:
plt.scatter(x=death_df['X'], y=death_df['Y'], label='Deaths', s=2, c='black')
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot of Deaths')
plt.legend()
plt.show()


In [None]:
# Scatter plot for deaths
plt.scatter(x=death_df['X'], y=death_df['Y'], s=2, c='black', label='Deaths')

# Scatter plot for pumps
plt.scatter(x=pump_df['X'], y=pump_df['Y'], s=8, c='red', label='Pumps')

# Adding labels and legend
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Scatter Plot of Deaths and Pumps')
plt.legend()

# Show the plot
plt.show()


In [None]:
from scipy.spatial import Voronoi, voronoi_plot_2d

In [None]:
# Get the points from the pump_df DataFrame
points = pump_df.values

# Compute Voronoi tessellation
vor = Voronoi(points)

# Plot Voronoi diagram
voronoi_plot_2d(vor, show_vertices=False)

# Scatter plot of pump locations
plt.scatter(pump_df['X'], pump_df['Y'], c='red', label='Pumps')

# Add labels and legend
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Voronoi Diagram of Pump Locations')
plt.legend()

# Show the plot
plt.show()


In [None]:


# Compute Voronoi tessellation
vor = Voronoi(points)

# Plot Voronoi diagram
voronoi_plot_2d(vor, show_vertices=False)

# Scatter plot of pump locations
plt.scatter(pump_df['X'], pump_df['Y'], c='red', label='Pumps')

# Scatter plot of death locations
plt.scatter(death_df['X'], death_df['Y'], s=2, c='black', label='Deaths')

# Add labels and legend
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Voronoi Diagram with Deaths and Pumps')
plt.legend()

# Save the plot
plt.savefig('voronoi_diagram.png')

# Show the plot
plt.show()


This useful if you do not have dataset, and you want to create Veroni Diagram

In [None]:

# Generate random data points

#make sure everytime the random is different
np.random.seed(0)
points = np.random.rand(10, 2)  # 10 random points in 2D space

# Compute Voronoi tessellation
vor = Voronoi(points)

# Plot Voronoi diagram
voronoi_plot_2d(vor, show_vertices=False)

# Scatter plot of data points
plt.scatter(points[:, 0], points[:, 1], c='blue', label='Data Points')

# Add labels and legend
plt.xlabel('X')
plt.ylabel('Y')
plt.title('Voronoi Diagram with Random Data Points')
plt.legend()

# Save the plot
plt.savefig('random_voronoi_diagram.png')

# Show the plot
plt.show()
