
#### **Pie Chart - Jupyter Notebook**
When you run this notebook, it will load a CSV file containing people’s email addresses, extract their email **domains**, count how many times each occurs, and display the results as a **pie chart**.


---


#### **Importing the Necessary Libraries**
To work with the dataset, pandas will be used for handling data and matplotlib for plotting. Pandas provide convenient tools for reading, cleaning, and analyzing tabular data, while matplotlib allows enables the ability to create flexible and informative visualizations.


In [None]:
%matplotlib inline
import pandas as pd                 # To handle csv data
import matplotlib.pyplot as plt     # To create the pie chart


#### **Load the Dataset**

The next step is to load the file people-1000.csv, which contains the required information. This file includes a column with email addresses that will be analyzed

In [None]:
csv_path = r"C:\Users\CAD-PC\Desktop\GitHub - Cloned Repository\PFDA\Assignments\Week-3\people-1000.csv"  # full path to the CSV file
births = pd.read_csv(csv_path)                  # load data into a pandas DataFrame
births.head()                                   # preview the first few rows


#### **Extracting Email Domains**

Here, the column that contains email addresses is found. The **domain** part of each email is extracted (everything after the `@` symbol) and store it in a new column called `domain`.





In [None]:
email_col = [c for c in data.columns if 'email' in c.lower()][0]  # detect email column
data['domain'] = data[email_col].apply(lambda x: str(x).split('@')[-1])  # extract domain
data.head()  # preview the updated DataFrame

#### **Count unique emaimail domains**
Count how many times each email domain appears using the `value_counts()` function. This gives us a summary of which domains are most common in the dataset.

In [None]:
domain_counts = data['domain'].value_counts()   # count occurrences of each domain
domain_counts.head(10)                          # display the top 10 most common domains

#### **Plotting the Pie Chart**
Visualize the distribution of email domains using a pie chart. Each slice of the pie represents a different domain, and the percentages show  
how common each one is in the dataset.

In [None]:
plt.figure(figsize=(8, 8))                      # set chart size
plt.pie(domain_counts, labels=domain_counts.index, autopct='%1.1f%%', startangle=140)
plt.title("Email Domain Distribution")          # add title
plt.show()                                      # display the pie chart


#### **Summary File**
This notebook read the **people-1000.csv** dataset, extracted email domain names, counted how often each appeared, and plotted the data as a pie chart using matplotlib. The chart makes it easy to see which email providers (like Gmail, Yahoo, or Outlook) are most common among the dataset’s users.