# Assignment 5 - Weighted Stats

#### Author: Elaine R. Cazeta

## Part 1
Write a jupyter notebook that analyses the differences between the sexes by age in Ireland.

- Weighted mean age (by sex)  
- The difference between the sexes by age  
- This part does not need to look at the regions  

In [42]:
# Importing pandas
import pandas as pd

In [43]:
# Importing the csv file
url = "https://ws.cso.ie/public/api.restful/PxStat.Data.Cube_API.ReadDataset/FY006A/CSV/1.0/en"
df = pd.read_csv(url)
df.head(5) # Visualizing first five rows

Unnamed: 0,STATISTIC,Statistic Label,TLIST(A1),CensusYear,C02199V02655,Sex,C02076V03371,Single Year of Age,C03789V04537,Administrative Counties,UNIT,VALUE
0,FY006AC01,Population,2022,2022,-,Both sexes,-,All ages,IE0,Ireland,Number,5149139
1,FY006AC01,Population,2022,2022,-,Both sexes,-,All ages,2ae19629-1492-13a3-e055-000000000001,Carlow County Council,Number,61968
2,FY006AC01,Population,2022,2022,-,Both sexes,-,All ages,2ae19629-1433-13a3-e055-000000000001,Dublin City Council,Number,592713
3,FY006AC01,Population,2022,2022,-,Both sexes,-,All ages,2ae19629-149f-13a3-e055-000000000001,Dún Laoghaire Rathdown County Council,Number,233860
4,FY006AC01,Population,2022,2022,-,Both sexes,-,All ages,2ae19629-14a0-13a3-e055-000000000001,Fingal County Council,Number,330506


In [44]:

# Removing data related to 'Both sexes' and 'All ages' 
df = df[df["Sex"] != "Both sexes"]
df = df[df["Single Year of Age"] != "All ages"]
df.head(5)

Unnamed: 0,STATISTIC,Statistic Label,TLIST(A1),CensusYear,C02199V02655,Sex,C02076V03371,Single Year of Age,C03789V04537,Administrative Counties,UNIT,VALUE
3296,FY006AC01,Population,2022,2022,1,Male,200,Under 1 year,IE0,Ireland,Number,29610
3297,FY006AC01,Population,2022,2022,1,Male,200,Under 1 year,2ae19629-1492-13a3-e055-000000000001,Carlow County Council,Number,346
3298,FY006AC01,Population,2022,2022,1,Male,200,Under 1 year,2ae19629-1433-13a3-e055-000000000001,Dublin City Council,Number,3188
3299,FY006AC01,Population,2022,2022,1,Male,200,Under 1 year,2ae19629-149f-13a3-e055-000000000001,Dún Laoghaire Rathdown County Council,Number,1269
3300,FY006AC01,Population,2022,2022,1,Male,200,Under 1 year,2ae19629-14a0-13a3-e055-000000000001,Fingal County Council,Number,2059


In [45]:
# Get a list of all column names in the dataset to understand its structure
headers = df.columns.tolist()
headers

['STATISTIC',
 'Statistic Label',
 'TLIST(A1)',
 'CensusYear',
 'C02199V02655',
 'Sex',
 'C02076V03371',
 'Single Year of Age',
 'C03789V04537',
 'Administrative Counties',
 'UNIT',
 'VALUE']

In [46]:
# Drop columns that are not needed for the analysis
drop_col_list = ['STATISTIC', 'Statistic Label','TLIST(A1)','CensusYear','C02199V02655','C02076V03371','C03789V04537','Administrative Counties','UNIT']
df.drop(columns=drop_col_list,inplace=True)
# On 'Single Year of Age' column, convert text to numbers and remove any non-numeric characters
df['Single Year of Age'] = df['Single Year of Age'].str.replace('Under 1 year', '0')
df['Single Year of Age'] = df['Single Year of Age'].str.replace(r'\D', '', regex=True)
print (df.head(5))


       Sex Single Year of Age  VALUE
3296  Male                  0  29610
3297  Male                  0    346
3298  Male                  0   3188
3299  Male                  0   1269
3300  Male                  0   2059


In [47]:
# Convert the 'Single Year of Age' and 'VALUE' columns to integers
df['Single Year of Age']=df['Single Year of Age'].astype('int64')
df['VALUE']=df['VALUE'].astype('int64')
df.info() # Check the dataframe structure and data types

<class 'pandas.core.frame.DataFrame'>
Index: 6464 entries, 3296 to 9791
Data columns (total 3 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   Sex                 6464 non-null   object
 1   Single Year of Age  6464 non-null   int64 
 2   VALUE               6464 non-null   int64 
dtypes: int64(2), object(1)
memory usage: 202.0+ KB


In [48]:
# Create a pivot table to show the total population (VALUE) by age and sex
# Each row represents an age, and columns show totals for males and females
df_anal = pd.pivot_table(df, values='VALUE', index='Single Year of Age', columns='Sex', aggfunc='sum')
df_anal.head()

Sex,Female,Male
Single Year of Age,Unnamed: 1_level_1,Unnamed: 2_level_1
0,56372,59220
1,55090,57750
2,57948,60472
3,58966,62002
4,59638,63372


In [51]:
weighted_mean_male = (df[df["Sex"]=="Male"]["Single Year of Age"] * df[df["Sex"]=="Male"]["VALUE"]).sum() / df[df["Sex"]=="Male"]["VALUE"].sum()
weighted_mean_female = (df[df["Sex"]=="Female"]["Single Year of Age"] * df[df["Sex"]=="Female"]["VALUE"]).sum() / df[df["Sex"]=="Female"]["VALUE"].sum()

print("Weighted Mean Age - Male:", round(weighted_mean_male, 2))
print("Weighted Mean Age - Female:", round(weighted_mean_female, 2))

Weighted Mean Age - Male: 37.74
Weighted Mean Age - Female: 38.94


## Part 2
In the same notebook, make a variable that stores an age (say 35).  

Write that code that would group the people within 5 years of that age together, into one age group.   

Calculate the population difference between the sexes in that age group.  

## Part 3
In the same notebook, write the code that would work out which region in Ireland has the biggest population difference between the sexes in that age group.  

# End