# NHS A&E Data Validation & Summary Analysis

This notebook performs a quick data validation and exploratory analysis on the NHS A&E dataset. We will:
- Check that the dataset is balanced across years and months.
- Identify any outliers or unexpected trends.
- Summarize key statistics for numerical and categorical variables.


In [13]:
import pandas as pd

# Load the dataset
file_path = "nhs_ae_merged.csv"
nhs_data = pd.read_csv(file_path)

# Check unique years
print("Unique years in dataset:", nhs_data["year"].unique())

# Count records per year
print("\nRecords per year:")
print(nhs_data["year"].value_counts())


Unique years in dataset: [2024 2023 2022 2021 2020 2019 2018]

Records per year:
year
2018    3686
2019    2812
2020    2663
2021    2583
2022    2466
2023    2450
2024    2383
Name: count, dtype: int64


## Checking Data Balance

To ensure the dataset is balanced, I will:
- Count the number of records for each year.
- Count the number of records for each month.


In [14]:
# Count records per year
print("Records per year:")
print(nhs_data["year"].value_counts())

# Count records per month
print("\nRecords per month:")
print(nhs_data["month"].value_counts())
print("Unique years in dataset:", nhs_data["year"].unique())



Records per year:
year
2018    3686
2019    2812
2020    2663
2021    2583
2022    2466
2023    2450
2024    2383
Name: count, dtype: int64

Records per month:
month
0            1756
April        1526
May          1525
June         1521
August       1509
September    1507
October      1505
November     1503
December     1501
July         1305
January      1299
February     1296
March        1290
Name: count, dtype: int64
Unique years in dataset: [2024 2023 2022 2021 2020 2019 2018]


## Identifying Outliers

Next, I will look at some key numerical fields to identify any outliers or unexpected values. This will help ensure that our analysis is based on clean, reliable data.
