# Exploratory Data Analysis (EDA) Report
This report analyzes the dataset `a34_1_refused_tidy.csv`, which includes refusal statistics based on inadmissibility grounds by country, year, and residency status.

In [1]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np

df = pd.read_csv('../data/processed/a34_1_refused_tidy.csv')
df.head()

Unnamed: 0,inadmissibility_grounds,country,year,cor_status,resident,count
0,A34(1),Afghanistan,2019,COR Not Canada,Permanent Resident,1
1,A34(1),Argentina,2019,COR Not Canada,Permanent Resident,0
2,A34(1),Egypt,2019,COR Not Canada,Permanent Resident,1
3,A34(1),Eritrea,2019,COR Not Canada,Permanent Resident,0
4,A34(1),Haiti,2019,COR Not Canada,Permanent Resident,0


## Field Overview
We begin by examining each field for data types, missing values, and unique value distributions.

In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3486 entries, 0 to 3485
Data columns (total 6 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   inadmissibility_grounds  3486 non-null   object
 1   country                  3486 non-null   object
 2   year                     3486 non-null   int64 
 3   cor_status               3486 non-null   object
 4   resident                 3486 non-null   object
 5   count                    3486 non-null   int64 
dtypes: int64(2), object(4)
memory usage: 163.5+ KB


## Refusal Reasons Distribution

In [11]:
refusal_counts = df.groupby("inadmissibility_grounds")["count"].sum().reset_index()
fig = px.bar(refusal_counts, x="inadmissibility_grounds", y="count", title="Total Refusals by Inadmissibility Grounds")
fig.show()

This bar chart illustrates the total number of refusals under different inadmissibility grounds as defined in **Section 34(1) of Canada’s Immigration and Refugee Protection Act (IRPA)**. Each bar represents a specific clause (e.g., A34(1)(a), A34(1)(b), etc.), and the height of the bar indicates the number of individuals refused entry based on that ground.

---

### Key Observations

* **A34(1)(f)** — *Membership in an inadmissible organization* — is by far the most frequently cited ground, with around 600 refusal cases. This clause applies even if the individual did not directly engage in espionage or terrorism but was merely associated with such an organization.

* Other commonly cited clauses include:

  * **A34(1)(a)**: Espionage or subversion against Canada’s interests.
  * **A34(1)(d)**: Being a danger to the security of Canada.
  * **A34(1)(c)**: Engagement in terrorism.
  * **A34(1)(b)**: Attempts to overthrow a government by force.

* **A34(1)(b.1)** and **A34(1)(e)** are the least cited, suggesting relatively few cases involve attempts to subvert democratic institutions (b.1) or acts of violence threatening individuals (e).


## Total Refusals by Country

In [16]:
country_counts = df.groupby("country")["count"].sum().reset_index().sort_values(by="count", ascending=False)
fig = px.bar(country_counts.head(10), x="country", y="count", title="Top 10 Countries by Total Refusals")
fig.show()

This chart shows the top 10 countries by total refusal cases. **Ukraine** leads by a wide margin, followed by **Syria** and **Iran**. The high numbers may reflect conflict zones or perceived security risks. Countries like **China**, **Russia**, and **India** also appear, but with fewer cases. 


## Refusal Trends Over Time

In [17]:
yearly_trend = df.groupby("year")["count"].sum().reset_index()
fig = px.line(yearly_trend, x="year", y="count", markers=True, title="Total Refusals Per Year")
fig.show()

Refusals dropped sharply in 2020, likely due to pandemic-related disruptions, then gradually increased. A major spike occurred in **2024**, reaching the highest level in the dataset, suggesting a possible policy change or increased enforcement.


## Heatmap: Refusals by Country and Year

In [18]:
top_countries = country_counts.head(10)["country"].tolist()
df_top = df[df["country"].isin(top_countries)]
heatmap_data = df_top.pivot_table(index="country", columns="year", values="count", aggfunc="sum", fill_value=0)
fig = px.imshow(heatmap_data, text_auto=True, aspect='auto', color_continuous_scale='Reds',
                labels=dict(x="Year", y="Country", color="Refusals"),
                title="Heatmap of Refusals for Top 10 Countries by Year")
fig.show()

Ukraine shows a sharp surge in 2024 with **131 refusals**, far exceeding all other countries and years. Syria also had a peak in 2022. Most other countries show relatively stable or low refusal counts over time.


## COR Status and Residency Analysis

In [7]:
cor_counts = df.groupby("cor_status")["count"].sum().reset_index()
fig1 = px.pie(cor_counts, names="cor_status", values="count", title="Refusals by COR Status")
fig1.show()

res_counts = df.groupby("resident")["count"].sum().reset_index()
fig2 = px.pie(res_counts, names="resident", values="count", title="Refusals by Residency Type")
fig2.show()

The majority of refusals (85.3%) involve applicants whose Country of Residence (COR) is not Canada, suggesting most cases affect individuals applying from abroad rather than from within Canada.

Refusals are slightly more common among permanent residents (56.7%) than temporary residents (43.3%), indicating both groups face significant scrutiny under inadmissibility grounds.

## Summary
This EDA highlights patterns in refusal data across time, countries, and different grounds of inadmissibility. Most refusals are concentrated in a few countries and specific inadmissibility reasons. The dataset can be further explored for policy impact or predictive modeling.