# Exploratory Data Analysis (EDA) Report
This report analyzes the dataset `a34_1_refused_tidy.csv`, which includes refusal statistics based on inadmissibility grounds by country, year, and residency status.

In [13]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import numpy as np

df = pd.read_csv('../data/processed/a34_1_refused_tidy.csv')
df.head()

Unnamed: 0,inadmissibility_grounds,country,year,cor_status,resident,count
0,A34(1),Afghanistan,2019,COR Not Canada,Permanent Resident,1
1,A34(1),Argentina,2019,COR Not Canada,Permanent Resident,0
2,A34(1),Egypt,2019,COR Not Canada,Permanent Resident,1
3,A34(1),Eritrea,2019,COR Not Canada,Permanent Resident,0
4,A34(1),Haiti,2019,COR Not Canada,Permanent Resident,0


## Field Overview
We begin by examining each field for data types, missing values, and unique value distributions.

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3486 entries, 0 to 3485
Data columns (total 6 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   inadmissibility_grounds  3486 non-null   object
 1   country                  3486 non-null   object
 2   year                     3486 non-null   int64 
 3   cor_status               3486 non-null   object
 4   resident                 3486 non-null   object
 5   count                    3486 non-null   int64 
dtypes: int64(2), object(4)
memory usage: 163.5+ KB


## Refusal Reasons Distribution

In [15]:
refusal_counts = df.groupby("inadmissibility_grounds")["count"].sum().reset_index()
fig = px.bar(refusal_counts, x="inadmissibility_grounds", y="count", title="Total Refusals by Inadmissibility Grounds")
fig.show()

This bar chart illustrates the total number of refusals under different inadmissibility grounds as defined in **Section 34(1) of Canada’s Immigration and Refugee Protection Act (IRPA)**. Each bar represents a specific clause (e.g., A34(1)(a), A34(1)(b), etc.), and the height of the bar indicates the number of individuals refused entry based on that ground.

---

### Key Observations

* **A34(1)(f)** — *Membership in an inadmissible organization* — is by far the most frequently cited ground, with around 600 refusal cases. This clause applies even if the individual did not directly engage in espionage or terrorism but was merely associated with such an organization.

* Other commonly cited clauses include:

  * **A34(1)(a)**: Espionage or subversion against Canada’s interests.
  * **A34(1)(d)**: Being a danger to the security of Canada.
  * **A34(1)(c)**: Engagement in terrorism.
  * **A34(1)(b)**: Attempts to overthrow a government by force.

* **A34(1)(b.1)** and **A34(1)(e)** are the least cited, suggesting relatively few cases involve attempts to subvert democratic institutions (b.1) or acts of violence threatening individuals (e).


## Total Refusals by Country

In [4]:
country_counts = df.groupby("country")["count"].sum().reset_index().sort_values(by="count", ascending=False)
fig = px.bar(country_counts.head(10), x="country", y="count", title="Top 10 Countries by Total Refusals")
fig.show()

This chart shows the top 10 countries by total refusal cases. **Ukraine** leads by a wide margin, followed by **Syria** and **Iran**. The high numbers may reflect conflict zones or perceived security risks. Countries like **China**, **Russia**, and **India** also appear, but with fewer cases. 


## Refusal Trends Over Time

In [5]:
yearly_trend = df.groupby("year")["count"].sum().reset_index()
fig = px.line(yearly_trend, x="year", y="count", markers=True, title="Total Refusals Per Year")
fig.show()

Refusals dropped sharply in 2020, likely due to pandemic-related disruptions, then gradually increased. A major spike occurred in **2024**, reaching the highest level in the dataset, suggesting a possible policy change or increased enforcement.


## Heatmap: Refusals by Country and Year

In [6]:
top_countries = country_counts.head(10)["country"].tolist()
df_top = df[df["country"].isin(top_countries)]
heatmap_data = df_top.pivot_table(index="country", columns="year", values="count", aggfunc="sum", fill_value=0)
fig = px.imshow(heatmap_data, text_auto=True, aspect='auto', color_continuous_scale='Reds',
                labels=dict(x="Year", y="Country", color="Refusals"),
                title="Heatmap of Refusals for Top 10 Countries by Year")
fig.show()

Ukraine shows a sharp surge in 2024 with **131 refusals**, far exceeding all other countries and years. Syria also had a peak in 2022. Most other countries show relatively stable or low refusal counts over time.


## COR Status and Residency Analysis

In [19]:
cor_counts = df.groupby("cor_status")["count"].sum().reset_index()
fig1 = px.pie(cor_counts, names="cor_status", values="count", title="Refusals by COR Status")
fig1.show()

res_counts = df.groupby("resident")["count"].sum().reset_index()
fig2 = px.pie(res_counts, names="resident", values="count", title="Refusals by Residency Type")
fig2.show()

The majority of refusals (85.3%) involve applicants whose Country of Residence (COR) is not Canada, suggesting most cases affect individuals applying from abroad rather than from within Canada.

Refusals are slightly more common among permanent residents (56.7%) than temporary residents (43.3%), indicating both groups face significant scrutiny under inadmissibility grounds.

## Trend of Refusals Over Time by Top 5 Countries

In [20]:
# Summarize total refusals per country per year
df_country_year = df.groupby(["country", "year"])["count"].sum().reset_index()

# Select the top 5 countries with the highest total number of refusals
top_countries_overall = df_country_year.groupby("country")["count"].sum().nlargest(5).index
df_top_country_year = df_country_year[df_country_year["country"].isin(top_countries_overall)]

# Plot line chart of refusal trends over time
fig_country_trend = px.line(
    df_top_country_year,
    x="year",
    y="count",
    color="country",
    markers=True,
    title="Trend of Refusals Over Time by Top 5 Countries",
    labels={"count": "Number of Refusals", "year": "Year"}
)

fig_country_trend.show()


This line chart illustrates the trend of refusals over time for the top 5 countries: **Ukraine**, **Syria**, **Iran**, **Eritrea**, and **Bangladesh**.
### Key Observations            

* **Ukraine**: Shows a dramatic spike in 2024, with refusals exceeding 130 cases — this could be linked to geopolitical instability or shifts in immigration policy.
* **Syria**: Peaks sharply in 2022 and then drops off, indicating strong temporal fluctuations, possibly tied to policy or humanitarian events.
* **Iran, Eritrea, and Bangladesh**: These countries show relatively stable trends with gradual increases from 2021 to 2024, and no sudden changes.



## Heatmap Commentary: Refusals by Country and Inadmissibility Type


In [21]:
# Create a pivot table for the heatmap: countries as rows, inadmissibility types as columns
df_heatmap = df.groupby(["country", "inadmissibility_grounds"])["count"].sum().reset_index()
pivot_heatmap = df_heatmap.pivot(index="country", columns="inadmissibility_grounds", values="count").fillna(0)

# Plot using go.Heatmap for compatibility and full control
fig_heatmap = go.Figure(data=go.Heatmap(
    z=pivot_heatmap.values,
    x=list(pivot_heatmap.columns),
    y=list(pivot_heatmap.index),
    colorscale='Reds',
    colorbar=dict(title='Number of Refusals')
))

fig_heatmap.update_layout(
    title='Heatmap of Refusals by Country and Inadmissibility Type',
    xaxis_title='Inadmissibility Type',
    yaxis_title='Country'
)

fig_heatmap.show()


This heatmap visualizes the number of refusals by country and inadmissibility type. Each cell represents a specific country and the corresponding refusal count for a particular inadmissibility ground.

### Key Insights:

* **A34(1)(f)** stands out with a strong concentration of refusals across many countries — especially one near the top (likely Uzbekistan or Uganda) — suggesting it's a commonly applied ground.
* Most countries have **very low or no activity** in categories like A34(1) to A34(1)(b.1), possibly reflecting the rarity or narrow scope of those legal provisions.
* A few countries, such as **Ecuador, Iraq, Cuba**, and **Afghanistan**, show broader involvement across multiple inadmissibility types.



## Refusal Trends Over Time by Inadmissibility Types

In [22]:
# Group data by inadmissibility type and year, summing refusals
df_reason_trend = df.groupby(["inadmissibility_grounds", "year"])["count"].sum().reset_index()

# Select inadmissibility types
top_reasons = df_reason_trend.groupby("inadmissibility_grounds")["count"].sum().index
df_top_reason_trend = df_reason_trend[df_reason_trend["inadmissibility_grounds"].isin(top_reasons)]

# Plot line chart
fig_reason_trend = px.line(
    df_top_reason_trend,
    x="year",
    y="count",
    color="inadmissibility_grounds",
    markers=True,
    title="Refusal Trends Over Time by Inadmissibility Types",
    labels={"count": "Number of Refusals", "year": "Year", "inadmissibility_grounds": "Inadmissibility Type"}
)

fig_reason_trend.show()



### Key Observations:

* **A34(1)(f)** (often linked to organized crime or national security threats) dominates in absolute volume and shows a **dramatic surge in 2024**, more than doubling from the previous year. This could reflect **heightened enforcement**, **policy shifts**, or **global conflict spillovers**.
* **A34(1)(a)** and **A34(1)(d)** also rise modestly in 2023–2024, suggesting potentially increased scrutiny under these categories.
* Most other types remain **stable or very low**, including **A34(1)(e)** and **A34(1)(b.1)**, indicating these provisions are either rarely applied or narrowly defined.


## Stacked Bar Chart: Refusals by COR Status and Residency Type

In [23]:
# Group data by COR status and residency type
df_stackbar = df.groupby(["cor_status", "resident"])["count"].sum().reset_index()

# Plot stacked bar chart
fig_stackbar = px.bar(
    df_stackbar,
    x="resident",
    y="count",
    color="cor_status",
    title="Stacked Bar Chart: Refusals by COR Status and Residency Type",
    labels={
        "cor_status": "COR Status",
        "count": "Number of Refusals",
        "resident": "Residency Type"
    },
    barmode="stack"
)

fig_stackbar.show()


### Key Insights:

* **Temporary Residents** show an overwhelming concentration of refusals from **COR Not Canada**, with barely any from **COR Canada**. This suggests that **foreign temporary applicants are much more likely to be refused**.
* **Permanent Residents** are more evenly split, but still show a significantly higher number of refusals from outside Canada.
* This reinforces the idea that **being outside Canada significantly increases refusal likelihood**, regardless of residency type.


## Summary
This EDA highlights patterns in refusal data across time, countries, and different grounds of inadmissibility. Most refusals are concentrated in a few countries and specific inadmissibility reasons. The dataset can be further explored for policy impact or predictive modeling.