# Sankey Charts of Refugees in the European Union, 2023
Florian Neukirchen

Create a Sankey Chart with plotly: example with data from UNHCR: 
https://www.unhcr.org/refugee-statistics/download/?url=IAr67y 

In [1]:
import plotly.graph_objects as go
import pandas as pd

Load data

In [2]:
df = pd.read_csv("data/population.csv", skiprows=14)
df.head()

Unnamed: 0,Year,Country of origin,Country of origin (ISO),Country of asylum,Country of asylum (ISO),Refugees under UNHCR's mandate
0,1990,Unknown,UNK,Afghanistan,AFG,50
1,1990,Palestinian,PSE,Algeria,DZA,4000
2,1990,Western Sahara,ESH,Algeria,DZA,165000
3,1990,Unknown,UNK,Algeria,DZA,110
4,1990,Dem. Rep. of the Congo,COD,Angola,AGO,9212


Rename column to `Refugees`, only EU countries as destination, filter on year:

In [3]:
EU = ["AUT", "BEL", "BGR", "HRV", "CYP", "CZE", "DNK", "EST", "FIN", "FRA", "DEU", "GRC", "HUN", "IRL", "ITA", "LVA", "LTU", "LUX", "MLT", "NLD", "POL", "PRT", "ROU", "SVK", "SVN", "ESP", "SWE"]

In [4]:
df.rename(columns={"Refugees under UNHCR's mandate":"Refugees"}, inplace=True)
df = df[df["Country of asylum (ISO)"].isin(EU)]
df = df[df["Year"] == 2023]
df.head()

Unnamed: 0,Year,Country of origin,Country of origin (ISO),Country of asylum,Country of asylum (ISO),Refugees
96071,2023,Afghanistan,AFG,Austria,AUT,45690
96072,2023,Albania,ALB,Austria,AUT,46
96073,2023,Algeria,DZA,Austria,AUT,28
96074,2023,Angola,AGO,Austria,AUT,33
96075,2023,Egypt,EGY,Austria,AUT,230


Have a look at the countries of asylum and the countries of origin:

In [5]:
df.groupby("Country of asylum")["Refugees"].sum().sort_values(ascending=False)

Country of asylum
Germany                         2509506
Poland                           989877
France                           641626
Spain                            369722
Czechia                          351338
Italy                            308663
Austria                          277158
Sweden                           258117
Netherlands (Kingdom of the)     224320
Bulgaria                         189577
Greece                           169393
Belgium                          156921
Romania                          139081
Slovakia                         105104
Ireland                           99048
Finland                           80189
Lithuania                         73170
Denmark                           69337
Portugal                          59311
Hungary                           57939
Latvia                            45237
Estonia                           40371
Cyprus                            33872
Croatia                           23789
Malta                 

In [6]:
df.groupby("Country of origin")["Refugees"].sum().sort_values(ascending=False).head(25)

Country of origin
Ukraine                                 3979602
Syrian Arab Rep.                        1134946
Afghanistan                              469436
Iraq                                     228726
Eritrea                                  143288
Venezuela (Bolivarian Republic of)       143065
Unknown                                   97055
Somalia                                   91438
Iran (Islamic Rep. of)                    84293
Türkiye                                   83591
Russian Federation                        72893
Nigeria                                   48750
Stateless                                 45532
Dem. Rep. of the Congo                    43739
Sri Lanka                                 38315
Pakistan                                  36066
Sudan                                     35547
Guinea                                    35112
Serbia and Kosovo: S/RES/1244 (1999)      27018
Mali                                      26510
Palestinian           

Shorten some labels:

In [7]:
df["Country of asylum"].replace("Netherlands (Kingdom of the)", "Netherlands", inplace=True)
df["Country of origin"].replace("Syrian Arab Rep.", "Syria", inplace=True)
df["Country of origin"].replace("Venezuela (Bolivarian Republic of)", "Venezuela", inplace=True)
df["Country of origin"].replace("Iran (Islamic Rep. of)", "Iran", inplace=True)
df["Country of origin"].replace("Serbia and Kosovo: S/RES/1244 (1999)", "Serbia and Kosovo", inplace=True)

Only the top 10 countries, everything else as "Other" / "Rest of EU"

In [8]:
top = 10

top_sources = list(df.groupby("Country of origin")["Refugees"].sum().sort_values(ascending=False).head(top).index)
top_asylum = list(df.groupby("Country of asylum")["Refugees"].sum().sort_values(ascending=False).head(top).index)

In [9]:
df["Country of origin"] = df["Country of origin"].apply(lambda x: x if x in top_sources else "Other") 
df["Country of asylum"] = df["Country of asylum"].apply(lambda x: x if x in top_asylum else "Rest of EU")

In [10]:
df = df.groupby(["Country of asylum", "Country of origin"])["Refugees"].sum().reset_index()

## Sankey in Plotly
The list "labels" contains the country names (of both sides of the chart)

In [11]:
labels = list(df.groupby("Country of origin")["Refugees"].sum().index) + list(df.groupby("Country of asylum")["Refugees"].sum().index)

Three lists are generated for the connecting links. `source` and `target` contain the index of the source respective target country in the list `labels`, and `value` contains the corresponding value.

In [12]:
source = []
target = []
value = []

for idx, row in df.iterrows():
    source.append(labels.index(row["Country of origin"]))
    target.append(labels.index(row["Country of asylum"]))
    value.append(row["Refugees"])


The sankey chart can be generated with `go.Sankey()`:

In [13]:
fig = go.Figure(data=[go.Sankey(
    node = dict(
        pad = 5,
        thickness = 20,
        line = dict(color = "black", width = 0.5),
        label = labels
    ),
    link = dict(
        source = source,
        target = target,
        value = value
    )
)])

fig.update_layout(title_text="Refugees in the EU, 2023", font_size=10)

# Footnote
fig.add_annotation(text='Florian Neukirchen 2024 <br>Data: UNHCR (2024)<br>Code: <a href="https://github.com/florianneukirchen/jupyter-notebooks/blob/main/refugees_sankey.ipynb">GitHub</a>', 
                    align='left',
                    showarrow=False,
                    xref='paper',
                    yref='paper',
                    x=0,
                    y=-0.22,
                    xanchor='left',
                    yanchor='bottom',
                    )

fig.show()

In [14]:
fig.write_html("refugees.html")