# Exploring trade communities using NetworkX 

The Packages we use in this tutorial:
- Networkx: a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. [https://networkx.org/documentation/stable/index.html]
- Matplotlib: a comprehensive library for creating static, animated, and interactive visualizations in Python. [https://matplotlib.org/]
- GeoPandas: GeoPandas extends the datatypes used by pandas to allow spatial operations on geometric types. [https://geopandas.org/en/stable/]

## Install packages

This cell provides the commands necessary to install Python packages we are using in this toturial. If you encounter an error related to any of these packages being missing while running the tutorial, you should uncomment the corresponding installation line and restart the Kernel before rerunning the rest of the code.

You might need to uncomment `!pip install Networkx`, `!pip install python-louvain` and `!pip install geopandas`.

In [None]:
# !pip install networkx
# !pip install pandas
# !pip install geopandas
# !pip install matplotlib
# !pip install python-louvain

## Import the libraries

In [None]:
import pandas as pd
import networkx as nx
from community import community_louvain
import geopandas as gpd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch #for map legend

## Take a look at the trade data

Before beginning the analysis it is a good practice to examine our data.

This code cell is used to load a dataset into a pandas DataFrame and display the first few rows. The dataset is expected to be in CSV format.

Replace `@@` with the file path to the 'Trade_Flows_medi_2019.csv' CSV file.

We use `.read_csv()` function from the pandas library to read a CSV file and convert it into a DataFrame.
We use `.head()` method on a DataFrame to return the first N rows for quick examination of the data. By default, it returns the first 5 rows.

In [None]:
flow_data = pd.read_csv(r'@@')
flow_data.head()

## Create undierected graph

Our goal is to extracting communities from flow data. To use community detection algorithm we need to build an undirected graph. Sometimes the flows are directed and we need to aggregate data based on origin and destination columns and summing up another column.

Since our data is already aggregated, there is no need to run this code.

`groupby()` function is used to group the DataFrame using the specified columns. `agg()` method aggregates grouped data by performing a specified operation, here summing the 'Trade.value.US' column. `reset_index()` method resets the index of the DataFrame, and is often used after grouping to turn grouped indices into columns.

In [None]:
# agg_flow_data = flow_data.groupby(['Exporter.ISO3', 'Importer.ISO3']).agg({'lTrade.value.US': 'sum'}).reset_index()

To create a graph (network) first we initialize a directed graph using the NetworkX library and then create edges from a pandas DataFrame.

Replace `@@` with the name of the column that represents the exporter, `^^` with the name of the column that represents the importer, and `&&` with the name of the column that represents the trade value in the trade flow DataFrame. In this way we define nodes and egdes of the networks and the weight of edges as an edge attribute.

`.Graph()` function creates a new, empty graph. Here it should be noted that if a directed graph is actually required, you should use `nx.DiGraph()` instead to accurately represent directional edges.
`.from_pandas_edgelist()`function creates a graph from a pandas DataFrame. The parameters specify the source node column, target node column, and edge attributes.

In [None]:
G = nx.Graph()
G = nx.from_pandas_edgelist(flow_data, '@@', '^^', edge_attr= '&&')

## Cummunity Detection
This code cell is used to detect communities within the graph using the Louvain method, which is a popular algorithm for community detection in large networks.
We also calculates the modularity of the detected communities, a measure of the structure of networks or graphs which help us to understand how much our devision of nodes into communities is solid.

Louvain community detection Document: [https://python-louvain.readthedocs.io/en/latest/]

Replace `@@` with the name of the column that represents the weights in trade network.

We use `community_louvain.best_partition()` function from the `community` library that applies the Louvain algorithm to find the best community partition of the graph, using the specified weight attribute.
`community_louvain.modularity()` calculates the modularity for the partition and the graph, using the given weight attribute. Higher modularity values indicate stronger community structure.

In [None]:
partition = community_louvain.best_partition(G, weight= '@@')
modularity = community_louvain.modularity(partition, G, weight= '@@')
print (f"The modularity is {modularity}")

`community_louvain.best_partition` function return a dictionary of nodes and partition (community) ids. Let's make a new dataframe that stors the nodes and their assigned community id and take a look at it.

In [None]:
community_df = pd.DataFrame(list(partition.items()), columns=['country', 'Community'])
community_df.head()

## Visualizing

Now we visualize the network graph created earlier, using colors to represent different communities detected in the network. We first extract the 'community' attribute from each node in the graph to use as the color map for visualization. Then visualize the network using the `matplotlib` and `networkx` libraries.

`nx.draw_networkx()` Draws the network with nodes and edges. Customizable parameters include node color, node size, and whether labels are shown.

In [None]:
colors = [node[1]['community'] for node in G.nodes(data=True)]

plt.figure(figsize=(10, 8))
nx.draw_networkx(G, node_color=colors, node_size=50, with_labels=True)
plt.title('Network of Medical Trade Flows between African countries')
plt.show()

## Mapping the network

To map the communitues on their geographycal locations, first we read and loads the African shapefile into a GeoDataFrame using the `geopandas` library, specifically targeting a shapefile that contains geographic data for Africa.

Replace `@@` with the path to the `Africa_Boundaries.shp` shapefile.

Notice the 'geometry' column. 

In [None]:
africa_map = gpd.read_file(r'@@')
africa_map.head()

### Tabular join
Similar to mapping in ArcGIS, here we also join our shapefile and community dataframe and create a `joined_gdf`. 

We use the 'right join' here base on the context that include all records from africa_map (the right DataFrame) and the matched records from community_df (the left DataFrame). If there are records in africa_map that do not have a corresponding match in community_df, these records will still be included in the resulting joined_gdf DataFrame, but the columns from community_df will contain NaN for these records.

Replace `@@` with the column name from `community_df` that corresponds to a geographical identifier that can be linked to the `africa_map`.
Replace `^^` with the column name from `africa_map` that matches the geographical identifier from `community_df`.

In [None]:
joined_gdf = gpd.GeoDataFrame(pd.merge(community_df, africa_map, left_on='@@', right_on='^^', how='inner'))
joined_gdf.head()

### Plotting Regions

First let's simply plot the communities (or now let's call them regions as they are in a geographycal context!), using the 'community' column in 'joined_gdf'.

In [None]:
fig, ax = plt.subplots(1, 1, figsize=(15, 10))
joined_gdf.plot(column='community', ax=ax, legend=False)
plt.show()

In [None]:
joined_gdf["community"].unique()

## Make a better map!

The following code cell is a way that I used for defining colors for each region and add more symbology to the map for a better representation based on cartography rules.

Guide for selecting color: [https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3]

In [None]:
# Define a dictionary of community id and color
color_dict = {0: "#8dd3c7", 1: "#e6f5c9", 2: "#bebada", 3: "#80b1d3", 4: "#fbb4ae"}

# Create a new color column in joined_gdf based on color dict
joined_gdf['Color'] = joined_gdf['community'].map(color_dict).fillna("white")

#Plot the map, setup symbology and create a legend
fig, ax = plt.subplots(figsize=(15, 10))
joined_gdf.plot(ax=ax, color=joined_gdf['Color'], edgecolor="#8f8c8c", linewidth=0.5)
ax.set_axis_off()
ax.set_title('Trade Regions for Medical Services in Africa, 2019', fontsize=18, fontweight="bold", verticalalignment="bottom")


legend_labels = [Patch(facecolor=color_dict[key], edgecolor='black', label=key) for key in color_dict]
ax.legend(handles=legend_labels, loc=(0.25, 0.18), title='Communities')
plt.show()
