# Soybean Export Network (Paraguay) - Edge and Node Construction

This notebook constructs the edge and node lists for the Paraguayan soybean export network. 
The goal is to prepare structured inputs for network analysis by identifying co-export relationships between firms (edges) and summarizing exporter-level attributes (nodes).

### Section 1: Edge List - Co-exports by Year and Destination

In [64]:
import pandas as pd
from itertools import combinations

In [66]:
# 1. Load the dataset
df = pd.read_csv("paraguay_soy_v1_2_6.csv")

In [68]:
# 2. Select relevant columns for edge construction
df_edges = df[['year', 'country_of_destination', 'exporter_group', 'fob']].copy()

In [70]:
# 3. Drop rows with missing values
df_edges.dropna(subset=['year', 'country_of_destination', 'exporter_group', 'fob'], inplace=True)

In [72]:
# 4. Initialize an empty list to store edges
edge_list = []

In [74]:
# 5. Group data by year and destination country
for (year, country), group in df_edges.groupby(['year', 'country_of_destination']):
    # Aggregate total FOB per exporter
    exporters = group.groupby('exporter_group')['fob'].sum().reset_index()

    # 6. Generate all possible exporter pair combinations for the given year and country
    for (exp1, fob1), (exp2, fob2) in combinations(exporters.values, 2):
        # Define edge weight as the sum of both FOB values (alternatives: average, min, max)
        weight = fob1 + fob2
        edge_list.append({
            'source': exp1,
            'target': exp2,
            'year': year,
            'country': country,
            'weight': weight
        })

In [76]:
# 7. Convert the edge list into a DataFrame
edges_df = pd.DataFrame(edge_list)

In [78]:
# 8. Export the edge list to a CSV file
edges_df.to_csv("soybean_coexport_edges.csv", index=False)

### Section 2: Node List - Aggregate exporter-level attributes across all years

In [81]:
# 1. Standardize exporter names
df['exporter_group'] = df['exporter_group'].str.strip().str.upper()

In [83]:
# 2. Select relevant columns
df_nodes = df[['exporter_group', 'fob', 'volume', 'land_use']].copy()

In [85]:
# 3. Drop rows with missing values
df_nodes.dropna(subset=['fob', 'volume', 'land_use'], inplace=True)

In [87]:
# 4. Aggregate attributes by exporter group
nodes_df = (
    df_nodes
    .groupby('exporter_group')
    .agg(
        total_fob_value=('fob', 'sum'),
        total_volume=('volume', 'sum'),
        total_land_use=('land_use', 'sum')
    )
    .reset_index()
)

In [89]:
# 5. Rename identifier column for clarity
nodes_df.rename(columns={'exporter_group': 'Id'}, inplace=True)

In [91]:
# 6. Flag transnational corporations
transnational_firms = [
    "ADM", "AMAGGI", "BASF", "BAYER", "BUNGE", "CARGILL", "CHS", "COFCO",
    "LOUIS DREYFUS", "SODRUGESTVO", "SYNGENTA", "VICENTIN PARAGUAY", "GL SOUTH AMERICA"
]

nodes_df['transnational'] = nodes_df['Id'].apply(lambda x: 'Yes' if x in transnational_firms else 'No')

In [93]:
# 7. Export the node list to CSV
nodes_df.to_csv("soybean_export_nodes.csv", index=False)