# download csv and shape files from https://hub.worldpop.org/geodata/summary?id=50577

Metadata for MigrEst_international_v7.csv
14/11/2019 Dorothea Woods, University of Southampton

The file contains estimated international migration movements between countries on a subnational level which were produced
for the project "Mapping gender-disaggregated migration movements at subnational scales in and between low- and middle-income countries".

The file contains 18,679,720 records and 8 columns, these are described below:
	ISOI - 3 letter ISO code of origin country
	NODEI - ID of origin admin unit
	ISOJ - 3 letter ISO code of destination country
	NODEJ - ID of destination admin unit
	sex - sex of migrants, 'M' or 'F'
	pred_seed1 - estimated number of migrants between origin and destination country and admin unit calculated using IPF seed = 1
	pred_dist - estimated number of migrants between origin and destination country and admin unit calculated using IPF seed = distance 
	pred_grav - estimated number of migrants between origin and destination country and admin unit calculated using IPF seed = (TotPopI * TotPopJ) / distance 

For visualization purposes the migration estimates can be joined to Flowlines_International.shp using the information in the first 4 columns ISOI, NODEI, ISOJ, NODEJ.

For details regarding the production of the estimates please refer to the project report.

In [7]:
import pandas as pd
import geopandas as gpd
import fiona

In [3]:
# File paths
csv_file_path = r"C:\Users\paude\Downloads\SexDisaggregated_Migration\SexDisaggregated_Migration\MigrationEstimates\MigrEst_international_v7.csv"  # Update with your CSV file path
shapefile_path = r"C:\Users\paude\Downloads\SexDisaggregated_Migration\SexDisaggregated_Migration\SpatialData\Flowlines_International.shp"  # Update with your shapefile path


In [4]:

# Read the CSV file
df = pd.read_csv(csv_file_path)


In [5]:
# Check for missing values in the key columns
print("Missing values in CSV key columns:")
print(df[['ISOI', 'NODEI', 'ISOJ', 'NODEJ']].isnull().sum())

Missing values in CSV key columns:
ISOI     0
NODEI    0
ISOJ     0
NODEJ    0
dtype: int64


In [6]:
# Attempt to read the shapefile using geopandas
try:
    gdf = gpd.read_file(shapefile_path)
    print("\nShapefile loaded successfully:")
    print(gdf.head())

    # Check for missing values in the shapefile key columns
    print("\nMissing values in shapefile key columns:")
    print(gdf[['ISO', 'NODE']].isnull().sum())

    # Merge the CSV data with the shapefile based on the origin node
    merged_gdf_origin = gdf.merge(df, left_on=['ISO', 'NODE'], right_on=['ISOI', 'NODEI'], how='inner')

    # Check if the merge was successful
    print("\nMerged data (origin nodes) - first few rows:")
    print(merged_gdf_origin.head())

    # Merge the CSV data with the shapefile based on the destination node
    merged_gdf_destination = gdf.merge(df, left_on=['ISO', 'NODE'], right_on=['ISOJ', 'NODEJ'], how='inner')

    # Check if the merge was successful
    print("\nMerged data (destination nodes) - first few rows:")
    print(merged_gdf_destination.head())

    # Save the merged GeoDataFrames to new shapefiles (optional)
    merged_gdf_origin.to_file('merged_origin.shp')
    merged_gdf_destination.to_file('merged_destination.shp')

    print("Merged shapefiles saved as 'merged_origin.shp' and 'merged_destination.shp'")

except Exception as e:
    print(f"Error occurred while processing shapefile: {e}")

# Continue with the rest of the script...

Error occurred while processing shapefile: Invalid offset for entity 17339


In [4]:





# Inspect unique values to ensure they match
print("\nUnique values in CSV 'ISOI' column:")
print(df['ISOI'].unique())

print("\nUnique values in shapefile 'ISO' column:")
print(gdf['ISO'].unique())

# If column names in shapefile are different, adjust accordingly
# Display the structure of the shapefile to understand its contents
print("\nShapefile structure:")
print(gdf.head())

# Merge the CSV data with the shapefile based on the origin node
merged_gdf_origin = gdf.merge(df, left_on=['ISO', 'NODE'], right_on=['ISOI', 'NODEI'], how='inner')

# Check if the merge was successful
print("\nMerged data (origin nodes) - first few rows:")
print(merged_gdf_origin.head())

print("\nMerged data (origin nodes) - shape:")
print(merged_gdf_origin.shape)

# Merge the CSV data with the shapefile based on the destination node
merged_gdf_destination = gdf.merge(df, left_on=['ISO', 'NODE'], right_on=['ISOJ', 'NODEJ'], how='inner')

# Check if the merge was successful
print("\nMerged data (destination nodes) - first few rows:")
print(merged_gdf_destination.head())

print("\nMerged data (destination nodes) - shape:")
print(merged_gdf_destination.shape)

# Save the merged GeoDataFrames to new shapefiles (optional)
merged_gdf_origin.to_file('merged_origin.shp')
merged_gdf_destination.to_file('merged_destination.shp')

print("Merged shapefiles saved as 'merged_origin.shp' and 'merged_destination.shp'")