# Regions Data Preparation - French Geographic Reference

**Project: Analysis of cultural accessibility and territorial inequalities in France**

---

## Dataset Information

**Source:** API Découpage Administratif (French Government)

**Name:** Régions - French Administrative Divisions (Level 1)

**Origin:** Official government API

**Year:** 2025 (continuously updated)

**Last Update:** Real-time

**Content:** 18 French regions (13 metropolitan + 5 overseas)

**File Location:** `data/raw/geography/regions.json`

**API URL:** https://geo.api.gouv.fr/regions?fields=nom,code

**Purpose:** Reference table for region names and codes

**Key columns:**
- `code`: Region code (2 digits)
- `nom`: Region name

**Important:** PACA region code is **93** ("Provence-Alpes-Côte d'Azur")

In [2]:
import pandas as pd
import json


## Download from API 

In [3]:
import requests, os
url = "https://geo.api.gouv.fr/regions?fields=nom,code"
response = requests.get(url)
if response.status_code == 200:
    os.makedirs('data/raw/geography', exist_ok=True)
    with open('data/raw/geography/regions.json', 'w', encoding='utf-8') as f:
        json.dump(response.json(), f, ensure_ascii=False, indent=2)

## Load Data

In [4]:
# Load JSON
with open('data/raw/geography/regions.json', 'r', encoding='utf-8') as f:
    region_data = json.load(f)

df_regions = pd.DataFrame(region_data)
print(f"{len(df_regions)} regions")


18 regions


In [5]:
# Rename columns
df_regions.columns = ['region_code', 'region_name']

#code and name are swaped

In [6]:
df_regions = df_regions.rename(columns={"region_code": "region_name","region_name": "region_code"})


In [7]:
df_regions.columns = ['region_code', 'region_name']

## Explore the Data

In [8]:
print(f" Total regions: {len(df_regions)}")
print("\n All regions:")
for idx, row in df_regions.iterrows():
    print(f"{row['region_code']} - {row['region_name']}")

 Total regions: 18

 All regions:
Île-de-France - 11
Centre-Val de Loire - 24
Bourgogne-Franche-Comté - 27
Normandie - 28
Hauts-de-France - 32
Grand Est - 44
Pays de la Loire - 52
Bretagne - 53
Nouvelle-Aquitaine - 75
Occitanie - 76
Auvergne-Rhône-Alpes - 84
Provence-Alpes-Côte d'Azur - 93
Corse - 94
Guadeloupe - 01
Martinique - 02
Guyane - 03
La Réunion - 04
Mayotte - 06


In [9]:
df_regions.head(5)




Unnamed: 0,region_code,region_name
0,Île-de-France,11
1,Centre-Val de Loire,24
2,Bourgogne-Franche-Comté,27
3,Normandie,28
4,Hauts-de-France,32


In [10]:
df_regions = df_regions.rename(columns={
    "region_code": "region_name",
    "region_name": "region_code"
})


In [11]:
df_regions.head(5)

Unnamed: 0,region_name,region_code
0,Île-de-France,11
1,Centre-Val de Loire,24
2,Bourgogne-Franche-Comté,27
3,Normandie,28
4,Hauts-de-France,32


## CSV file

In [12]:
import os
os.makedirs('data/processed', exist_ok=True)

output_file = 'data/processed/regions_for_sql.csv'
df_regions.to_csv(output_file, index=False, encoding='utf-8')

print(f"   {len(df_regions)} regions")

   18 regions
