# NDC 3.0 Analysis 
## Building an NLP pipeline to analyse NDCs and track contributions over time.

Countries submit their updated NDCs (NDC 3.0) in 2025. Each round of NDCs should be more ambitious than the last to put us on track to limit warming under 2 degrees. The next [Global Stocktake](https://unfccc.int/documents/631600) will take place in 2028, and is critical for checking progress toward the Paris Agreement. 

**Given the urgency of climate action required, can we check on the progress towards emissions reductions and resilience in each commitment in real time?**

**What can we learn from NDCs to understand how to address financing gaps?**

**How do mitigation, adaptation and L&D feature in NDCs?**

## Method
- Step 1: Collect the NDC Data --> Scrape the NDC Registry for links to NDCs
- Step 2: Download the NDC texts --> Download from the scraped links
- Step 3: Preprocess the Text Data --> Convert unstructured NDC text into clean, structured format for analysis.
- Step 4: Compare Ambition Over Time --> Track changes in emissions reduction targets over time.
- Step 5: Identify Key Sectors & Financial Instruments --> Extract mentions of economic sectors (e.g., energy, transport) and financial instruments (e.g., carbon pricing, green bonds).
- Step 6: Visualise the Findings on a Global Map --> Show how each country's ambition score has changed across NDC updates.
- Step 7: Automate & Scale the Pipeline --> Make this process reproducible for future NDC updates.

In [2]:
!which python
import sys
print(sys.executable)

/usr/local/bin/python3


In [3]:
# Step 1: Collect the NDC Data
# Check for bulk download of NDCs
!pip install requests beautifulsoup4 pandas tqdm
import requests
from bs4 import BeautifulSoup

# Define the URL for the NDC Registry
ndc_url = "https://unfccc.int/NDCREG"

# Send a request to fetch the webpage
response = requests.get(ndc_url)

# Check if the request was successful (Status Code 200 = OK)
if response.status_code == 200:
    print("Successfully accessed the NDC Registry page!")
    soup = BeautifulSoup(response.text, "html.parser")  # Parse the HTML
else:
    print(f"Failed to access the page. Status Code: {response.status_code}")


Collecting requests
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting pandas
  Using cached pandas-2.2.3-cp311-cp311-macosx_11_0_arm64.whl.metadata (89 kB)
Collecting tqdm
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Collecting charset-normalizer<4,>=2 (from requests)
  Using cached charset_normalizer-3.4.1-cp311-cp311-macosx_10_9_universal2.whl.metadata (35 kB)
Collecting urllib3<3,>=1.21.1 (from requests)
  Using cached urllib3-2.3.0-py3-none-any.whl.metadata (6.5 kB)
Collecting certifi>=2017.4.17 (from requests)
  Using cached certifi-2025.1.31-py3-none-any.whl.metadata (2.5 kB)
Collecting pytz>=2020.1 (from pandas)
  Using cached pytz-2025.1-py2.py3-none-any.whl.metadata (22 kB)
Collecting tzdata>=2022.7 (from pandas)
  Using cached tzdata-2025.1-py2.py3-none-any.whl.metadata (1.4 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Using cached pandas-2.2.3-cp311-cp311-macosx_11_0_arm64.whl (11.3 MB)
Using cached tqdm-4.67.1-py3-none

In [6]:
!pip install selenium webdriver-manager


Collecting selenium
  Downloading selenium-4.29.0-py3-none-any.whl.metadata (7.1 kB)
Collecting webdriver-manager
  Downloading webdriver_manager-4.0.2-py2.py3-none-any.whl.metadata (12 kB)
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.29.0-py3-none-any.whl.metadata (8.5 kB)
Collecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.12.2-py3-none-any.whl.metadata (5.1 kB)
Collecting typing_extensions~=4.9 (from selenium)
  Using cached typing_extensions-4.12.2-py3-none-any.whl.metadata (3.0 kB)
Collecting websocket-client~=1.8 (from selenium)
  Using cached websocket_client-1.8.0-py3-none-any.whl.metadata (8.0 kB)
Collecting python-dotenv (from webdriver-manager)
  Downloading python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting attrs>=23.2.0 (from trio~=0.17->selenium)
  Using cached attrs-25.1.0-py3-none-any.whl.metadata (10 kB)
Collecting sortedcontainers (from trio~=0.17->selenium)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl

In [36]:
# Step 1: Scrape the NDC Registry page
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.common.by import By
import time

# Setup Selenium WebDriver
options = webdriver.ChromeOptions()
options.add_argument("--headless")  # Run in headless mode (no browser window)
options.add_argument("--disable-gpu")
options.add_argument("--no-sandbox")

# Start a new browser session
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=options)

# Open the UNFCCC NDC Registry page
driver.get("https://unfccc.int/NDCREG")
time.sleep(5)  # Wait for JavaScript to load the page

# Scroll down to load more content (if necessary)
scroll_pause_time = 2  # Pause time to allow the page to load content
scroll_height = driver.execute_script("return document.body.scrollHeight")
while True:
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
    time.sleep(scroll_pause_time)
    new_scroll_height = driver.execute_script("return document.body.scrollHeight")
    if new_scroll_height == scroll_height:
        break
    scroll_height = new_scroll_height

# Find all <a> tags on the page
all_links = driver.find_elements(By.TAG_NAME, "a")

# Filter out the links that lead to PDF documents (based on the .pdf file extension)
ndc_urls = [link.get_attribute("href") for link in all_links if link.get_attribute("href") and link.get_attribute("href").endswith(".pdf")]

# Close the browser session
driver.quit()

# Show the first 10 NDC document links
print(f"Found {len(ndc_urls)} NDC document links.")
print(ndc_urls[:10])  # Display first 10 links


Found 269 NDC document links.
['https://unfccc.int/sites/default/files/2025-03/Provisional%20NDC%20Submission_Zambia_Revised%20and%20Updated_NDC_100325.pdf', 'https://unfccc.int/sites/default/files/2025-02/REPUBLICA%20DE%20CUBA%20CND3.0.pdf', 'https://unfccc.int/sites/default/files/2025-02/Maldives%E2%80%99%20Third%20Nationally%20Determined%20Contribution.pdf', 'https://unfccc.int/sites/default/files/2025-02/001_eng_NDC_Montenegro.pdf', 'https://unfccc.int/sites/default/files/2025-02/Japans%202035-2040%20NDC.pdf', 'https://unfccc.int/sites/default/files/2025-02/Canada%27s%202035%20Nationally%20Determined%20Contribution_ENc.pdf', 'https://unfccc.int/sites/default/files/2025-02/Soumission%20officielle%20de%20la%20CDN%20du%20Canada%20-%20CCNUCC%20v2fr.pdf', 'https://unfccc.int/sites/default/files/2025-02/Zimbabwe%20NDC3.0%20Country%20Statement_2025_35.pdf', 'https://unfccc.int/sites/default/files/2025-02/Singapore%20Second%20Nationally%20Determined%20Contribution.pdf', 'https://unfccc.int

In [37]:
# Extract unique PDF links
ndc_urls = list(set(ndc_urls))  # Remove duplicates by converting to a set

# Show the cleaned-up number of links
print(f"Found {len(ndc_urls)} unique NDC document links.")
print(ndc_urls[:10])  # Display first 10 links to verify


Found 242 unique NDC document links.
['https://unfccc.int/sites/default/files/NDC/2022-06/MD_Updated_NDC_final_version_EN.pdf', 'https://unfccc.int/sites/default/files/NDC/2022-06/New%20Zealand%20NDC%20November%202021.pdf', 'https://unfccc.int/sites/default/files/NDC/2022-06/CDN%20r%C3%A9vis%C3%A9e%20CMR%20finale%20sept%202021.pdf', 'https://unfccc.int/sites/default/files/NDC/2022-11/Chile_%20fortalecimiento%20NDC_nov22.pdf', 'https://unfccc.int/sites/default/files/NDC/2023-06/Egypts%20Updated%20First%20Nationally%20Determined%20Contribution%202030%20%28Second%20Update%29.pdf', 'https://unfccc.int/sites/default/files/2025-02/Canada%27s%202035%20Nationally%20Determined%20Contribution_ENc.pdf', 'https://unfccc.int/sites/default/files/2025-02/Saint%20Lucias%20Third%20Nationally%20Determined%20Contribution.pdf', 'https://unfccc.int/sites/default/files/NDC/2022-06/NDC_TAJIKISTAN_ENG.pdf', 'https://unfccc.int/sites/default/files/2025-01/20241220_Uruguay_NDC3.pdf', 'https://unfccc.int/sites/d

In [38]:
# Step 2: Download the NDCs
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from webdriver_manager.chrome import ChromeDriverManager
import os
import time

# Set the download directory (modify this path to your actual folder)
download_directory = os.path.expanduser("~/Desktop/Projects/NDC/ndc_downloads")

# Ensure the directory exists
os.makedirs(download_directory, exist_ok=True)

# Configure Chrome options for automatic PDF download
chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
    "download.default_directory": download_directory,  # Set default download folder
    "download.prompt_for_download": False,  # Disable download prompt
    "download.directory_upgrade": True,
    "plugins.always_open_pdf_externally": True,  # Bypass Chrome's PDF viewer
})

# Start a new browser session
service = Service(ChromeDriverManager().install())
driver = webdriver.Chrome(service=service, options=chrome_options)

# Visit each NDC link and download the PDF
for pdf_url in ndc_urls:
    print(f"Downloading: {pdf_url}")
    driver.get(pdf_url)
    time.sleep(5)  # Wait for the file to download

# Close the browser session
driver.quit()

print("All downloads completed.")


Downloading: https://unfccc.int/sites/default/files/NDC/2022-06/MD_Updated_NDC_final_version_EN.pdf
Downloading: https://unfccc.int/sites/default/files/NDC/2022-06/New%20Zealand%20NDC%20November%202021.pdf
Downloading: https://unfccc.int/sites/default/files/NDC/2022-06/CDN%20r%C3%A9vis%C3%A9e%20CMR%20finale%20sept%202021.pdf
Downloading: https://unfccc.int/sites/default/files/NDC/2022-11/Chile_%20fortalecimiento%20NDC_nov22.pdf
Downloading: https://unfccc.int/sites/default/files/NDC/2023-06/Egypts%20Updated%20First%20Nationally%20Determined%20Contribution%202030%20%28Second%20Update%29.pdf
Downloading: https://unfccc.int/sites/default/files/2025-02/Canada%27s%202035%20Nationally%20Determined%20Contribution_ENc.pdf
Downloading: https://unfccc.int/sites/default/files/2025-02/Saint%20Lucias%20Third%20Nationally%20Determined%20Contribution.pdf
Downloading: https://unfccc.int/sites/default/files/NDC/2022-06/NDC_TAJIKISTAN_ENG.pdf
Downloading: https://unfccc.int/sites/default/files/2025-01/2

In [23]:
# Step 3: Preprocess the Text Data
!pip install pdfplumber
import pdfplumber
import os
import re
import pandas as pd

Collecting pdfplumber
  Downloading pdfplumber-0.11.5-py3-none-any.whl.metadata (42 kB)
Collecting pdfminer.six==20231228 (from pdfplumber)
  Downloading pdfminer.six-20231228-py3-none-any.whl.metadata (4.2 kB)
Collecting Pillow>=9.1 (from pdfplumber)
  Using cached pillow-11.1.0-cp311-cp311-macosx_11_0_arm64.whl.metadata (9.1 kB)
Collecting pypdfium2>=4.18.0 (from pdfplumber)
  Downloading pypdfium2-4.30.1-py3-none-macosx_11_0_arm64.whl.metadata (48 kB)
Collecting cryptography>=36.0.0 (from pdfminer.six==20231228->pdfplumber)
  Downloading cryptography-44.0.2-cp39-abi3-macosx_10_9_universal2.whl.metadata (5.7 kB)
Downloading pdfplumber-0.11.5-py3-none-any.whl (59 kB)
Downloading pdfminer.six-20231228-py3-none-any.whl (5.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m5.6/5.6 MB[0m [31m14.2 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hUsing cached pillow-11.1.0-cp311-cp311-macosx_11_0_arm64.whl (3.1 MB)
Downloading pypdfium2-4.30.1-py3-none-macosx_11_0_ar

In [39]:
# Step 3A: Preprocess the Text Data --> store in list version
# Path where your NDC PDFs are saved
pdf_folder = "/Users/liztan/Desktop/Projects/NDC/ndc_downloads"

# Create a list to hold the extracted data
ndc_data = []

# Iterate through all PDFs in the folder
for file_name in os.listdir(pdf_folder):
    if file_name.endswith(".pdf"):
        file_path = os.path.join(pdf_folder, file_name)
        
        # Open the PDF and extract text
        with pdfplumber.open(file_path) as pdf:
            text = ""
            for page in pdf.pages:
                text += page.extract_text()
        
        # Clean up and store the text
        ndc_data.append({"country": file_name, "text": text})

# Preview the extracted data
print(ndc_data[:2])  # Show the first 2 entries for example




In [None]:
# Step 3B: Preprocess the Text Data --> store as txt version for later use
import pdfplumber
import os

# Folder where PDFs are stored
pdf_folder = "Desktop/Projects/NDC/ndc_downloads"
text_folder = "Desktop/Projects/NDC/ndc_texts"  

# Ensure the text output folder exists
os.makedirs(text_folder, exist_ok=True)

def extract_text_from_pdf(pdf_path):
    """Extracts text from a given PDF file."""
    text = ""
    try:
        with pdfplumber.open(pdf_path) as pdf:
            for page in pdf.pages:
                text += page.extract_text() + "\n"
    except Exception as e:
        print(f"Error extracting {pdf_path}: {e}")
    return text

# Process each PDF file
for pdf_file in os.listdir(pdf_folder):
    if pdf_file.endswith(".pdf"):
        pdf_path = os.path.join(pdf_folder, pdf_file)
        text = extract_text_from_pdf(pdf_path)
        
        # Save cleaned text to a file
        text_file_path = os.path.join(text_folder, pdf_file.replace(".pdf", ".txt"))
        with open(text_file_path, "w", encoding="utf-8") as text_file:
            text_file.write(text)
        
        print(f"Extracted text from {pdf_file} and saved to {text_file_path}")
