# Jupyter notebook for downloading CMPI6 data

Coupled Model Intercomparison Project Phase 6 (CMIP6) is a project coordinated by the Working Group on Coupled Modelling (WGCM) as part of the World Climate Research Programme (WCRP). Phase 6 builds on previous phases executed under the leadership of the Program for Climate Model Diagnosis and Intercomparison (PCMDI) and relies on the Earth System Grid Federation (ESGF) and the Centre for Environmental Data Analysis (CEDA) along with numerous related activities for implementation. The original data is hosted and partially replicated on a federated collection of data nodes.
The project includes simulations from more than 100 global climate models and around 45 institutions and organizations worldwide.

hdl:21.14106/ef6056e5788bde823f8d5e5da965044b997c20a0

ERA5 is the global reanalysis version 5 produced by the European Centre for Medium-Range Weather Forecasting (ECMWF). A reanalysis builds upon a weather forecasting system, which means it uses a variety of satellite and in-situ datasets to estimate initial conditions ("analysis") and advances the initial state forward in time with a numerical model ("forecast"). In contrast to operational forecasts, one consistent model version is used for the entire reanalysis period (ERA5: 1940 - present), and more data can be "assimilated" as not all measurements are available in realtime. ERA5 has been run at a resolution of 25 km with 137 vertical levels. Data are stored with hourly time resolution. 


## Content of this notebook

This notebook explains how you can get access to CMPI6 data and how you can download a custom-tailored subset of this dataset using the WDC.

For the purpose of this assignment, only a specific variable and a delimited time span will be chosen. 

In general, the dataset has the following characteristics: 

* spatial domain: Longitude 0 to 360 Latitude -90 to 90
* temporal extent: 1850-01-01 to 2014-12-31 (proleptic_gregorian)
* format: NetCDF
* variables: temperature, humidity, wind, geopotential height, etc...

Specifically, we want to obtain the following data:

* spatial domain: global
* temporal extent: ???
* variable: geopotential height
* time resolution: daily

## How to proceed

1. Install the ESGF pyclient. 
* for Anaconda use : conda install -c conda-forge esgf-pyclient

2. First register in the ESGF MetaGrid. Copy and store your key??

2. Install the client. Use pip or anaconda, depending on your specific case. 

3. Browse the catalogue to find the dataset you want to download. Note the fields of the specific query. 





In [2]:
import os

# --- CRITICAL FIX for 'KeyError: HOME' on Windows/Anaconda ---
# pyesgf requires the 'HOME' environment variable to be set.
# This line sets it using the existing 'USERPROFILE' variable (the standard Windows home path).
if 'HOME' not in os.environ:
    os.environ['HOME'] = os.environ['USERPROFILE']
    
print(f"Setting HOME environment variable to: {os.environ['HOME']}")

# Now proceed with imports (if they were failing before)
import pyesgf.logon
from pyesgf.search import SearchConnection

# ... rest of your authentication code

Setting HOME environment variable to: C:\Users\nagib


In [6]:
# --- Your Original Authentication Code ---

# 1. Initialize the logon manager
lm = pyesgf.logon.LogonManager()

# 2. Interactive logon using your ESGF OpenID credentials.
# This will now create the necessary .esg/credentials.pem file in the directory defined by HOME.
lm.logon(hostname='esgf-data.dkrz.de', interactive=True, bootstrap=True)

if lm.is_logged_on():
    print("Logon successful. Proceeding to search...")
else:
    print("Logon failed. Please check credentials and try again.")
    
# 3. Connect to the DKRZ ESGF search node
conn = SearchConnection('https://esgf-data.dkrz.de/esg-search', distrib=False)

Enter myproxy username: nagibe_mg
Enter password for nagibe_mg: ········


TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond

In [11]:
import os
import pyesgf.logon
from pyesgf.search import SearchConnection

# --- CRITICAL FIX for 'KeyError: HOME' on Windows/Anaconda ---
# This must remain as the first step!
if 'HOME' not in os.environ:
    os.environ['HOME'] = os.environ['USERPROFILE']
    
print(f"Setting HOME environment variable to: {os.environ['HOME']}")

# 1. Initialize the logon manager
lm = pyesgf.logon.LogonManager()

# ⚠️ Replace with your actual ESGF OpenID Username and Password (use caution)
# You MUST use the short username you retrieved, not the email address.
ESGF_USERNAME = 'nagibe_mg'
ESGF_PASSWORD = 'Bella-1999'

# Try non-interactive logon (REMOVED: disable_cleanup=True)
print("Attempting non-interactive logon...")
lm.logon(
    hostname='esgf-data.dkrz.de', 
    username=ESGF_USERNAME, 
    password=ESGF_PASSWORD
    # Note: bootstrap=True is also no longer needed for non-interactive
)

if lm.is_logged_on():
    print("Logon successful (Non-interactive).")
    # Proceed to Step 2: Connect and Search
    
    # 3. Connect to the DKRZ ESGF search node
    conn = SearchConnection('https://esgf-data.dkrz.de/esg-search', distrib=False)
    print("Search connection established.")

else:
    print("Logon failed. Check username/password.")

Setting HOME environment variable to: C:\Users\nagib
Attempting non-interactive logon...


TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond