# Lab 0: Code Environment

## Introduction

This notebook guides you through the final steps of setting up your local Ubuntu 24.04 environment for the Computational Genetic Genealogy course. It focuses on installing specific bioinformatics tools and verifying the configuration after you have completed the initial setup described in the `README.md` file.

## Prerequisites Check

Before proceeding with this notebook, ensure you have completed the **initial setup steps outlined in the `README.md` file** under the "Ubuntu 24.04 Local Setup" section. This includes:

1.  ✅ Cloning the `computational_genetic_genealogy` repository (referred to as `PROJECT_BASE_DIR`).
2.  ✅ Manually creating the subdirectories (`data`, `results`, `references`, `utils`) within `PROJECT_BASE_DIR`.
3.  ✅ Manually creating the `~/.env` file (in your home directory) with the correct absolute paths pointing to `PROJECT_BASE_DIR` and its subdirectories.
4.  ✅ Installing all required system packages via `sudo apt install ...` (including Java 21, Python 3.12, build tools, R, TeX, etc.).
5.  ✅ **Building and installing HTSlib, Samtools, and BCFtools version 1.18 from source** using the commands provided in the README.
6.  ✅ Installing `poetry` via `pipx` and ensuring `~/.local/bin` is in your PATH.
7.  ✅ Running `poetry install --no-root` within the `PROJECT_BASE_DIR` directory to install Python dependencies into `./.venv`.
8.  ✅ Manually adding the `${PROJECT_BASE_DIR}/utils` directory to your `~/.bashrc` PATH and sourcing the file (e.g., `source ~/.bashrc`).
9.  ✅ Configuring the `JAVA_HOME` environment variable in `~/.bashrc`.
10. ✅ Installing JAR files (Beagle, HapIBD, RefinedIBD), LiftOver, BonsaiTree, IBIS, Ped-Sim, RFMix2, and PLINK2.

**Important Note:** This notebook assumes the `.env` file is located in your **home directory (`~/.env`)**, aligning with the Docker environment setup, and therefore **does not run** the `scripts_env/directory_setup.py` script.

## Selecting Python Interpreter in VS Code

Your Jupyter Notebook needs to know which Python environment (including installed packages) to use. Select the interpreter created by Poetry in the previous steps:

1.  Open the Command Palette in VS Code (Ctrl+Shift+P or Cmd+Shift+P).
2.  Type and select `Python: Select Interpreter`.
3.  Choose the option that points to the `.venv` directory within your project folder (`computational_genetic_genealogy`). It might look like `Python 3.12 ('computational_genetic_genealogy/.venv': Poetry)` or similar.

If you don't see the correct interpreter, ensure you successfully ran `poetry install --no-root` in the terminal within your `PROJECT_BASE_DIR`.

## Load Environment Variables and Verify Setup

First, we load necessary libraries and the directory paths defined in your `~/.env` file.

In [None]:
import os
import sys
import shutil
from pathlib import Path
from dotenv import load_dotenv
import logging

# Configure basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def load_env_file():
    """Load the .env file from the user's home directory."""
    env_path = Path.home() / '.env'
    logging.info(f"Attempting to load environment variables from: {env_path}")
    if not env_path.exists():
        logging.error(f"'.env' file not found in home directory ({Path.home()}).")
        logging.error("Please ensure you have created it as per the README instructions.")
        raise FileNotFoundError(f"'.env' file not found at {env_path}")

    try:
        # Load the .env file, overriding existing environment variables
        load_dotenv(env_path, override=True)
        logging.info(f"Successfully loaded environment variables from: {env_path}")
        return env_path
    except Exception as e:
        logging.error(f"Error loading .env file: {e}")
        raise

# --- Load and Set Environment Variables ---
try:
    env_path = load_env_file()

    # Get variables from environment (loaded from .env)
    working_directory = os.getenv('PROJECT_WORKING_DIR')
    data_directory = os.getenv('PROJECT_DATA_DIR')
    references_directory = os.getenv('PROJECT_REFERENCES_DIR')
    results_directory = os.getenv('PROJECT_RESULTS_DIR')
    utils_directory = os.getenv('PROJECT_UTILS_DIR')

    # Verify paths were loaded
    required_paths = {
        'PROJECT_WORKING_DIR': working_directory,
        'PROJECT_DATA_DIR': data_directory,
        'PROJECT_REFERENCES_DIR': references_directory,
        'PROJECT_RESULTS_DIR': results_directory,
        'PROJECT_UTILS_DIR': utils_directory
    }
    missing_vars = [name for name, path in required_paths.items() if not path]
    if missing_vars:
        logging.error(f"Missing environment variables in {env_path}: {', '.join(missing_vars)}")
        raise ValueError(f"Missing required paths in {env_path}. Please check the file.")

    # Set environment variables for shell commands (! or %%) within the notebook
    # Note: Python's os.environ affects subprocesses launched *from* this notebook process.
    os.environ["PROJECT_WORKING_DIR"] = working_directory
    os.environ["PROJECT_DATA_DIR"] = data_directory
    os.environ["PROJECT_REFERENCES_DIR"] = references_directory
    os.environ["PROJECT_RESULTS_DIR"] = results_directory
    os.environ["PROJECT_UTILS_DIR"] = utils_directory

    logging.info(f"Working Directory set to: {working_directory}")
    logging.info(f"Data Directory set to: {data_directory}")
    logging.info(f"References Directory set to: {references_directory}")
    logging.info(f"Results Directory set to: {results_directory}")
    logging.info(f"Utils Directory set to: {utils_directory}")

    # Change Notebook's Current Working Directory (CWD)
    if working_directory and Path(working_directory).is_dir():
        os.chdir(working_directory)
        logging.info(f"Changed notebook CWD to: {os.getcwd()}")
    else:
        logging.warning(f"Working directory '{working_directory}' not found or not a directory. Cannot change CWD.")
        raise NotADirectoryError(f"Working directory path not valid: {working_directory}")

except (FileNotFoundError, ValueError, NotADirectoryError) as e:
    logging.critical(f"Critical setup error: {e}. Please fix the setup and restart the kernel.")
    # Optionally, stop execution more forcefully if needed, though this can be abrupt:
    # raise SystemExit("Stopping execution due to critical setup error.")

## Verify Initial Setup

Let's check the versions of key software installed via the terminal steps in the README.

In [None]:
# Verify Java Installation
print("--- Verifying Java (Should be OpenJDK 21) ---")
!java -version
print("\n--- Verifying JAVA_HOME ---")
# Note: $JAVA_HOME might not be directly visible here if not exported correctly for the notebook kernel
# Running java -version is a more reliable check within the notebook
!echo $JAVA_HOME

# Verify Samtools/BCFtools/Tabix Installation (Should be v1.18 from source)
print("\n--- Verifying Samtools/BCFtools/Tabix (should be v1.18) ---")
!samtools --version | head -n 1
!bcftools --version | head -n 1
print("\n--- Checking Tabix --- ")
!which tabix
!tabix --version

# Verify R Installation
print("\n--- Verifying R ---")
!R --version | head -n 1

## Configure sudo for Notebook (Optional)

This step allows specific `sudo` commands (like `apt` or `rm`) to be run within notebook cells using `!sudo ...` without requiring a password. This can be convenient but modifies system configuration (`/etc/sudoers.d`). **Only run this if you understand the security implications and find it necessary.** You will need to copy and paste the command block into your Ubuntu terminal and enter your password there.

In [None]:
%%bash
# --- DO NOT RUN THIS CELL IN JUPYTER NOTEBOOK ---
# --- COPY AND PASTE THE COMMANDS BELOW INTO YOUR UBUNTU TERMINAL WINDOW ---

# echo "$(whoami) ALL=(ALL) NOPASSWD: /usr/bin/apt-add-repository, /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg, /bin/rm" | sudo tee /etc/sudoers.d/$(whoami)-notebook
# sudo chmod 0440 /etc/sudoers.d/$(whoami)-notebook
# echo "Passwordless sudo configured for specific commands in /etc/sudoers.d/$(whoami)-notebook"

# --- Then you can run commands like !sudo apt update -y in the notebook ---

## Verify Installation of Previously Installed Tools

In the README.md, you should have already installed several tools. Let's verify those installations before proceeding.

In [None]:
# Verify JAR Files Installation
print("--- Verifying JAR Files Installation ---")
utils_dir = os.getenv('PROJECT_UTILS_DIR')

# Define expected JARs
beagle_jar = "beagle.27Feb25.75f.jar"
hap_ibd_jar = "hap-ibd.jar"
refined_ibd_jar = "refined-ibd.17Jan20.102.jar"
refined_ibd_merge_jar = "merge-ibd-segments.17Jan20.102.jar"

jar_files = [beagle_jar, hap_ibd_jar, refined_ibd_jar, refined_ibd_merge_jar]

# Check each JAR file
if utils_dir:
    for jar in jar_files:
        jar_path = Path(utils_dir) / jar
        if jar_path.exists():
            print(f"✅ Found {jar} at {jar_path}")
        else:
            print(f"❌ Missing {jar}. Expected location: {jar_path}")
else:
    print("❌ Cannot verify JARs: Utils directory path not set in environment.")

# Verify LiftOver Installation
print("\n--- Verifying LiftOver Installation ---")
if utils_dir:
    liftover_bin = Path(utils_dir) / "liftOver"
    if liftover_bin.exists() and os.access(liftover_bin, os.X_OK):
        print(f"✅ Found LiftOver executable at {liftover_bin}")
    else:
        print(f"❌ LiftOver not found or not executable at {liftover_bin}")
else:
    print("❌ Cannot verify LiftOver: Utils directory path not set in environment.")

# Verify PLINK2 Installation
print("\n--- Verifying PLINK2 Installation ---")
if utils_dir:
    plink2_bin = Path(utils_dir) / "plink2"
    if plink2_bin.exists() and os.access(plink2_bin, os.X_OK):
        print(f"✅ Found PLINK2 executable at {plink2_bin}")
        !{plink2_bin} --version | head -n 1
    else:
        print(f"❌ PLINK2 not found or not executable at {plink2_bin}")
else:
    print("❌ Cannot verify PLINK2: Utils directory path not set in environment.")

# Verify Source-Built Tools
print("\n--- Verifying Source-Built Tools ---")
source_tools = [
    {"name": "IBIS", "path": "ibis/ibis"},
    {"name": "Ped-Sim", "path": "ped-sim/ped-sim"},
    {"name": "RFMix2", "path": "rfmix2/rfmix"}
]

if utils_dir:
    for tool in source_tools:
        tool_path = Path(utils_dir) / tool["path"]
        if tool_path.exists() and (os.access(tool_path, os.X_OK) or tool["name"] == "RFMix2"):
            print(f"✅ Found {tool['name']} at {tool_path}")
        else:
            print(f"❌ {tool['name']} not found or not executable at {tool_path}")
else:
    print("❌ Cannot verify source tools: Utils directory path not set in environment.")

# Verify BonsaiTree Installation
print("\n--- Verifying BonsaiTree Installation ---")
if utils_dir:
    bonsai_dir = Path(utils_dir) / "bonsaitree"
    if bonsai_dir.is_dir():
        print(f"✅ Found BonsaiTree directory at {bonsai_dir}")
        # Try to verify installation more thoroughly with Python import
        try:
            import_code = "import sys; "
            import_code += "from bonsaitree.v3.bonsai import build_pedigree; "
            import_code += "print('✅ BonsaiTree module imported successfully.')"
            !poetry run python -c "{import_code}"
        except Exception as e:
            print(f"⚠️ BonsaiTree directory exists but module import test failed: {e}")
    else:
        print(f"❌ BonsaiTree directory not found at {bonsai_dir}")
else:
    print("❌ Cannot verify BonsaiTree: Utils directory path not set in environment.")

## Configure R Library Path

If any of the previous verification steps failed, you may need to install or reinstall some tools. First, let's set up the R library path to allow for user-specific package installation.

In [None]:
# Create a personal R library directory and configure R to use it
!mkdir -p ~/R/library/
!grep -qxF '.libPaths(c("~/R/library/", .libPaths()))' ~/.Rprofile || echo '.libPaths(c("~/R/library/", .libPaths()))' >> ~/.Rprofile
print("Contents of ~/.Rprofile:")
!cat ~/.Rprofile
# You should see: .libPaths(c("~/R/library/", .libPaths()))

## Install Missing Tools (If Needed)

The following sections will help you install any tools that were missing during the verification step. If all tools were successfully verified, you can skip these sections.

### Install LiftOver (If Missing)

In [None]:
# Install UCSC LiftOver tool and hg19->hg38 chain file
utils_dir = os.getenv('PROJECT_UTILS_DIR')
refs_dir = os.getenv('PROJECT_REFERENCES_DIR')

if utils_dir and refs_dir:
    print("--- Installing LiftOver ---")
    liftover_bin = Path(utils_dir) / "liftOver"
    chain_file = Path(refs_dir) / "hg19ToHg38.over.chain.gz"

    if not liftover_bin.exists():
        print("Downloading LiftOver binary...")
        !wget -nv http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver -O "{liftover_bin}"
        !chmod +x "{liftover_bin}"
    else:
        print(f"LiftOver binary already exists: {liftover_bin}")

    if not chain_file.exists():
        print("Downloading hg19ToHg38 chain file...")
        !wget -nv http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz -P "{refs_dir}"
    else:
        print(f"Chain file already exists: {chain_file}")

    # Verify executable
    if liftover_bin.exists() and os.access(liftover_bin, os.X_OK):
        print("✅ LiftOver installed successfully.")
    else:
        print("❌ LiftOver installation verification failed.")
else:
    print("❌ Cannot install LiftOver: Utils or References directory path not set in environment.")

### Install JAR Files (If Missing)

In [None]:
# Download Java tools (if missing)
utils_dir = os.getenv('PROJECT_UTILS_DIR')

if utils_dir:
    print("--- Installing JAR Files ---")
    # Define JARs and URLs (versions match those in Dockerfile)
    beagle_jar = "beagle.27Feb25.75f.jar"
    hap_ibd_jar = "hap-ibd.jar"
    refined_ibd_jar = "refined-ibd.17Jan20.102.jar"
    refined_ibd_merge_jar = "merge-ibd-segments.17Jan20.102.jar"

    beagle_url = f"https://faculty.washington.edu/browning/beagle/{beagle_jar}"
    hap_ibd_url = f"https://faculty.washington.edu/browning/{hap_ibd_jar}"
    refined_ibd_url = f"https://faculty.washington.edu/browning/refined-ibd/{refined_ibd_jar}"
    refined_ibd_merge_url = f"https://faculty.washington.edu/browning/refined-ibd/{refined_ibd_merge_jar}"

    jar_files = {
        beagle_jar: beagle_url,
        hap_ibd_jar: hap_ibd_url,
        refined_ibd_jar: refined_ibd_url,
        refined_ibd_merge_jar: refined_ibd_merge_url
    }

    print("Downloading JAR files (if missing)...")
    for jar, url in jar_files.items():
        target_path = Path(utils_dir) / jar
        if not target_path.exists():
            logging.info(f"Downloading {jar} from {url}...")
            # Use !wget for download progress in notebook
            !wget --progress=dot:giga -O "{target_path}" "{url}"
            if target_path.exists():
                logging.info(f"Successfully downloaded {jar}")
                # chmod +x might not be strictly necessary for jars but doesn't hurt
                !chmod +x "{target_path}"
            else:
                logging.error(f"Failed to download {jar}")
        else:
            logging.info(f"✅ File already exists: {target_path}")

    # Verify Beagle (example)
    beagle_target_path = Path(utils_dir) / beagle_jar
    if beagle_target_path.exists():
         print("\nTesting Beagle JAR execution...")
         # Running java -jar might produce non-zero exit code if no args are given, which is ok
         !java -Xmx1g -jar "{beagle_target_path}" || echo "Beagle test command finished (ignore non-zero exit code if help message shown)."
    else:
         print(f"❌ Beagle JAR not found at {beagle_target_path}")

else:
    print("❌ Cannot install JARs: Utils directory path not set in environment.")

### Install BonsaiTree (If Missing)

In [None]:
# Install BonsaiTree (if missing)
utils_dir = os.getenv('PROJECT_UTILS_DIR')
project_dir = os.getenv('PROJECT_WORKING_DIR')

if utils_dir and project_dir:
    bonsai_dir = Path(utils_dir) / "bonsaitree"
    bonsai_repo = "https://github.com/23andMe/bonsaitree.git"
    original_dir = Path.cwd()

    print("--- Setting up BonsaiTree ---")
    if not bonsai_dir.is_dir():
        print(f"⬇️ Cloning BonsaiTree into {bonsai_dir}...")
        !git clone "{bonsai_repo}" "{bonsai_dir}"
        # Change ownership after cloning (important!)
        # Assumes user owns the directory where the repo was cloned (e.g., home dir)
        !chown -R $(whoami):$(whoami) "{bonsai_dir}"
    else:
        print(f"✅ BonsaiTree directory already exists: {bonsai_dir}")

    if bonsai_dir.is_dir():
        print(f"🛠️ Installing BonsaiTree package in {bonsai_dir}...")
        try:
            os.chdir(bonsai_dir)
            print(f"Changed CWD to: {Path.cwd()}")

            # Ensure poetry is aware of the correct python interpreter
            main_venv_path = Path(project_dir) / ".venv" / "bin" / "python"
            if main_venv_path.exists():
                 print(f"Using Python interpreter: {main_venv_path}")
                 !poetry env use "{main_venv_path}"
            else:
                 # Should not happen if README steps were followed, but fallback
                 print("Warning: Main project venv python not found, falling back to python3.12")
                 !poetry env use python3.12

            # Explicitly add dependencies via Poetry (matching Dockerfile)
            print("Adding all BonsaiTree dependencies via Poetry...")
            !poetry add Cython funcy numpy scipy six setuptools-scm wheel pandas frozendict

            # Build and install the package itself using pip --no-deps
            print("Building and installing BonsaiTree package itself (dependencies handled by Poetry)...")
            !poetry run pip install . --no-deps

            # Verify installation
            print("Verifying BonsaiTree import...")
            verify_code = "import sys; " 
            # No need to add '.' to sys.path if installed correctly via pip
            verify_code += "from bonsaitree.v3.bonsai import build_pedigree; "
            verify_code += "print('✅ BonsaiTree module imported successfully.')"
            !poetry run python -c "{verify_code}"

        except Exception as e:
            print(f"❌ An error occurred during BonsaiTree setup: {e}")
        finally:
            # IMPORTANT: Change back to the original directory
            os.chdir(original_dir)
            print(f"Returned CWD to: {Path.cwd()}")
    else:
        # This case implies git clone failed
        print(f"❌ BonsaiTree directory does not exist after attempting clone: {bonsai_dir}")
else:
    print("❌ Cannot install BonsaiTree: Utils or Working directory path not set in environment.")

### Install IBIS, Ped-Sim, RFMix2 (If Missing)

In [None]:
# Install tools that require compilation (if missing)
utils_dir = os.getenv('PROJECT_UTILS_DIR')
project_dir = os.getenv('PROJECT_WORKING_DIR')

if utils_dir and project_dir:
    original_dir = Path.cwd() # Store current directory

    # --- Tool Definitions ---
    tools = {
        'IBIS': {
            'repo': "https://github.com/williamslab/ibis.git",
            'dir': Path(utils_dir) / "ibis",
            'build_cmd': "make",
            'executable': "ibis",
            'clone_opts': "--recurse-submodules"
        },
        'Ped-Sim': {
            'repo': "https://github.com/williamslab/ped-sim.git",
            'dir': Path(utils_dir) / "ped-sim",
            'build_cmd': "make && chmod +x ped-sim",
            'executable': "ped-sim",
            'clone_opts': "--recurse-submodules"
        },
        'RFMix2': {
            'repo': "https://github.com/slowkoni/rfmix.git",
            'dir': Path(utils_dir) / "rfmix2",
            'build_cmd': "aclocal && autoheader && autoconf && automake --add-missing && ./configure && make",
            'executable': "rfmix",
            'clone_opts': ""
        }
    }

    # --- Installation Loop ---
    all_successful = True
    for name, config in tools.items():
        print(f"\n--- Setting up {name} ---")
        tool_dir = config['dir']
        repo_url = config['repo']
        build_cmd = config['build_cmd']
        executable = tool_dir / config['executable']
        clone_opts = config['clone_opts']

        # Clone if directory doesn't exist
        if not tool_dir.is_dir():
            print(f"⬇️ Cloning {name} repository...")
            # Use !git directly without variable substitution in the command itself
            clone_command = f"git clone {clone_opts} \"{repo_url}\" \"{tool_dir}\""
            
            # Execute the command using the get_ipython method
            import subprocess
            try:
                # Option 1: Using subprocess (more reliable)
                clone_process = subprocess.run(clone_command, shell=True, capture_output=True, text=True)
                clone_result = clone_process.stdout.splitlines() + clone_process.stderr.splitlines()
                
                # Option 2: Alternative using get_ipython() if in Jupyter 
                # (only use one option, not both)
                # clone_result = get_ipython().getoutput(clone_command)
            except Exception as e:
                clone_result = [str(e)]
            
            if not tool_dir.is_dir(): # Check if clone succeeded
                print(f"❌ Git clone failed for {name}. Output:")
                for line in clone_result:
                    print(line)
                all_successful = False
                continue # Skip to next tool
            # Ensure user owns the directory
            subprocess.run(f"chown -R $(whoami):$(whoami) {tool_dir}", shell=True)
        else:
            print(f"✅ {name} directory already exists: {tool_dir}")
            # Optional: Add update logic here (e.g., git pull)

        # Build if executable doesn't exist or isn't executable
        if tool_dir.is_dir():
            if not executable.exists() or not os.access(executable, os.X_OK):
                print(f"🛠️ Building {name} in {tool_dir}...")
                current_cwd = Path.cwd() # Remember where we are before changing
                try:
                    os.chdir(tool_dir) # Go into the tool directory
                    print(f"Changed CWD to: {Path.cwd()}")
                    # Execute the build command using bash -c
                    build_script = f"""
                    set -e
                    echo 'Running build command for {name}...'
                    {build_cmd}
                    echo 'Build command finished for {name}.'
                    """
                    build_process = subprocess.run(f"bash -c '{build_script}'", shell=True, capture_output=True, text=True)
                    build_process_result = build_process.stdout.splitlines() + build_process.stderr.splitlines()
                    # print('\n'.join(build_process_result)) # Optional: Show build output

                    # Check again if executable exists after build
                    if executable.exists() and os.access(executable, os.X_OK):
                        print(f"✅ {name} build successful.")
                    else:
                        print(f"❌ {name} build failed (executable {executable.name} not found or not executable after build attempt).")
                        # Print output if build failed
                        print("Build Output/Error:")
                        print('\n'.join(build_process_result))
                        all_successful = False

                except Exception as e:
                    print(f"❌ An error occurred during {name} build: {e}")
                    all_successful = False
                finally:
                    os.chdir(current_cwd) # Go back to where we were before cd'ing into tool dir
                    print(f"Returned CWD to: {Path.cwd()}")
            else:
                 print(f"✅ {name} executable already exists and is executable: {executable}")
        else:
             # This case should not be reached if clone check works
             print(f"❌ Cannot build {name}, directory does not exist: {tool_dir}")
             all_successful = False

    if not all_successful:
         print("\n⚠️ One or more tools failed to build. Please review the logs above.")

else:
    print("❌ Cannot install source tools: Utils or Working directory path not set in environment.")

### Install PLINK2 Binary (If Missing)

In [None]:
# Install PLINK2 binary (if missing)
utils_dir = os.getenv('PROJECT_UTILS_DIR')

if utils_dir:
    print("--- Setting up PLINK2 ---")
    plink2_zip_url = "https://s3.amazonaws.com/plink2-assets/alpha6/plink2_linux_x86_64_20241206.zip"
    plink2_zip_file = Path(utils_dir) / "plink2_linux_x86_64_20241206.zip"
    plink2_binary = Path(utils_dir) / "plink2"

    if not plink2_binary.exists():
        print(f"⬇️ Downloading PLINK2 zip from {plink2_zip_url}...")
        !wget --progress=dot:giga "{plink2_zip_url}" -O "{plink2_zip_file}"

        if plink2_zip_file.exists():
            print("📦 Unzipping PLINK2...")
            # Use -o to overwrite if necessary, ensure it extracts directly to utils_dir
            !unzip -o "{plink2_zip_file}" plink2 -d "{utils_dir}"

            # Check if unzip created the binary directly
            if plink2_binary.exists():
                 print("✅ PLINK2 unzipped.")
                 !chmod +x "{plink2_binary}"
                 !rm "{plink2_zip_file}" # Clean up zip
            else:
                 print(f"❌ Failed to find plink2 binary directly in {utils_dir} after unzipping.")
                 # Check common issue: zip might contain parent dir; plink2 binary usually top-level in zip
                 # If unzip command above failed, this check won't help much. Check unzip output.
                 print("Please check unzip output and manually place 'plink2' in the utils directory if needed.")
        else:
            print("❌ PLINK2 zip download failed.")
    else:
        print(f"✅ PLINK2 binary already exists: {plink2_binary}")

    # Verify PLINK2
    if plink2_binary.exists() and os.access(plink2_binary, os.X_OK):
        print("\n🔍 Verifying PLINK2 version:")
        !"{plink2_binary}" --version
    else:
        print("❌ PLINK2 installation verification failed.")
else:
    print("❌ Cannot install PLINK2: Utils directory path not set in environment.")

## Copy Initial Data

The project includes some initial data (`class_data`). This step copies it from the repository's default location into the `data` directory specified in your `~/.env` file, if they are different and the source exists. This replicates logic from the `scripts_env/directory_setup.py` script.

In [None]:
import os
import shutil
from pathlib import Path
import logging # Use logging already configured

logging.info("--- Checking and copying initial data ---")

# Get paths from environment (loaded from ~/.env)
working_dir = os.getenv('PROJECT_WORKING_DIR')
user_data_dir = os.getenv('PROJECT_DATA_DIR')

if not working_dir or not user_data_dir:
    logging.error("PROJECT_WORKING_DIR or PROJECT_DATA_DIR not found in environment. Cannot copy data.")
else:
    # Define the potential source location within the cloned repository
    repo_data_source_dir = Path(working_dir) / "data" / "class_data"

    # Define the target location within the user-configured data directory
    target_data_dest_dir = Path(user_data_dir) / "class_data"

    logging.info(f"Source data location (repo): {repo_data_source_dir}")
    logging.info(f"Target data location (user): {target_data_dest_dir}")

    # Check if source exists
    if repo_data_source_dir.is_dir():
        # Use resolved paths for comparison to handle symlinks etc.
        if repo_data_source_dir.resolve() != target_data_dest_dir.resolve():
            logging.info(f"Source and target differ. Preparing to copy data...")
            try:
                # Ensure parent of target exists
                target_data_dest_dir.parent.mkdir(parents=True, exist_ok=True)
                logging.info(f"Copying tree from {repo_data_source_dir} to {target_data_dest_dir}")
                # Copy tree, allowing overwrite of existing target directory content
                shutil.copytree(repo_data_source_dir, target_data_dest_dir, dirs_exist_ok=True)
                logging.info(f"✅ Data successfully copied.")
            except Exception as e:
                logging.error(f"Error copying data: {e}")
        else:
            logging.info("Source and target data directories are the same. Skipping copy.")
    else:
        logging.warning(f"Source data directory {repo_data_source_dir} not found. Cannot copy.")

## Exporting a Jupyter Notebook to PDF with Poetry

This guide explains how to **export a Jupyter Notebook (`.ipynb`) to a PDF** using Poetry's virtual environment. This process requires a LaTeX installation (`texlive` packages were installed in the initial setup).

### **Running the Conversion in the Terminal (Recommended)**
Running the conversion from your Ubuntu terminal is the most reliable method.

1.  **Navigate to your project's base directory:**
    ```bash
    # If you cloned to your home directory:
    cd ~/computational_genetic_genealogy
    # Or navigate to the correct PROJECT_BASE_DIR you used during setup
    ```

2.  **Run the conversion command:**
    Use `poetry run` to execute the `jupyter nbconvert` command within the project's Python environment.

    *   **Example (Save PDF in the same directory as the notebook):**
        To convert this notebook (`Lab0_Code_Environment.ipynb`), which is located in the `instructions/` subdirectory:
        ```bash
        poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb
        ```
        *(The PDF will appear inside the `instructions/` folder.)*

    *   **Example (Save PDF to a specific directory, e.g., `results`):**
        Use the `--output-dir` option. You need to provide the *actual path* to your results directory.
        ```bash
        # Replace '/path/to/your/computational_genetic_genealogy/results' with the real path
        # This path should match the PROJECT_RESULTS_DIR value in your ~/.env file
        poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb --output-dir='/path/to/your/computational_genetic_genealogy/results'

        # Example if your project is in the home directory:
        # poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb --output-dir="$HOME/computational_genetic_genealogy/results"
        ```

---

### **Running Conversion Inside This Jupyter Notebook (Alternative)**
You can *attempt* to run the conversion from a code cell below, but be aware of potential issues:
*   **Rendering:** Complex outputs or interactive elements might not render correctly in the PDF.
*   **Environment:** Accessing the correct paths (like the results directory) requires using Python variables (e.g., `os.getenv`).

The terminal method is generally preferred for stability.

### Convert this Notebook to PDF (Optional)

Run the following cell here in Jupyter Notebook to attempt the conversion. The PDF will be saved in the same directory as this notebook (`instructions/`). To access it from outside WSL (e.g., Windows File Explorer), navigate to `\\\\wsl$\\\\Ubuntu-24.04` (or your WSL distribution name) and then browse to the full path of the `instructions` directory within your project.

In [None]:
# Attempt to convert this notebook to PDF using the Poetry environment
# Note: The notebook's CWD should be the project's base directory now.
print(f"Current directory for conversion: {os.getcwd()}")
notebook_path = "instructions/Lab0_Code_Environment.ipynb"

if Path(notebook_path).exists():
    print(f"Converting {notebook_path} to PDF...")
    !poetry run jupyter nbconvert --to pdf "{notebook_path}"
else:
    print(f"Error: Notebook not found at relative path {notebook_path} from CWD {os.getcwd()}")

# Example: Convert and save to results directory using Python variable
# results_dir = os.getenv('PROJECT_RESULTS_DIR')
# if results_dir and Path(notebook_path).exists():
#    print(f"\nConverting {notebook_path} and saving PDF to {results_dir}...")
#    !poetry run jupyter nbconvert --to pdf "{notebook_path}" --output-dir="{results_dir}"
# else:
#    print("Could not save to results directory (path not set or notebook not found).")

# **🚀 Start Your Labs in the Fully Configured Environment!**

---

## ✅ **Setup Completed! Your Environment is Ready.**
Your system has been successfully configured based on the project's Docker environment specification:

- 🖥️ **OS:** Ubuntu 24.04
- 🐍 **Python:** 3.12 managed by Poetry (`~/.local/bin/poetry`) in `./.venv`
- ☕ **Java:** OpenJDK 21 (`$JAVA_HOME` configured)
- 🧬 **Genomics Tools:**
  - Samtools/BCFtools/Tabix v1.18 (Built from source)
  - Beagle, HapIBD, RefinedIBD (JARs in `utils`)
  - BonsaiTree (Installed via pip in Poetry env, source in `utils`)
  - LiftOver (Binary in `utils`, Chain file in `references`)
  - IBIS, Ped-Sim, RFMix2 (Built from source in `utils`)
  - PLINK2 (Binary in `utils`)
- 📁 **Directories:** Project structure created (`data`, `results`, `references`, `utils` within `PROJECT_BASE_DIR`).
- ⚙️ **Configuration:** Environment variables loaded from `~/.env`.
- ➕ **PATH:** Includes `~/.local/bin` and `${PROJECT_BASE_DIR}/utils`.
- 🚀 **Notebook:** Ready to run analyses with the correct interpreter selected.

---

✅ **Your local Ubuntu environment is fully set up. Get started on your next lab now!** 🚀

## Introduction to Code Environments

The remainder of this notebook contains educational content about code environments in general. This information is not required for setup but provides valuable context for understanding the environment you've just configured.

### What is a Code Environment?

A **code environment** is the structured setup in which software applications run, including dependencies, libraries, and configurations necessary for execution. Code environments ensure that software behaves consistently across different machines and operating systems, preventing issues related to dependency conflicts and system inconsistencies.

### Why Do Code Environments Matter?

- **Reproducibility** – Ensures that code runs the same way across different systems, facilitating collaboration and research reproducibility.
- **Dependency Management** – Prevents conflicts between different software packages by isolating dependencies.
- **System Stability** – Protects the main operating system from unnecessary installations and modifications.
- **Scalability** – Makes it easier to scale applications across multiple machines, cloud environments, or containerized deployments.

## Virtual Environments with Poetry

Our class is using **Poetry**, a modern dependency management tool for Python that simplifies package installation, versioning, and virtual environment creation. Poetry offers an elegant solution by combining dependency management and virtual environment creation into a single workflow.

### Why Use Poetry?

- **Automated Virtual Environments** – Poetry automatically creates and manages virtual environments for projects.
- **Simplified Dependency Management** – Uses a `pyproject.toml` file instead of a `requirements.txt`, making package tracking more structured.
- **Reproducibility** – The `poetry.lock` file ensures that everyone working on the project installs the exact same package versions.
- **Seamless Package Publishing** – Poetry simplifies the process of building and publishing Python packages.

### Key Poetry Commands

| Command | Description |
|---------|-------------|
| `poetry new my_project` | Creates a new Poetry project with a `pyproject.toml` file |
| `poetry install` | Installs dependencies and sets up the virtual environment |
| `poetry add <package>` | Adds a new package to the project |
| `poetry remove <package>` | Removes a package from the project |
| `poetry shell` | Activates the project's virtual environment |
| `poetry run <command>` | Runs a command inside the virtual environment |
| `poetry lock` | Locks the dependencies to exact versions for consistency |

## Docker: Containerized Code Environments

While Poetry helps manage dependencies within Python projects, **Docker** provides an alternative approach by encapsulating an entire system environment, including the OS, into a container. Unlike virtual environments, which only manage dependencies at the application level, Docker offers a complete solution for deploying applications across different systems.

### Key Features of Docker:
- **Portability** – Containers run identically on any system with Docker installed.
- **Isolation** – Each container runs independently, preventing dependency conflicts.
- **Scalability** – Facilitates cloud-based and microservices architectures.

### When to Use Poetry vs. Docker:

| Feature            | Poetry (Virtual Environment) | Docker (Containerization) |
|--------------------|---------------------------|--------------------------| 
| **Scope**         | Manages Python dependencies within a project | Encapsulates the entire OS and software stack |
| **Reproducibility** | Ensures consistent package versions | Provides full OS-level consistency |
| **Portability**   | Works across Python projects on the same system | Runs across different machines and cloud platforms |
| **Resource Usage** | Lightweight | Slightly heavier due to system overhead |
| **Best Use Case** | Managing dependencies for Python projects | Deploying applications in diverse environments |

## Responsibility for the Code Environment

While I will maintain the **Docker environment**, the focus of this class is on **running and understanding the genomic analysis code itself**. The Docker image provides a controlled and reproducible environment, ensuring that all necessary dependencies are pre-installed and configured correctly. By using Docker, you eliminate potential compatibility issues and can focus on the **analysis and interpretation of genomic data**.

A well-structured code environment is essential for ensuring software **stability**, **reproducibility**, and **efficiency**. In our setup:
- **Poetry** simplifies dependency and virtual environment management, ensuring consistency across Python projects.
- **Docker** provides an isolated, reproducible system environment, making it ideal for deploying applications across different machines.

Understanding when to use each tool helps streamline development workflows. **Poetry** is best suited for managing dependencies within a Python project, while **Docker** ensures complete system encapsulation for broader deployment and portability needs. By combining both tools effectively, we create an environment that supports seamless collaboration, minimal dependency conflicts, and efficient software deployment.