# Lab 0: Code Environment

# Lab 0: Code Environment

## Introduction

This notebook guides you through the final steps of setting up your local Ubuntu 24.04 environment for the Computational Genetic Genealogy course. It focuses on installing specific bioinformatics tools and verifying the configuration after you have completed the initial setup described in the `README.md` file.

## Prerequisites Check

Before proceeding with this notebook, ensure you have completed the **initial setup steps outlined in the `README.md` file** under the "Ubuntu 24.04 Local Setup" section. This includes:

1.  ✅ Cloning the `computational_genetic_genealogy` repository (referred to as `PROJECT_BASE_DIR`).
2.  ✅ Manually creating the subdirectories (`data`, `results`, `references`, `utils`) within `PROJECT_BASE_DIR`.
3.  ✅ Manually creating the `~/.env` file (in your home directory) with the correct absolute paths pointing to `PROJECT_BASE_DIR` and its subdirectories.
4.  ✅ Installing all required system packages via `sudo apt install ...` (including Java 21, Python 3.12, build tools, R, TeX, etc.).
5.  ✅ **Building and installing HTSlib, Samtools, and BCFtools version 1.18 from source** using the commands provided in the README.
6.  ✅ Installing `poetry` via `pipx` and ensuring `~/.local/bin` is in your PATH.
7.  ✅ Running `poetry install --no-root` within the `PROJECT_BASE_DIR` directory to install Python dependencies into `./.venv`.
8.  ✅ Manually adding the `${PROJECT_BASE_DIR}/utils` directory to your `~/.bashrc` PATH and sourcing the file (e.g., `source ~/.bashrc`).
9.  ✅ Configuring the `JAVA_HOME` environment variable in `~/.bashrc`.

**Important Note:** This notebook assumes the `.env` file is located in your **home directory (`~/.env`)**, aligning with the Docker environment setup, and therefore **does not run** the `scripts_env/directory_setup.py` script.

## Selecting Python Interpreter in VS Code

Your Jupyter Notebook needs to know which Python environment (including installed packages) to use. Select the interpreter created by Poetry in the previous steps:

1.  Open the Command Palette in VS Code (Ctrl+Shift+P or Cmd+Shift+P).
2.  Type and select `Python: Select Interpreter`.
3.  Choose the option that points to the `.venv` directory within your project folder (`computational_genetic_genealogy`). It might look like `Python 3.12 ('computational_genetic_genealogy/.venv': Poetry)` or similar.

If you don't see the correct interpreter, ensure you successfully ran `poetry install --no-root` in the terminal within your `PROJECT_BASE_DIR`.

## Load Environment Variables and Verify Setup

First, we load necessary libraries and the directory paths defined in your `~/.env` file.

In [None]:
import os
import sys
import shutil
from pathlib import Path
from dotenv import load_dotenv
import logging

# Configure basic logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

def load_env_file():
    """Load the .env file from the user's home directory."""
    env_path = Path.home() / '.env'
    logging.info(f"Attempting to load environment variables from: {env_path}")
    if not env_path.exists():
        logging.error(f"'.env' file not found in home directory ({Path.home()}).")
        logging.error("Please ensure you have created it as per the README instructions.")
        raise FileNotFoundError(f"'.env' file not found at {env_path}")

    try:
        # Load the .env file, overriding existing environment variables
        load_dotenv(env_path, override=True)
        logging.info(f"Successfully loaded environment variables from: {env_path}")
        return env_path
    except Exception as e:
        logging.error(f"Error loading .env file: {e}")
        raise

# --- Load and Set Environment Variables ---
try:
    env_path = load_env_file()

    # Get variables from environment (loaded from .env)
    working_directory = os.getenv('PROJECT_WORKING_DIR')
    data_directory = os.getenv('PROJECT_DATA_DIR')
    references_directory = os.getenv('PROJECT_REFERENCES_DIR')
    results_directory = os.getenv('PROJECT_RESULTS_DIR')
    utils_directory = os.getenv('PROJECT_UTILS_DIR')

    # Verify paths were loaded
    required_paths = {
        'PROJECT_WORKING_DIR': working_directory,
        'PROJECT_DATA_DIR': data_directory,
        'PROJECT_REFERENCES_DIR': references_directory,
        'PROJECT_RESULTS_DIR': results_directory,
        'PROJECT_UTILS_DIR': utils_directory
    }
    missing_vars = [name for name, path in required_paths.items() if not path]
    if missing_vars:
        logging.error(f"Missing environment variables in {env_path}: {', '.join(missing_vars)}")
        raise ValueError(f"Missing required paths in {env_path}. Please check the file.")

    # Set environment variables for shell commands (! or %%) within the notebook
    # Note: Python's os.environ affects subprocesses launched *from* this notebook process.
    os.environ["PROJECT_WORKING_DIR"] = working_directory
    os.environ["PROJECT_DATA_DIR"] = data_directory
    os.environ["PROJECT_REFERENCES_DIR"] = references_directory
    os.environ["PROJECT_RESULTS_DIR"] = results_directory
    os.environ["PROJECT_UTILS_DIR"] = utils_directory

    logging.info(f"Working Directory set to: {working_directory}")
    logging.info(f"Data Directory set to: {data_directory}")
    logging.info(f"References Directory set to: {references_directory}")
    logging.info(f"Results Directory set to: {results_directory}")
    logging.info(f"Utils Directory set to: {utils_directory}")

    # Change Notebook's Current Working Directory (CWD)
    if working_directory and Path(working_directory).is_dir():
        os.chdir(working_directory)
        logging.info(f"Changed notebook CWD to: {os.getcwd()}")
    else:
        logging.warning(f"Working directory '{working_directory}' not found or not a directory. Cannot change CWD.")
        raise NotADirectoryError(f"Working directory path not valid: {working_directory}")

except (FileNotFoundError, ValueError, NotADirectoryError) as e:
    logging.critical(f"Critical setup error: {e}. Please fix the setup and restart the kernel.")
    # Optionally, stop execution more forcefully if needed, though this can be abrupt:
    # raise SystemExit("Stopping execution due to critical setup error.")

## Verify Initial Setup

Let's check the versions of key software installed via the terminal steps in the README.

In [1]:
# Verify Java Installation
print("--- Verifying Java (Should be OpenJDK 21) ---")
!java -version
print("\n--- Verifying JAVA_HOME ---")
# Note: $JAVA_HOME might not be directly visible here if not exported correctly for the notebook kernel
# Running java -version is a more reliable check within the notebook
!echo $JAVA_HOME

# Verify Samtools/BCFtools/Tabix Installation (Should be v1.18 from source)
print("\n--- Verifying Samtools/BCFtools/Tabix (should be v1.18) ---")
!samtools --version | head -n 1
!bcftools --version | head -n 1
print("\n--- Checking Tabix --- ")
!which tabix
!tabix --version

# Verify R Installation
print("\n--- Verifying R ---")
!R --version | head -n 1

--- Verifying Java (Should be OpenJDK 21) ---
openjdk version "21.0.6" 2025-01-21
OpenJDK Runtime Environment (build 21.0.6+7-Ubuntu-124.04.1)
OpenJDK 64-Bit Server VM (build 21.0.6+7-Ubuntu-124.04.1, mixed mode, sharing)

--- Verifying JAVA_HOME ---


--- Verifying Samtools/BCFtools/Tabix (should be v1.18) ---
samtools 1.19.2
bcftools 1.19

--- Checking Tabix --- 
/usr/bin/tabix
tabix (htslib) 1.19
Copyright (C) 2023 Genome Research Ltd.

--- Verifying R ---
R version 4.3.3 (2024-02-29) -- "Angel Food Cake"


## Configure sudo for Notebook (Optional)

This step allows specific `sudo` commands (like `apt` or `rm`) to be run within notebook cells using `!sudo ...` without requiring a password. This can be convenient but modifies system configuration (`/etc/sudoers.d`). **Only run this if you understand the security implications and find it necessary.** You will need to copy and paste the command block into your Ubuntu terminal and enter your password there.

In [None]:
%%bash
# --- DO NOT RUN THIS CELL IN JUPYTER NOTEBOOK ---
# --- COPY AND PASTE THE COMMANDS BELOW INTO YOUR UBUNTU TERMINAL WINDOW ---

# echo "$(whoami) ALL=(ALL) NOPASSWD: /usr/bin/apt-add-repository, /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg, /bin/rm" | sudo tee /etc/sudoers.d/$(whoami)-notebook
# sudo chmod 0440 /etc/sudoers.d/$(whoami)-notebook
# echo "Passwordless sudo configured for specific commands in /etc/sudoers.d/$(whoami)-notebook"

# --- Then you can run commands like !sudo apt update -y in the notebook ---

## Install Remaining Utilities

Now we install the bioinformatics tools that were not covered by the initial system setup.

### Configure R Library Path

In [None]:
# Create a personal R library directory and configure R to use it
!mkdir -p ~/R/library/
!grep -qxF '.libPaths(c("~/R/library/", .libPaths()))' ~/.Rprofile || echo '.libPaths(c("~/R/library/", .libPaths()))' >> ~/.Rprofile
print("Contents of ~/.Rprofile:")
!cat ~/.Rprofile
# You should see: .libPaths(c("~/R/library/", .libPaths()))

### Install LiftOver

In [None]:
# Install UCSC LiftOver tool and hg19->hg38 chain file
utils_dir = os.getenv('PROJECT_UTILS_DIR')
refs_dir = os.getenv('PROJECT_REFERENCES_DIR')

if utils_dir and refs_dir:
    print("--- Installing LiftOver ---")
    liftover_bin = Path(utils_dir) / "liftOver"
    chain_file = Path(refs_dir) / "hg19ToHg38.over.chain.gz"

    if not liftover_bin.exists():
        print("Downloading LiftOver binary...")
        !wget -nv http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver -O "{liftover_bin}"
        !chmod +x "{liftover_bin}"
    else:
        print(f"LiftOver binary already exists: {liftover_bin}")

    if not chain_file.exists():
        print("Downloading hg19ToHg38 chain file...")
        !wget -nv http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz -P "{refs_dir}"
    else:
        print(f"Chain file already exists: {chain_file}")

    # Verify executable
    if liftover_bin.exists() and os.access(liftover_bin, os.X_OK):
        print("✅ LiftOver installed successfully.")
    else:
        print("❌ LiftOver installation verification failed.")
else:
    print("❌ Cannot install LiftOver: Utils or References directory path not set in environment.")

### Install JAR Files (Beagle, HapIBD, RefinedIBD)

In [2]:
# Download Java tools
utils_dir = os.getenv('PROJECT_UTILS_DIR')

if utils_dir:
    print("--- Installing JAR Files ---")
    # Define JARs and URLs based on the target Dockerfile
    beagle_jar = "beagle.27Feb25.75f.jar"
    hap_ibd_jar = "hap-ibd.jar"
    refined_ibd_jar = "refined-ibd.17Jan20.102.jar"
    refined_ibd_merge_jar = "merge-ibd-segments.17Jan20.102.jar"

    beagle_url = f"https://faculty.washington.edu/browning/beagle/{beagle_jar}"
    hap_ibd_url = f"https://faculty.washington.edu/browning/{hap_ibd_jar}"
    refined_ibd_url = f"https://faculty.washington.edu/browning/refined-ibd/{refined_ibd_jar}"
    refined_ibd_merge_url = f"https://faculty.washington.edu/browning/refined-ibd/{refined_ibd_merge_jar}"

    jar_files = {
        beagle_jar: beagle_url,
        hap_ibd_jar: hap_ibd_url,
        refined_ibd_jar: refined_ibd_url,
        refined_ibd_merge_jar: refined_ibd_merge_url
    }

    print("Downloading JAR files (if missing)...")
    for jar, url in jar_files.items():
        target_path = Path(utils_dir) / jar
        if not target_path.exists():
            logging.info(f"Downloading {jar} from {url}...")
            # Use !wget for download progress in notebook
            !wget --progress=dot:giga -O "{target_path}" "{url}"
            if target_path.exists():
                logging.info(f"Successfully downloaded {jar}")
                # chmod +x might not be strictly necessary for jars but doesn't hurt
                !chmod +x "{target_path}"
            else:
                logging.error(f"Failed to download {jar}")
        else:
            logging.info(f"✅ File already exists: {target_path}")

    # Verify Beagle (example)
    beagle_target_path = Path(utils_dir) / beagle_jar
    if beagle_target_path.exists():
         print("\nTesting Beagle JAR execution...")
         # Running java -jar might produce non-zero exit code if no args are given, which is ok
         !java -Xmx1g -jar "{beagle_target_path}" || echo "Beagle test command finished (ignore non-zero exit code if help message shown)."
    else:
         print(f"❌ Beagle JAR not found at {beagle_target_path}")

else:
    print("❌ Cannot install JARs: Utils directory path not set in environment.")

NameError: name 'os' is not defined

### Install BonsaiTree

In [None]:
# Install BonsaiTree
utils_dir = os.getenv('PROJECT_UTILS_DIR')
project_dir = os.getenv('PROJECT_WORKING_DIR')

if utils_dir and project_dir:
    bonsai_dir = Path(utils_dir) / "bonsaitree"
    bonsai_repo = "https://github.com/23andMe/bonsaitree.git"
    original_dir = Path.cwd()

    print("--- Setting up BonsaiTree ---")
    if not bonsai_dir.is_dir():
        print(f"⬇️ Cloning BonsaiTree into {bonsai_dir}...")
        !git clone "{bonsai_repo}" "{bonsai_dir}"
        # Change ownership after cloning (important!)
        # Assumes user owns the directory where the repo was cloned (e.g., home dir)
        !chown -R $(whoami):$(whoami) "{bonsai_dir}"
    else:
        print(f"✅ BonsaiTree directory already exists: {bonsai_dir}")

    if bonsai_dir.is_dir():
        print(f"🛠️ Installing BonsaiTree package in {bonsai_dir}...")
        try:
            os.chdir(bonsai_dir)
            print(f"Changed CWD to: {Path.cwd()}")

            # Ensure poetry is aware of the correct python interpreter
            main_venv_path = Path(project_dir) / ".venv" / "bin" / "python"
            if main_venv_path.exists():
                 print(f"Using Python interpreter: {main_venv_path}")
                 !poetry env use "{main_venv_path}"
            else:
                 # Should not happen if README steps were followed, but fallback
                 print("Warning: Main project venv python not found, falling back to python3.12")
                 !poetry env use python3.12

            # Explicitly add dependencies via Poetry
            print("Adding all BonsaiTree dependencies via Poetry...")
            !poetry add Cython funcy numpy scipy six setuptools-scm wheel pandas frozendict

            # Build and install the package itself using pip --no-deps
            print("Building and installing BonsaiTree package itself (dependencies handled by Poetry)...")
            !poetry run pip install . --no-deps

            # Verify installation
            print("Verifying BonsaiTree import...")
            verify_code = "import sys; "
            # No need to add '.' to sys.path if installed correctly via pip
            verify_code += "from bonsaitree.v3.bonsai import build_pedigree; "
            verify_code += "print('✅ BonsaiTree module imported successfully.')"
            !poetry run python -c "{verify_code}"

        except Exception as e:
            print(f"❌ An error occurred during BonsaiTree setup: {e}")
        finally:
            # IMPORTANT: Change back to the original directory
            os.chdir(original_dir)
            print(f"Returned CWD to: {Path.cwd()}")
    else:
        # This case implies git clone failed
        print(f"❌ BonsaiTree directory does not exist after attempting clone: {bonsai_dir}")
else:
    print("❌ Cannot install BonsaiTree: Utils or Working directory path not set in environment.")

### Install IBIS, Ped-Sim, RFMix2 (Source Build)

In [None]:
# Install tools that require compilation
utils_dir = os.getenv('PROJECT_UTILS_DIR')
project_dir = os.getenv('PROJECT_WORKING_DIR')

if utils_dir and project_dir:
    original_dir = Path.cwd() # Store current directory

    # --- Tool Definitions ---
    tools = {
        'IBIS': {
            'repo': "https://github.com/williamslab/ibis.git",
            'dir': Path(utils_dir) / "ibis",
            'build_cmd': "make",
            'executable': "ibis",
            'clone_opts': "--recurse-submodules"
        },
        'Ped-Sim': {
            'repo': "https://github.com/williamslab/ped-sim.git",
            'dir': Path(utils_dir) / "ped-sim",
            'build_cmd': "make && chmod +x ped-sim",
            'executable': "ped-sim",
            'clone_opts': "--recurse-submodules"
        },
        'RFMix2': {
            'repo': "https://github.com/slowkoni/rfmix.git",
            'dir': Path(utils_dir) / "rfmix2",
            'build_cmd': "aclocal && autoheader && autoconf && automake --add-missing && ./configure && make",
            'executable': "rfmix",
            'clone_opts': ""
        }
    }

    # --- Installation Loop ---
    all_successful = True
    for name, config in tools.items():
        print(f"\n--- Setting up {name} ---")
        tool_dir = config['dir']
        repo_url = config['repo']
        build_cmd = config['build_cmd']
        executable = tool_dir / config['executable']
        clone_opts = config['clone_opts']

        # Clone if directory doesn't exist
        if not tool_dir.is_dir():
            print(f"⬇️ Cloning {name} repository...")
            # Use !git for simplicity in notebook
            clone_command = f"git clone {clone_opts} \"{repo_url}\" \"{tool_dir}\""
            clone_result = !{clone_command}
            if not tool_dir.is_dir(): # Check if clone succeeded
               print(f"❌ Git clone failed for {name}. Output:")
               print('\n'.join(clone_result))
               all_successful = False
               continue # Skip to next tool
            # Ensure user owns the directory
            !chown -R $(whoami):$(whoami) "{tool_dir}"
        else:
            print(f"✅ {name} directory already exists: {tool_dir}")
            # Optional: Add update logic here (e.g., git pull)

        # Build if executable doesn't exist or isn't executable
        if tool_dir.is_dir():
            if not executable.exists() or not os.access(executable, os.X_OK):
                print(f"🛠️ Building {name} in {tool_dir}...")
                current_cwd = Path.cwd() # Remember where we are before changing
                try:
                    os.chdir(tool_dir) # Go into the tool directory
                    print(f"Changed CWD to: {Path.cwd()}")
                    # Execute the build command using bash -c
                    build_script = f"""
                    set -e
                    echo 'Running build command for {name}...'
                    {build_cmd}
                    echo 'Build command finished for {name}.'
                    """
                    # Using subprocess might capture errors better, but !bash is simpler for now
                    build_process_result = !bash -c '{build_script}'
                    # print('\n'.join(build_process_result)) # Optional: Show build output

                    # Check again if executable exists after build
                    if executable.exists() and os.access(executable, os.X_OK):
                        print(f"✅ {name} build successful.")
                    else:
                        print(f"❌ {name} build failed (executable {executable.name} not found or not executable after build attempt).")
                        # Print output if build failed
                        print("Build Output/Error:")
                        print('\n'.join(build_process_result))
                        all_successful = False

                except Exception as e:
                    print(f"❌ An error occurred during {name} build: {e}")
                    all_successful = False
                finally:
                    os.chdir(current_cwd) # Go back to where we were before cd'ing into tool dir
                    print(f"Returned CWD to: {Path.cwd()}")
            else:
                 print(f"✅ {name} executable already exists and is executable: {executable}")
        else:
             # This case should not be reached if clone check works
             print(f"❌ Cannot build {name}, directory does not exist: {tool_dir}")
             all_successful = False

    if not all_successful:
         print("\n⚠️ One or more tools failed to build. Please review the logs above.")

else:
    print("❌ Cannot install source tools: Utils or Working directory path not set in environment.")

### Install PLINK2 Binary

In [None]:
# Install PLINK2 binary
utils_dir = os.getenv('PROJECT_UTILS_DIR')

if utils_dir:
    print("--- Setting up PLINK2 ---")
    plink2_zip_url = "https://s3.amazonaws.com/plink2-assets/alpha6/plink2_linux_x86_64_20241206.zip"
    plink2_zip_file = Path(utils_dir) / "plink2_linux_x86_64_20241206.zip"
    plink2_binary = Path(utils_dir) / "plink2"

    if not plink2_binary.exists():
        print(f"⬇️ Downloading PLINK2 zip from {plink2_zip_url}...")
        !wget --progress=dot:giga "{plink2_zip_url}" -O "{plink2_zip_file}"

        if plink2_zip_file.exists():
            print("📦 Unzipping PLINK2...")
            # Use -o to overwrite if necessary, ensure it extracts directly to utils_dir
            !unzip -o "{plink2_zip_file}" plink2 -d "{utils_dir}"

            # Check if unzip created the binary directly
            if plink2_binary.exists():
                 print("✅ PLINK2 unzipped.")
                 !chmod +x "{plink2_binary}"
                 !rm "{plink2_zip_file}" # Clean up zip
            else:
                 print(f"❌ Failed to find plink2 binary directly in {utils_dir} after unzipping.")
                 # Check common issue: zip might contain parent dir; plink2 binary usually top-level in zip
                 # If unzip command above failed, this check won't help much. Check unzip output.
                 print("Please check unzip output and manually place 'plink2' in the utils directory if needed.")
        else:
            print("❌ PLINK2 zip download failed.")
    else:
        print(f"✅ PLINK2 binary already exists: {plink2_binary}")

    # Verify PLINK2
    if plink2_binary.exists() and os.access(plink2_binary, os.X_OK):
        print("\n🔍 Verifying PLINK2 version:")
        !"{plink2_binary}" --version
    else:
        print("❌ PLINK2 installation verification failed.")
else:
    print("❌ Cannot install PLINK2: Utils directory path not set in environment.")

## Copy Initial Data

The project includes some initial data (`class_data`). This step copies it from the repository's default location into the `data` directory specified in your `~/.env` file, if they are different and the source exists. This replicates logic from the `scripts_env/directory_setup.py` script.

In [None]:
import os
import shutil
from pathlib import Path
import logging # Use logging already configured

logging.info("--- Checking and copying initial data ---")

# Get paths from environment (loaded from ~/.env)
working_dir = os.getenv('PROJECT_WORKING_DIR')
user_data_dir = os.getenv('PROJECT_DATA_DIR')

if not working_dir or not user_data_dir:
    logging.error("PROJECT_WORKING_DIR or PROJECT_DATA_DIR not found in environment. Cannot copy data.")
else:
    # Define the potential source location within the cloned repository
    repo_data_source_dir = Path(working_dir) / "data" / "class_data"

    # Define the target location within the user-configured data directory
    target_data_dest_dir = Path(user_data_dir) / "class_data"

    logging.info(f"Source data location (repo): {repo_data_source_dir}")
    logging.info(f"Target data location (user): {target_data_dest_dir}")

    # Check if source exists
    if repo_data_source_dir.is_dir():
        # Use resolved paths for comparison to handle symlinks etc.
        if repo_data_source_dir.resolve() != target_data_dest_dir.resolve():
            logging.info(f"Source and target differ. Preparing to copy data...")
            try:
                # Ensure parent of target exists
                target_data_dest_dir.parent.mkdir(parents=True, exist_ok=True)
                logging.info(f"Copying tree from {repo_data_source_dir} to {target_data_dest_dir}")
                # Copy tree, allowing overwrite of existing target directory content
                shutil.copytree(repo_data_source_dir, target_data_dest_dir, dirs_exist_ok=True)
                logging.info(f"✅ Data successfully copied.")
            except Exception as e:
                logging.error(f"Error copying data: {e}")
        else:
            logging.info("Source and target data directories are the same. Skipping copy.")
    else:
        logging.warning(f"Source data directory {repo_data_source_dir} not found. Cannot copy.")

# Exporting a Jupyter Notebook to PDF with Poetry

This guide explains how to **export a Jupyter Notebook (`.ipynb`) to a PDF** using Poetry’s virtual environment. This process requires a LaTeX installation (`texlive` packages were installed in the initial setup).

## **Running the Conversion in the Terminal (Recommended)**
Running the conversion from your Ubuntu terminal is the most reliable method.

1.  **Navigate to your project's base directory:**
    ```bash
    # If you cloned to your home directory:
    cd ~/computational_genetic_genealogy
    # Or navigate to the correct PROJECT_BASE_DIR you used during setup
    ```

2.  **Run the conversion command:**
    Use `poetry run` to execute the `jupyter nbconvert` command within the project's Python environment.

    *   **Example (Save PDF in the same directory as the notebook):**
        To convert this notebook (`Lab0_Code_Environment.ipynb`), which is located in the `instructions/` subdirectory:
        ```bash
        poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb
        ```
        *(The PDF will appear inside the `instructions/` folder.)*

    *   **Example (Save PDF to a specific directory, e.g., `results`):**
        Use the `--output-dir` option. You need to provide the *actual path* to your results directory.
        ```bash
        # Replace '/path/to/your/computational_genetic_genealogy/results' with the real path
        # This path should match the PROJECT_RESULTS_DIR value in your ~/.env file
        poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb --output-dir='/path/to/your/computational_genetic_genealogy/results'

        # Example if your project is in the home directory:
        # poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb --output-dir="$HOME/computational_genetic_genealogy/results"
        ```

---

## **Running Conversion Inside This Jupyter Notebook (Alternative)**
You can *attempt* to run the conversion from a code cell below, but be aware of potential issues:
*   **Rendering:** Complex outputs or interactive elements might not render correctly in the PDF.
*   **Environment:** Accessing the correct paths (like the results directory) requires using Python variables (e.g., `os.getenv`).

The terminal method is generally preferred for stability.

### Convert this Notebook to PDF (Optional)

Run the following cell here in Jupyter Notebook to attempt the conversion. The PDF will be saved in the same directory as this notebook (`instructions/`). To access it from outside WSL (e.g., Windows File Explorer), navigate to `\\\\wsl$\\Ubuntu-24.04` (or your WSL distribution name) and then browse to the full path of the `instructions` directory within your project.

In [None]:
# Attempt to convert this notebook to PDF using the Poetry environment
# Note: The notebook's CWD should be the project's base directory now.
print(f"Current directory for conversion: {os.getcwd()}")
notebook_path = "instructions/Lab0_Code_Environment.ipynb"

if Path(notebook_path).exists():
    print(f"Converting {notebook_path} to PDF...")
    !poetry run jupyter nbconvert --to pdf "{notebook_path}"
else:
    print(f"Error: Notebook not found at relative path {notebook_path} from CWD {os.getcwd()}")

# Example: Convert and save to results directory using Python variable
# results_dir = os.getenv('PROJECT_RESULTS_DIR')
# if results_dir and Path(notebook_path).exists():
#    print(f"\nConverting {notebook_path} and saving PDF to {results_dir}...")
#    !poetry run jupyter nbconvert --to pdf "{notebook_path}" --output-dir="{results_dir}"
# else:
#    print("Could not save to results directory (path not set or notebook not found).")

# **🚀 Start Your Labs in the Fully Configured Environment!**

---

## ✅ **Setup Completed! Your Environment is Ready.**
Your system has been successfully configured based on the project's Docker environment specification:

- 🖥️ **OS:** Ubuntu 24.04
- 🐍 **Python:** 3.12 managed by Poetry (`~/.local/bin/poetry`) in `./.venv`
- ☕ **Java:** OpenJDK 21 (`$JAVA_HOME` configured)
- 🧬 **Genomics Tools:**
  - Samtools/BCFtools/Tabix v1.18 (Built from source)
  - Beagle, HapIBD, RefinedIBD (JARs in `utils`)
  - BonsaiTree (Installed via pip in Poetry env, source in `utils`)
  - LiftOver (Binary in `utils`, Chain file in `references`)
  - IBIS, Ped-Sim, RFMix2 (Built from source in `utils`)
  - PLINK2 (Binary in `utils`)
- 📁 **Directories:** Project structure created (`data`, `results`, `references`, `utils` within `PROJECT_BASE_DIR`).
- ⚙️ **Configuration:** Environment variables loaded from `~/.env`.
- ➕ **PATH:** Includes `~/.local/bin` and `${PROJECT_BASE_DIR}/utils`.
- 🚀 **Notebook:** Ready to run analyses with the correct interpreter selected.

---

✅ **Your local Ubuntu environment is fully set up. Get started on your next lab now!** 🚀

# Lab 0: Code Environment

## Introduction to Code Environments

### What is a Code Environment?

A **code environment** is the structured setup in which software applications run, including dependencies, libraries, and configurations necessary for execution. Code environments ensure that software behaves consistently across different machines and operating systems, preventing issues related to dependency conflicts and system inconsistencies.

### Why Do Code Environments Matter?

- **Reproducibility** – Ensures that code runs the same way across different systems, facilitating collaboration and research reproducibility.
- **Dependency Management** – Prevents conflicts between different software packages by isolating dependencies.
- **System Stability** – Protects the main operating system from unnecessary installations and modifications.
- **Scalability** – Makes it easier to scale applications across multiple machines, cloud environments, or containerized deployments.

## Virtual Environments with Poetry

Our class is using **Poetry**, a modern dependency management tool for Python that simplifies package installation, versioning, and virtual environment creation. Poetry offers an elegant solution by combining dependency management and virtual environment creation into a single workflow.

### Why Use Poetry?

- **Automated Virtual Environments** – Poetry automatically creates and manages virtual environments for projects.
- **Simplified Dependency Management** – Uses a `pyproject.toml` file instead of a `requirements.txt`, making package tracking more structured.
- **Reproducibility** – The `poetry.lock` file ensures that everyone working on the project installs the exact same package versions.
- **Seamless Package Publishing** – Poetry simplifies the process of building and publishing Python packages.

### Key Poetry Commands

| Command | Description |
|---------|-------------|
| `poetry new my_project` | Creates a new Poetry project with a `pyproject.toml` file |
| `poetry install` | Installs dependencies and sets up the virtual environment |
| `poetry add <package>` | Adds a new package to the project |
| `poetry remove <package>` | Removes a package from the project |
| `poetry shell` | Activates the project's virtual environment |
| `poetry run <command>` | Runs a command inside the virtual environment |
| `poetry lock` | Locks the dependencies to exact versions for consistency |

## Docker: Containerized Code Environments

While Poetry helps manage dependencies within Python projects, **Docker** provides an alternative approach by encapsulating an entire system environment, including the OS, into a container. Unlike virtual environments, which only manage dependencies at the application level, Docker offers a complete solution for deploying applications across different systems.

### Key Features of Docker:
- **Portability** – Containers run identically on any system with Docker installed.
- **Isolation** – Each container runs independently, preventing dependency conflicts.
- **Scalability** – Facilitates cloud-based and microservices architectures.

### When to Use Poetry vs. Docker:

| Feature            | Poetry (Virtual Environment) | Docker (Containerization) |
|--------------------|---------------------------|--------------------------| 
| **Scope**         | Manages Python dependencies within a project | Encapsulates the entire OS and software stack |
| **Reproducibility** | Ensures consistent package versions | Provides full OS-level consistency |
| **Portability**   | Works across Python projects on the same system | Runs across different machines and cloud platforms |
| **Resource Usage** | Lightweight | Slightly heavier due to system overhead |
| **Best Use Case** | Managing dependencies for Python projects | Deploying applications in diverse environments |

## Upgrading WSL2 Ubuntu to 24.04

If you are using WSL2 with an older version of Ubuntu, you should upgrade to Ubuntu 24.04 for compatibility with this course environment.

### Check Your Current Ubuntu Version

First, check your current Ubuntu version:

```bash
lsb_release -a
```

If you're not running Ubuntu 24.04, follow these steps to upgrade:

### Upgrading from Ubuntu 22.04 to 24.04

1. **Update your package lists and upgrade installed packages**:
   ```bash
   sudo apt update && sudo apt upgrade -y
   ```

2. **Install the update-manager-core package**:
   ```bash
   sudo apt install update-manager-core -y
   ```

3. **Run the release upgrade**:
   ```bash
   sudo do-release-upgrade -d
   ```

4. **Follow the prompts** during the upgrade process. The upgrade may take some time and will ask you questions along the way.

5. **Restart your WSL instance** after the upgrade completes:
   ```bash
   exec sudo /sbin/reboot
   ```
   Or from Windows PowerShell:
   ```powershell
   wsl --shutdown
   ```
   Then restart Ubuntu from your Start menu.

6. **Verify your new Ubuntu version**:
   ```bash
   lsb_release -a
   ```

### Alternative: Install Fresh Ubuntu 24.04 in WSL

If upgrading causes issues, you can install a fresh copy of Ubuntu 24.04:

1. **From Windows PowerShell (as administrator)**:
   ```powershell
   # List your WSL distributions
   wsl --list
   
   # Optionally backup your data first
   # Copy your important files to Windows
   
   # Unregister your current Ubuntu (this will remove it)
   wsl --unregister Ubuntu
   
   # Install Ubuntu 24.04
   wsl --install -d Ubuntu-24.04
   ```

2. **Set up the new Ubuntu** by following the prompts to create a username and password.

After upgrading or installing Ubuntu 24.04, continue with the setup instructions below.

## Docker Environment: Ubuntu 24.04

In our Docker setup, we are using **Ubuntu 24.04 (Noble Numbat)** as the base environment. This ensures consistency across different systems and provides a stable, long-term support (LTS) release with security updates and package support until **April 2029**. 

### Why Use Ubuntu 24.04?

- **Long-Term Support (LTS)** – Ubuntu 24.04 is an LTS release, ensuring reliability and security updates for an extended period.
- **Stability and Compatibility** – It is widely used in cloud computing, machine learning, and development environments, making it an ideal choice for reproducible research and software deployment.
- **Lightweight and Efficient** – The minimal Ubuntu image is optimized for running applications in containers without unnecessary overhead.
- **Extensive Package Support** – Ubuntu provides access to a vast software ecosystem, ensuring compatibility with necessary tools and dependencies.

By using **Ubuntu 24.04** within Docker, we establish a controlled environment that minimizes discrepancies between development, testing, and production systems, ensuring consistency and reproducibility in our work.

## Ensuring You Are Using the Latest Docker Image

To maintain consistency and take advantage of the most up-to-date dependencies and security patches, it is important to ensure you are running the latest version of the **Ubuntu 24.04**-based Docker image. This prevents issues caused by outdated packages and ensures alignment with the current development environment.

### Pulling the Latest Image

Before running a container, always pull the latest version of the image by executing:

```
docker pull lakishadavid/cgg_image:latest
```

This command fetches the most recent version of the **cgg_image**, ensuring you are using the most up-to-date environment.

### Running the Docker Container

Once you have pulled the latest image, start a container interactively with:

```
docker run -it lakishadavid/cgg_image:latest bash
```

This command:
- **Runs** a new container from the latest image.
- **Opens an interactive terminal (`-it`)** to allow direct interaction with the container.
- **Launches a Bash shell** so you can execute commands within the container.

By following these steps, you ensure that your development environment is always using the most recent and properly configured version of the image.

## Exiting the Docker Container

Once you have finished working inside the Docker container, you will need to exit properly. There are multiple ways to leave the container depending on whether you want to stop it entirely or keep it running in the background.

### Exit and Stop the Container

The most common way to exit a Docker container is by using the `exit` command:

```
exit
```

This will terminate the container and return you to your local terminal.

### When Should You Stop a Container?

Stopping a container is necessary when:
- You **no longer need** the application or environment running.
- You want to **free up system resources** being used by the container.
- You need to **apply updates** or modifications before restarting the container.
- You want to **preserve changes** made inside the container so they are available the next time you start it.

If the container is still running in the background, you can stop it from your terminal using:

```
docker stop <container_id>
```

To ensure a clean development workflow, it is good practice to stop containers when they are no longer needed, rather than letting them consume system resources indefinitely.

## Conclusion

A well-structured code environment is essential for ensuring software **stability**, **reproducibility**, and **efficiency**. In our setup:
- **Poetry** simplifies dependency and virtual environment management, ensuring consistency across Python projects.
- **Docker** provides an isolated, reproducible system environment, making it ideal for deploying applications across different machines.

Understanding when to use each tool helps streamline development workflows. **Poetry** is best suited for managing dependencies within a Python project, while **Docker** ensures complete system encapsulation for broader deployment and portability needs. By combining both tools effectively, we create an environment that supports seamless collaboration, minimal dependency conflicts, and efficient software deployment.

## Responsibility for the Code Environment

While I will maintain the **Docker environment**, the focus of this class is on **running and understanding the genomic analysis code itself**. The Docker image provides a controlled and reproducible environment, ensuring that all necessary dependencies are pre-installed and configured correctly. By using Docker, you eliminate potential compatibility issues and can focus on the **analysis and interpretation of genomic data**.

### Choosing How to Maintain Your Code Environment

Students have two options for managing their code environment:

1. **Use the Provided Docker Image**  
   - I will maintain and update the Docker image to ensure compatibility and reproducibility.
   - If you encounter any issues while using Docker, I will troubleshoot and resolve the problem.
   - Using the Docker environment ensures that you are working in the exact same setup as me and others using the image.

2. **Maintain Your Own Code Environment**  
   - If you choose **not to use Docker**, you are responsible for setting up and maintaining your own code environment.
   - You must ensure that all dependencies are correctly installed and compatible with the provided code.
   - If issues arise due to your custom environment, I will offer support, but it is ultimately your responsibility to resolve them.

### Required Setup for Non-Docker Users

If you choose to work **outside of Docker**, you must manually install the required dependencies. The following code blocks have already been completed **within the Docker image**, meaning Docker users **do not need to run them**. However, **non-Docker users must run the following setup commands themselves** to ensure their environment is configured correctly.

#### Important Notes:
- The provided code blocks assume you are running **Ubuntu 24.04**.
- If you are using a different system, you must adapt the installation steps accordingly.
- While I can answer general questions about dependencies, my responsibility is to **maintain the code base within the Docker image**. Your responsibility is to effectively use that image—**either by running it directly or by correctly configuring your own system.**

Below are the commands that **non-Docker Ubuntu users must run** to set up their environment properly:

# **Begin**

At this point, it is assumed that you have cloned the repository into your local Ubuntu envrionment, navigated to the course directory, opened VS Code, and opened Lab0_Code_Envronment on your local machine (e.g., your computer). If you haven't make sure to do that first. Then continue here with setting up your code environment.

## Install Python packages and system dependencies

Enter the next set of commands in your Ubuntu terminal window. Update where your system looks for system packages.

In [None]:
%%bash

# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

sudo apt update -y && sudo apt upgrade -y
sudo apt-add-repository -y universe && sudo apt-add-repository -y multiverse && sudo apt-add-repository -y ppa:deadsnakes/ppa && sudo apt update -y

Enter the next set of commands in your Ubuntu terminal window. This will install `pipx`. Your output should be something like `1.4.3`, indicating the `pipx` version number.

In [None]:
%%bash

# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

sudo apt update -y && sudo apt install pipx -y && pipx ensurepath && exec "$SHELL" && pipx --version

Enter the next set of commands in your Ubuntu terminal window. Use `pipx` to install `poetry`. The output should let you know what `poetry` version was installed, but you can check by running `poetry --version`. Then configure `poetry`.

In [None]:
%%bash

# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

sudo apt install -y python3.12 && pipx install poetry && poetry config virtualenvs.in-project true

Enter the next set of commands in your Ubuntu terminal window. Install Python 3.12.

In [None]:
%%bash

# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

sudo apt install -y python3.12

Enter the next set of commands in your Ubuntu terminal window. Navigate to project directory `computational_genetic_genealogy` and install Python dependencies. If you used the `git clone` command in your `HOME` directory, then the navigtion command is `cd ~/computational_genetic_genealogy`. Once you're in the project directory, run the `poetry install --no-root` command.

In [None]:
%%bash

# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

cd ~/computational_genetic_genealogy
poetry install --no-root

You may get a notice indicating that VS Code noticed a new virtual environment. You can cancel (or select `yes`). We will handle virtual envrionments a little later.

## Configure sudo for Notebook

Enter the next set of commands in your Ubuntu terminal window. The next set of commands enables you to run certain commands without needing to enter your password.

In [None]:
%%bash

# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

# Add rule for passwordless sudo
echo "$(whoami) ALL=(ALL) NOPASSWD: /usr/bin/apt-add-repository, /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg, /usr/bin/apt, /bin/rm" | sudo tee -a /etc/sudoers.d/$(whoami)

# Verify and set permissions
sudo cat /etc/sudoers.d/$(whoami) && sudo chmod 0440 /etc/sudoers.d/$(whoami)

# Apply changes
exec sudo su - $USER

✅ **You can now run system updates in Jupyter Notebook without entering a password!** 🚀

## Set Up Project Directories

Enter the next set of commands in your Ubuntu terminal window. Run the directory setup script to create necessary project folders: Navigate to project directory `computational_genetic_genealogy` and install Python dependencies. If you used the `git clone` command in your `HOME` directory, then the navigtion command is `cd ~/computational_genetic_genealogy`. Once you're in the project directory, run the `poetry...` command.

In [None]:
%%bash

# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

cd ~/computational_genetic_genealogy
poetry run python scripts_env/directory_setup.py

## Selecting Python Interpreter in VS Code

Your Jupyter Notebook needs to know not only which Python to use, but what set of Python packages to use. Selecting the one described below does that by selecting the Python within the virtual envrionment you created with the `poetry install --no-root` command.

1. This first time, go to the VS Code menu and select `View` > `Command Palette` > `Python: Select Interpreter`.
2. Select `Enter interpreter path...`
3. Select `Find...`
4. Note it suggests the `computational_genetic_genealogy/instructions/` directory. Select the `..` at the top of the list to go up one level
5. Select `.venv`, `bin`, and `python`. NOTE: Select plain `python`, not one of the versions.
6. Select the button `Select Interpreter`.

After this, each time you open this Jupyter Notebook, VS Codes remembers which virtual environment to use.

## ✅ You can now run cells within the Jupyter Notebook. 🚀

## Install Required System Packages

In [None]:
# Update system packages
!sudo apt update -y
!sudo apt upgrade -y

# Install system
!sudo apt install -y --no-install-recommends \
    build-essential \
    g++ \
    gcc \
    make \
    python3.12 \
    python3.12-dev \
    python3.12-venv \
    python3-pip \
    graphviz \
    libfreetype-dev \
    pkg-config \
    libpng-dev \
    zlib1g-dev \
    libbz2-dev \
    libharfbuzz-dev \
    libcurl4-openssl-dev \
    libssl-dev \
    libxml2-dev \
    wget \
    curl \
    git \
    unzip \
    default-jre \
    gawk \
    libboost-all-dev \
    texlive-xetex \
    texlive-fonts-recommended \
    texlive-plain-generic \
    pandoc

!sudo apt-get clean
!sudo rm -rf /var/lib/apt/lists/*

### Get directory variables

Now that you ran `directory_setup.py`, you should see your .env file in your file explorer. Let's make sure the notebook can see the file. Run the following code.

In [None]:
import os
from collections import Counter
import logging
import sys
from pathlib import Path
from dotenv import load_dotenv

In [None]:
def find_comp_gen_dir():
    """Find the computational_genetic_genealogy directory by searching up from current directory."""
    current = Path.cwd()
    
    # Search up through parent directories
    while current != current.parent:
        # Check if target directory exists in current path
        target = current / 'computational_genetic_genealogy'
        if target.is_dir():
            return target
        # Move up one directory
        current = current.parent
    
    raise FileNotFoundError("Could not find computational_genetic_genealogy directory")

def load_env_file():
    """Find and load the .env file from the computational_genetic_genealogy directory."""
    try:
        # Find the computational_genetic_genealogy directory
        comp_gen_dir = find_comp_gen_dir()
        
        # Look for .env file
        env_path = comp_gen_dir / '.env'
        if not env_path.exists():
            print(f"Warning: No .env file found in {comp_gen_dir}")
            return None
        
        # Load the .env file
        load_dotenv(env_path, override=True)
        print(f"Loaded environment variables from: {env_path}")
        return env_path
        
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return None

# Use the function
env_path = load_env_file()

working_directory = os.getenv('PROJECT_WORKING_DIR', default=None)
data_directory = os.getenv('PROJECT_DATA_DIR', default=None)
references_directory = os.getenv('PROJECT_REFERENCES_DIR', default=None)
results_directory = os.getenv('PROJECT_RESULTS_DIR', default=None)
utils_directory = os.getenv('PROJECT_UTILS_DIR', default=None)

os.environ["WORKING_DIRECTORY"] = working_directory
os.environ["DATA_DIRECTORY"] = data_directory
os.environ["REFERENCES_DIRECTORY"] = references_directory
os.environ["RESULTS_DIRECTORY"] = results_directory
os.environ["UTILS_DIRECTORY"] = utils_directory

print(f"Working Directory: {working_directory}")
print(f"Data Directory: {data_directory}")
print(f"References Directory: {references_directory}")
print(f"Results Directory: {results_directory}")
print(f"Utils Directory: {utils_directory}")

os.chdir(working_directory)
print(f"The current directory is {os.getcwd()}")

The path above should point to your `.env` file in your `compuational_genetic_genealogy directory`. Please notify the instructor by email with a PDF of the Jupyter Notebook (with output) if this is not the case. The next cell should read the values of your .env for use in the code.

### Install Utilities

Install R

In [None]:
!sudo apt update -y
!sudo apt install -y r-base libtirpc-dev libcurl4-openssl-dev libxml2-dev libssl-dev

In [None]:
!R --version

In [None]:
%%bash

mkdir -p ~/R/library/
grep -qxF '.libPaths(c("~/R/library/", .libPaths()))' ~/.Rprofile || echo '.libPaths(c("~/R/library/", .libPaths()))' >> ~/.Rprofile
cat ~/.Rprofile
# You should see: .libPaths(c("~/R/library/", .libPaths()))

Install liftover

In [None]:
%%bash

# Download the liftOver binary for Linux x86_64 and save it in utils_directory
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver -O "${UTILS_DIRECTORY}/liftOver"

# Make the binary executable
chmod +x "${UTILS_DIRECTORY}/liftOver"

# Download the hg19ToHg38 chain file into the references directory
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz -P "${REFERENCES_DIRECTORY}"

Install bcftools, samtools, and tabix.

In [None]:
!sudo apt update -y
!sudo apt-get install -y --no-install-recommends \
    libbz2-dev \
    liblzma-dev \
    zlib1g-dev \
    libgsl-dev \
    libcurl4-openssl-dev

!sudo apt-get install -y bcftools samtools tabix
!echo 'export BCFTOOLS_PLUGINS=/usr/lib/x86_64-linux-gnu/bcftools' >> ~/.bashrc

In [None]:
!bcftools --version
!samtools --version
!tabix --version

Install Java

In [None]:
%%bash

if command -v java &> /dev/null; then
    echo "Java is already installed. Version: $(java -version 2>&1 | head -n 1)"
    exit 0
fi

sudo apt-get install -y default-jdk

# Verify installation
if command -v java &> /dev/null; then
    echo "Java installation successful. Version: $(java -version 2>&1 | head -n 1)"
else
    echo "Java installation failed. Please check the log at $LOGFILE for details."
    exit 1
fi

# Verify Java Home
JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:/bin/java::")
if [ -n "$JAVA_HOME" ]; then
    echo "JAVA_HOME detected: $JAVA_HOME"
    if ! grep -q "export JAVA_HOME=$JAVA_HOME" "$HOME/.bashrc"; then
        echo "Adding JAVA_HOME to .bashrc..."
        echo "export JAVA_HOME=$JAVA_HOME" >> "$HOME/.bashrc"
        echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> "$HOME/.bashrc"
        source "$HOME/.bashrc"
    else
        echo "JAVA_HOME already set in .bashrc."
    fi
else
    echo "Failed to detect JAVA_HOME. Please set it manually if required."
fi

Install beagle, bref, and unbref

In [None]:
%%bash

BEAGLE_VERSION="17Dec24.224"
BEAGLE_JAR="beagle.${BEAGLE_VERSION}.jar"
BREF3_JAR="bref3.${BEAGLE_VERSION}.jar"
UNBREF3_JAR="unbref3.${BEAGLE_VERSION}.jar"
BEAGLE_URL="https://faculty.washington.edu/browning/beagle/${BEAGLE_JAR}"
BREF3_URL="https://faculty.washington.edu/browning/beagle/${BREF3_JAR}"
UNBREF3_URL="https://faculty.washington.edu/browning/beagle/${UNBREF3_JAR}"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="${UTILS_DIRECTORY}/$1"
    local file_url="$2"

    if [ -f "${file_path}" ]; then
        echo "✅ File already exists: ${file_path}. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "${UTILS_DIRECTORY}" "${file_url}"
    fi
}

# Check and download each required file
download_if_missing "$UNBREF3_JAR" "$UNBREF3_URL"
download_if_missing "$BREF3_JAR" "$BREF3_URL"
download_if_missing "$BEAGLE_JAR" "$BEAGLE_URL"

# Test Beagle installation
echo "🔍 Testing Beagle installation..."
java -jar "${UTILS_DIRECTORY}/$BEAGLE_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "❌ Beagle test run failed."
    exit 1
else
    echo "✅ Beagle installed successfully."
fi

Install bonsaitree

In [None]:
# Install BonsaiTree
utils_dir = os.getenv('PROJECT_UTILS_DIR')
project_dir = os.getenv('PROJECT_WORKING_DIR')

if utils_dir and project_dir:
    bonsai_dir = Path(utils_dir) / "bonsaitree"
    bonsai_repo = "https://github.com/23andMe/bonsaitree.git"
    original_dir = Path.cwd()

    print("--- Setting up BonsaiTree ---")
    if not bonsai_dir.is_dir():
        print(f"⬇️ Cloning BonsaiTree into {bonsai_dir}...")
        !git clone "{bonsai_repo}" "{bonsai_dir}"
    else:
        print(f"✅ BonsaiTree directory already exists: {bonsai_dir}")

    if bonsai_dir.is_dir():
        print(f"🛠️ Installing BonsaiTree package in {bonsai_dir}...")
        try:
            os.chdir(bonsai_dir)
            print(f"Changed CWD to: {Path.cwd()}")

            # Ensure poetry is aware of the correct python interpreter
            main_venv_path = Path(project_dir) / ".venv" / "bin" / "python"
            if main_venv_path.exists():
                 print(f"Using Python interpreter: {main_venv_path}")
                 !poetry env use "{main_venv_path}"
            else:
                 print("Main project venv python not found, falling back to python3.12")
                 !poetry env use python3.12

            # Install the package using pip install . within the poetry env
            # This handles setup/build/install dependencies from setup.py
            print("Running 'poetry run pip install .'...")
            !poetry run pip install .

            # Verify installation
            print("Verifying BonsaiTree import...")
            verify_code = "import sys; sys.path.append('.'); " # May not be needed now it's installed
            verify_code += "from bonsaitree.v3.bonsai import build_pedigree; "
            verify_code += "print('✅ BonsaiTree module imported successfully.')"
            !poetry run python -c "{verify_code}"

        except Exception as e:
            print(f"❌ An error occurred during BonsaiTree setup: {e}")
        finally:
            os.chdir(original_dir)
            print(f"Returned CWD to: {Path.cwd()}")
    else:
        print(f"❌ BonsaiTree directory cloning appears to have failed: {bonsai_dir}")
else:
    print("❌ Cannot install BonsaiTree: Utils or Working directory path not set in environment.")

Install Hap-IBD

In [None]:
%%bash

# Define variables
HAP_IBD_URL="https://faculty.washington.edu/browning/hap-ibd.jar"
HAP_IBD_JAR="hap-ibd.jar"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="${UTILS_DIRECTORY}/$1"
    local file_url="$2"

    if [ -f "$file_path" ]; then
        echo "✅ File already exists: $file_path. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "${UTILS_DIRECTORY}" "$file_url"
    fi
}

# Check and download each required file
download_if_missing "$HAP_IBD_JAR" "$HAP_IBD_URL"

# Test installation
echo "Testing installation..."
java -jar "${UTILS_DIRECTORY}/$HAP_IBD_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "Test run failed."
    exit 1
else
    echo "Installed successfully."
fi

Install Refined IBD

In [None]:
%%bash

# Define variables
Refined_IBD_URL="https://faculty.washington.edu/browning/refined-ibd/refined-ibd.17Jan20.102.jar"
Refined_IBD_JAR="refined-ibd.17Jan20.102.jar"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="${UTILS_DIRECTORY}/$1"
    local file_url="$2"

    if [ -f "$file_path" ]; then
        echo "✅ File already exists: $file_path. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "${UTILS_DIRECTORY}" "$file_url"
    fi
}

# Check and download each required file
download_if_missing "$Refined_IBD_JAR" "$Refined_IBD_URL"

# Test installation
echo "Testing installation..."
java -jar "${UTILS_DIRECTORY}/$Refined_IBD_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "Test run failed."
    exit 1
else
    echo "Installed successfully."
fi

Install Merge IBD Segments

In [None]:
%%bash

# Define variables
Refined_IBD_Merge_URL="https://faculty.washington.edu/browning/refined-ibd/merge-ibd-segments.17Jan20.102.jar"
Refined_IBD_Merge_JAR="merge-ibd-segments.17Jan20.102.jar"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="${UTILS_DIRECTORY}/$1"
    local file_url="$2"

    if [ -f "$file_path" ]; then
        echo "✅ File already exists: $file_path. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "${UTILS_DIRECTORY}" "$file_url"
    fi
}

# Check and download each required file
download_if_missing "$Refined_IBD_Merge_JAR" "$Refined_IBD_Merge_URL"

# Test installation
echo "Testing installation..."
java -jar "${UTILS_DIRECTORY}/$Refined_IBD_Merge_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "Test run failed."
    exit 1
else
    echo "Installed successfully."
fi

Install IBIS

In [None]:
%%bash

IBIS_REPO="https://github.com/williamslab/ibis.git"
IBIS_DIR="${UTILS_DIRECTORY}/ibis"

# Handle existing IBIS directory
if [ -d "$IBIS_DIR" ]; then
    echo "📂 IBIS directory already exists at $IBIS_DIR."
    
    # Check if it is a valid Git repo
    if [ -d "$IBIS_DIR/.git" ]; then
        echo "🔄 Updating IBIS repository..."
        cd "$IBIS_DIR" || { echo "❌ Failed to navigate to IBIS directory."; exit 1; }
        git pull origin master
    else
        echo "⚠️ Directory exists but is not a Git repository. Consider removing it manually."
        exit 1  # Stop execution
    fi
else
    # Clone IBIS repository
    echo "⬇️ Cloning IBIS repository..."
    git clone --recurse-submodules "$IBIS_REPO" "$IBIS_DIR" || { echo "❌ Git clone failed."; exit 1; }
    
    # Navigate to IBIS directory and build
    cd "$IBIS_DIR" || { echo "❌ Failed to navigate to $IBIS_DIR."; exit 1; }
    echo "🔨 Building IBIS using make..."
    make || { echo "❌ Build failed."; exit 1; }
fi

# Verify IBIS installation
if [ -x "./ibis" ]; then
    echo "✅ IBIS installed successfully."
else
    echo "❌ IBIS executable not found. Build might have failed."
    exit 1
fi

In [None]:
%%bash

sudo apt-get install -y libboost-all-dev make

PED_SIM_REPO="https://github.com/williamslab/ped-sim.git"
PED_SIM_DIR="${UTILS_DIRECTORY}/ped-sim"

# Handle existing IBIS directory
if [ -d "$PED_SIM_DIR" ]; then
    echo "📂 Ped-Sim directory already exists at $PED_SIM_DIR."
    
    # Check if it is a valid Git repo
    if [ -d "$PED_SIM_DIR/.git" ]; then
        echo "🔄 Updating IBIS repository..."
        cd "$PED_SIM_DIR" || { echo "❌ Failed to navigate to Ped-Sim directory."; exit 1; }
        git pull origin master
    else
        echo "⚠️ Directory exists but is not a Git repository. Consider removing it manually."
        exit 1  # Stop execution
    fi
else
    # Clone Ped-Sim repository
    echo "⬇️ Cloning Ped-Sim repository..."
    git clone --recurse-submodules "$PED_SIM_REPO" "$PED_SIM_DIR" || { echo "❌ Git clone failed."; exit 1; }
    
    # Navigate to Ped-Sim directory and build
    cd "$PED_SIM_DIR" || { echo "❌ Failed to navigate to $PED_SIM_DIR."; exit 1; }
    echo "🔨 Building Ped-Sim using make..."
    make || { echo "❌ Build failed."; exit 1; }
fi

chmod +x ./ped-sim

# Verify Ped-Sim installation
if [ -x "./ped-sim" ]; then
    echo "✅ Ped-Sim installed successfully."
else
    echo "❌ Ped-Sim executable not found. Build might have failed."
    exit 1
fi

In [None]:
%%bash

# Define PLINK2 download URL and file
plink2_file_url="https://s3.amazonaws.com/plink2-assets/alpha6/plink2_linux_x86_64_20241206.zip"
plink2_zip_file="${UTILS_DIRECTORY}/plink2_linux_x86_64_20241206.zip"
plink2_binary="${UTILS_DIRECTORY}/plink2"

# Download and unzip PLINK2 if not already present
if [ ! -f "$plink2_binary" ]; then
    echo
    echo "Downloading PLINK2..."
    echo
    wget --progress=bar:force:noscroll "$plink2_file_url" -P "${UTILS_DIRECTORY}"
    
    # Ensure the file is downloaded before unzipping
    while [ ! -f "$plink2_zip_file" ]; do
        sleep 1
    done

    echo
    echo "Unzipping PLINK2..."
    echo
    unzip "$plink2_zip_file" -d "${UTILS_DIRECTORY}"

    # Remove the zip file after extraction
    rm "$plink2_zip_file"
fi

# Check if the PLINK2 binary was installed correctly
if [ -f "$plink2_binary" ] && [ -x "$plink2_binary" ]; then
    echo "PLINK2 installed successfully."
    "$plink2_binary" --version
else
    echo "Error: PLINK2 installation failed. Binary not found or not executable."
    exit 1
fi

In [None]:
%%bash

# Define RFMix2 installation directory
rfmix2_dir="${UTILS_DIRECTORY}/rfmix2"

# Install required tools (if missing)
for tool in autoconf make gcc; do
    if ! command -v $tool &> /dev/null; then
        echo "$tool not found. Installing..."
        sudo apt-get install -y $tool || {
            echo "Failed to install $tool. Exiting."
            exit 1
        }
    else
        echo "$tool is already installed."
    fi
done

# Clone RFMix2 repository
rfmix_dir="${UTILS_DIRECTORY}/rfmix2"
if [ ! -d "$rfmix_dir" ]; then
    echo "Cloning RFMix2 repository..."
    git clone https://github.com/slowkoni/rfmix.git "$rfmix_dir" || {
        echo "Failed to clone RFMix2 repository. Exiting."
        exit 1
    }

    # Navigate to RFMix2 directory
    if [ -d "$rfmix_dir" ]; then
        cd "$rfmix_dir" || { echo "Failed to enter $rfmix_dir. Exiting."; exit 1; }
    else
        echo "Error: RFMix2 directory not found. Exiting."
        exit 1
    fi

    # Step-by-step generation of configuration files
    echo "Generating build files the long way..."

    # 1. Create aclocal.m4
    echo "Running aclocal..."
    aclocal || {
        echo "Error running aclocal. Exiting."
        exit 1
    }

    # 2. Create config.h.in
    echo "Running autoheader..."
    autoheader || {
        echo "Error running autoheader. Exiting."
        exit 1
    }

    # 3. Create configure script
    echo "Running autoconf..."
    autoconf || {
        echo "Error running autoconf. Exiting."
        exit 1
    }

    # 4. Create Makefile.in
    echo "Running automake with --add-missing..."
    automake --add-missing || {
        echo "Error running automake. Exiting."
        exit 1
    }

    # 5. Configure the build system
    echo "Running ./configure..."
    ./configure || {
        echo "Error running configure. Exiting."
        exit 1
    }

    # 6. Compile the program
    echo "Compiling RFMix2..."
    make || {
        echo "Error running make. Exiting."
        exit 1
    }
    
else
    echo "RFMix2 repository already exists at $rfmix_dir"
fi

# Verify RFMix2 build
if [ -f "$rfmix_dir/rfmix" ]; then
    echo "RFMix2 built successfully and is ready to use."
else
    echo "Error: RFMix2 binary not found. Build failed."
    exit 1
fi

# Exporting a Jupyter Notebook to PDF with Poetry

This guide explains how to **export a Jupyter Notebook (`.ipynb`) to a PDF** using Poetry’s virtual environment.

## **Running the Conversion in the Terminal**
To convert a Jupyter Notebook to PDF, run the following command in the terminal:

```
poetry run jupyter nbconvert --to pdf path/to/notebook.ipynb
```

### **Example:**
If your notebook is named `Lab0_Code_Environment.ipynb` and is stored in the `instructions/` directory, run:

```
poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb
```

### **Saving the PDF to a Specific Path**
By default, the PDF will be saved in the **same directory as the input notebook**. To save the output in a different location, use the `--output-dir` option:

```
poetry run jupyter nbconvert --to pdf path/to/notebook.ipynb --output-dir=path/to/save/
```

**Example:**
To save the PDF in the `results/` directory:

```
poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb --output-dir=results/
```

---

## **What Happens If We Run This Inside a Jupyter Notebook?**
If you attempt to run `!poetry run jupyter nbconvert --to pdf ...` inside a Jupyter Notebook cell, you may encounter issues because **Notebook-specific variables (such as inline plots) might not be preserved.**

### **Workaround**
You can run the conversion within a Jupyter Notebook cell if you don't think that is an issue in the notebook you're using. The commnad to convert the notebook to PDF is (also in cell below)
```
!poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb
```
However, **it is recommended to run this in the terminal instead** for better stability.

---
By following these steps, you can successfully convert Jupyter Notebooks into PDFs while managing dependencies with Poetry.

Run the following cell here in Jupyter Notebook. To get the PDF, go outside of Ubuntu (e.g., Windows), open file explorer, enter `\\wsl$` in the navigation bar, select Ubuntu, then navigate to the file starting with `home` and `username`.

In [None]:
!poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb

# **🚀 Start Your Labs in the Fully Configured Environment!**

---

## ✅ **Setup Completed! Your Environment is Ready.**
Your system has been successfully configured with:

- 📁 **Project directory structure set up**
- 📦 **System packages updated**
- 🛠️ **`~/.local/bin` added to PATH**
- 🔧 **System dependencies installed**
- 🏗 **Poetry installed and configured**
- 🐍 **Project dependencies installed**
- 📚 **Python kernel installed for Jupyter Notebooks**

---

✅ **Your environment is fully set up. Get started on your next lab now!** 🚀
