# Lab 0: Code Environment

# Introduction to Code Environments

## What is a Code Environment?

A **code environment** is the structured setup in which software applications run, including dependencies, libraries, and configurations necessary for execution. Code environments ensure that software behaves consistently across different machines and operating systems, preventing issues related to dependency conflicts and system inconsistencies.

### Why Do Code Environments Matter?

- **Reproducibility** – Ensures that code runs the same way across different systems, facilitating collaboration and research reproducibility.
- **Dependency Management** – Prevents conflicts between different software packages by isolating dependencies.
- **System Stability** – Protects the main operating system from unnecessary installations and modifications.
- **Scalability** – Makes it easier to scale applications across multiple machines, cloud environments, or containerized deployments.

## Virtual Environments with Poetry

Our class is using **Poetry**, a modern dependency management tool for Python that simplifies package installation, versioning, and virtual environment creation. Poetry offers an elegant solution by combining dependency management and virtual environment creation into a single workflow.

### Why Use Poetry?

- **Automated Virtual Environments** – Poetry automatically creates and manages virtual environments for projects.
- **Simplified Dependency Management** – Uses a `pyproject.toml` file instead of a `requirements.txt`, making package tracking more structured.
- **Reproducibility** – The `poetry.lock` file ensures that everyone working on the project installs the exact same package versions.
- **Seamless Package Publishing** – Poetry simplifies the process of building and publishing Python packages.

### Key Poetry Commands

| Command | Description |
|---------|-------------|
| `poetry new my_project` | Creates a new Poetry project with a `pyproject.toml` file |
| `poetry install` | Installs dependencies and sets up the virtual environment |
| `poetry add <package>` | Adds a new package to the project |
| `poetry remove <package>` | Removes a package from the project |
| `poetry shell` | Activates the project's virtual environment |
| `poetry run <command>` | Runs a command inside the virtual environment |
| `poetry lock` | Locks the dependencies to exact versions for consistency |

## Docker: Containerized Code Environments

While Poetry helps manage dependencies within Python projects, **Docker** provides an alternative approach by encapsulating an entire system environment, including the OS, into a container. Unlike virtual environments, which only manage dependencies at the application level, Docker offers a complete solution for deploying applications across different systems.

### Key Features of Docker:
- **Portability** – Containers run identically on any system with Docker installed.
- **Isolation** – Each container runs independently, preventing dependency conflicts.
- **Scalability** – Facilitates cloud-based and microservices architectures.

### When to Use Poetry vs. Docker:

| Feature            | Poetry (Virtual Environment) | Docker (Containerization) |
|--------------------|---------------------------|--------------------------|
| **Scope**         | Manages Python dependencies within a project | Encapsulates the entire OS and software stack |
| **Reproducibility** | Ensures consistent package versions | Provides full OS-level consistency |
| **Portability**   | Works across Python projects on the same system | Runs across different machines and cloud platforms |
| **Resource Usage** | Lightweight | Slightly heavier due to system overhead |
| **Best Use Case** | Managing dependencies for Python projects | Deploying applications in diverse environments |

## Docker Environment: Ubuntu 22.04

In our Docker setup, we are using **Ubuntu 22.04 (Jammy Jellyfish)** as the base environment. This ensures consistency across different systems and provides a stable, long-term support (LTS) release with security updates and package support until **April 2027**. 

### Why Use Ubuntu 22.04?

- **Long-Term Support (LTS)** – Ubuntu 22.04 is an LTS release, ensuring reliability and security updates for an extended period.
- **Stability and Compatibility** – It is widely used in cloud computing, machine learning, and development environments, making it an ideal choice for reproducible research and software deployment.
- **Lightweight and Efficient** – The minimal Ubuntu image is optimized for running applications in containers without unnecessary overhead.
- **Extensive Package Support** – Ubuntu provides access to a vast software ecosystem, ensuring compatibility with necessary tools and dependencies.

By using **Ubuntu 22.04** within Docker, we establish a controlled environment that minimizes discrepancies between development, testing, and production systems, ensuring consistency and reproducibility in our work.

## Ensuring You Are Using the Latest Docker Image

To maintain consistency and take advantage of the most up-to-date dependencies and security patches, it is important to ensure you are running the latest version of the **Ubuntu 22.04**-based Docker image. This prevents issues caused by outdated packages and ensures alignment with the current development environment.

### Pulling the Latest Image

Before running a container, always pull the latest version of the image by executing:

```
docker pull lakishadavid/cgg_image:latest
```

This command fetches the most recent version of the **cgg_image**, ensuring you are using the most up-to-date environment.

### Running the Docker Container

Once you have pulled the latest image, start a container interactively with:

```
docker run -it lakishadavid/cgg_image:latest bash
```

This command:
- **Runs** a new container from the latest image.
- **Opens an interactive terminal (`-it`)** to allow direct interaction with the container.
- **Launches a Bash shell** so you can execute commands within the container.

By following these steps, you ensure that your development environment is always using the most recent and properly configured version of the image.

## Exiting the Docker Container

Once you have finished working inside the Docker container, you will need to exit properly. There are multiple ways to leave the container depending on whether you want to stop it entirely or keep it running in the background.

### Exit and Stop the Container

The most common way to exit a Docker container is by using the `exit` command:

```
exit
```

This will terminate the container and return you to your local terminal.

### When Should You Stop a Container?

Stopping a container is necessary when:
- You **no longer need** the application or environment running.
- You want to **free up system resources** being used by the container.
- You need to **apply updates** or modifications before restarting the container.
- You want to **preserve changes** made inside the container so they are available the next time you start it.

If the container is still running in the background, you can stop it from your terminal using:

```
docker stop <container_id>
```

To ensure a clean development workflow, it is good practice to stop containers when they are no longer needed, rather than letting them consume system resources indefinitely.

## Conclusion

A well-structured code environment is essential for ensuring software **stability**, **reproducibility**, and **efficiency**. In our setup:
- **Poetry** simplifies dependency and virtual environment management, ensuring consistency across Python projects.
- **Docker** provides an isolated, reproducible system environment, making it ideal for deploying applications across different machines.

Understanding when to use each tool helps streamline development workflows. **Poetry** is best suited for managing dependencies within a Python project, while **Docker** ensures complete system encapsulation for broader deployment and portability needs. By combining both tools effectively, we create an environment that supports seamless collaboration, minimal dependency conflicts, and efficient software deployment.

## Responsibility for the Code Environment

While I will maintain the **Docker environment**, the focus of this class is on **running and understanding the genomic analysis code itself**. The Docker image provides a controlled and reproducible environment, ensuring that all necessary dependencies are pre-installed and configured correctly. By using Docker, you eliminate potential compatibility issues and can focus on the **analysis and interpretation of genomic data**.

### Choosing How to Maintain Your Code Environment

Students have two options for managing their code environment:

1. **Use the Provided Docker Image**  
   - I will maintain and update the Docker image to ensure compatibility and reproducibility.
   - If you encounter any issues while using Docker, I will troubleshoot and resolve the problem.
   - Using the Docker environment ensures that you are working in the exact same setup as me and others using the image.

2. **Maintain Your Own Code Environment**  
   - If you choose **not to use Docker**, you are responsible for setting up and maintaining your own code environment.
   - You must ensure that all dependencies are correctly installed and compatible with the provided code.
   - If issues arise due to your custom environment, I will offer support, but it is ultimately your responsibility to resolve them.

### Required Setup for Non-Docker Users

If you choose to work **outside of Docker**, you must manually install the required dependencies. The following code blocks have already been completed **within the Docker image**, meaning Docker users **do not need to run them**. However, **non-Docker users must run the following setup commands themselves** to ensure their environment is configured correctly.

#### Important Notes:
- The provided code blocks assume you are running **Ubuntu 22.04**.
- If you are using a different system, you must adapt the installation steps accordingly.
- While I can answer general questions about dependencies, my responsibility is to **maintain the code base within the Docker image**. Your responsibility is to effectively use that image—**either by running it directly or by correctly configuring your own system.**

Below are the commands that **non-Docker Ubuntu users must run** to set up their environment properly:

### Select the Python Interpreter

Your Jupyter Notebook needs to know not only which Python to use, but what set of Python packages to use. Selecting the one described below does that by selecting the Python within the virtual envrionment you created with the `poetry install --no-root` command.

1. This first time, go to the VS Code menu and select `View` > `Command Palette` > `Python: Select Interpreter`.
2. Select `Enter interpreter path...`
3. Select `Find...`
4. Note it suggests the `computational_genetic_genealogy/instructions/` directory. Select the `..` at the top of the list to go up one level
5. Select `.venv`, `bin`, and `python`.
6. Select the button `Select Interpreter`.

After this, each time you open this Jupyter Notebook, VS Codes remembers which virtual environment to use.

# **Begin**

At this point, it is assumed that you have cloned the repository into your local Ubuntu envrionment, navigated to the course directory, opened VS Code, and opened Lab0_Code_Envronment on your local machine (e.g., your computer). If you haven't make sure to do that first. Then continue here with setting up your code environment.

# Enable Passwordless sudo apt-get for Jupyter Notebook

To run `sudo apt-get` commands in Jupyter Notebook **without being prompted for a password**, follow these steps. (See cell block below to copy and paste code into terminal window.)

---

## **Step 1: Find Your Username**
Before proceeding, you need to determine your Linux username. To do this, run the following in your terminal window:

```
whoami
```

This will return your username. **Example Output:**
```
failingbird
```

In this example, the username is `failingbird`. The user would see `$HOME` defined as `/home/failingbird`.

---

## **Step 2: Add the Rule**
Copy and paste the following command into your terminal:

```
echo "$(whoami) ALL=(ALL) NOPASSWD: /usr/bin/apt-get, /usr/bin/dpkg, /usr/bin/apt, /bin/rm" | sudo tee -a /etc/sudoers.d/$(whoami)
```

This command adds a rule to allow **your user** to execute `sudo apt-get` and `sudo rm` commands without entering a password.

---

## **Step 3: Verify the Change**
Run the following command to confirm the rule was added successfully:

```
sudo cat /etc/sudoers.d/$(whoami)
```

If the output includes:

```
failingbird ALL=(ALL) NOPASSWD: /usr/bin/apt-get, /usr/bin/dpkg, /usr/bin/apt, /bin/rm
```

then the setup is correct.

---

## **Step 4: (Optional) Remove the Rule**
If you ever need to **undo** this change and restore the default behavior (where a password is required for `sudo apt-get` and `sudo rm`), run:

```
sudo rm /etc/sudoers.d/$(whoami)
```

### **Example for Username `failingbird`**
```
sudo rm /etc/sudoers.d/failingbird
```

This will remove the rule and require a password for future `sudo apt-get` and `sudo rm` commands.

---


## **Run the commands in the next cell in your Ubuntu terminal window.**

In [None]:
# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

# whoami
# echo "$(whoami) ALL=(ALL) NOPASSWD: /usr/bin/apt-add-repository, /usr/bin/apt, /usr/bin/apt-get, /usr/bin/dpkg, /usr/bin/apt, /bin/rm" | sudo tee -a /etc/sudoers.d/$(whoami)
# sudo cat /etc/sudoers.d/$(whoami) && sudo chmod 0440 /etc/sudoers.d/$(whoami)
# exec sudo su - $USER

✅ **You can now run system updates in Jupyter Notebook without entering a password!** 🚀

# Install Python packages and system dependencies

Continue to enter commands in your Ubuntu terminal window:
```
sudo apt install pipx -y
pipx ensurepath
exec "$SHELL"
```

✅ Verify pipx Works.

Enter:
```
pipx --version
```
Your output should be something like `1.4.3`, indicating the `pipx` version number.

In [None]:
# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

# sudo apt update -y && sudo apt install pipx -y && pipx ensurepath && exec "$SHELL" && pipx --version

Now, use `pipx` to install `poetry`.
```
pipx install poetry
```
The output should let you know what `poetry` version was installed, but you can check by running `poetry --version`.

Congure `poetry` by entering:
```
poetry config virtualenvs.in-project true
```

Now, **navigate into the course directory `computational_genetic_genealogy`** using the `cd` command if needed and install the needed Python packages by using the following command:
```
poetry install --no-root
```

In [None]:
# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

# sudo apt update -y && sudo apt upgrade -y
# sudo apt-add-repository -y universe && sudo apt-add-repository -y multiverse && sudo apt-add-repository -y ppa:deadsnakes/ppa && sudo apt update -y
# sudo apt install -y python3.12 && pipx install poetry && poetry config virtualenvs.in-project true

### Install Required System Packages

In [None]:
# Update system packages
!sudo apt update -y
!sudo apt upgrade -y

# Install system
!sudo apt install -y --no-install-recommends \
    build-essential \
    g++ \
    gcc \
    make \
    python3.12 \
    python3.12-dev \
    python3.12-venv \
    python3-pip \
    graphviz \
    libfreetype-dev \
    pkg-config \
    libpng-dev \
    zlib1g-dev \
    libbz2-dev \
    libharfbuzz-dev \
    libcurl4-openssl-dev \
    libssl-dev \
    libxml2-dev \
    wget \
    curl \
    git \
    unzip \
    default-jre \
    gawk \
    libboost-all-dev \
    texlive-xetex \
    texlive-fonts-recommended \
    texlive-plain-generic \
    pandoc

!sudo apt-get clean
!sudo rm -rf /var/lib/apt/lists/*

: 

 && poetry install --no-root

In [None]:

# !poetry env use python3.12
# !poetry install --no-root
# !poetry run python3.12 -m pip install --upgrade pip

This Jupyter Notebook assumes you are running this from *computational_genetic_genealogy/instructions. It also assumes that `$HOME` is defined.

# **Run the Directory Setup Script**

Follow these steps to execute the `directory_setup.py` script. (See cell block below to copy and paste code into terminal window.)

---

## **Step 1: Open Your Terminal Window**
- **Linux/macOS**: Open **Terminal**.

---

## **Step 2: Navigate to the `computational_genetic_genealogy` Directory**
In your terminal, run the following command:

```
cd ~/computational_genetic_genealogy
```

If your project is in a different location, replace `~/computational_genetic_genealogy` with the actual path.

To confirm you are in the correct directory, run:

```
pwd
```

The output should show the full path to `computational_genetic_genealogy`.

---

## **Step 3: Run the Setup Script**
Now, execute the following command:

```
poetry run python scripts_env/directory_setup.py
```

This will:
- **Run the script to set up required directories.**
- **Ensure your environment is properly structured.**

---

✅ **You have successfully executed `directory_setup.py`!** 🚀

In [None]:
# DO NOT RUN IN JUPYTER NOTEBOOK. COPY AND PASTE TO TERMINAL WINDOW.

# cd ~/computational_genetic_genealogy
# poetry run python scripts_env/directory_setup.py

### Get directory variables

Now that you ran `directory_setup.py`, you should see your .env file in your file explorer. Let's make sure the notebook can see the file. Run the following code.

In [None]:
import os
from collections import Counter
import logging
import sys
from pathlib import Path
from dotenv import load_dotenv

In [None]:
def find_comp_gen_dir():
    """Find the computational_genetic_genealogy directory by searching up from current directory."""
    current = Path.cwd()
    
    # Search up through parent directories
    while current != current.parent:
        # Check if target directory exists in current path
        target = current / 'computational_genetic_genealogy'
        if target.is_dir():
            return target
        # Move up one directory
        current = current.parent
    
    raise FileNotFoundError("Could not find computational_genetic_genealogy directory")

def load_env_file():
    """Find and load the .env file from the computational_genetic_genealogy directory."""
    try:
        # Find the computational_genetic_genealogy directory
        comp_gen_dir = find_comp_gen_dir()
        
        # Look for .env file
        env_path = comp_gen_dir / '.env'
        if not env_path.exists():
            print(f"Warning: No .env file found in {comp_gen_dir}")
            return None
        
        # Load the .env file
        load_dotenv(env_path, override=True)
        print(f"Loaded environment variables from: {env_path}")
        return env_path
        
    except FileNotFoundError as e:
        print(f"Error: {e}")
        return None

# Use the function
env_path = load_env_file()

The path above should point to your `.env` file in your `compuational_genetic_genealogy directory`. Please notify the instructor by email with a PDF of the Jupyter Notebook (with output) if this is not the case. The next cell should read the values of your .env for use in the code.

In [None]:
working_directory = os.getenv('PROJECT_WORKING_DIR', default=None)
data_directory = os.getenv('PROJECT_DATA_DIR', default=None)
references_directory = os.getenv('PROJECT_REFERENCES_DIR', default=None)
results_directory = os.getenv('PROJECT_RESULTS_DIR', default=None)
utils_directory = os.getenv('PROJECT_UTILS_DIR', default=None)

print(f"Working Directory: {working_directory}")
print(f"Data Directory: {data_directory}")
print(f"References Directory: {references_directory}")
print(f"Results Directory: {results_directory}")
print(f"Utils Directory: {utils_directory}")

os.chdir(working_directory)
print(f"The current directory is {os.getcwd()}")

### Install Utilities

Install R

In [None]:
!sudo apt update -y
!sudo apt install -y r-base libtirpc-dev

In [None]:
!R --version

In [None]:
%%bash

mkdir -p ~/R/library/
grep -qxF '.libPaths(c("~/R/library/", .libPaths()))' ~/.Rprofile || echo '.libPaths(c("~/R/library/", .libPaths()))' >> ~/.Rprofile
cat ~/.Rprofile
# You should see: .libPaths(c("~/R/library/", .libPaths()))

Install liftover

In [None]:
%%bash -s "$utils_directory" "$references_directory"

utils_directory=$1
references_directory=$2

# Download the liftOver binary for Linux x86_64 and save it in utils_directory
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver -O "$utils_directory/liftOver"

# Make the binary executable
chmod +x "$utils_directory/liftOver"

# Download the hg19ToHg38 chain file into the references directory
wget http://hgdownload.cse.ucsc.edu/goldenPath/hg19/liftOver/hg19ToHg38.over.chain.gz -P "$references_directory"

Install bcftools, samtools, and tabix.

In [None]:
!sudo apt update -y
!sudo apt-get install -y --no-install-recommends \
    libbz2-dev \
    liblzma-dev \
    zlib1g-dev \
    libgsl-dev \
    libcurl4-openssl-dev

!sudo apt-get install -y bcftools samtools tabix
!echo 'export BCFTOOLS_PLUGINS=/usr/lib/x86_64-linux-gnu/bcftools' >> ~/.bashrc

In [None]:
!bcftools --version
!samtools --version
!tabix --version

Install Java

In [None]:
%%bash

if command -v java &> /dev/null; then
    echo "Java is already installed. Version: $(java -version 2>&1 | head -n 1)"
    exit 0
fi

sudo apt-get install -y default-jdk

# Verify installation
if command -v java &> /dev/null; then
    echo "Java installation successful. Version: $(java -version 2>&1 | head -n 1)"
else
    echo "Java installation failed. Please check the log at $LOGFILE for details."
    exit 1
fi

# Verify Java Home
JAVA_HOME=$(readlink -f /usr/bin/java | sed "s:/bin/java::")
if [ -n "$JAVA_HOME" ]; then
    echo "JAVA_HOME detected: $JAVA_HOME"
    if ! grep -q "export JAVA_HOME=$JAVA_HOME" "$HOME/.bashrc"; then
        echo "Adding JAVA_HOME to .bashrc..."
        echo "export JAVA_HOME=$JAVA_HOME" >> "$HOME/.bashrc"
        echo "export PATH=\$JAVA_HOME/bin:\$PATH" >> "$HOME/.bashrc"
        source "$HOME/.bashrc"
    else
        echo "JAVA_HOME already set in .bashrc."
    fi
else
    echo "Failed to detect JAVA_HOME. Please set it manually if required."
fi

Install beagle, bref, and unbref

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1

BEAGLE_VERSION="17Dec24.224"
BEAGLE_JAR="beagle.${BEAGLE_VERSION}.jar"
BREF3_JAR="bref3.${BEAGLE_VERSION}.jar"
UNBREF3_JAR="unbref3.${BEAGLE_VERSION}.jar"
BEAGLE_URL="https://faculty.washington.edu/browning/beagle/${BEAGLE_JAR}"
BREF3_URL="https://faculty.washington.edu/browning/beagle/${BREF3_JAR}"
UNBREF3_URL="https://faculty.washington.edu/browning/beagle/${UNBREF3_JAR}"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="${utils_directory}/$1"
    local file_url="$2"

    if [ -f "${file_path}" ]; then
        echo "✅ File already exists: ${file_path}. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "${utils_directory}" "${file_url}"
    fi
}

# Check and download each required file
download_if_missing "$UNBREF3_JAR" "$UNBREF3_URL"
download_if_missing "$BREF3_JAR" "$BREF3_URL"
download_if_missing "$BEAGLE_JAR" "$BEAGLE_URL"

# Test Beagle installation
echo "🔍 Testing Beagle installation..."
java -jar "${utils_directory}/$BEAGLE_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "❌ Beagle test run failed."
    exit 1
else
    echo "✅ Beagle installed successfully."
fi

Install bonsaitree

In [None]:
%%bash

bash $PROJECT_WORKING_DIR/scripts_env/install_bonsaitree.sh

Install Hap-IBD

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1

# Define variables
HAP_IBD_URL="https://faculty.washington.edu/browning/hap-ibd.jar"
HAP_IBD_JAR="hap-ibd.jar"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="$utils_directory/$1"
    local file_url="$2"

    if [ -f "$file_path" ]; then
        echo "✅ File already exists: $file_path. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "$utils_directory" "$file_url"
    fi
}

# Check and download each required file
download_if_missing "$HAP_IBD_JAR" "$HAP_IBD_URL"

# Test installation
echo "Testing installation..."
java -jar "$utils_directory/$HAP_IBD_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "Test run failed."
    exit 1
else
    echo "Installed successfully."
fi

Install Refined IBD

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1

# Define variables
Refined_IBD_URL="https://faculty.washington.edu/browning/refined-ibd/refined-ibd.17Jan20.102.jar"
Refined_IBD_JAR="refined-ibd.17Jan20.102.jar"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="$utils_directory/$1"
    local file_url="$2"

    if [ -f "$file_path" ]; then
        echo "✅ File already exists: $file_path. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "$utils_directory" "$file_url"
    fi
}

# Check and download each required file
download_if_missing "$Refined_IBD_JAR" "$Refined_IBD_URL"

# Test installation
echo "Testing installation..."
java -jar "$utils_directory/$Refined_IBD_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "Test run failed."
    exit 1
else
    echo "Installed successfully."
fi

Install Merge IBD Segments

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1

# Define variables
Refined_IBD_Merge_URL="https://faculty.washington.edu/browning/refined-ibd/merge-ibd-segments.17Jan20.102.jar"
Refined_IBD_Merge_JAR="merge-ibd-segments.17Jan20.102.jar"

# Function to download a file only if it does not exist
download_if_missing() {
    local file_path="$utils_directory/$1"
    local file_url="$2"

    if [ -f "$file_path" ]; then
        echo "✅ File already exists: $file_path. Skipping download."
    else
        echo "⬇️ Downloading $file_url..."
        wget -P "$utils_directory" "$file_url"
    fi
}

# Check and download each required file
download_if_missing "$Refined_IBD_Merge_JAR" "$Refined_IBD_Merge_URL"

# Test installation
echo "Testing installation..."
java -jar "$utils_directory/$Refined_IBD_Merge_JAR" 2>&1
if [ $? -ne 0 ]; then
    echo "Test run failed."
    exit 1
else
    echo "Installed successfully."
fi

Install IBIS

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1
IBIS_REPO="https://github.com/williamslab/ibis.git"
IBIS_DIR="$utils_directory/ibis"

# Handle existing IBIS directory
if [ -d "$IBIS_DIR" ]; then
    echo "📂 IBIS directory already exists at $IBIS_DIR."
    
    # Check if it is a valid Git repo
    if [ -d "$IBIS_DIR/.git" ]; then
        echo "🔄 Updating IBIS repository..."
        cd "$IBIS_DIR" || { echo "❌ Failed to navigate to IBIS directory."; exit 1; }
        git pull origin master
    else
        echo "⚠️ Directory exists but is not a Git repository. Consider removing it manually."
        exit 1  # Stop execution
    fi
else
    # Clone IBIS repository
    echo "⬇️ Cloning IBIS repository..."
    git clone --recurse-submodules "$IBIS_REPO" "$IBIS_DIR" || { echo "❌ Git clone failed."; exit 1; }
    
    # Navigate to IBIS directory and build
    cd "$IBIS_DIR" || { echo "❌ Failed to navigate to $IBIS_DIR."; exit 1; }
    echo "🔨 Building IBIS using make..."
    make || { echo "❌ Build failed."; exit 1; }
fi

# Verify IBIS installation
if [ -x "./ibis" ]; then
    echo "✅ IBIS installed successfully."
else
    echo "❌ IBIS executable not found. Build might have failed."
    exit 1
fi

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1

sudo apt-get install -y libboost-all-dev make

PED_SIM_REPO="https://github.com/williamslab/ped-sim.git"
PED_SIM_DIR="$utils_directory/ped-sim"

# Handle existing IBIS directory
if [ -d "$PED_SIM_DIR" ]; then
    echo "📂 Ped-Sim directory already exists at $PED_SIM_DIR."
    
    # Check if it is a valid Git repo
    if [ -d "$PED_SIM_DIR/.git" ]; then
        echo "🔄 Updating IBIS repository..."
        cd "$PED_SIM_DIR" || { echo "❌ Failed to navigate to Ped-Sim directory."; exit 1; }
        git pull origin master
    else
        echo "⚠️ Directory exists but is not a Git repository. Consider removing it manually."
        exit 1  # Stop execution
    fi
else
    # Clone Ped-Sim repository
    echo "⬇️ Cloning Ped-Sim repository..."
    git clone --recurse-submodules "$PED_SIM_REPO" "$PED_SIM_DIR" || { echo "❌ Git clone failed."; exit 1; }
    
    # Navigate to Ped-Sim directory and build
    cd "$PED_SIM_DIR" || { echo "❌ Failed to navigate to $PED_SIM_DIR."; exit 1; }
    echo "🔨 Building Ped-Sim using make..."
    make || { echo "❌ Build failed."; exit 1; }
fi

chmod +x ./ped-sim

# Verify Ped-Sim installation
if [ -x "./ped-sim" ]; then
    echo "✅ Ped-Sim installed successfully."
else
    echo "❌ Ped-Sim executable not found. Build might have failed."
    exit 1
fi

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1

# Define PLINK2 download URL and file
plink2_file_url="https://s3.amazonaws.com/plink2-assets/alpha6/plink2_linux_x86_64_20241206.zip"
plink2_zip_file="$utils_directory/plink2_linux_x86_64_20241206.zip"
plink2_binary="$utils_directory/plink2"

# Download and unzip PLINK2 if not already present
if [ ! -f "$plink2_binary" ]; then
    echo
    echo "Downloading PLINK2..."
    echo
    wget --progress=bar:force:noscroll "$plink2_file_url" -P "$utils_directory"
    
    # Ensure the file is downloaded before unzipping
    while [ ! -f "$plink2_zip_file" ]; do
        sleep 1
    done

    echo
    echo "Unzipping PLINK2..."
    echo
    unzip "$plink2_zip_file" -d "$utils_directory"

    # Remove the zip file after extraction
    rm "$plink2_zip_file"
fi

# Check if the PLINK2 binary was installed correctly
if [ -f "$plink2_binary" ] && [ -x "$plink2_binary" ]; then
    echo "PLINK2 installed successfully."
    "$plink2_binary" --version
else
    echo "Error: PLINK2 installation failed. Binary not found or not executable."
    exit 1
fi

In [None]:
%%bash -s "$utils_directory"

utils_directory=$1

# Define RFMix2 installation directory
rfmix2_dir="$utils_directory/rfmix2"

# Install required tools (if missing)
for tool in autoconf make gcc; do
    if ! command -v $tool &> /dev/null; then
        echo "$tool not found. Installing..."
        sudo apt-get install -y $tool || {
            echo "Failed to install $tool. Exiting."
            exit 1
        }
    else
        echo "$tool is already installed."
    fi
done

# Clone RFMix2 repository
rfmix_dir="$utils_directory/rfmix2"
if [ ! -d "$rfmix_dir" ]; then
    echo "Cloning RFMix2 repository..."
    git clone https://github.com/slowkoni/rfmix.git "$rfmix_dir" || {
        echo "Failed to clone RFMix2 repository. Exiting."
        exit 1
    }

    # Navigate to RFMix2 directory
    if [ -d "$rfmix_dir" ]; then
        cd "$rfmix_dir" || { echo "Failed to enter $rfmix_dir. Exiting."; exit 1; }
    else
        echo "Error: RFMix2 directory not found. Exiting."
        exit 1
    fi

    # Step-by-step generation of configuration files
    echo "Generating build files the long way..."

    # 1. Create aclocal.m4
    echo "Running aclocal..."
    aclocal || {
        echo "Error running aclocal. Exiting."
        exit 1
    }

    # 2. Create config.h.in
    echo "Running autoheader..."
    autoheader || {
        echo "Error running autoheader. Exiting."
        exit 1
    }

    # 3. Create configure script
    echo "Running autoconf..."
    autoconf || {
        echo "Error running autoconf. Exiting."
        exit 1
    }

    # 4. Create Makefile.in
    echo "Running automake with --add-missing..."
    automake --add-missing || {
        echo "Error running automake. Exiting."
        exit 1
    }

    # 5. Configure the build system
    echo "Running ./configure..."
    ./configure || {
        echo "Error running configure. Exiting."
        exit 1
    }

    # 6. Compile the program
    echo "Compiling RFMix2..."
    make || {
        echo "Error running make. Exiting."
        exit 1
    }
    
else
    echo "RFMix2 repository already exists at $rfmix_dir"
fi

# Verify RFMix2 build
if [ -f "$rfmix_dir/rfmix" ]; then
    echo "RFMix2 built successfully and is ready to use."
else
    echo "Error: RFMix2 binary not found. Build failed."
    exit 1
fi

# Exporting a Jupyter Notebook to PDF with Poetry

This guide explains how to **export a Jupyter Notebook (`.ipynb`) to a PDF** using Poetry’s virtual environment.

## **Running the Conversion in the Terminal**
To convert a Jupyter Notebook to PDF, run the following command in the terminal:

```
poetry run jupyter nbconvert --to pdf path/to/notebook.ipynb
```

### **Example:**
If your notebook is named `Lab0_Code_Environment.ipynb` and is stored in the `instructions/` directory, run:

```
poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb
```

### **Saving the PDF to a Specific Path**
By default, the PDF will be saved in the **same directory as the input notebook**. To save the output in a different location, use the `--output-dir` option:

```
poetry run jupyter nbconvert --to pdf path/to/notebook.ipynb --output-dir=path/to/save/
```

**Example:**
To save the PDF in the `results/` directory:

```
poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb --output-dir=results/
```

---

## **What Happens If We Run This Inside a Jupyter Notebook?**
If you attempt to run `!poetry run jupyter nbconvert --to pdf ...` inside a Jupyter Notebook cell, you may encounter issues because **Notebook-specific variables (such as inline plots) might not be preserved.**

### **Workaround**
You can run the conversion within a Jupyter Notebook cell if you don't think that is an issue in the notebook you're using. The commnad to convert the notebook to PDF is (also in cell below)
```
!poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb
```
However, **it is recommended to run this in the terminal instead** for better stability.

---
By following these steps, you can successfully convert Jupyter Notebooks into PDFs while managing dependencies with Poetry.

Run the following cell here in Jupyter Notebook. To get the PDF, go outside of Ubuntu (e.g., Windows), open file explorer, enter `\\wsl$` in the navigation bar, select Ubuntu, then navigate to the file starting with `home` and `username`.

In [None]:
!poetry run jupyter nbconvert --to pdf instructions/Lab0_Code_Environment.ipynb

# **🚀 Start Your Labs in the Fully Configured Environment!**

---

## ✅ **Setup Completed! Your Environment is Ready.**
Your system has been successfully configured with:

- 📁 **Project directory structure set up**
- 📦 **System packages updated**
- 🛠️ **`~/.local/bin` added to PATH**
- 🔧 **System dependencies installed**
- 🏗 **Poetry installed and configured**
- 🐍 **Project dependencies installed**
- 📚 **Python kernel installed for Jupyter Notebooks**

---

✅ **Your environment is fully set up. Get started on your next lab now!** 🚀
