# R and rpy2 installation guide

This notebook will guide users to installing R and the python package rpy2 in preparation to install and run the fast_fmm_rpy2 module. This notebook assumes the user is familiar with Python and Jupyter Notebooks.

The goal of this notebook is to guide Python-users to the minimum install of R to use rpy2.

## R installation on Mac

Due to older Macs running x86 Intel chips and newer Macs using ARM-based Apple Silicon chips, care must be taken when installing R on Macs.

### Apple Silicon's ARM Architecture vs. Legacy x86 Architecture

Apple Silicon, used in M1, M2, and M3 chips, is based on ARM architecture, which differs significantly from the legacy x86 architecture used in older Intel-based Macs. ARM architecture is known for its efficiency and performance per watt, making it ideal for mobile and low-power devices. In contrast, x86 architecture is designed for high-performance computing but is less power-efficient.

R users cannot use compiled binaries for x86 chips on Macs with M-chips because of the architectural differences. Binaries compiled for x86 are not natively compatible with ARM architecture. While Rosetta 2 can translate x86 binaries to run on ARM, this can lead to performance degradation and compatibility issues. Therefore, it is recommended to use binaries specifically compiled for ARM architecture to ensure optimal performance and compatibility.

### How to Check if Your Mac Has an Intel or ARM Chip

1. Click on the Apple logo in the top-left corner of your screen.
2. Select "About This Mac" from the dropdown menu.
3. In the window that appears, look for the "Processor" or "Chip" information:
   - If it contains the word "Intel," your Mac has an Intel x86 chip.
   - If it says something like "Apple M1", "Apple M2", or "Apple M3" your Mac has an ARM-based Apple Silicon chip.


### Downloading R PKG from CRAN

Navigate to the [R for macOS](https://cran.r-project.org/bin/macosx/) page.

#### For Apple Silicon
Please download the PKG-file under the heading **For Apple silicon (M1,2,..) Macs:**.

#### For Intel
Please download the PKG-file under the heading **For older Intel Macs:**.

#### Optional: Verify the SHA1 of the downloaded package.
Copy the SHA1-hash string for your package and copy it into the cell below to the verify the file hash matches the SHA1 provided from the CRAN website.

#### Install PKG

Please install R using downloaded PKG file. You may need admin permisisons to install.

In [None]:
from hashlib import sha1
from pathlib import Path


def verify_sha1(file_path: str, sha1_target: str) -> tuple[bool, str]:
    with open(file_path, 'rb') as f:
        file_hash: str = sha1(f.read()).hexdigest()
    return file_hash == sha1_target, file_hash

# PASTE THE SHA1 STRING FROM CRAN BELOW
sha1_target: str = "SHA1_string_from_website"
assert sha1_target != "SHA1_string_from_website", "Please paste in SHA1 string from CRAN website"

# PASTE THE FILEPATH TO THE DOWNLOADED R PKG INSTALLER 
file_path: str = "filepath_to_PKG"
assert Path(file_path).exists(), f"{file_path} was not found!"
is_valid: bool
file_hash: str
is_valid, file_hash = verify_sha1(file_path, sha1_target)


if is_valid :
    print("Installer is valid!")
else:
    print("WARNING: SHA1 string does not match!")
    print(f"Generated file hash: {file_hash}")
    print(f"Expected SHA1: {sha1_target}")


### Xcode

Xcode is Apple's integrated development environment (IDE) that includes a suite of software development tools for macOS. It is essential for compiling and building software on macOS, including R and its packages.

#### Reasons for Installing Xcode:

1. **Command Line Tools**: Xcode provides command line tools (such as `gcc`, `make`, and `clang`) that are necessary for compiling R from source and for installing R packages that contain C, C++, or Fortran code.

2. **Package Installation**: Many R packages require compilation during installation. Without Xcode's command line tools, these packages cannot be compiled and installed properly.

3. **Development**: For users who develop R packages or need to compile custom code, Xcode provides the necessary tools and libraries to support development on macOS.

To install Xcode command line tools, you can run the following command in the terminal:

```bash
xcode-select --install
```

### gfortran compiler

There is a "univervsal" compiler that works for both x86 and ARM architectures that is [recommended by R] (https://cran.r-project.org/bin/macosx/tools/). Please ensure a gfortran compiler is installed prior to installing `fastFMM`.

### XQuartz Install

XQuartz is an open-source version of the X.Org X Window System that runs on macOS. It provides a graphical environment for Unix-based applications, including R, to display graphical output. To install XQuartz, you can download it from [XQuartz's official website](https://www.xquartz.org/) and follow the installation instructions.

## fastFMM installation in R

Once R is installed the R package fastFMM must be installed prior to using the Python package fast_fmm_rpy2.

### fastFMM CRAN Package Installation

The fastFMM has a [CRAN Package](https://cran.r-project.org/web/packages/fastFMM/index.html)

```R
install.packages('fastFMM', dependencies = TRUE)
```

## Install fast_fmm_rpy2 package

In the project's root directory run:

using pip:

```bash
pip install .
```

using [uv](https://docs.astral.sh/uv/):
```bash
uv sync
```

using uv but installing development and pytest dependencies:
```bash
uv sync --all-extras
```

## Testing Installation

In order to run the PyTest tests please install PyTest and from the project root run

```bash
pip install pytest
```
then

```bash
pytest ./tests
```

### Floating point from text complications

The rpy2 implimentation of fast_fmm_rpy2 uses pandas to read in CSV files. The string of numbers in the CSV file is converted to floating point numbers using the 'roundtrip' converter, see [read_csv docs](https://pandas.pydata.org/docs/dev/reference/api/pandas.read_csv.html). This converter matched the read.csv function in R on a MacBook Pro with an M3 chip. It may NOT match on your machine. The test found in `test\test_read_csv.py` test if the floating point numbers parsed from the provided CSVs all match using Pandas and R. Please use to tests to determine if subtle differences may occur if you run FMM in R versus using fast_fmm_rpy2.




## R Installation on Windows

While the core functionality of running fui from fastFMM is possible from a Windows machine, there is a know issue with how floating point numbers are parsed from csv files. This will result in small differences in the results of the fastFMM in R and fast_fmm_rpy2.

Download the R installer for windows from the [R for Windows Downloads Page](https://cran.r-project.org/bin/windows/base/). Note on the download page there is a link to the md5sum fingerprint to verify the R windows installer.

In [None]:
from hashlib import md5
from pathlib import Path


def verify_md5(file_path: str, md5_target: str) -> tuple[bool, str]:
    with open(file_path, 'rb') as f:
        file_hash: str = md5(f.read()).hexdigest()
    return file_hash == md5_target, file_hash

# PASTE THE SHA1 STRING FROM CRAN BELOW
md5_target: str = "MD5_fingerprint_from_website"
assert md5_target != "MD5_fingerprint_from_website", "Please paste in MD5 string from R Windows Download website"

# PASTE THE FILEPATH TO THE DOWNLOADED R PKG INSTALLER 
file_path: str = r"filepath_to_installer"
assert Path(file_path).exists(), f"{file_path} was not found!"
is_valid: bool
file_hash: str
is_valid, file_hash = verify_md5(file_path, md5_target)


if is_valid :
    print("Installer is valid!")
else:
    print("WARNING: MD5 string does not match!")
    print(f"Generated file hash: {file_hash}")
    print(f"Expected MD5: {md5_target}")


## Alias Issue

Note that R is an alias in PowerShell. Even if you add R to path, the R command will not resolve unless the alias is first removed. Use `R.exe` after adding R to path.

## R Windows fastFMM installation


```R
install.packages('fastFMM', dependencies = TRUE)
```

## Set R_HOME environment variable

Determine R_HOME directory using R

```powershell
R.exe RHOME
```

Set the R_HOME environment variable in Windows.


## Install fast_fmm_rpy2 package

In the project's root directory run:

using pip:

```bash
pip install .
```

using pip to install the development and pytest dependencies:

```bash
pip install .[dev]
```

using [uv](https://docs.astral.sh/uv/):
```bash
uv sync
```

using uv but installing development and pytest dependencies:
```bash
uv sync --all-extras
```

## pyTests Purpose

The pyTests test if the results from running the fastFMM in R are the same as running the fastFMM in Python using the fast_fmm_rpy2 package.

The pyTests are designed to test the following:

1. Test if the floating point numbers parsed from the provided CSVs all match using Pandas and R.
2. Test if the results from running the fastFMM in R are the same as running the fastFMM in Python using the fast_fmm_rpy2 package.

The pyTests are designed to be run from the root of the repository.

```bash
pytest
```

### Known Issues

On Windows the tests may fail due to small differences in floating point numbers.

