Google Colab’s underlying virtual machine actually includes R by default. However, the default runtime environment is set up for Python. This means you can access R—for example, by using the %%R cell magic provided through rpy2 (which is pre-installed)—or by switching to an R-specific kernel if you prefer running an all-R notebook. For psy-data-tool to work a Python kernel is needed, and rpy2 is used to let Python communicate with R, hence using both the languages and their features in the same notebook.

Check if rpy2 is already in the remote kernel (it should be by default). It is necessary to have it before verifying if R is installed (when running a Python runtime. You don't need this in an R-specific runtime).

In [1]:
# Check rpy2
import rpy2
print(rpy2.__version__)

# load rpy2
%load_ext rpy2.ipython

3.5.17


This allows you to use `%R` or `%%R` magic commands in Python cells. Here you can find the documentation about IPython magics. For example the following should show you the installed R version:


In [2]:
# Check R
%%R
R.version.string

[1] "R version 4.4.2 (2024-10-31)"


Clone the psy-data-tool repository (after resetting previous version if existing):

In [3]:
!rm -rf psy-data-tool
!git clone https://github.com/francesco-gariboldi/psy-data-tool.git

Cloning into 'psy-data-tool'...
remote: Enumerating objects: 148, done.[K
remote: Counting objects: 100% (148/148), done.[K
remote: Compressing objects: 100% (109/109), done.[K
remote: Total 148 (delta 89), reused 90 (delta 39), pack-reused 0 (from 0)[K
Receiving objects: 100% (148/148), 258.18 KiB | 3.00 MiB/s, done.
Resolving deltas: 100% (89/89), done.


Let's move into the cloned project directory on the remote server:

In [4]:
# Note that !cd doesn’t work for this purpose because the shell where !command
# runs is immediately discarded after executing ‘command’.
%cd psy-data-tool

/content/psy-data-tool


To avoid possible conflicts in the remote machine, we'll create and activate a virtual environment for python packages.

Next, install python packages. Install all packages at once with the following command (using the `requirements.txt` file, already existing within the repository).

_(While `!pip install rpy2` works too, `%pip install rpy2` is often recommended within notebooks because it ensures installation into the environment backing the current IPython kernel.)_

In [5]:
%pip install -r requirements.txt



Now you can check which Python and pip you are using:

In [6]:
!which python
!python --version

!which pip
!pip --version

/usr/local/bin/python
Python 3.11.11
/usr/local/bin/pip
pip 24.1.2 from /usr/local/lib/python3.11/dist-packages/pip (python 3.11)


Usually, Colab will discover the correct `R_HOME` automatically once R is installed. But if you run into issues, you can set it explicitly in Python.

(This path may vary depending on how R was installed, but for most Debian/Ubuntu-based systems, `/usr/lib/R` is correct.)

Let's verify how `R_HOME` is set in the remote system. In IPython, if you want to access a true shell variable, an extra $ is necessary to prevent its expansion by IPython, hence `$$` is used to obtain a system variable.

You can check the R_HOME variables in two ways: `!echo $$R_HOME` or `%env R_HOME`

If it is not automatically well set, then try running the following to manually set it to `/usr/lib/R`:

In [7]:
import os
os.environ['R_HOME'] = '/usr/lib/R'

In [8]:
# Let's check if it has been correctly set
!echo $$R_HOME

/usr/lib/R


In [9]:
%%R
# Get/set the library trees within which packages are looked for.
.libPaths()

[1] "/usr/local/lib/R/site-library" "/usr/lib/R/site-library"      
[3] "/usr/lib/R/library"           


We'll install the R packages in "/usr/lib/R/site-library"

In [10]:
import rpy2.robjects as robjects
import rpy2.robjects.packages as rpackages
from rpy2.robjects.vectors import StrVector
from rpy2.robjects.conversion import localconverter

Install the dependencies needed in a colab environment to install the necessary R packages

In [11]:
!apt-get update
!apt-get install -y \
    libcurl4-openssl-dev \
    libssl-dev \
    libxml2-dev \
    libfontconfig1-dev \
    libfreetype6-dev \
    libharfbuzz-dev \
    libfribidi-dev \
    g++

Hit:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Get:2 https://r2u.stat.illinois.edu/ubuntu jammy InRelease [6,555 B]
Hit:3 http://security.ubuntu.com/ubuntu jammy-security InRelease
Hit:4 http://archive.ubuntu.com/ubuntu jammy InRelease
Hit:5 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease
Hit:6 http://archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:7 http://archive.ubuntu.com/ubuntu jammy-backports InRelease
Hit:8 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Hit:9 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:10 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Fetched 6,555 B in 2s (3,532 B/s)
Reading package lists... Done
W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)
Reading package lists... D

The following step can take some minutes to complete (R packages installation)
I suggest to run it with "verbose=TRUE" to debug in case of errors.

In [12]:
%%R
install.packages(c('future', 'furrr', 'broom.mixed', 'gglm', 'performance'),
  repos="http://cran.r-project.org",
  type="source",
  verbose=TRUE,
  INSTALL_opts=c("--no-lock", "--no-build-vignettes")
)

Installing packages into ‘/usr/local/lib/R/site-library’
(as ‘lib’ is unspecified)
system (cmd0): /usr/lib/R/bin/R CMD INSTALL --no-lock --no-build-vignettes
trying URL 'http://cran.r-project.org/src/contrib/future_1.34.0.tar.gz'
Content type 'application/x-gzip' length 359503 bytes (351 KB)
downloaded 351 KB

trying URL 'http://cran.r-project.org/src/contrib/furrr_0.3.1.tar.gz'
Content type 'application/x-gzip' length 907668 bytes (886 KB)
downloaded 886 KB

trying URL 'http://cran.r-project.org/src/contrib/broom.mixed_0.2.9.6.tar.gz'
Content type 'application/x-gzip' length 5147791 bytes (4.9 MB)
downloaded 4.9 MB

trying URL 'http://cran.r-project.org/src/contrib/gglm_1.0.3.tar.gz'
Content type 'application/x-gzip' length 140821 bytes (137 KB)
downloaded 137 KB

trying URL 'http://cran.r-project.org/src/contrib/performance_0.13.0.tar.gz'
Content type 'application/x-gzip' length 2165084 bytes (2.1 MB)
downloaded 2.1 MB

foundpkgs: future, furrr, broom.mixed, gglm, performance, /tmp/R

In [None]:
%%R
# You can install with verbose output using the optional argument "verbose=TRUE"
# to assess potential problems
install.packages(c('lme4', 'lmerTest', 'emmeans', 'geepack',
                   'performance', 'ggplot2','gglm', 'gridExtra'),
                 lib="/usr/lib/R/site-library", repos="http://cran.r-project.org",
                 verbose=TRUE)

_(Or from Python, you can import `rpy2.robjects.packages.importr('utils')` and do `utils.install_packages(...)`—whichever workflow you prefer.)_

# Setup completed
Now you can use the actual notebook/code, the `psy-data-tool.ipynb` file content.

To load files in the colab environment you can use different methods. We'll keep things simple here and upload it with the 'drag and drop' method.

Simply drag your files (e.g., data.csv or any executable/data file) into the file browser pane to load it in the colab runtime session environment.

In [None]:
import os

print(os.getcwd())

In [None]:
import pandas as pd

# Load your data (usually it is stored in './content/ in the colab environment).
df=pd.read_csv("../world-happiness-report-2021.csv")

# 5. Standardize column names (to lower cased snake_case)
df.columns = df.columns.str.replace('%', '').str.replace('(', '').str.replace(')', '').str.strip().str.lower().str.replace(' ', '_')
df

In [None]:
# Let's look at the data types
df.dtypes

Now that you have seen the dataframe, let the `xplore_data`
function do all the hard work:

In [None]:
from re import DEBUG
from xplore_data import xplore_data

# Selected vars must be written in snake_case (for example if var in original
# dataframe is 'Weighted Frequency', you have to use the argument 'weighted_frequency')

# If our sample includes, with respect to a categorical variable, relative values
# to all possible observable levels in the population, if we were to
# fit it into the model, it would be a fixed effect.
# If the identification codes we have in our dataset are only a sample of
# those of the entire population, it* should be entered into the model as a random effect.
# In automatic models generation this is not possible a priori, but we can check the
# models that are chosen at the end of the program.

# In Colab, models will be stored in "/content/psy-data-tool/models.json"

response_var, predictor_vars, best_models, df_r = xplore_data(df, response_var="ladder_score", predictor_vars=["freedom_to_make_life_choices", "healthy_life_expectancy", "regional_indicator"], print_r_warnings=True)