Running Python Models in R

library(reticulate)

Prerequisites

For these methods to work, you will need to point to a Python executable in a Conda environment or Virtualenv that contains all the Python packages you need. You can do this by using a .Rprofile file in your project directory. See the contents of the .Rprofile file in this project to see how I have done this.

Write Python functions to run on a data set in R

In the file python_functions.py I have written the required functions in Python to perform an XGBoost model on an arbitrary data set. We expect all the parameters for these functions to to be in a single dict called parameters. I am now going to source these functions into R so they become R functions that expect equivalent data structures.

source_python("python_functions.py")

Example: Using XGBoost in R

We now use these Python function on a prepared wine dataset in R to try to learn to predict a high quality wine.

First we download data sets for white wines and red wines.

white_wines <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-white.csv",
                        sep = ";")
red_wines <- read.csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv", 
                      sep = ";")

We will create ‘white versus red’ as a new feature, and we will define ‘High Quality’ to be a quality score of seven or more.

library(dplyr)

white_wines$red <- 0
red_wines$red <- 1

wine_data <- white_wines %>% 
  bind_rows(red_wines) %>% 
  mutate(high_quality = ifelse(quality >= 7, 1, 0)) %>% 
  select(-quality)

knitr::kable(head(wine_data))

fixed.acidity	volatile.acidity	citric.acid	residual.sugar	chlorides	free.sulfur.dioxide	total.sulfur.dioxide	density	pH	sulphates	alcohol
7.0	0.27	0.36	20.7	0.045	45	170	1.0010	3.00	0.45	8.8
6.3	0.30	0.34	1.6	0.049	14	132	0.9940	3.30	0.49	9.5
8.1	0.28	0.40	6.9	0.050	30	97	0.9951	3.26	0.44	10.1
7.2	0.23	0.32	8.5	0.058	47	186	0.9956	3.19	0.40	9.9
7.2	0.23	0.32	8.5	0.058	47	186	0.9956	3.19	0.40	9.9
8.1	0.28	0.40	6.9	0.050	30	97	0.9951	3.26	0.44	10.1

Now we set our list of parameters (a list in R is equivalent to a dict in Python):

params <- list(
  input_cols = colnames(wine_data)[colnames(wine_data) != 'high_quality'],
  target_col = 'high_quality',
  test_size = 0.3,
  random_state = 123,
  subsample = (3:9)/10, 
  xgb_max_depth = 3:9,
  colsample_bytree = (3:9)/10,
  xgb_min_child_weight = 1:4,
  k = 3,
  k_shuffle = TRUE,
  n_iter = 10,
  scoring = 'f1',
  error_score = 0,
  verbose = 1,
  n_jobs = -1
)

Now we are ready to run our XGBoost model with 3-fold cross validation. First we split the data:

split <- split_data(df = wine_data,  parameters = params)

This produces a list, which we can feed into our scaling function:

scaled <- scale_data(split$X_train, split$X_test)

Now we can run the XGBoost algorithm with the defined parameters on our training set:

trained <- train_xgb_crossvalidated(
  scaled$X_train_scaled,
  split$y_train,
  parameters = params
)

Finally we can generate a classification report on our test set:

report <- generate_classification_report(trained, scaled$X_test_scaled, split$y_test)

knitr::kable(report)

	precision	recall	f1-score
0.0	0.8859915	0.9377407	0.9111319
1.0	0.6777409	0.5204082	0.5887446
accuracy	0.8538462	0.8538462	0.8538462
macro avg	0.7818662	0.7290744	0.7499382
weighted avg	0.8441278	0.8538462	0.8463238

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
python		python
templates		templates
www		www
.Rprofile		.Rprofile
.gitignore		.gitignore
README.md		README.md
index.html		index.html
model_testing_python_to_r.Rmd		model_testing_python_to_r.Rmd
python_functions.py		python_functions.py
r_and_py.Rmd		r_and_py.Rmd
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

python

python

templates

templates

www

www

.Rprofile

.Rprofile

.gitignore

.gitignore

README.md

README.md

index.html

index.html

model_testing_python_to_r.Rmd

model_testing_python_to_r.Rmd

python_functions.py

python_functions.py

r_and_py.Rmd

r_and_py.Rmd

style.css

style.css

Repository files navigation

Running Python Models in R

Prerequisites

Write Python functions to run on a data set in R

Example: Using XGBoost in R

About

Releases

Packages

Languages

keithmcnulty/r_and_py_models

Folders and files

Latest commit

History

Repository files navigation

Running Python Models in R

Prerequisites

Write Python functions to run on a data set in R

Example: Using XGBoost in R

About

Resources

Stars

Watchers

Forks

Languages