<a href="https://colab.research.google.com/github/hyojin13/data_analysis/blob/main/14_ANOVA_usingR.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

How to run R within Python?

1. R within Python by leveraging the rpy2 library, which provides a bridge to run R code and use R libraries from within Python.

2. Steps to set up R and its necessary packages
    
    
    %% Ensure R is installed on your system. You can download it from the CRAN website.
  

 - [**Download R CRAN Mirrors**](https://cran.yu.ac.kr/)




Install any necessary R packages you intend to use.

1. Base R Packages:
  
  **stats:** This comes with the base R installation and includes functions for ANOVA and GLM.
>
2. Additional R Packages:

    i) **car:** Companion to Applied Regression, useful for more advanced ANOVA methods and diagnostic tests
    >
    ii) **multcomp:** Tools for multiple comparisons (post-hoc tests)
    >
    iii) **emmeans:** Estimated marginal means (EMMs), also known as least-squares means, for various model objects.
    >
    iv) **ggplot2:** For creating enhanced visualizations.

Steps to Install and Load the Necessary R Packages
Open an R Session:

Use the install.packages() function to install the necessary packages.

## 🚘 **Directly in an R Environment**
When working directly in R, you type the commands into your R console or R script.

# 💊 <font color = 'red'> **WARNING</font>**

**Check for Special Characters:** Ensure there are **non-ASCII** characters in your command. Sometimes, copying and pasting code from different sources can introduce hidden characters.

How to interpret the **aov()** function:

- Format: (dependent variable name ~ independent variable name, data = data frame in use)

- It is performing an analysis of variance to see how the dependent variable is affected by the independent variable, using the data stored in the data frame dt.
>

### **Some symbols in R useful to know:**

the **~** symbol: It is essential for defining relationships between variables in statistical models. The model formula indicates that dependent/response variable is being modeled as a function of the independent/factor variable

the **$** symbol: To extract components of a list or data frame.

>

# [**R script base for one-way ANOVA**](https://raw.githubusercontent.com/ms624atyale/Data_Misc/main/R_scriptBase4_one_way_ANOVA.txt)


## 🚗 **Run R code from within Python:**

Within Python using rpy2: Use Python to interface with R, running install.packages() and library() through rpy2 to execute R commands.

🐹 🐾 <font color = 'green'>_If you are primarily working in R, use the direct method._</font> If you prefer Python or need to integrate R code within a larger Python project, use the rpy2 approach, <font color = 'red'>_**which is not recommended**_...

### The pipeline of using R within Python is as follows:
1. Install and import necessary packages and modules, and call necessary functions.  

2. Read csv file for analysis.

3. Convert the pandas DataFrame to an R dataframe

4. Assign the R dataframe to the R environment

5. Perform one-way ANOVA in R within Python

In [None]:
#Example ANOVA for analysis

!pip install rpy2
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
import pandas as pd

# Activate the pandas2ri conversion
pandas2ri.activate()

# Function to install R packages
def install_r_package(package_name):
    try:
        robjects.r(f'install.packages("{package_name}")')
    except Exception as e:
        print(f"An error occurred: {e}")

# Install necessary packages
install_r_package('car')

# Load necessary libraries
robjects.r('library(car)')

# Load the CSV file into a pandas DataFrame
file_path = '/content/sample_data/korean_stops_vot.csv'

df = pd.read_csv(file_path)

# Convert the pandas DataFrame to an R dataframe
r_df = pandas2ri.DataFrame(df)

# 2. Assign the R dataframe to the R environment
robjects.globalenv['data'] = r_df


# 3. Perform one-way ANOVA in R
anova_result = robjects.r('anova_result <- aov(vot ~ stops, data = data)')
summary_result = robjects.r('summary(anova_result)')
print(summary_result)



(as ‘lib’ is unspecified)







	‘/tmp/Rtmp654QeK/downloaded_packages’



            Df Sum Sq Mean Sq F value Pr(>F)    
stops        2  65535   32768   478.7 <2e-16 ***
Residuals   87   5955      68                   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

