---
title: R to Python
---

| Task                             | `pandas` (Python)                                     | `tidyverse` (R)                                                |
| -------------------------------- | ----------------------------------------------------- | -------------------------------------------------------------- |
| **1. Select columns**            | `df[['col1', 'col2']]`                                | `df %>% select(col1, col2)`                                    |
| **2. Filter rows**               | `df[df['col'] > 10]`                                  | `df %>% filter(col > 10)`                                      |
| **3. Create new column**         | `df['new'] = df['col1'] + df['col2']`                 | `df %>% mutate(new = col1 + col2)`                             |
| **4. Group by and summarize**    | `df.groupby('group').agg({'value': 'mean'})`          | `df %>% group_by(group) %>% summarize(mean_val = mean(value))` |
| **5. Sort rows**                 | `df.sort_values('col', ascending=False)`              | `df %>% arrange(desc(col))`                                    |
| **6. Drop missing values**       | `df.dropna()`                                         | `df %>% drop_na()`                                             |
| **7. Fill missing values**       | `df.fillna(0)`                                        | `df %>% replace_na(list(col = 0))`                             |
| **8. Reshape from wide to long** | `df.melt(id_vars='id')`                               | `df %>% pivot_longer(-id, names_to='key', values_to='value')`  |
| **9. Reshape from long to wide** | `df.pivot(index='id', columns='key', values='value')` | `df %>% pivot_wider(names_from='key', values_from='value')`    |
| **10. Join tables**              | `pd.merge(df1, df2, on='id')`                         | `left_join(df1, df2, by='id')`                                 |


Below is an expanded list of 20 common data manipulation tasks, comparing syntax in Python using pandas and in R using the tidyverse (primarily dplyr and tidyr).

| Task                                  | `pandas` (Python)                                     | `tidyverse` (R)                                               |
| ------------------------------------- | ----------------------------------------------------- | ------------------------------------------------------------- |
| **1. Select columns**                 | `df[['col1', 'col2']]`                                | `df %>% select(col1, col2)`                                   |
| **2. Filter rows**                    | `df[df['col'] > 10]`                                  | `df %>% filter(col > 10)`                                     |
| **3. Create new column**              | `df['new'] = df['col1'] + df['col2']`                 | `df %>% mutate(new = col1 + col2)`                            |
| **4. Group by and summarize**         | `df.groupby('group').agg({'val': 'mean'})`            | `df %>% group_by(group) %>% summarize(mean_val = mean(val))`  |
| **5. Sort rows**                      | `df.sort_values('col', ascending=False)`              | `df %>% arrange(desc(col))`                                   |
| **6. Drop missing values**            | `df.dropna()`                                         | `df %>% drop_na()`                                            |
| **7. Fill missing values**            | `df.fillna(0)`                                        | `df %>% replace_na(list(col = 0))`                            |
| **8. Rename columns**                 | `df.rename(columns={'old': 'new'})`                   | `df %>% rename(new = old)`                                    |
| **9. Reshape wide to long**           | `df.melt(id_vars='id')`                               | `df %>% pivot_longer(-id, names_to='key', values_to='value')` |
| **10. Reshape long to wide**          | `df.pivot(index='id', columns='key', values='value')` | `df %>% pivot_wider(names_from=key, values_from=value)`       |
| **11. Remove columns**                | `df.drop(['col1', 'col2'], axis=1)`                   | `df %>% select(-col1, -col2)`                                 |
| **12. Sample rows**                   | `df.sample(n=5)`                                      | `df %>% sample_n(5)`                                          |
| **13. Count unique values**           | `df['col'].nunique()`                                 | `df %>% summarize(n_distinct(col))`                           |
| **14. Count frequency of values**     | `df['col'].value_counts()`                            | `df %>% count(col)`                                           |
| **15. Conditional column creation**   | `df['new'] = np.where(df['col'] > 0, 'yes', 'no')`    | `df %>% mutate(new = if_else(col > 0, 'yes', 'no'))`          |
| **16. String operations**             | `df['col'].str.lower()`                               | `df %>% mutate(col = str_to_lower(col))`                      |
| **17. Filter by multiple conditions** | `df[(df['x'] > 0) & (df['y'] < 10)]`                  | `df %>% filter(x > 0, y < 10)`                                |
| **18. Cumulative sum**                | `df['cumsum'] = df['val'].cumsum()`                   | `df %>% mutate(cumsum = cumsum(val))`                         |
| **19. Windowed ranking**              | `df['rank'] = df['val'].rank()`                       | `df %>% mutate(rank = rank(val))`                             |
| **20. Left join two tables**          | `pd.merge(df1, df2, on='id', how='left')`             | `df1 %>% left_join(df2, by='id')`                             |


Give 20 examples of plots using seaborn and ggplot2

| Plot Type                     | **Seaborn (Python)**                                          | **ggplot2 (R)**                                                              |
| ----------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| **1. Scatter plot**           | `sns.scatterplot(x='x', y='y', data=df)`                      | `ggplot(df, aes(x, y)) + geom_point()`                                       |
| **2. Line plot**              | `sns.lineplot(x='x', y='y', data=df)`                         | `ggplot(df, aes(x, y)) + geom_line()`                                        |
| **3. Histogram**              | `sns.histplot(x='x', data=df)`                                | `ggplot(df, aes(x)) + geom_histogram()`                                      |
| **4. KDE plot**               | `sns.kdeplot(x='x', data=df)`                                 | `ggplot(df, aes(x)) + geom_density()`                                        |
| **5. Box plot**               | `sns.boxplot(x='group', y='value', data=df)`                  | `ggplot(df, aes(group, value)) + geom_boxplot()`                             |
| **6. Violin plot**            | `sns.violinplot(x='group', y='value', data=df)`               | `ggplot(df, aes(group, value)) + geom_violin()`                              |
| **7. Bar plot (categorical)** | `sns.barplot(x='group', y='value', data=df)`                  | `ggplot(df, aes(group, value)) + geom_bar(stat='identity')`                  |
| **8. Count plot**             | `sns.countplot(x='category', data=df)`                        | `ggplot(df, aes(category)) + geom_bar()`                                     |
| **9. Heatmap**                | `sns.heatmap(data=corr_matrix)`                               | `ggplot(melt(corr_matrix), aes(Var1, Var2, fill=value)) + geom_tile()`       |
| **10. Pair plot**             | `sns.pairplot(df)`                                            | `GGally::ggpairs(df)`                                                        |
| **11. Swarm plot**            | `sns.swarmplot(x='group', y='value', data=df)`                | `ggplot(df, aes(group, value)) + geom_jitter(width=0.1)`                     |
| **12. Strip plot**            | `sns.stripplot(x='group', y='value', data=df)`                | `ggplot(df, aes(group, value)) + geom_jitter()`                              |
| **13. Regression plot**       | `sns.regplot(x='x', y='y', data=df)`                          | `ggplot(df, aes(x, y)) + geom_smooth(method='lm') + geom_point()`            |
| **14. Residual plot**         | `sns.residplot(x='x', y='y', data=df)`                        | Not directly available; simulate with `augment()` from `broom`               |
| **15. Facet grid**            | `sns.FacetGrid(df, col='var').map(sns.histplot, 'x')`         | `ggplot(df, aes(x)) + geom_histogram() + facet_wrap(~var)`                   |
| **16. Joint plot**            | `sns.jointplot(x='x', y='y', data=df)`                        | `ggExtra::ggMarginal(p, type='histogram')`                                   |
| **17. Time series plot**      | `sns.lineplot(x='date', y='value', data=df)`                  | `ggplot(df, aes(date, value)) + geom_line()`                                 |
| **18. Error bars (CI)**       | `sns.pointplot(x='x', y='y', data=df, ci='sd')`               | `ggplot(df, aes(x, y)) + stat_summary(fun.data=mean_sdl, geom='pointrange')` |
| **19. Categorical dot plot**  | `sns.catplot(x='group', y='value', kind='strip', data=df)`    | `ggplot(df, aes(group, value)) + geom_jitter()`                              |
| **20. Multi-plot layout**     | `sns.catplot(x='x', y='y', col='facet', kind='box', data=df)` | `ggplot(df, aes(x, y)) + geom_boxplot() + facet_wrap(~facet)`                |

Below is a comparative list of **20 common plot types** using Python’s **Seaborn** and R’s **ggplot2**. Each entry includes the plot type and the canonical syntax in both libraries, assuming a data frame `df` with appropriate columns.

| Plot Type                     | **Seaborn (Python)**                                          | **ggplot2 (R)**                                                              |
| ----------------------------- | ------------------------------------------------------------- | ---------------------------------------------------------------------------- |
| **1. Scatter plot**           | `sns.scatterplot(x='x', y='y', data=df)`                      | `ggplot(df, aes(x, y)) + geom_point()`                                       |
| **2. Line plot**              | `sns.lineplot(x='x', y='y', data=df)`                         | `ggplot(df, aes(x, y)) + geom_line()`                                        |
| **3. Histogram**              | `sns.histplot(x='x', data=df)`                                | `ggplot(df, aes(x)) + geom_histogram()`                                      |
| **4. KDE plot**               | `sns.kdeplot(x='x', data=df)`                                 | `ggplot(df, aes(x)) + geom_density()`                                        |
| **5. Box plot**               | `sns.boxplot(x='group', y='value', data=df)`                  | `ggplot(df, aes(group, value)) + geom_boxplot()`                             |
| **6. Violin plot**            | `sns.violinplot(x='group', y='value', data=df)`               | `ggplot(df, aes(group, value)) + geom_violin()`                              |
| **7. Bar plot (categorical)** | `sns.barplot(x='group', y='value', data=df)`                  | `ggplot(df, aes(group, value)) + geom_bar(stat='identity')`                  |
| **8. Count plot**             | `sns.countplot(x='category', data=df)`                        | `ggplot(df, aes(category)) + geom_bar()`                                     |
| **9. Heatmap**                | `sns.heatmap(data=corr_matrix)`                               | `ggplot(melt(corr_matrix), aes(Var1, Var2, fill=value)) + geom_tile()`       |
| **10. Pair plot**             | `sns.pairplot(df)`                                            | `GGally::ggpairs(df)`                                                        |
| **11. Swarm plot**            | `sns.swarmplot(x='group', y='value', data=df)`                | `ggplot(df, aes(group, value)) + geom_jitter(width=0.1)`                     |
| **12. Strip plot**            | `sns.stripplot(x='group', y='value', data=df)`                | `ggplot(df, aes(group, value)) + geom_jitter()`                              |
| **13. Regression plot**       | `sns.regplot(x='x', y='y', data=df)`                          | `ggplot(df, aes(x, y)) + geom_smooth(method='lm') + geom_point()`            |
| **14. Residual plot**         | `sns.residplot(x='x', y='y', data=df)`                        | Not directly available; simulate with `augment()` from `broom`               |
| **15. Facet grid**            | `sns.FacetGrid(df, col='var').map(sns.histplot, 'x')`         | `ggplot(df, aes(x)) + geom_histogram() + facet_wrap(~var)`                   |
| **16. Joint plot**            | `sns.jointplot(x='x', y='y', data=df)`                        | `ggExtra::ggMarginal(p, type='histogram')`                                   |
| **17. Time series plot**      | `sns.lineplot(x='date', y='value', data=df)`                  | `ggplot(df, aes(date, value)) + geom_line()`                                 |
| **18. Error bars (CI)**       | `sns.pointplot(x='x', y='y', data=df, ci='sd')`               | `ggplot(df, aes(x, y)) + stat_summary(fun.data=mean_sdl, geom='pointrange')` |
| **19. Categorical dot plot**  | `sns.catplot(x='group', y='value', kind='strip', data=df)`    | `ggplot(df, aes(group, value)) + geom_jitter()`                              |
| **20. Multi-plot layout**     | `sns.catplot(x='x', y='y', col='facet', kind='box', data=df)` | `ggplot(df, aes(x, y)) + geom_boxplot() + facet_wrap(~facet)`                |

### Notes:

* `sns` stands for `seaborn`.
* In ggplot2, `stat='identity'` must be specified when y-values are given (not counts).
* `ggExtra`, `GGally`, and `broom` are part of the broader tidyverse ecosystem but may require additional installation.

This comparison covers common visualization tasks across exploratory data analysis and statistical modeling workflows.



::: {.panel-tabset group="language"}
## R

``` {.r}
fizz_buzz <- function(fbnums = 1:50) {
  output <- dplyr::case_when(
    fbnums %% 15 == 0 ~ "FizzBuzz",
    fbnums %% 3 == 0 ~ "Fizz",
    fbnums %% 5 == 0 ~ "Buzz",
    TRUE ~ as.character(fbnums)
  )
  print(output)
}
```

## Python

``` {.python}
def fizz_buzz(num):
  if num % 15 == 0:
    print("FizzBuzz")
  elif num % 5 == 0:
    print("Buzz")
  elif num % 3 == 0:
    print("Fizz")
  else:
    print(num)
```

:::

::: {.panel-tabset group="language"}
## R

``` {.r}
fizz_buzz <- function(fbnums = 1:50) {
  output <- dplyr::case_when(
    fbnums %% 15 == 0 ~ "FizzBuzz",
    fbnums %% 3 == 0 ~ "Fizz",
    fbnums %% 5 == 0 ~ "Buzz",
    TRUE ~ as.character(fbnums)
  )
  print(output)
}
```

## Python

``` {.python}
def fizz_buzz(num):
  if num % 15 == 0:
    print("FizzBuzz")
  elif num % 5 == 0:
    print("Buzz")
  elif num % 3 == 0:
    print("Fizz")
  else:
    print(num)
```

:::
