## Google Data Analyst - [Course 7 - Data Analysis with R Programming](https://www.coursera.org/learn/data-analysis-r/supplement/Y0Vr4/course-syllabus) [[Data Analyst]] #Google-data-analyst-course

### [Week 3 - Working with data frames](https://www.coursera.org/learn/data-analysis-r/lecture/BGeQ4/working-with-data-frames)

The course (finally) gets into using R by showing the basics of loading the diamonds dataset, like this:

```
library(ggplots2)
data("diamonds")
View(diamonds)
head(diamonds)
```

Which I can do the same in pandas here:

### **Side note: For detailed Pandas notes, check my Colab Notebooks in Google Drive!**

In [1]:
import pandas as pd
import numpy as np

url = "https://github.com/tidyverse/ggplot2/blob/main/data-raw/diamonds.csv?raw=true"

In [2]:
df_diamonds = pd.read_csv(url)

df_diamonds.head()

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31
3,0.29,Premium,I,VS2,62.4,58.0,334,4.2,4.23,2.63
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75


The `str(diamonds)` method in R gives structure info about the data frame.

In pandas we use `info()` for that.

In [3]:
df_diamonds.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 53940 entries, 0 to 53939
Data columns (total 10 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   carat    53940 non-null  float64
 1   cut      53940 non-null  object 
 2   color    53940 non-null  object 
 3   clarity  53940 non-null  object 
 4   depth    53940 non-null  float64
 5   table    53940 non-null  float64
 6   price    53940 non-null  int64  
 7   x        53940 non-null  float64
 8   y        53940 non-null  float64
 9   z        53940 non-null  float64
dtypes: float64(6), int64(1), object(3)
memory usage: 4.1+ MB


The `colnames()` method displays the column names. Use `columns` in pandas.

In [4]:
df_diamonds.columns

Index(['carat', 'cut', 'color', 'clarity', 'depth', 'table', 'price', 'x', 'y',
       'z'],
      dtype='object')

In R, you use the `mutate()` method in the `tidyverse` package to make changes to the data frame:
```
library(tidyverse) # load package
mutate(diamonds, carat_2=carat*100) # add a "carat_2" column with its content being…
```

In Python, we do the same without any special method:

In [5]:
df_diamonds['caratx100'] = df_diamonds['carat'] * 100
df_diamonds

Unnamed: 0,carat,cut,color,clarity,depth,table,price,x,y,z,caratx100
0,0.23,Ideal,E,SI2,61.5,55.0,326,3.95,3.98,2.43,23.0
1,0.21,Premium,E,SI1,59.8,61.0,326,3.89,3.84,2.31,21.0
2,0.23,Good,E,VS1,56.9,65.0,327,4.05,4.07,2.31,23.0
3,0.29,Premium,I,VS2,62.4,58.0,334,4.20,4.23,2.63,29.0
4,0.31,Good,J,SI2,63.3,58.0,335,4.34,4.35,2.75,31.0
...,...,...,...,...,...,...,...,...,...,...,...
53935,0.72,Ideal,D,SI1,60.8,57.0,2757,5.75,5.76,3.50,72.0
53936,0.72,Good,D,SI1,63.1,55.0,2757,5.69,5.75,3.61,72.0
53937,0.70,Very Good,D,SI1,62.8,60.0,2757,5.66,5.68,3.56,70.0
53938,0.86,Premium,H,SI2,61.0,58.0,2757,6.15,6.12,3.74,86.0
