# DATA WRANGLING AND VISUALIZATIONS COMMON CODES:
---

### Viewing the Dataset:
    
#### Python:

Entire Dataset:
```python
datasetName()
```


Top 5 columns:
```python
DatasetName.head()
```

#### R:
Entire Dataset:
```r
View(DatasetName)
```


Top 5 columns:
```r
head(DatasetName)
```
---

### Viewing Columns Names:
    
#### Python:
```python
datasetName.columns()
```

#### R:
```r
names(dataFrame)

or

colnames(dataFrame)
```
---

### Column Variables:
    
#### Python:
```python
df.column.value_counts()
```

#### R:
```r
unique(Dataset$column)
```
---

### Data Structure:
    
#### Python:
```python
Dataset.info()
```

#### R:
```r
str(datasetName)
```
---

### Basic Statistics:
    
#### Python:
```python
# count, max, min, quartiles, std, mean
Dataset.describe()
```

#### R:
```r
# count, max, min, quartiles, std, mean
summary(datasetName)
```
---

### Removing Missing Values:
    
#### Python:
```python
dataset.dropna(inplace=True)
```

#### R:
```r
newdatasetName <- na.omit("OriginalDataset")
newdatasetName <- NaRV.omit("OriginalDataset")
```

- Refer to [Missing Data Impuation Notebook]() for more advanced cleaning. 
---

### Dropping Columns:
    
#### Python:
```python
datasetName.drop(['Column1', 'Column2', 'Column3'], axis=1, inplace=True)
```

#### R:
```r
# Filtering and Keeping the column you want, While removing the other: 
newDSname <- na.omit(ogDS %>% filter(ColumnName %in% c(“keep”, “these”, “categories” ,“ within this column”)))
```
---

### Renaming Columns:
    
#### Python:
```python
# Example 1
DS["New Column Name"].(Old Column Name)

# Example 2
datasetName.rename(columns={"old column name 1":"new column name 1", "old column name 2":"new column name 2"}, inplace=True)
```

#### R:
```r
# Example 1
names(DS)[names(DS) == "old column name"]  <- "New Column Name" 
```
---

### Recoding:
    
#### Python:
4 different ways to recode:

### Find the Columns you need to recode
```python
dataset.unique()
```

### Look and see what variables are inside your column. And see what variables you need to recode. 
```python
dataSetName.ColumnName.value_counts()
dataset.ColumnName.unique()
```

### Step 1: 
1 way to recode 1 column and the variables
```python
def NewFunctionName (series):
  if series == "Original Column Category Name":
    return "0"
  if series == "OG column category": 
    return "1"
  if series == "OG column category": 
    return "2"
  if series == "OG column category": 
    return "3"

DataSetName["RecodeColName"] = DataSetName["OriginalColName"].apply(NewFunctionName)
```

### Step 2:
If you don't want to type all that code above. You can just replace the variables instead of recoding. Using the .replace() function and correct syntax format. Keep in mind the numbers are numeric, bc your changing the original variables to it's new numeric values.

```python
dataFrame['NewColumnName'] = dataFrame['OriginalColumnName'].replace(['Variable0', 'Variable1, 'Variable2'],[0,1,2])

dataFrame = dataFrame['NewColumnName']
 ```             
                                                                      
### Step 3:
You can just entirely replace the variables inside the column ADD do multiple columns at once. Using the .replace() function and correct syntax format. 
```python
NewVariableName = {
"ColumnName": {"variable": 0, "variable1": 1,"variable2": 2},
"ColumnName1": {"variable": 0, "variable1": 1, "variable2": 2},  
"ColumnName2": {"variable": 0, "variable1": 1, "variable2": 2}}     

DatasetName.replace(NewVariableName , inplace=True)                                                             
```                                                                      
### Step 4:
Recode using replace() function
```python
df.replace(['Manga', '4-koma manga', 'Web manga', 'Digital manga'], 'Manga', inplace =True)
df.replace(['Light novel', 'Novel', 'Visual novel', 'Picture book', 'Book'], 'Book', inplace =True)
df.replace(['Game', 'Card game'], 'Game', inplace =True)
df.replace(['Original', 'Other', 'Music', 'Radio'], 'Listening', inplace =True)
```

#### R:
### Find the Columns you need to recode
```r
colnames(datasetName)
```
### Look and see what variables are inside your column. And see what variables you need to recode.
```r
unique(Dataset$column)
```
### Recode 
```r
df$columnNameRecode <- NA
df$columnNameRecode[df$columnName <= variables] <- 0
df$columnNameRecode[df$columnName > variables] <- 1

df$columnNameRecode[df$columnName == 'variables'] <- 0
df$columnNameRecode[df$columnName == 'variables'] <- 1
```
---