# Data Cleaning Basics: Takeaways

## Syntax
---

### Reading in a CSV with a Specific Encoding

- Reading in a CSV file using Latin encoding:

In [None]:
import pandas as pd

laptops = pd.read_csv('laptops.csv', encoding='Latin-1')

- Reading in a CSV file using UTF-8:

In [None]:
laptops = pd.read_csv('laptops.csv', encoding='UTF-8')

- Reading in a CSV file using Windows-1251:

In [None]:
laptops = pd.read_csv('laptops.csv', encoding='Windows-1251')

---
### Modifying Columns in a DataFrame

- Renaming an Existing Column:

In [None]:
laptops.rename(columns={'MANUfacturer' : 'manufacturer'}, inplace=True)

- Converting a String Column to a Float:

In [None]:
laptops["screen_size"] = laptops["screen_size"].str.replace('"', '').astype(float)

- Converting a String Column to Integer:

In [None]:
laptops["ram"] = laptops["ram"].str.replace('GB', '')
laptops["ram"] = laptops["ram"].astype(int)

---
### String Column Operations

- Extracting Values from Strings:

In [None]:
laptops["gpu_manufacturer"] = (laptops["gpu"]
                              .str.split()
                              .str[0])

---
### Fixing Values

- Replacing Values Using a Mapping Dictionary:

In [None]:
mapping_dict = {
    'Android': 'Android',
    'Chrome OS': 'Chrome OS',
    'Linux': 'Linux',
    'Mac OS': 'macOS',
    'No OS': 'No OS',
    'Windows': 'Windows',
    'macOS': 'macOS'
}
laptops["os"] = laptops["os"].map(mapping_dict)

- Dropping Missing Values:

In [None]:
laptops_no_null_rows = laptops.dropna(axis=0)

---
### Exporting Cleaned Data

- Exporting Cleaned Data:

In [None]:
df.to_csv('laptops_cleaned.csv', index=False)

## Concepts

- Computers, at their lowest levels, can only understand binary.

- Encodings are systems for representing all other values in binary so a computer can work with them.

- UTF-8 is the most common encoding and is very friendly to work with in Python 3.

- When converting text data to numeric data, we usually follow the following steps:
    - Explore the data in the column.
    - Identify patterns and special cases.
    - Remove non-digit characters.
    - Convert the column to a numeric dtype.
    - Rename column if required.