# Unifying and Fixing Data types

The idea is:
- When numbers are stored as strings, use `pd.to_numeric(column, errors="coerce")`
- However, inspect data and do necessary cleaning first, such as remove `$` sign

## Example 1: unifying data typr
### Let's create a messy dataset first


In [1]:
import pandas as pd

df = pd.DataFrame({
    "product": ["Laptop", "Phone", "Tablet", "Monitor", "Keyboard"],
    "price": ["1200", "800", "N/A", "350", "unknown"]
})

print("Original Data:")
print(df)

print("\nData types:")
print(df.dtypes)

print("\n notice the type of price is object, not numeric")

Original Data:
    product    price
0    Laptop     1200
1     Phone      800
2    Tablet      N/A
3   Monitor      350
4  Keyboard  unknown

Data types:
product    object
price      object
dtype: object


### Let's try Converting to Numeric

In [None]:
# errors="coerce" is important here, it means if value cannot be converted, turn it into NaN.

df["price"] = pd.to_numeric(df["price"], errors="coerce")

print("After Conversion:")
print(df)

print("\nData types:")
print(df.dtypes)


### Why This Is Helpful?

Now you can detect problematic entries. This helps identify:
- Data entry errors
- Unexpected values
- Cleaning needed

In [None]:

print("Rows with invalid price data:")
print(df[df["price"].isna()])


### What If We Didnâ€™t Use errors="coerce"?

This would raise an error, because string "Unknown" is unable to be convert to numeric

In [4]:

df_test = pd.DataFrame({
    "price": ["1200", "800", "Unknown"]
})

pd.to_numeric(df_test["price"])


ValueError: Unable to parse string "Unknown" at position 2

## Example 2: removing currency symbols

Sometimes prices look like:

In [12]:
df2 = pd.DataFrame({
    "price": ["$1200", "$800", "$350"]
})

print(df2)
print(df2.dtypes)


   price
0  $1200
1   $800
2   $350
price    object
dtype: object


### First remove $

In [13]:

df2["price"] = df2["price"].str.replace("$", "", regex=False)

print(df2)
print(df.dtypes)

  price
0  1200
1   800
2   350
product     object
price      float64
dtype: object


### Then convert the strings to numeric


In [14]:
df2["price"] = pd.to_numeric(df2["price"], errors="coerce")

print(df2)
print(df2.dtypes)


   price
0   1200
1    800
2    350
price    int64
dtype: object
