Data Type Conversions

In [13]:
import pandas as pd

data = {
    "Name": ["Amit", "Riya", "John", "Neha"],
    "Age": ["25", "22", "27", "24"],
    "DOB": ["1998-01-01", "2001-07-15", "1996-12-05", "2000-03-21"],
    "Score": ["85.5", "91", "88", "NaN"],
    "Fees": ["₹1,200", "₹1,500", "₹950", "₹1,000"],
    "Grade": ["a", "B", "C", "d"],
    "City": ["muMBai", "delhi", "KOLKATA", "ChEnNaI"]
}

df = pd.DataFrame(data)
print(df)


   Name Age         DOB Score    Fees Grade     City
0  Amit  25  1998-01-01  85.5  ₹1,200     a   muMBai
1  Riya  22  2001-07-15    91  ₹1,500     B    delhi
2  John  27  1996-12-05    88    ₹950     C  KOLKATA
3  Neha  24  2000-03-21   NaN  ₹1,000     d  ChEnNaI


Convert age column to integer type

In [4]:
df["Age"] = df["Age"].astype(int)
print(df["Age"].dtype)

int64


Convert Grade column to uppercase string

In [9]:
df["Grade"] = df["Grade"].str.upper()
print(df)

   Name Age         DOB Score    Fees Grade
0  Amit  25  1998-01-01  85.5  ₹1,200     A
1  Riya  22  2001-07-15    91  ₹1,500     B
2  John  27  1996-12-05    88    ₹950     C
3  Neha  24  2000-03-21   NaN  ₹1,000     D


Convert city column to title case

In [15]:
df["City"] = df.City.str.title()
print(df)

   Name Age         DOB Score    Fees Grade     City
0  Amit  25  1998-01-01  85.5  ₹1,200     a   Mumbai
1  Riya  22  2001-07-15    91  ₹1,500     B    Delhi
2  John  27  1996-12-05    88    ₹950     C  Kolkata
3  Neha  24  2000-03-21   NaN  ₹1,000     d  Chennai


Removing unwanted characters from string columns

In [None]:
import pandas as pd

data = {
    "Student": ["Amit", "Priya", "Ravi", "Neha"],
    "Fees": ["₹1000", "$800", "750 Rs", "USD 1200"]
}
df = pd.DataFrame(data)


#extract only the numbers and convert them into integer

# Extract digits from 'Fees' column
df["Fees_Clean"] = df["Fees"].str.extract(r"(\d+)")

# Convert the new column to integer
df["Fees_Clean"] = df["Fees_Clean"].astype(int)

print(df)

  Student      Fees  Fees_Clean
0    Amit     ₹1000        1000
1   Priya      $800         800
2    Ravi    750 Rs         750
3    Neha  USD 1200        1200


## Mini Challenge

Clean & Convert Student Fee Data

You're given a DataFrame with mixed fee formats and some missing values.

In [40]:
import pandas as pd

data = {
    "Student": ["Amit", "Priya", "Ravi", "Neha", "John", "Sana"],
    "Fees": ["₹1,000", "$800", "750 Rs", None, "USD 1,500", "N/A"]
}
df = pd.DataFrame(data)
print(df)

df["Fees"] = df["Fees"].str.replace(",","",regex=False)

  Student       Fees
0    Amit     ₹1,000
1   Priya       $800
2    Ravi     750 Rs
3    Neha       None
4    John  USD 1,500
5    Sana        N/A


Tasks:

1. Clean the "Fees" column by:

    - Removing commas

    - Extracting numeric digits (like you did earlier)

    - Replacing missing or unknown values (None, "N/A") with 0

2. Convert the cleaned fees to integer type and store in a new column called "Fees_Cleaned".

3. Print the cleaned DataFrame.

In [42]:
df["Fees"] = df["Fees"].str.extract(r"(\d+)")
df.fillna(0,inplace=True)
print(df)

df["Fees_Cleaned"] = df["Fees"].astype(int)
print(df)

  Student  Fees
0    Amit  1000
1   Priya   800
2    Ravi   750
3    Neha     0
4    John  1500
5    Sana     0
  Student  Fees  Fees_Cleaned
0    Amit  1000          1000
1   Priya   800           800
2    Ravi   750           750
3    Neha     0             0
4    John  1500          1500
5    Sana     0             0
