## Importing Modules

In [12]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Loading Dataset 

In [13]:
df=pd.read_csv("C:/Users/DELL/OneDrive/Desktop/Power BI Projects/Dataset Used/Booking Dataset of Multi Service Business.csv")

In [14]:
df.head()

Unnamed: 0,Booking ID,Customer ID,Customer Name,Booking Type,Booking Date,Status,Class Type,Instructor,Time Slot,Duration (mins),Price,Facility,Theme,Subscription Type,Service Name,Service Type
0,279d92c6-ce26-47c0-8915-e45b77fe20e2,00901ce3-3d86-4c97-bca2-40ccac2fb99f,Customer 1,Facility,30-05-2025,Pending,,,10:00,90.0,42.74,Party Room,,,Party Room,Facility
1,415bfcbe-1a2e-4d4b-809a-4c5b606653b1,b82db986-bd52-4b07-bdd8-aa8cf2016241,Customer 2,Birthday Party,29-05-2025,Pending,,,,,182.06,Party Room,Superhero,,Party Room,Birthday Party
2,2100024b-46fc-47b5-ac1c-047d007a4723,6bbb6e83-9577-4f64-80b0-f073132d18f3,Customer 3,Birthday Party,09-05-2025,Confirmed,,,11:00,120.0,207.5,Play Area,,,Play Area,Facility
3,74936def-088f-4d34-bad1-dfa76f78b704,f16f5beb-6a7d-4493-a19e-a30dbbd206e9,Customer 4,Birthday Party,07-06-2025,Pending,,,12:00,90.0,203.2,Play Area,,,Play Area,Birthday Party
4,6272b4e7-a508-4ed7-bae0-21f7293287a8,eb297435-93d1-4e65-8dd4-6450922305cb,Customer 5,Class,13-04-2025,Pending,Art,,15:00,120.0,161.14,,,,Art,Class


## Identify Missing Data Points per Column
This code calculates the total number of missing values (NaNs) in each column of the DataFrame using the following:
- **`df.isnull()`**: Checks for missing values in the DataFrame (returns `True` for NaNs).
- **`.sum()`**: Sums up the missing values for each column.
The output will be a Series where the index represents the column names and the values represent the count of missing data points.

In [16]:
# get the number of missing data points per column
missing_values_count = df.isnull().sum()
missing_values_count

Booking ID              0
Customer ID             0
Customer Name           0
Booking Type            0
Booking Date            0
Status                  0
Class Type            672
Instructor            730
Time Slot             205
Duration (mins)       205
Price                   0
Facility              328
Theme                 727
Subscription Type    1000
Service Name            0
Service Type            0
Customer_ID_Valid       0
Booking_ID_Valid        0
dtype: int64

## Output: Missing Values Count
The output displays the number of missing data points for each column. Columns without any missing values will have a count of `0`.
This information can be used to assess data quality and decide how to handle missing values in the dataset.

## UUID Validation for Customer ID and Booking ID

### Objective:
1. **Validate Columns:** Check if values in the `Customer ID` and `Booking ID` columns match the UUID format.
2. **Create Validity Columns:** Add two new columns (`Customer_ID_Valid` and `Booking_ID_Valid`) to indicate whether the respective values are valid UUIDs.
3. **Identify Invalid Rows:** Extract all rows where either the `Customer ID` or `Booking ID` is invalid.

### Steps:
1. **Regex for UUID Format**:
   - The UUID format is defined as `xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx` (where `x` is a hexadecimal digit, and `M` & `N` follow certain rules).
2. **Validation Logic**:
   - For each row, check the format of `Customer ID` and `Booking ID` using the regex.
   - Append the results as new boolean columns (`True` for valid, `False` for invalid).
3. **Filter Invalid Rows**:
   - Extract rows where either `Customer ID` or `Booking ID` is invalid.

In [15]:
import re

# Define the UUID regex pattern
uuid_pattern = r'^[a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}$'

# Check if values match the UUID format
for column in ['Customer ID', 'Booking ID']:
    # Create a new column to store validity check
    valid_column_name = f'{column.replace(" ", "_")}_Valid'
    df[valid_column_name] = df[column].apply(lambda x: bool(re.match(uuid_pattern, str(x))))

# View rows where either Customer ID or Booking ID is invalid
invalid_rows = df[(~df['Customer_ID_Valid']) | (~df['Booking_ID_Valid'])]
invalid_rows

Unnamed: 0,Booking ID,Customer ID,Customer Name,Booking Type,Booking Date,Status,Class Type,Instructor,Time Slot,Duration (mins),Price,Facility,Theme,Subscription Type,Service Name,Service Type,Customer_ID_Valid,Booking_ID_Valid


## Results
The extracted rows represent instances where either the `Customer ID` or `Booking ID` does not match the UUID format. This process helps ensure data integrity by identifying and handling invalid entries.



## Filter and Analyze Rows with Booking Type = "Birthday Party"
### Objective:
1. Retrieve all rows where the **Booking Type** is `"Birthday Party"`,`"Class"` `"Facility"`.
2. Check for any missing values (`NaN`) in these rows and ensure the dataset is complete.
3. Apply logical or random assignments to fill missing values where possible.

### Code Steps:
1. **Filter Rows**:
   - Use the `df.loc[]` function to select rows where the `Booking Type` equals `"Birthday Party"`.
   - Retrieve all columns to inspect the data.


In [17]:
# Display all columns for rows with Booking Type "Birthday Party"
birthday_party_rows = df.loc[df['Booking Type'] == 'Birthday Party']

birthday_party_rows

Unnamed: 0,Booking ID,Customer ID,Customer Name,Booking Type,Booking Date,Status,Class Type,Instructor,Time Slot,Duration (mins),Price,Facility,Theme,Subscription Type,Service Name,Service Type,Customer_ID_Valid,Booking_ID_Valid
1,415bfcbe-1a2e-4d4b-809a-4c5b606653b1,b82db986-bd52-4b07-bdd8-aa8cf2016241,Customer 2,Birthday Party,29-05-2025,Pending,,,,,182.06,Party Room,Superhero,,Party Room,Birthday Party,True,True
2,2100024b-46fc-47b5-ac1c-047d007a4723,6bbb6e83-9577-4f64-80b0-f073132d18f3,Customer 3,Birthday Party,09-05-2025,Confirmed,,,11:00,120.0,207.50,Play Area,,,Play Area,Facility,True,True
3,74936def-088f-4d34-bad1-dfa76f78b704,f16f5beb-6a7d-4493-a19e-a30dbbd206e9,Customer 4,Birthday Party,07-06-2025,Pending,,,12:00,90.0,203.20,Play Area,,,Play Area,Birthday Party,True,True
5,6fe4a4d1-8533-4981-acd5-860b9c381639,7d3d0421-66b7-434a-9357-ea2d222deab4,Customer 6,Birthday Party,10-05-2025,Confirmed,,,11:00,45.0,278.07,Party Room,Sports,,Party Room,Facility,True,True
7,8f0f2078-cbdf-48ca-81a9-c3fd6adae8ee,7fe06e42-caf2-4ac0-ae54-fb43e5c0c4d5,Customer 8,Birthday Party,23-04-2025,Confirmed,,,,,282.96,Party Room,Princess,,Party Room,Facility,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
982,3190e834-1099-412f-84d7-877a1664293f,858320b1-a397-46da-a6d6-590e0dab7e01,Customer 983,Birthday Party,02-06-2025,Confirmed,,,15:00,120.0,180.62,Party Room,Superhero,,Party Room,Class,True,True
983,69e74626-28eb-435a-839a-85c8292b0bda,37adf3a5-63d9-47d5-bd7a-4aaf374579db,Customer 984,Birthday Party,30-04-2025,Pending,,,09:00,90.0,37.31,Play Area,Superhero,,Play Area,Facility,True,True
989,31628d7e-4822-4baf-8977-3532bd115868,4cfe5d15-7435-4320-836d-a221ab454a82,Customer 990,Birthday Party,06-06-2025,Confirmed,,,15:00,45.0,212.87,Play Area,Princess,,Play Area,Facility,True,True
990,b9c20893-853f-4f62-91f6-ca09e484cf31,d0486a55-d60f-4fc9-baea-58aaf6a46fb9,Customer 991,Birthday Party,17-04-2025,Confirmed,,,15:00,45.0,231.22,Party Room,,,Party Room,Birthday Party,True,True


In [18]:
# Display all columns for rows with Booking Type "Class"
Class_rows = df.loc[df['Booking Type'] == 'Class']
Class_rows

Unnamed: 0,Booking ID,Customer ID,Customer Name,Booking Type,Booking Date,Status,Class Type,Instructor,Time Slot,Duration (mins),Price,Facility,Theme,Subscription Type,Service Name,Service Type,Customer_ID_Valid,Booking_ID_Valid
4,6272b4e7-a508-4ed7-bae0-21f7293287a8,eb297435-93d1-4e65-8dd4-6450922305cb,Customer 5,Class,13-04-2025,Pending,Art,,15:00,120.0,161.14,,,,Art,Class,True,True
11,fb5092f9-f7ac-4144-b477-0650c88134a4,22addcce-0a6c-4be1-b7ec-11320beb6739,Customer 12,Class,20-04-2025,Pending,Dance,,13:00,45.0,186.42,,,,Dance,Class,True,True
13,74b8102e-eef6-4774-a134-5a3d0d012c3d,6a0d66d3-24b1-4eb5-8f6f-b9893eb57701,Customer 14,Class,04-06-2025,Pending,Art,Amanda Davis,16:00,120.0,83.48,,,,Art,Class,True,True
15,e8e2f341-6a4f-43b5-aee5-0b35878474cd,489a4949-b903-4f4f-8bdb-17a0a7553d35,Customer 16,Class,10-05-2025,Confirmed,Art,Amanda Davis,10:00,90.0,30.53,,,,Art,Class,True,True
16,9bdfab08-b764-4f66-b295-e61de9f134af,41e8c079-1e43-467a-87dc-ff6b9330695c,Customer 17,Class,26-03-2025,Pending,Gymnastics,Amanda Davis,,,0.00,,,,Gymnastics,Class,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
991,51e641d7-b9e8-4bbe-bb4c-31ee1890141e,c09799ac-517b-40b5-a5ae-20395b8ce9e7,Customer 992,Class,21-05-2025,Confirmed,Dance,James Howard,,,236.52,,,,Dance,Class,True,True
992,47ed703d-5a43-4155-825f-6713a636877d,ba8b4ad5-9a21-41a3-8a43-c16a0b51eaae,Customer 993,Class,10-06-2025,Pending,Gymnastics,Lisa Hensley,12:00,45.0,120.10,,,,Gymnastics,Facility,True,True
993,73d48a88-6115-4fed-9b4e-a42a9d392713,b694aefb-8cbf-47fe-8d5b-aa71ce08fb59,Customer 994,Class,25-05-2025,Pending,Art,Lisa Hensley,13:00,90.0,0.00,,,,Art,Class,True,True
994,8ae357a5-5860-4a35-bd7a-8fc56e67f163,c190430a-ebf4-4951-8899-4faa1170de75,Customer 995,Class,17-05-2025,Pending,Gymnastics,Lisa Hensley,16:00,45.0,141.70,,,,Gymnastics,Class,True,True


In [19]:
# Display all columns for rows with Booking Type "Facility"
Facility_rows = df.loc[df['Booking Type'] == 'Facility']
Facility_rows

Unnamed: 0,Booking ID,Customer ID,Customer Name,Booking Type,Booking Date,Status,Class Type,Instructor,Time Slot,Duration (mins),Price,Facility,Theme,Subscription Type,Service Name,Service Type,Customer_ID_Valid,Booking_ID_Valid
0,279d92c6-ce26-47c0-8915-e45b77fe20e2,00901ce3-3d86-4c97-bca2-40ccac2fb99f,Customer 1,Facility,30-05-2025,Pending,,,10:00,90.0,42.74,Party Room,,,Party Room,Facility,True,True
6,ab8e1bb4-1268-4983-a726-99144a1fcb13,a0e06a2f-fe55-44e2-adbe-c2fa8efe3bf3,Customer 7,Facility,18-05-2025,Pending,,,,,161.82,Party Room,,,Party Room,Class,True,True
9,13e526c2-f8cc-463d-8ce5-2a6652dcc7c0,87ebdecd-4a9d-4a99-84da-fec3e0167563,Customer 10,Facility,02-06-2025,Pending,,,16:00,90.0,44.41,Play Area,,,Play Area,Facility,True,True
14,e6bcd389-c931-41d0-9dae-c3e1015057c0,1a24e417-bfa5-4dd9-b27c-cdc8c9e18b0e,Customer 15,Facility,06-04-2025,Pending,,,10:00,90.0,180.94,Play Area,,,Play Area,Facility,True,True
17,c34016f8-4a55-4b01-b3d0-90799d6f1fbb,90c40046-fa6c-4b7b-9d68-301e17a7faad,Customer 18,Facility,25-04-2025,Confirmed,,,09:00,120.0,0.00,Party Room,,,Party Room,Facility,True,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
986,262f8686-b181-4dfb-b47b-c5df0e13d83c,7d5321fe-ef1d-4219-af80-16e4c708778a,Customer 987,Facility,24-04-2025,Pending,,,09:00,120.0,102.88,Play Area,,,Play Area,Facility,True,True
987,a5030331-95a2-4f92-a325-98ee91761171,3334a11b-c786-4dc6-9380-b490c780907b,Customer 988,Facility,24-05-2025,Pending,,,,,105.40,Play Area,,,Play Area,Birthday Party,True,True
997,c552c1b5-bb2e-46a5-b5c9-43101cf39133,623b4030-64c5-46ae-935f-fd9c3beea824,Customer 998,Facility,04-04-2025,Confirmed,,,17:00,120.0,172.84,Party Room,,,Party Room,Facility,True,True
998,09fb0473-56cc-4624-9c5a-e9cd152acc41,e483e86a-cc2c-4a49-86ef-b7a0c1b6fd7b,Customer 999,Facility,11-05-2025,Pending,,,12:00,120.0,275.18,Party Room,,,Party Room,Facility,True,True


### df.loc[rowselection, Column Selection]:
###### The function in pandas is a powerful tool for selecting and manipulating rows and columns in a DataFrame based on labels or conditions. It is primarily used for label-based indexing but can handle logical conditions as well.

## Observations:
1. Missing values (`NaN`) were found in several columns of the dataset.
2. The goal is to assign appropriate values based on the **Booking Type** to ensure completeness and logical consistency.

---

### Next Steps: Assign Values by Booking Type
#### **Booking Type = Birthday Party**
- **Logic**:
  - `Class Type`: Set to "Party Activity."
  - `Instructor`: Assign "No Instructor" since instructors are not relevant.
  - `Theme`: Fill randomly with options like "Superhero," "Sports," or "Princess."
- **Outcome**:
  - Ensures all rows with "Birthday Party" have consistent and meaningful assignments.

---

#### **Booking Type = Class**
- **Logic**:
  - `Instructor`: Assign randomly from the available list (e.g., "Amanda Davis," "Lisa Hensley," "James Howard").
  - `Facility`: Assign based on the `Class Type`:
    - If `Class Type = "Art"`, assign "Art Room."
    - If `Class Type = "Dance"`, assign "Dance Room."
    - If `Class Type = "Gymnastics"`, assign "Gymnastics Room."
  - `Theme`: Assign a consistent value of "No Theme."
- **Outcome**:
  - Maintains logical connections between `Class Type` and `Facility.`
  - Fills missing instructors randomly for variety.

---

#### **Booking Type = Facility**
- **Logic**:
  - `Class Type`: Set to "Facility Activity."
  - `Instructor`: Assign "No Instructor."
  - `Theme`: Assign a consistent value of "No Theme."
- **Outcome**:
  - Ensures data completeness with predefined placeholder values.

---

### **Purpose**
- These steps guarantee that all missing values are filled logically and consistently based on the context of the **Booking Type.**
- The dataset is now ready for further analysis, reporting, or integration into scheduling or decision-making systems.c


In [20]:
import random


# List of instructors
instructors = df['Instructor'].dropna().unique().tolist() 

# Assign Class Type for Birthday Party
df.loc[df['Booking Type'] == 'Birthday Party', 'Class Type'] = 'Party Activity'

# Assign "No Instructor" for Birthday Party
df.loc[df['Booking Type'] == 'Birthday Party', 'Instructor'] = 'No Instructor'

# Assign random Theme for Birthday Party
themes = df['Theme'].dropna().unique().tolist()
df.loc[df['Booking Type'] == 'Birthday Party', 'Theme'] = df['Theme'].apply(
    lambda x: random.choice(themes)
)



# Assign values based on Booking Type = Class
df.loc[df['Booking Type'] == 'Class', 'Instructor'] = df['Instructor'].apply(
    lambda x: random.choice(instructors)
)

df.loc[df['Booking Type'] == 'Class', 'Facility'] = df.apply(
    lambda x: 'Art Room' if x['Class Type'] == 'Art' 
    else 'Dance Room' if x['Class Type'] == 'Dance' 
    else 'Gymnastics Room' if x['Class Type'] == 'Gymnastics' 
    else x['Facility'], 
    axis=1
)

df.loc[df['Booking Type'] == 'Class', 'Theme'] = 'No Theme'

# Assign values for Booking Type = Facility
df.loc[df['Booking Type'] == 'Facility', 'Class Type'] = 'Facility Activity'
df.loc[df['Booking Type'] == 'Facility', 'Instructor'] = 'No Instructor'
df.loc[df['Booking Type'] == 'Facility', 'Theme'] = 'No Theme'

# Check the result
df.head()

Unnamed: 0,Booking ID,Customer ID,Customer Name,Booking Type,Booking Date,Status,Class Type,Instructor,Time Slot,Duration (mins),Price,Facility,Theme,Subscription Type,Service Name,Service Type,Customer_ID_Valid,Booking_ID_Valid
0,279d92c6-ce26-47c0-8915-e45b77fe20e2,00901ce3-3d86-4c97-bca2-40ccac2fb99f,Customer 1,Facility,30-05-2025,Pending,Facility Activity,No Instructor,10:00,90.0,42.74,Party Room,No Theme,,Party Room,Facility,True,True
1,415bfcbe-1a2e-4d4b-809a-4c5b606653b1,b82db986-bd52-4b07-bdd8-aa8cf2016241,Customer 2,Birthday Party,29-05-2025,Pending,Party Activity,No Instructor,,,182.06,Party Room,Superhero,,Party Room,Birthday Party,True,True
2,2100024b-46fc-47b5-ac1c-047d007a4723,6bbb6e83-9577-4f64-80b0-f073132d18f3,Customer 3,Birthday Party,09-05-2025,Confirmed,Party Activity,No Instructor,11:00,120.0,207.5,Play Area,Sports,,Play Area,Facility,True,True
3,74936def-088f-4d34-bad1-dfa76f78b704,f16f5beb-6a7d-4493-a19e-a30dbbd206e9,Customer 4,Birthday Party,07-06-2025,Pending,Party Activity,No Instructor,12:00,90.0,203.2,Play Area,Sports,,Play Area,Birthday Party,True,True
4,6272b4e7-a508-4ed7-bae0-21f7293287a8,eb297435-93d1-4e65-8dd4-6450922305cb,Customer 5,Class,13-04-2025,Pending,Art,Lisa Hensley,15:00,120.0,161.14,Art Room,No Theme,,Art,Class,True,True


## Results:
The dataset has been successfully updated based on the conditions for each Booking Type. Below is a summary of the changes applied:

### **1. Booking Type = "Birthday Party"**
- `Class Type`: All rows are now updated with the value `"Party Activity"`.
- `Instructor`: All rows now have `"No Instructor"` as expected for this Booking Type.
- `Theme`: Missing values (`NaN`) are replaced with random values from the list: `["Superhero", "Sports", "Princess"]`.

### **2. Booking Type = "Class"**
- `Instructor`: Each row has been assigned a random instructor (e.g., "Amanda Davis," "Lisa Hensley," "James Howard") from the list of available instructors.
- `Facility`: The Facility column now matches the Class Type:
  - `"Art"` → `"Art Room"`
  - `"Dance"` → `"Dance Room"`
  - `"Gymnastics"` → `"Gymnastics Room"`
- `Theme`: All rows are set to `"No Theme"` for consistency.

### **3. Booking Type = "Facility"**
- `Class Type`: All rows are updated with the value `"Facility Activity"`.
- `Instructor`: All rows now have `"No Instructor"` since instructors are irrelevant for Facility bookings.
- `Theme`: All rows are set to `"No Theme"` for consistency.

---

### Observations:
1. **No Missing Values (`NaN`)**:
   - All columns have been successfully filled based on the defined logic.
2. **Logical Consistency**:
   - Assignments follow the logic and conditions specified for each Booking Type.
   - Ensures meaningful and accurate data representation.

---

### Example Updated Dataset (Few Rows):
| Booking Type      | Class Type         | Instructor    | Facility          | Theme        |
|-------------------|--------------------|---------------|-------------------|--------------|
| Birthday Party    | Party Activity     | No Instructor | Party Room        | Superhero    |
| Class             | Art                | Lisa Hensley  | Art Room          | No Theme     |
| Facility          | Facility Activity  | No Instructor | Party Area        | No Theme     |
| Class             | Gymnastics         | James Howard  | Gymnastics Room   | No Theme     |
| Birthday Party    | Party Activity     | No Instructor | Party Room        | Princess     |

---


## Displaying Data Types of Columns
### Objective:
- Use `df.dtypes` to display the data types of all columns in the DataFrame.
- This is useful for understanding the structure of your dataset and identifying the types of data you're working with (e.g., `int64`, `float64`, `object`).

### Why It's Important:
1. **Data Validation**: Ensures each column has the expected data type.
2. **Processing Preparation**: Helps in deciding appropriate operations (e.g., mathematical operations on numerical columns, string manipulation on object columns).

In [21]:
df.dtypes

Booking ID            object
Customer ID           object
Customer Name         object
Booking Type          object
Booking Date          object
Status                object
Class Type            object
Instructor            object
Time Slot             object
Duration (mins)      float64
Price                float64
Facility              object
Theme                 object
Subscription Type    float64
Service Name          object
Service Type          object
Customer_ID_Valid       bool
Booking_ID_Valid        bool
dtype: object

## Output Explanation
- The output displays each column's name along with its data type.
- Common data types include:
  - `int64` for integers,
  - `float64` for decimal numbers,
  - `object` for text or mixed data types,
  - `datetime64` for date/time values.
  
### Next Steps:
- If the data types don't match your expectations (e.g., a numeric column mistakenly identified as `object`), you can use type conversions to fix them (e.g., `astype()` for conversions).

## Filling Missing Values Using the Mode for "Time Slot" and "Duration (mins)"
### Objective:
- Replace missing values (`NaN`) in the columns `Time Slot` and `Duration (mins)` with their respective mode (most frequently occurring value).
- This ensures that the dataset is complete and consistent, particularly for categorical or frequently repeated numerical data.

### Steps:
1. **Find the Mode**:
   - Use the `.mode()` function to find the most common value in each column.
   - The `[0]` index extracts the first mode value (in case of multiple modes).
2. **Fill Missing Values**:
   - Use `.fillna()` to replace the `NaN` values in the respective columns with their mode.

In [22]:
# Fill missing values in "Time Slot" with its mode
mode_value = df['Time Slot'].mode()[0]
df['Time Slot'] = df['Time Slot'].fillna(mode_value)

# Fill missing values in "Duration (mins)" with its mode
mode_value = df['Duration (mins)'].mode()[0]
df['Duration (mins)'] = df['Duration (mins)'].fillna(mode_value)


## Output Explanation
- **Time Slot**: Missing values have been replaced with the most frequently occurring value in the column.
- **Duration (mins)**: Similarly, missing durations have been replaced with their most frequent value.
- This approach is ideal for categorical columns (like time slots) or numerical columns (like durations) where the most common value logically represents the missing data.

### Benefits:
- Ensures no missing values remain, reducing the risk of errors during subsequent processing.
- Retains consistency by using the most frequently occurring values.

## Data Type Conversion for "Booking Date" and "Duration (mins)"
### Objective:
1. **Convert "Booking Date"**:
   - Ensure that the `Booking Date` column is properly formatted as `datetime`.
   - Use `dayfirst=True` to interpret dates in the `DD/MM/YYYY` format.
   - Use `errors='coerce'` to handle invalid dates by replacing them with `NaT` (Not a Time).
2. **Convert "Duration (mins)"**:
   - Change the data type from `float` to `int` for durations to ensure consistent numeric formatting.
3. **Validate Data Types**:
   - Display the data types of all columns using `df.dtypes` to confirm successful conversions.

In [23]:
df['Booking Date'] = pd.to_datetime(df['Booking Date'],dayfirst=True, errors='coerce')
df['Duration (mins)'] = df['Duration (mins)'].astype(int)
df.dtypes

Booking ID                   object
Customer ID                  object
Customer Name                object
Booking Type                 object
Booking Date         datetime64[ns]
Status                       object
Class Type                   object
Instructor                   object
Time Slot                    object
Duration (mins)               int32
Price                       float64
Facility                     object
Theme                        object
Subscription Type           float64
Service Name                 object
Service Type                 object
Customer_ID_Valid              bool
Booking_ID_Valid               bool
dtype: object

## Objective:
To clean the dataset by removing columns that contain null values (`NaN`) and ensure the dataset is more focused and concise.

---

## Implementation Details:
1. **Remove Columns with All Null Values**:
   - Use `dropna(axis=1, how='all')` to drop columns where **all** values are `NaN`.
   - This retains columns that have at least one non-null value.


In [25]:
# Remove columns with any null values
df = df.dropna(axis=1, how='any')

In [26]:
#Removing Irrelevant Columns
df.drop(columns=['Customer_ID_Valid', 'Booking_ID_Valid'],inplace=True)

In [27]:
df.columns

Index(['Booking ID', 'Customer ID', 'Customer Name', 'Booking Type',
       'Booking Date', 'Status', 'Class Type', 'Instructor', 'Time Slot',
       'Duration (mins)', 'Price', 'Facility', 'Theme', 'Service Name',
       'Service Type'],
      dtype='object')

In [28]:
# get the number of missing data points per column
missing_values_count = df.isnull().sum()
missing_values_count

Booking ID         0
Customer ID        0
Customer Name      0
Booking Type       0
Booking Date       0
Status             0
Class Type         0
Instructor         0
Time Slot          0
Duration (mins)    0
Price              0
Facility           0
Theme              0
Service Name       0
Service Type       0
dtype: int64

In [29]:
df.to_csv('corrected_file.csv', index=False)