# Exploratory Data Analysis (EDA) for Interview Preparation

This notebook provides a comprehensive EDA template to answer common interview questions about a dataset, including shape, size, statistics, missing values, and more.

## 1. Import Required Libraries
Import pandas, numpy, and any other libraries needed for data analysis.

In [2]:
import pandas as pd
import numpy as np
# import matplotlib.pyplot as plt  # Uncomment if you want to visualize
df = None  # Placeholder for the DataFrame

## 2. Load the Dataset
Read the dataset into a pandas DataFrame. Update the path as needed.

In [3]:
# Update the file path as needed
dataset_path = '../bookings.xlsx'  # or '../revenue.xlsx' or your own file
try:
    df = pd.read_excel(dataset_path)
    print(f"Loaded dataset with shape: {df.shape}")
except Exception as e:
    print(f"Error loading dataset: {e}")

Loaded dataset with shape: (676, 32)


## 3. View Dataset Shape and Size
Display the number of rows and columns, and the total number of elements.

In [4]:
if df is not None:
    print(f"Rows: {df.shape[0]}, Columns: {df.shape[1]}")
    print(f"Total elements: {df.size}")

Rows: 676, Columns: 32
Total elements: 21632


## 4. Display Data Types and Null Values
Show the data types of each column and count the number of missing values per column.

In [5]:
if df is not None:
    print("\nData types:")
    print(df.dtypes)
    print("\nMissing values per column:")
    print(df.isnull().sum())


Data types:
product_name                      object
property_name                     object
enterprise_name                   object
frequency                        float64
price                            float64
calculated_price                 float64
booking_start_time        datetime64[ns]
booking_end_time          datetime64[ns]
duration                          object
duration_in_mins                   int64
property_timezone                 object
discount_code                    float64
created_at                        object
updated_at                        object
is_deleted                          bool
tax                                int64
stripe_charge_id                 float64
refunded                         float64
session_processed                   bool
tax_rate_id                      float64
currency                          object
product_type                      object
creator_id                        object
payment_reference_id             float64
cou

## 5. Show Summary Statistics
Display summary statistics for numerical columns.

In [6]:
if df is not None:
    print("\nSummary statistics:")
    display(df.describe(include='all'))


Summary statistics:


Unnamed: 0,product_name,property_name,enterprise_name,frequency,price,calculated_price,booking_start_time,booking_end_time,duration,duration_in_mins,...,creator_id,payment_reference_id,coupon_discount_amount,payment_origin,stripe_account_id,failed_transaction,wallet_credits_used,payment_gateway,surcharge,status
count,676,676,676,0.0,507.0,507.0,676,676,676,676.0,...,676,0.0,676.0,0.0,0.0,0.0,676.0,0.0,676.0,342.0
unique,41,8,66,,,,,,24,,...,91,,,,,,,,,
top,Rooftop Area (BBQ & Seating),American Realty Inc.,High Folio,,,,,,00:30:00,,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,,,,,,,,
freq,156,455,451,,,,,,376,,...,378,,,,,,,,,
mean,,,,,143.537387,143.537387,2023-12-18 23:31:03.905325312,2023-12-19 00:54:01.863905280,,82.965976,...,,,0.0,,,,0.0,,0.0,0.947368
min,,,,,0.0,0.0,2022-08-31 12:00:00,2022-08-31 12:30:00,,15.0,...,,,0.0,,,,0.0,,0.0,0.0
25%,,,,,0.0,0.0,2023-03-27 14:37:30,2023-03-27 16:07:30,,30.0,...,,,0.0,,,,0.0,,0.0,1.0
50%,,,,,50.0,50.0,2024-02-02 05:30:00,2024-02-02 06:00:00,,30.0,...,,,0.0,,,,0.0,,0.0,1.0
75%,,,,,50.25,50.25,2024-06-12 09:30:00,2024-06-12 10:00:00,,60.0,...,,,0.0,,,,0.0,,0.0,1.0
max,,,,,11500.0,11500.0,2025-09-13 10:00:00,2025-09-13 11:00:00,,1425.0,...,,,0.0,,,,0.0,,0.0,1.0


## 6. Find Maximum, Minimum, and Unique Values
For each column, display the maximum, minimum, and number of unique values.

In [7]:
if df is not None:
    for col in df.columns:
        print(f"\nColumn: {col}")
        print(f"  Max: {df[col].max()}")
        print(f"  Min: {df[col].min()}")
        print(f"  Unique values: {df[col].nunique()}")


Column: product_name
  Max: test meetingroom 27-01
  Min: Affordable Bookable Space
  Unique values: 41

Column: property_name
  Max: Mohit Property
  Min: American Realty Inc.
  Unique values: 8

Column: enterprise_name
  Max: tet test
  Min: 08112023_Enterprise
  Unique values: 66

Column: frequency
  Max: nan
  Min: nan
  Unique values: 0

Column: price
  Max: 11500.0
  Min: 0.0
  Unique values: 51

Column: calculated_price
  Max: 11500.0
  Min: 0.0
  Unique values: 51

Column: booking_start_time
  Max: 2025-09-13 10:00:00
  Min: 2022-08-31 12:00:00
  Unique values: 622

Column: booking_end_time
  Max: 2025-09-13 11:00:00
  Min: 2022-08-31 12:30:00
  Unique values: 631

Column: duration
  Max: 23:45:00
  Min: 00:15:00
  Unique values: 24

Column: duration_in_mins
  Max: 1425
  Min: 15
  Unique values: 24

Column: property_timezone
  Max: Australia/Perth
  Min: Africa/Abidjan
  Unique values: 5

Column: discount_code
  Max: nan
  Min: nan
  Unique values: 0

Column: created_at
  Max

## 7. Check for Duplicates
Check if there are any duplicate rows in the dataset and count them.

In [8]:
if df is not None:
    duplicate_count = df.duplicated().sum()
    print(f"Number of duplicate rows: {duplicate_count}")

Number of duplicate rows: 0


## 8. Preview Sample Rows
Display the first few and last few rows of the dataset.

In [9]:
if df is not None:
    print("\nFirst 5 rows:")
    display(df.head())
    print("\nLast 5 rows:")
    display(df.tail())


First 5 rows:


Unnamed: 0,product_name,property_name,enterprise_name,frequency,price,calculated_price,booking_start_time,booking_end_time,duration,duration_in_mins,...,creator_id,payment_reference_id,coupon_discount_amount,payment_origin,stripe_account_id,failed_transaction,wallet_credits_used,payment_gateway,surcharge,status
0,Affordable Bookable Space,Beachfront Realty,High Folio,,0.0,0.0,2024-04-16 13:00:00,2024-04-16 13:30:00,00:30:00,30,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,0,,,,0,,0,
1,Multifunctional and Training Room,American Realty Inc.,Checkout User,,100.0,100.0,2024-06-24 03:30:00,2024-06-24 06:30:00,03:00:00,180,...,9edbc6dd-07a6-4f2d-bf46-8a49c93d9386,,0,,,,0,,0,1.0
2,meeting room13,Australia Realty Inc.,Zahir Shaikh16,,10.0,10.0,2025-03-15 09:00:00,2025-03-15 10:00:00,01:00:00,60,...,eb0cd214-a434-470f-a9e9-45036b680f8f,,0,,,,0,,0,1.0
3,Company Invoice Multifunctional and Training Room,American Realty Inc.,High Folio,,,,2024-06-24 13:00:00,2024-06-24 14:00:00,01:00:00,60,...,228b8380-0ed8-43a8-b9bd-bc53f7ed7a10,,0,,,,0,,0,
4,Rooftop Area (BBQ & Seating),American Realty Inc.,High Folio,,,,2022-11-14 06:00:00,2022-11-14 06:30:00,00:30:00,30,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,0,,,,0,,0,



Last 5 rows:


Unnamed: 0,product_name,property_name,enterprise_name,frequency,price,calculated_price,booking_start_time,booking_end_time,duration,duration_in_mins,...,creator_id,payment_reference_id,coupon_discount_amount,payment_origin,stripe_account_id,failed_transaction,wallet_credits_used,payment_gateway,surcharge,status
671,meeting room10-09 test,GBP test,High Folio,,0.0,0.0,2024-10-11 10:00:00,2024-10-11 11:00:00,01:00:00,60,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,0,,,,0,,0,
672,Multifunctional and Training Room,American Realty Inc.,High Folio,,100.0,100.0,2022-12-01 04:00:00,2022-12-01 05:00:00,01:00:00,60,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,0,,,,0,,0,1.0
673,Multifunctional and Training Room,American Realty Inc.,High Folio,,200.0,200.0,2024-08-16 04:00:00,2024-08-16 05:00:00,01:00:00,60,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,0,,,,0,,0,1.0
674,Rooftop Area (BBQ & Seating),American Realty Inc.,High Folio,,,,2023-02-15 11:30:00,2023-02-15 11:45:00,00:15:00,15,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,0,,,,0,,0,
675,test meetingroom 27-01,Beachfront Realty,High Folio,,,,2025-01-27 09:00:00,2025-01-27 09:30:00,00:30:00,30,...,ee97bd55-05aa-4b5c-879d-92465d525d04,,0,,,,0,,0,
