📒 Myanmar SMEs Sample Dataset Analysis (Test_Version)
1. Introduction

This notebook explores and analyzes the Myanmar SMEs dataset.
We will:

Load and inspect the dataset

Check data quality (missing values, duplicates, column types)

Clean and preprocess data

Perform exploratory analysis

Visualize some interesting insights

In [1]:
# Core Libraries
import pandas as pd
import numpy as np


For Visululization We need the following packages
**matplotlib.pyplot**
**seaborn**

In [1]:
import matplotlib.pyplot as plt
import seaborn as sns

To get the better visuals we will use these 
plt.style.use("seaborn-v0_8")
sns.set_palette("Set2")

In [2]:
# Settings for better visuals
plt.style.use("seaborn-v0_8")
sns.set_palette("Set2")

Now Let's see the Myanmar SMEs dataset

In [10]:
# Load CSV file
file_path = "Myanmar SMEs Listsed.csv"  # update path if needed
df = pd.read_csv(file_path)

# Display basic info
df.head()


Unnamed: 0,Business ID,Business Name,Industry Sector,Owner Name,Gender,Position,Contact Number,Email Address,Region or State,Business Size,Registration Status,Number of Employees,Revenue (Annual),Customer Base,Social Media Presence
0,1,Aye Myanmar Trading,Retail,Aye Aye,Female,Owner,09-123456789,aye@example.com,Yangon,Small,Registered,15,50000,Retail Shops,Yes
1,2,Golden Harvest,Agriculture,Tun Tun,Male,Manager,09-987654321,tun@example.com,Mandalay,Medium,Not Registered,50,200000,Farmers,No
2,3,Shwe Min Co.,Manufacturing,Min Min,Male,Owner,09-234567890,min@example.com,Bago,Small,Registered,10,30000,Wholesalers,Yes
3,4,Evergreen Foods,Food Processing,Hla Hla,Female,Owner,09-345678901,hla@example.com,Yangon,Medium,Not Registered,30,150000,Supermarkets,No
4,5,Royal Rubber,Rubber,Soe Moe,Male,Manager,09-456789012,soe@example.com,Mon,Micro,Registered,5,10000,Exporters,Yes


Now is time to basic dataset information get to know


In [11]:
# Shape
print("Rows, Columns:", df.shape)

# Info (column types, nulls, etc.)
df.info()

# Summary statistics
df.describe(include="all").T


Rows, Columns: (20, 15)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype 
---  ------                 --------------  ----- 
 0   Business ID            20 non-null     int64 
 1   Business Name          20 non-null     object
 2   Industry Sector        20 non-null     object
 3   Owner Name             20 non-null     object
 4   Gender                 18 non-null     object
 5   Position               20 non-null     object
 6   Contact Number         20 non-null     object
 7   Email Address          19 non-null     object
 8   Region or State        20 non-null     object
 9   Business Size          20 non-null     object
 10  Registration Status    20 non-null     object
 11  Number of Employees    20 non-null     int64 
 12  Revenue (Annual)       20 non-null     int64 
 13  Customer Base          20 non-null     object
 14  Social Media Presence  20 non-null     object
dtypes

Unnamed: 0,count,unique,top,freq,mean,std,min,25%,50%,75%,max
Business ID,20.0,,,,10.5,5.91608,1.0,5.75,10.5,15.25,20.0
Business Name,20.0,11.0,Aye Myanmar Trading,2.0,,,,,,,
Industry Sector,20.0,9.0,Agriculture,4.0,,,,,,,
Owner Name,20.0,10.0,Aye Aye,2.0,,,,,,,
Gender,18.0,2.0,Female,9.0,,,,,,,
Position,20.0,2.0,Owner,12.0,,,,,,,
Contact Number,20.0,10.0,09-123456789,2.0,,,,,,,
Email Address,19.0,10.0,aye@example.com,2.0,,,,,,,
Region or State,20.0,7.0,Yangon,6.0,,,,,,,
Business Size,20.0,3.0,Small,8.0,,,,,,,


Now is time to Data Cleaning


In [12]:
# Check missing values
print("Missing values per column:")
print(df.isnull().sum())

# Drop duplicates if any
df = df.drop_duplicates()

# Handle missing values (example: fill with 'Unknown' for categorical, mean for numeric)
for col in df.columns:
    if df[col].dtype == "object":
        df[col] = df[col].fillna("Unknown")
    else:
        df[col] = df[col].fillna(df[col].mean())

print("After cleaning:")
print(df.isnull().sum())


Missing values per column:
Business ID              0
Business Name            0
Industry Sector          0
Owner Name               0
Gender                   2
Position                 0
Contact Number           0
Email Address            1
Region or State          0
Business Size            0
Registration Status      0
Number of Employees      0
Revenue (Annual)         0
Customer Base            0
Social Media Presence    0
dtype: int64
After cleaning:
Business ID              0
Business Name            0
Industry Sector          0
Owner Name               0
Gender                   0
Position                 0
Contact Number           0
Email Address            0
Region or State          0
Business Size            0
Registration Status      0
Number of Employees      0
Revenue (Annual)         0
Customer Base            0
Social Media Presence    0
dtype: int64


6. Exploratory Data Analysis (EDA)
6.1 Distribution of SMEs by Region/State

In [16]:
if "Business Type" in df.columns:
    plt.figure(figsize=(12,6))
    sns.countplot(data=df, x="Business Type", order=df["Business Type"].value_counts().index)
    plt.xticks(rotation=45)
    plt.title("Distribution of SMEs by Business Type")
    plt.show()


Now is time for you dude!!!