# Dataset Code Generation
- Dataset for DLS-CSB's List of Accredited Student Organizations and Recognized Student Groups

**References Used:**
- De La Salle-College of Saint Benilde. (n.d.). *De La Salle-College of Saint Benilde Center for Student Life List of Accredited Student Organizations and Recognized Student Groups Academic Year 2024-2025*. https://www.benilde.edu.ph/wp-content/uploads/2024/11/List-of-Accredited-Student-Org-and-Recognized-Student-Groups-AY-24-25.docx.pdf
- De La Salle-College of Saint Benilde. (2019). *LIST OF DEGREE PROGRAMS AND DEGREE CODES*. https://apps1.benilde.edu.ph/Apply/docs/Degree%20Programs%20and%20Degree%20Codes.pdf
- DLSCSB Official. (2021). *List of Programs*. Benilde. https://archive.benilde.edu.ph/admissions/list-of-programs/

In [1]:
import pandas as pd
import numpy as np
import random
from datetime import datetime, timedelta

# Set seed for reproducibility
random.seed(42)
np.random.seed(42)

# 🎓 Distribution for 5000 students
student_distribution = {
    '125': 1250,  # Batch 2025 - First years  
    '124': 1200,  # Batch 2024 - Second years
    '123': 1050,  # Batch 2023 - Third years
    '122': 900,   # Batch 2022 - Fourth years
    '121': 600    # Batch 2021 - Fifth years and above
}

# Complete List of Benilde Courses by School
courses = [
    # School of Deaf Education and Applied Studies
    "Bachelor in Applied Deaf Studies (BAPDST)",
    "Bachelor in Sign Language Interpretation (BSLI)",
    
    # School of Arts, Culture, and Performance
    "AB in Creative Industries Management (ABCIM)",
    "Bachelor of Performing Arts Major in Dance (BPAD)",
    "BFA in Culture-Based Arts (BFA CBA)",
    "AB in Music Production (ABMP)",
    "AB in Production Design (ABPRD)",
    "AB in Theater Arts (ABTHA)",
    
    # School of Environment Design
    "BS in Architecture (BS-ARCH)",
    "AB in Fashion Design and Merchandising (AB-FDM)",
    "BS in Industrial Design (BS-ID)",
    "BS in Interior Design (BS-IND)",
    
    # School of New Media Arts
    "Associates in Animation (AiA)",
    "AB in Animation (ABANI)",
    "AB in Film (ABFILM)",
    "AB in Multimedia Arts (ABMMA)",
    "AB in Photography (ABPHOTO)",
    
    # School of Diplomacy and Governance
    "AB in Diplomacy and International Affairs (AB-DIA)",
    "AB in Governance and Public Affairs (AB-GPA)",
    
    # School of Hotel, Restaurant, and Institution Management
    "BS in Culinary Arts Management (BS-CAM)",
    "BS in Hospitality and Luxury Management (BS-HLM)",
    "BS in International Hospitality Management (BS-IHM)",
    "BS in Tourism Management (BSTM)",
    
    # School of Management and Information Technology - LME Cluster
    "BSBA major in Business Management (BSBA-BM)",
    "BSBA major in Export and Global Business Management (BSBA-EGBM)",
    "BSBA major in Human Resource Management (BSBA-HRM)",
    "BSBA major in Marketing Management (BSBA-MM)",
    "BS in Real Estate Management (BS-REM)",
    "BS in Social Innovation and Entrepreneurship (BS-SIE)",
    
    # School of Management and Information Technology - ACI Cluster
    "BSBA in Business Intelligence and Analytics (BSBA-BIA)",
    "BSBA major in Business Solutions and Applications (BSBA-BSAA)",
    "BS in Cybersecurity (BSCSEC)",
    "BS in Information Systems (BS-IS)",
    "BS in Game Design and Development (BS-GDD)",
    
    # School of Multidisciplinary Studies
    "Bachelor in Holistic Disciplines (B-HOLD)"
]

# Student Organizations from the PDF
organizations = {
    # Volunteer Groups
    "Kaagapay Volunteers Group (KVG)": "Volunteer Group",
    "Best Buddies": "Volunteer Group", 
    "Center for Lasallian Ministry-Student Ministers": "Volunteer Group",
    
    # Athletic Teams
    "Athletics-Men": "Athletic Team",
    "Badminton-Men": "Athletic Team",
    "Badminton-Women": "Athletic Team",
    "Basketball-Men": "Athletic Team",
    "Beach Volleyball-Men": "Athletic Team",
    "Beach Volleyball-Women": "Athletic Team",
    "Benilde Golf Team": "Athletic Team",
    "Chess": "Athletic Team",
    "Football-Men": "Athletic Team",
    "Lawn Tennis-Men": "Athletic Team",
    "Lawn Tennis-Women": "Athletic Team",
    "Pep Squad": "Athletic Team",
    "Soft Tennis-Men": "Athletic Team",
    "Soft Tennis-Women": "Athletic Team",
    "Swimming-Men": "Athletic Team",
    "Swimming-Women": "Athletic Team",
    "Table Tennis-Men": "Athletic Team",
    "Table Tennis-Women": "Athletic Team",
    "Taekwondo-Men": "Athletic Team",
    "Taekwondo-Women": "Athletic Team",
    "Volleyball-Men": "Athletic Team",
    "Volleyball-Women": "Athletic Team",
    
    # Student Artists Groups
    "Coro San Benildo": "Student Artists Group",
    "Cultural Promotions Team": "Student Artists Group",
    "Dulaang Filipino": "Student Artists Group",
    "Karilyo": "Student Artists Group",
    "Saint Benilde Romançon Dance Company-Contemporary": "Student Artists Group",
    "Saint Benilde Romançon Dance Company-Hip Hop": "Student Artists Group",
    "Stage Production Operations Team (SPOT)": "Student Artists Group",
    
    # Academic Cluster
    "Animotion": "Academic Cluster",
    "Association of Information Management (AIM)": "Academic Cluster",
    "BASTION": "Academic Cluster",
    "Benilde Arts Management (BeAM)": "Academic Cluster",
    "Benildean Deaf Association (BDA)": "Academic Cluster",
    "Benildean Industrial Designers (BIND)": "Academic Cluster",
    "Chefs in Progress (CHIP)": "Academic Cluster",
    "Computer Business Association (CBA)": "Academic Cluster",
    "Export Management Society (EMS)": "Academic Cluster",
    "Gamers Union for Innovation and Leadership Development (GUILD)": "Academic Cluster",
    "Guild Rising Interior Designers (GRID)": "Academic Cluster",
    "Human Resource Management Society (HRMS)": "Academic Cluster",
    "Hotel, Restaurant, and Institution Management Society (HRIMS)": "Academic Cluster",
    "Junior Marketing Association of Benilde (JMAB)": "Academic Cluster",
    "Leaders in Diplomacy (LEAD)": "Academic Cluster",
    "LIKHA": "Academic Cluster",
    "Mark of Designers Alliance (MODA)": "Academic Cluster",
    "Media Max (MMX)": "Academic Cluster",
    "Social and Academic Guild for Architecture (SAGA)": "Academic Cluster",
    "Society of Analytics and Business Intelligence (SABI)": "Academic Cluster",
    "Travelers in Progress (TRIP)": "Academic Cluster",
    "Vateliens In Progress (VIP)": "Academic Cluster",
    "World-class Hoteliers in Progress (WHIP)": "Academic Cluster",
    
    # Socio-Civic Cluster
    "Benildean Scholars Association (BSA)": "Socio-Civic Cluster",
    "Benilde Red Cross Youth Council (BRCYC)": "Socio-Civic Cluster",
    "Greenergy (GNY)": "Socio-Civic Cluster",
    
    # Special Interest Cluster
    "AIESEC in Benilde": "Special Interest Cluster",
    "Artelier": "Special Interest Cluster",
    "DrawInk": "Special Interest Cluster",
    "Google Developers Student Club in Benilde": "Special Interest Cluster",
    "Nihon Bunka-bu (NBB)": "Special Interest Cluster",
    "Romancon Gaming Community (RGC)": "Special Interest Cluster",
    
    # SIU Recognized Cluster
    "Benilde Student Government": "SIU Recognized Cluster",
    "Benilde Commission on Elections (COMELEC)": "SIU Recognized Cluster",
    "Benilde Committee on Student Involvement (BCSI)": "SIU Recognized Cluster",
    "Benilde International Student Emissaries (BISE)": "SIU Recognized Cluster",
    "Jiu-jitsu Benilde (JJB)": "SIU Recognized Cluster",
    "Student Trainers (STRAINS)": "SIU Recognized Cluster",
    
    # Publications Group
    "Ad Astra": "Publications Group",
    "Benildean Press Corps (BPC)": "Publications Group",
    
    # SDEAS Groups
    "SDEAS-Lasallian Ministry Program for the Deaf (LMPD) Volunteers": "Volunteer Group",
    "SDEAS-Social Responsibility and Outreach Program (SROP) Volunteers": "Volunteer Group",
    "SDEAS-Silent Steps": "Athletic Team",
    "SDEAS-Deaf Benildean Sports Team": "Athletic Team",
    "SDEAS-Deaf Festival Committee": "Special Interest Cluster",
    
    # Additional Groups
    "Primo Musi.Co": "Academic Cluster",
    "Benilde Business Leaders Society (BBLS)": "Academic Cluster"
}

# Municipalities in NCR + nearby
municipalities = [
    "Manila", "Quezon City", "Makati", "Taguig", "Pasig", "Mandaluyong",
    "Caloocan", "San Juan", "Marikina", "Las Piñas", "Muntinlupa", 
    "Parañaque", "Pasay", "Valenzuela", "Navotas", "Malabon",
    "Antipolo", "San Mateo", "Rodriguez", "Cainta", "Bacoor", "Imus",
    "Cavite City", "Dasmariñas", "General Trias", "Trece Martires"
]

# Other demographic options
nationality_options = ['Filipino', 'Chinese', 'Korean', 'Japanese', 'Vietnamese', 'American', 'Indian', 'Thai']
nationality_weights = [92, 2, 2, 1, 1, 1, 0.5, 0.5]

civil_status_options = ['Single', 'Married', 'In a relationship']
civil_status_weights = [85, 3, 12]

year_levels = ['1st Year', '2nd Year', '3rd Year', '4th Year', '5th Year', '6th Year']
leadership_positions = ['President', 'Vice President', 'Secretary', 'Treasurer', 'Auditor', 'Public Relations Officer', 
                       'Events Coordinator', 'Membership Officer', 'Technical Officer', 'Member']
position_weights = [2, 3, 5, 5, 3, 8, 10, 8, 6, 60]

# Activity levels and engagement
activity_levels = ['Very Active', 'Active', 'Moderately Active', 'Low Activity', 'Inactive']
activity_weights = [15, 30, 35, 15, 5]

attendance_rates = [0.95, 0.90, 0.85, 0.80, 0.75, 0.70, 0.65, 0.60, 0.50, 0.30]
attendance_weights = [10, 15, 20, 20, 15, 10, 5, 3, 1, 1]

# Track used student numbers
used_student_numbers = set()
def generate_unique_student_number(prefix):
    while True:
        suffix = random.randint(10000, 99999)  # 5-digit suffix for 8-digit total
        student_number = f"{prefix}{suffix}"
        if student_number not in used_student_numbers:
            used_student_numbers.add(student_number)
            return student_number

# Generate student records
students = []

for year_prefix, count in student_distribution.items():
    for _ in range(count):
        student_number = generate_unique_student_number(year_prefix)
        course = random.choice(courses)
        
        # Birth year logic
        if year_prefix == '125':  # Batch 2025
            birth_year = random.randint(2004, 2007)
        elif year_prefix == '124':  # Batch 2024
            birth_year = random.randint(2003, 2006)
        elif year_prefix == '123':  # Batch 2023
            birth_year = random.randint(2002, 2005)
        elif year_prefix == '122':  # Batch 2022
            birth_year = random.randint(2001, 2004)
        else:  # '121' and older batches
            birth_year = random.randint(1998, 2003)
            
        birthdate = datetime(birth_year, random.randint(1, 12), random.randint(1, 28))
        age = 2024 - birth_year
        
        # Year level mapping
        year_level_map = {'125': '1st Year', '124': '2nd Year', '123': '3rd Year', '122': '4th Year', '121': '5th Year'}
        year_level = year_level_map.get(year_prefix, '5th Year')
        
        nationality = random.choices(nationality_options, weights=nationality_weights, k=1)[0]
        civil_status = random.choices(civil_status_options, weights=civil_status_weights, k=1)[0]
        gender = random.choice(['Male', 'Female'])
        municipality = random.choice(municipalities)
        
        # Academic info
        gwa = round(random.uniform(1.0, 4.0), 2)
        units = random.choice([12, 15, 18, 21, 24])
        is_scholarship = random.random() < 0.12
        
        # Organization membership (60% chance of being in at least one org)
        has_membership = random.random() < 0.60
        
        if has_membership:
            # Number of organizations (most students are in 1-2 orgs)
            num_orgs = random.choices([1, 2, 3, 4], weights=[50, 30, 15, 5], k=1)[0]
            selected_orgs = random.sample(list(organizations.keys()), num_orgs)
            
            # Pick primary organization (first one)
            primary_org = selected_orgs[0]
            org_type = organizations[primary_org]
            
            # Leadership position based on year level and activity
            if year_level in ['4th Year', '5th Year'] and random.random() < 0.25:
                position = random.choices(leadership_positions[:5], weights=[10, 15, 20, 20, 15], k=1)[0]
            elif year_level == '3rd Year' and random.random() < 0.15:
                position = random.choices(leadership_positions[2:8], weights=[15, 15, 10, 25, 20, 15], k=1)[0]
            else:
                position = random.choices(leadership_positions[5:], weights=[20, 25, 20, 15, 20], k=1)[0]
                
            # Membership details
            join_date = datetime(random.randint(2020, 2024), random.randint(1, 12), random.randint(1, 28))
            years_in_org = max(1, 2024 - join_date.year)
            
            activity_level = random.choices(activity_levels, weights=activity_weights, k=1)[0]
            attendance_rate = random.choices(attendance_rates, weights=attendance_weights, k=1)[0]
            
            events_attended = int(random.uniform(0, 20) * attendance_rate)
            volunteer_hours = random.randint(0, 100) if org_type == "Volunteer Group" else random.randint(0, 30)
            
            # Additional orgs as comma-separated string
            other_orgs = ', '.join(selected_orgs[1:]) if len(selected_orgs) > 1 else ""
            
        else:
            # No membership
            primary_org = ""
            org_type = ""
            position = ""
            join_date = None
            years_in_org = 0
            activity_level = ""
            attendance_rate = 0
            events_attended = 0
            volunteer_hours = 0
            other_orgs = ""
        
        # Skills and interests (based on course and org)
        skills = []
        course_lower = course.lower()
        
        # Business and Management skills
        if 'business' in course_lower or 'management' in course_lower or 'bsba' in course_lower:
            skills.extend(['Leadership', 'Communication', 'Project Management', 'Strategic Planning'])
        if 'marketing' in course_lower:
            skills.extend(['Digital Marketing', 'Brand Management', 'Consumer Psychology'])
        if 'human resource' in course_lower or 'hrm' in course_lower:
            skills.extend(['People Management', 'Recruitment', 'Employee Relations'])
        if 'export' in course_lower or 'international' in course_lower:
            skills.extend(['Global Trade', 'Cross-cultural Communication', 'International Relations'])
            
        # Technology and IT skills
        if 'information' in course_lower or 'cybersecurity' in course_lower or 'game' in course_lower:
            skills.extend(['Programming', 'Database Management', 'System Analysis', 'Problem Solving'])
        if 'analytics' in course_lower or 'intelligence' in course_lower:
            skills.extend(['Data Analysis', 'Statistical Modeling', 'Business Intelligence'])
        if 'cybersecurity' in course_lower:
            skills.extend(['Network Security', 'Risk Assessment', 'Ethical Hacking'])
        if 'game' in course_lower:
            skills.extend(['Game Programming', '3D Modeling', 'User Experience Design'])
            
        # Arts and Design skills
        if 'arts' in course_lower or 'design' in course_lower or 'animation' in course_lower or 'film' in course_lower:
            skills.extend(['Creative Thinking', 'Visual Design', 'Artistic Expression', 'Digital Arts'])
        if 'architecture' in course_lower:
            skills.extend(['Architectural Design', 'CAD Software', 'Spatial Planning'])
        if 'interior' in course_lower:
            skills.extend(['Space Planning', 'Color Theory', 'Furniture Design'])
        if 'fashion' in course_lower:
            skills.extend(['Fashion Illustration', 'Textile Knowledge', 'Trend Forecasting'])
        if 'photography' in course_lower:
            skills.extend(['Photo Editing', 'Lighting Techniques', 'Visual Storytelling'])
        if 'music' in course_lower:
            skills.extend(['Audio Production', 'Music Composition', 'Sound Engineering'])
        if 'theater' in course_lower or 'dance' in course_lower or 'performing' in course_lower:
            skills.extend(['Performance Skills', 'Stage Presence', 'Creative Expression'])
            
        # Hospitality and Tourism skills  
        if 'hospitality' in course_lower or 'hotel' in course_lower or 'tourism' in course_lower or 'culinary' in course_lower:
            skills.extend(['Customer Service', 'Event Management', 'Cultural Awareness'])
        if 'culinary' in course_lower:
            skills.extend(['Culinary Techniques', 'Menu Planning', 'Food Safety'])
        if 'tourism' in course_lower:
            skills.extend(['Tour Planning', 'Cultural Knowledge', 'Travel Coordination'])
            
        # Diplomacy and Governance skills
        if 'diplomacy' in course_lower or 'governance' in course_lower or 'public affairs' in course_lower:
            skills.extend(['Negotiation', 'Policy Analysis', 'Public Speaking', 'Research Skills'])
            
        # Special Programs skills
        if 'deaf' in course_lower or 'sign language' in course_lower:
            skills.extend(['Sign Language', 'Deaf Culture', 'Inclusive Communication'])
        if 'holistic' in course_lower:
            skills.extend(['Critical Thinking', 'Interdisciplinary Knowledge', 'Systems Thinking'])
        if 'real estate' in course_lower:
            skills.extend(['Property Valuation', 'Market Analysis', 'Contract Negotiation'])
        if 'social innovation' in course_lower or 'entrepreneurship' in course_lower:
            skills.extend(['Social Impact', 'Innovation', 'Startup Development'])
            
        # Organization-based skills
        if org_type == "Athletic Team":
            skills.extend(['Teamwork', 'Discipline', 'Physical Fitness', 'Competitive Spirit'])
        if org_type == "Volunteer Group":
            skills.extend(['Community Service', 'Empathy', 'Social Awareness'])
        if org_type == "Publications Group":
            skills.extend(['Writing', 'Journalism', 'Content Creation'])
        if org_type == "Student Artists Group":
            skills.extend(['Artistic Collaboration', 'Performance', 'Creative Direction'])
            
        skills_str = ', '.join(random.sample(skills, min(4, len(skills)))) if skills else ""
        
        students.append({
            "student_number": student_number,
            "course": course,
            "year_level": year_level,
            "age": age,
            "birthdate": birthdate.strftime("%Y-%m-%d"),
            "nationality": nationality,
            "civil_status": civil_status,
            "gender": gender,
            "municipality": municipality,
            "gwa": gwa,
            "units_enrolled": units,
            "has_scholarship": is_scholarship,
            "primary_organization": primary_org,
            "organization_type": org_type,
            "position_held": position,
            "join_date": join_date.strftime("%Y-%m-%d") if join_date else "",
            "years_in_organization": years_in_org,
            "activity_level": activity_level,
            "attendance_rate": round(attendance_rate, 2),
            "events_attended_last_year": events_attended,
            "volunteer_hours_completed": volunteer_hours,
            "other_organizations": other_orgs,
            "leadership_experience": 1 if position in leadership_positions[:8] else 0,
            "skills_developed": skills_str,
            "is_active_member": 1 if activity_level in ['Very Active', 'Active', 'Moderately Active'] else 0,
            "extracurricular_engagement_score": round(random.uniform(1.0, 10.0), 1),
            "graduation_readiness": 1 if gwa >= 2.5 and (not has_membership or activity_level != 'Inactive') else 0
        })

# 💾 Save to CSV
df = pd.DataFrame(students)
df = df.sample(frac=1).reset_index(drop=True)  # Shuffle the data
df.to_csv("benilde_student_organizations_dataset_5000.csv", index=False)

print("✅ benilde_student_organizations_dataset_5000.csv has been generated with 5,000 student records!")
print(f"📊 Dataset includes {len(organizations)} different student organizations")
print(f"🎓 Students from {len(courses)} different courses")
print(f"📈 {len([s for s in students if s['primary_organization']])} students are organization members")

# Display basic statistics
print("Dataset Overview:")
print(f"- Total Records: {len(df)}")
print(f"- Organization Members: {len(df[df['primary_organization'] != ''])}")
print(f"- Non-members: {len(df[df['primary_organization'] == ''])}")
print(f"- Leadership Positions: {len(df[df['leadership_experience'] == 1])}")
print(f"- Active Members: {len(df[df['is_active_member'] == 1])}")

print("Top 10 Most Popular Organizations:")
org_counts = df[df['primary_organization'] != '']['primary_organization'].value_counts().head(10)
for org, count in org_counts.items():
    print(f"- {org}: {count} members")

✅ benilde_student_organizations_dataset_5000.csv has been generated with 5,000 student records!
📊 Dataset includes 79 different student organizations
🎓 Students from 35 different courses
📈 2987 students are organization members
Dataset Overview:
- Total Records: 5000
- Organization Members: 2987
- Non-members: 2013
- Leadership Positions: 2059
- Active Members: 2371
Top 10 Most Popular Organizations:
- Badminton-Men: 50 members
- Chess: 49 members
- SDEAS-Deaf Festival Committee: 49 members
- Benilde Arts Management (BeAM): 48 members
- AIESEC in Benilde: 47 members
- Cultural Promotions Team: 47 members
- Basketball-Men: 47 members
- Beach Volleyball-Women: 47 members
- Karilyo: 46 members
- Best Buddies: 46 members
