# **Project DREAMS: Data Analysis Assignment**
by John Mike Asuncion

## **Introduction**
This notebook presents **five research questions for Project DREAMS (DataCamp Resources & Engagement Analytics Monitoring System)**. These questions are designed to extract meaningful insights from the DataCamp Data Connector to inform workshop planning, interventions, and resource allocation for *DataCamp scholars* under *GDG On Campus PUP supervision*. Each question aligns with the **project objectives** of enhancing monitoring capabilities and developing a comprehensive analytics platform.

## **Research Questions**

### **Research Question 1:** What are the learning patterns and engagement trends across different times of day and days of the week?

**Relevance**: Understanding when scholars are most active on the platform can help optimize the scheduling of workshops, support sessions, and intervention timing. This insight allows for more strategic resource allocation and potentially higher engagement rates with support activities.

**Tables and Columns Required**:
- `exercise_fact`: `user_id`, `date_id`, `started_at`, `completed_at`, `time_spent`, `xp`
- `course_fact`: `user_id`, `date_id`, `started_at`, `completed_at`, `time_spent`, `xp`
- `user_dim`: `user_id`, `first_name`, `last_name`, `email`

**Methodology**:
1. Extract day of week and hour of day from `started_at` timestamps in both fact tables
2. Aggregate metrics (time spent, XP gained, completion rates) by day of week and hour of day
3. Calculate activity concentration metrics (e.g., percentage of total learning occurring during each time slot)
4. Identify peak activity periods and potential "dead zones"

**Potential Insights and Actions**:
- If most scholars are active during evenings, schedule workshops and support sessions during these peak times
- If weekends show significantly higher activity, consider weekend workshops or support materials releases
- Identify times of low engagement to deploy targeted notifications or challenges to boost activity
- Design communication strategies aligned with scholars' natural learning rhythms

### **Research Question 2:** How does the completion rate of different technologies (Python, R, SQL) correlate with subsequent course selection and learning paths?

**Relevance**: This analysis helps understand how initial success or struggle with specific technologies influences scholars' learning journeys. It provides insights into natural learning progressions versus abandonment patterns, which can inform how learning paths should be structured.

**Tables and Columns Required**:
- `course_fact`: `user_id`, `course_id`, `started_at`, `completed_at`
- `course_dim`: `course_id`, `title`, `technology`, `topic`
- `user_dim`: `user_id`, `first_name`, `last_name`, `email`
- `track_fact`: `user_id`, `track_version_id`, `started_at`, `completed_at`
- `track_dim`: `track_version_id`, `title`, `technology`

**Methodology**:
1. Calculate completion rates by technology for each user
2. Analyze the sequence of courses taken by technology
3. Identify transition patterns (e.g., from Python to SQL, from R to Python)
4. Compare learning paths of those who complete versus abandon courses in specific technologies

**Potential Insights and Actions**:
- If scholars struggling with Python tend to switch to R, develop bridging resources or specialized support
- If successful SQL learners consistently move to Python next, formalize this path in recommended learning tracks
- Identify technologies with high abandonment rates to develop targeted interventions
- Design supplementary materials for the most challenging technology transitions

### **Research Question 3:** What is the relationship between engagement with practice exercises and performance in assessments and projects?

**Relevance**: This analysis reveals how different types of learning activities contribute to practical skill development and assessment success. Understanding this relationship helps optimize the balance of content types in learning paths and identify effective preparation strategies.

**Tables and Columns Required**:
- `practice_fact`: `user_id`, `time_spent`, `xp`, `completed_at`
- `practice_dim`: `practice_id`, `title`, `technology`
- `assessment_fact`: `user_id`, `score`, `percentile`, `completed_at`
- `assessment_dim`: `assessment_id`, `title`, `technology`
- `project_fact`: `user_id`, `time_spent`, `completed_at`
- `project_dim`: `project_id`, `title`, `technology`, `is_guided`

**Methodology**:
1. Calculate practice engagement metrics (time spent, completion rate) before assessments
2. Analyze correlation between practice activity and assessment scores/percentiles
3. Compare project completion rates based on prior practice engagement
4. Segment users by practice activity levels and analyze differences in assessment outcomes

**Potential Insights and Actions**:
- If high practice engagement strongly correlates with assessment success, encourage more practice through gamification
- If specific practice types better prepare for assessments, prioritize these in pre-assessment recommendations
- Develop targeted practice modules for scholars struggling with specific assessment types
- Implement "practice checkpoints" before high-stakes projects or assessments

### **Research Question 4:** Which content types (courses, practice, projects) drive the highest XP accumulation and sustained engagement over time?

**Relevance**: Understanding which content types most effectively maintain scholar engagement helps optimize the mix of learning activities recommended to scholars. This insight informs content priority in workshops and helps identify engagement triggers.

**Tables and Columns Required**:
- `course_fact`: `user_id`, `time_spent`, `xp`, `started_at`, `completed_at`
- `practice_fact`: `user_id`, `time_spent`, `xp`, `started_at`, `completed_at`
- `project_fact`: `user_id`, `time_spent`, `xp`, `started_at`, `completed_at`
- `assessment_fact`: `user_id`, `time_spent`, `started_at`, `completed_at`
- `user_dim`: `user_id`, `registered_at`, `last_visit_at`, `last_time_spent_at`
- `xp_fact`: `user_id`, `event`, `xp`, `created_date`

**Methodology**:
1. Calculate XP accumulation rates by content type
2. Measure engagement persistence (time between first and last activity) by dominant content type
3. Analyze patterns of content type transitions and their relationship to ongoing engagement
4. Identify "re-engagement" patterns after periods of inactivity

**Potential Insights and Actions**:
- If projects drive the highest sustained engagement, increase project-based workshops
- If alternating between content types shows better retention, design varied learning paths
- Develop re-engagement strategies using the content types that best reactivate dormant users
- Create personalized content mix recommendations based on engagement patterns

### **Research Question 5:** How do collaborative features like DataLab workbooks impact learning outcomes and course completion rates?

**Relevance**: Understanding the impact of collaborative and applied learning features helps determine their effectiveness and informs strategies for promoting these features. This insight is critical for developing a holistic learning environment that balances structured and exploratory learning.

**Tables and Columns Required**:
- `workspace_fact`: `creator_id`, `nb_attempts_to_publish`, `nb_times_published_successfully`
- `workspace_visit_fact`: `visitor_id`, `workspace_id`, `nb_seconds`
- `workspace_dim`: `workspace_id`, `workspace_title`, `technology`
- `course_fact`: `user_id`, `course_id`, `completed_at`, `time_spent`, `xp`
- `user_dim`: `user_id`, `email`, `first_name`, `last_name`

**Methodology**:
1. Compare course completion rates between active DataLab workbook creators/users and non-users
2. Analyze the relationship between workbook creation/viewing and subsequent course engagement
3. Identify which technologies show the strongest correlation between workbook usage and course success
4. Examine temporal patterns (e.g., does workbook usage precede or follow course completion?)

**Potential Insights and Actions**:
- If DataLab workbook usage correlates with higher course completion, develop workshop modules showcasing workbook features
- If specific technologies show stronger benefits from workbook usage, create technology-specific collaborative challenges
- Implement peer learning strategies based on workbook sharing patterns
- Design scaffolded workbook templates for technologies where scholars struggle most

## **Conclusion**

These **five research questions address key aspects of scholar engagement and learning on DataCamp**. The insights derived will **directly support Project DREAMS' objectives** by providing actionable intelligence for optimizing scholar management strategies. The findings will enable:

1. More effective workshop planning based on engagement patterns and learning preferences
2. Targeted interventions for scholars struggling with specific technologies or content types
3. Strategic resource allocation to maximize impact on scholar progress
4. Development of personalized learning path recommendations
5. Creation of a comprehensive analytics dashboard that visualizes these key metrics

By analyzing these dimensions of the DataCamp learning experience, **GDG On Campus PUP** will be able to develop a robust monitoring system that not only tracks progress but actively **contributes to improving scholar success rates and engagement**.

## **Notes**

Here's why my research questions are well-aligned with **Project DREAMS objectives and requirements:**

1. **Alignment with Project Objectives**:
   - The questions directly support the goal of designing "**comprehensive performance metrics**" by examining engagement patterns, completion rates, and learning outcomes
   - The insights would inform the "**interactive visualizations**" needed for the web-based analytics platform

2. **Focus on Monitoring Capabilities**:
   - Questions examine key tracking metrics (time spent, XP gained, completion rates)
   - Analysis of learning patterns provides the foundation for meaningful scholar progress monitoring

3. **Support for Resource Allocation & Workshop Planning**:
   - **Question 1** directly addresses optimal workshop timing
   - **Questions 2-5** provide insights for workshop content prioritization
   - All questions yield actionable data for targeted interventions

4. **Technical Feasibility**:
   - All analyses can be implemented using the specified technology stack (Pandas, Plotly, Streamlit)
   - Data required is available in the DataCamp Data Connector model
   - Methodologies are practical and achievable

5. **Dual-Purpose Success**:
   - Questions support both **operational needs** (better scholar management) and **learning opportunities** (practical data analysis skills)

*These research questions successfully establish the foundation for a comprehensive monitoring system that could transform how GDG On Campus PUP manages its DataCamp scholars, directly fulfilling the project's stated purpose.*