# Introduction
This research examines how Datacamp scholars engage with courses, assessments, and projects to identify factors contributing to learning progress. We analyze the connection between time spent on courses, completion rates, and assessment results to determine what drives higher performance. Additionally, we assess how scholars' course choices align with their intended tracks and technology focus, measuring the impact on engagement, time spent on progress, and XP accumulation. By utilizing these learning patterns and their performance and participation data, this study seeks to enhance personalized track recommendations and improve support through our study jams, workshops, and guided learning paths.

---

## [1] **Research Question**:
*How can we predict successful assessment performance based on time spent on and completed courses?*

## Relevance:
Understanding the relationship between **course engagement**, time spent on course completion and **assessment performance** can help identify **key factors that lead to higher assessment scores**, recommend more targeted learning paths for similar students based on their course engagement.

## **Tables and Columns Required**:
- course_fact: user_id, course_id, time_spent, completed_at
- course_dim: course_id, technology, title
- user_dim: user_id, email, first_name, last_name
- assessment_dim: assessment_id, technology
- assessment_fact: user_id, assessment_id score, score_group, percentile, time_spent

## Methodology:
1. Join `course_fact` with  `course_dim` on `course_id`
2. Join `assessment_fact` with `assessment_dim` on `assessment_id`
3. Join `user_dim` to associate course and assessment performance per user.
4. Aggregate course engagement per user
   - Calculate **total time spent per user** on course
   - Count **the number of completed courses per user**.
5. Aggregate assessment performance per user
   - Calculate average assessment score per user.
   - Identify score groups (e.g., low, medium, high performers).
6. Analyze correlation between course engagement and assessment score
   - Use statistical methods (e.g., correlation coefficients, regression models).
   - Group by technology to see if certain subjects have stronger relationships.
7. Predict assessment success based on course engagement
   - Use machine learning models (e.g., logistic regression, decision trees) to predict success based on time spent and completed courses.


## Potential Insights and Actions:
If students who spend more time on courses tend to score higher on assessments, we can encourage students to **allocate more time to learning activities** and provide structured study plans.

If users with completed courses in a technology (Python, R, SQL) score higher in assessments related to the same technology, we can recommend similar students finish prerequisite courses before taking assessments to improve success rates.

If users in the top 10% of assessment scores follow a different learning pattern compared to lower-performing users. We need to  identify and replicate **high-performing study behaviors** to recommend effective learning strategies for others.

---


## [2] **Research Question:**
*How can we recommend personalized and suitable data track (Data Engineering, Data Science, Machine Learning, etc.) for Datacamp scholars based on their learning patterns (time spent), performance (score, technologies), and preferences?*

## Relevance
Recommending personalized data tracks based on scholars' learning patterns, performance, and preferences ensures that they are guided toward the most suitable learning path. Understanding how scholars engage with different technologies, courses, and assessments allows for data-driven track recommendations that align with their strengths and interests. This can improve course completion rates, increase engagement, and help scholars transition effectively from foundational topics to specialized fields. Additionally, by identifying scholars struggling with track selection or engagement, we can provide targeted mentorship, study jams, and workshops to enhance their learning experience together with Datacamp.


## **Tables and Columns Required:**
- course_fact: user_id, course_id, time_spent
- course_dim: course_id, technology, title, track_id

- assessment_fact: user_id, assessment_id, score
- assessment_dim: assessment_id, technology, difficulty

- practice_fact: user_id, practice_id, time_spent,
- practice_dim: practice_id, technology

- projects_fact: user_id, project_id, time_spent
- projects_dim: project_id, technology

- user_dim: user_id, email, first_name, last_name

## **Methodology**:
1. Join course_fact with course_dim → Summarize time spent per technology
2. Join practice_fact with practice_dim → Summarize practice time per technology
3. Join projects_fact with projects_dim → Summarize project engagement per technology
4. Join `assessment_fact` with `assessment_dim` → **Get average scores per technology**
5. Group by `user_id` and `technology` → Compute:
   Total time spent per technology
   Average assessment score per technology
   Number of completed projects/practices
6. Recommend Suitable Tracks
   - Match dominant learning technology with course_dim.track_id
   - Rank tracks based on time spent + performance + project completion
   - Recommend top 1-2 best-matching tracks per user

## **Potential Insights and Actions:**

- If the scholars who spend more time on Python and SQL courses tend to perform well in assessments related to Data Science and Data Engineering, then we must recommend these scholars to Data Science or Data Engineering tracks based on their dominant technology.
- If there is high assessment scores in Machine Learning technologies which correlates to more projects made in real life, then we can encourage scholars who excel in ML Assessments to engage more in ML Projects for deeper practical experience.
- Scholars who engage in multiple tracks often take longer to complete in any single track, then we provide personalized guidance to help scholars focus on a primary track before diversifying into multiple fields.
- Many scholars engage with Python first before transitioning into ML or Data Engineering, design introductory learning paths that guide scholars from Python fundamentals to advanced tracks.

---


## [3] **Research Question**:
*How much is time spent on doing courses account for starting practicing and projects?*

## **Relevance**:
Understanding the relationship between time spent on courses and engagement in practice exercises and projects can help identify whether scholars actively apply their learning. This insight would allow us to encourage a more hands-on learning approach

## **Tables and Columns Required:**
- course_fact: user_id, course_id, time_spent
- course_dim: course_id, technology, title, track_id
- practice_fact: user_id, practice_id, time_spent
- practice_dim: practice_id, technology
- projects_fact: user_id, project_id, time_spent
- projects_dim: project_id, technology
- user_dim: user_id, email, first_name, last_name

## **Methodology**:
1. Join `course_fact` with `course_dim` on `course_id`
2. Join `course_fact` with `user_dim` on `user_id` to get course engagement per user.
3. Calculate total **time spent on courses per user**.
4. Join `practice_fact` with `practice_dim` on `practice_id` to link practice exercises with technologies.
5. Join `projects_fact` with `projects_dim` on `project_id` to associate projects with technologies.
6. Identify users who **started practice exercises or projects after spending time on courses**.
7. Calculate correlation between time spent on courses and time spent on practice/projects.
8. Compare the percentage of users transitioning from courses → practice → projects.
9. Identify patterns of engagement (e.g., users who only do courses, those who practice but don’t do projects, those who do all).
10. Classify users into:
	1. Course-focused learners (high course time, low practice/projects).
	2. Balanced learners (moderate time across courses, practice, and projects).
	3. Hands-on learners (low course time, high practice/project engagement).

## Potential Insights and Actions
- If scholars spend a lot of time on courses but rarely start practice exercises, we might need to introduce guided practice sessions immediately after course completion.
- If scholars transition quickly from courses to practice but avoid projects, we might need to provide smaller, beginner-friendly project challenges.
- If certain technologies (e.g., Machine Learning) show low engagement in practice and projects, we might need to add more real-world case studies or interactive exercises.

---
## [4] Research Question:
"How aligned are Datacamp scholars’ course selections with their chosen learning track, and how does this alignment impact overall course engagement?"

## Relevance:
Understanding how closely Datacamp scholars follow their intended learning track can help identify patterns in course selection, engagement levels, and potential gaps in track adherence. If scholars frequently take courses outside their selected track, it may indicate a need for better track guidance, more flexible learning paths, or additional interdisciplinary options. These insights can help improve learning track recommendations and optimize course offerings for better engagement.

## Tables and Columns Required:
- user_dim: user_id, email, first_name, last_name
- track_fact: track_id, user_id, is_currently_active
- track_dim: track_id, title, technology, nb_courses, nb_projects
- course_fact: user_id, course_id, time_spent, completed_at
- course_dim: course_id, technology, title, track_id
- assessment_fact: user_id, assessment_id, score, score_group, percentile
- assessment_dim: assessment_id, technology

## Methodology
1. Join `track_fact` with `track_dim` on `track_id` to get each scholar's current active track and its details.
2. Join with `user_dim` on `user_id` to associate scholars with their chosen track.
3. Join `course_fact` with `course_dim` on `course_id` to get courses taken by each scholar and their associated track.
4. Compare `course_dim.track_id` with `track_fact.track_id` to determine if scholars are taking courses within their chosen track.
5. Calculate **percentage of courses** taken that belong to the scholar’s selected track per user.
6. Calculate **total time spent on courses inside** vs. **outside the chosen track** per user.
7. Compare **completion rates for courses aligned** vs. **not aligned with the chosen track.**
8. Join `assessment_fact` with `assessment_dim` on `assessment_id` to obtain scholars’ assessment scores by technology.
9. Compare **assessment performance of scholars** who primarily follow their track vs. those who take a mix of courses.
10. Determine if **higher track adherence** correlates with **better assessment scores.**

## Potential Insights and Actions
- If scholars who strictly follow their track show higher engagement, we might reinforce track adherence through structured learning paths and milestone-based rewards.
- If scholars frequently take courses outside their track, we might introduce interdisciplinary study jam sessions to support broader learning interests.
- If engagement is low even when scholars stay within their track, we might provide additional motivation through gamification elements like badges or XP bonuses.
- If scholars engaged in off-track courses perform better in assessments, we might explore integrating cross-track recommendations to enhance learning flexibility.
- If scholars struggle with staying on track, we might conduct onboarding sessions to help them navigate and commit to a structured learning path.

---
## [5] Research Question
How does the alignment between Datacamp scholars’ course selections and their intended technology focus impact their course completion rate and XP accumulation?

## Relevance:
Understanding how well scholars’ chosen courses align with their intended technology focus can help assess whether learners stay engaged and complete their courses. If strong alignment leads to higher completion and XP rates, it suggests that learners benefit from clear, structured paths. If misalignment leads to lower completion, interventions like better course recommendations or guided tracks can improve learner retention and progress.

## **Tables and Columns Required:**
- user_dim: user_id, email, first_name, last_name
- track_fact: track_id, user_id, is_currently_active, xp
- track_dim: track_id, title, technology, nb_courses, nb_projects, xp
- course_fact: user_id, course_id, time_spent, completed_at, xp
- course_dim: course_id, technology, title, xp

## Methodology
1. Join `track_fact` with `user_dim` on `user_id` to identify scholars and their active learning tracks.
2. Join `track_fact` with `track_dim` on `track_id` to retrieve track details, including technology focus and XP.
3. Join course_fact with `course_dim` on `course_id` to obtain course details (technology, XP).
4. Filter records to keep only courses associated with a scholar's active track (`track_id`)
5. Compare the technology column in course_dim with the scholar’s track technology from track_dim.
6. Calculate the proportion of courses taken that match the track technology (Aligned Courses % = Aligned Courses / Total Courses Taken).
7. Compute course completion rate per scholar:
$$\text{Completion Rate}=\frac{\text{Completed Courses}}{\text{Total Courses Taken}}$$

8. Compute average XP accumulation per course per scholar:
$$\text{Avg XP per Course} = \frac{\text{Total XP from Courses}}{\text{Total Courses Taken}}$$

9. Aggregate XP per scholar's learning track from `track_fact`.

10. Group scholars by high and low course-track alignment (e.g., >70% aligned vs. < 70% aligned).

11. Compare course completion rates and XP accumulation between these groups to determine if alignment influences engagement and progress.


## Potential Insights and Actions
- If completion rates remain low regardless of alignment, we might need to investigate student's experience on course difficulty or content gaps.
- If XP accumulation is significantly higher in aligned courses, we might continue to emphasize track-based XP milestones to motivate scholars or introduce leaderboard-based incentives to encourage scholars to stick to their intended technology focus.
- If scholars with low alignment accumulate less XP, we might need to introduce personalized course suggestions and learning recommendations during mentorship sessions to help them stay on track. to keep them engaged.
- If completion rates remain low regardless of alignment, we might conduct feedback surveys to identify challenges and adjust our support programs accordingly.
- If scholars with high course-track alignment show higher completion rates, we might encourage stricter learning path recommendations to improve retention.

# Conclusion
These insights will support Project DREAMS in utilizing the DataCamp Data Connector to further develop learning results, track retention, and enhance scholar engagement. The project can create more effective study suggestions, organized learning pathways, and focused mentoring by seeing trends in course completion, assessment performance, and track alignment. The results will direct workshops, study jams, and gamification techniques to improve student achievement and retention. Furthermore, by examining learning practices, the analytics platform will be improved to deliver actionable insights for maximizing student progress.