# Asking the Right Questions
Kim Audrey Magan | 3/14/2025

---

Your task is to:

- Review the DataCamp Data Connector documentation.
- Formulate 5 research questions that provide valuable insights for tracking and improving DataCamp scholar engagement.
For each question:
    - Explain its relevance to Project DREAMS objectives.
    - Identify the tables and columns needed to answer the question.
    - Describe any joins, aggregations, or calculations required.
    - Explain how the insights could improve scholar management.

## 1. Introduction
This notebook will first touch on understanding the main problem only then proceed to asking the 5 right questions followed by the approach to solve them.

### Understanding the Problem

DataCamp is a popular online learning platform that allows students to study machine learning and data science. Although well-known, it lacks the ability to display student progress visually, making it difficult for administrators to track **improvements through various metrics** and **visualize patterns and trends**. Implementing a system to do so would enable the administrators to improve resource allocation and workshop planning.

**How can someone help solve the problem?**

I think to solve this problem we could:
- **Identify the metrics** that really tell the ability of a student (_what does improving really mean here?_)
    - **Qualitative & Quantitative**, beyond XP, consider metrics like engagement level, time spent per module, or even peer collaboration
- Use statistical methods and machine learning to **differentiate top performers from others**, uncover trends, and predict outcomes
- Of course **gather the right data** from the databank with security and all
- **Integrate the effective and scientific study techniques** to datacamp after the analysis (_are they doing spaced repetition, projects every after courses?_ \[techniques that are based and scientific])
- Get the **feedbacks from the students** about the courses, are the instructors fast are the courses outdated, etc. Do their expereinces at their homes affect their ability to learn? How can offficers help when they get stuck?

## 2. Research Questions

**Format:**

Research Questions: Your 5 research questions, each including:
- The question itself.
- Relevance to Project DREAMS objectives.
    - Design and implement comprehensive performance metrics and interactive visualizations to enhance the monitoring capabilities of the system.
    - Develop a web-based analytics platform specifically specially made to track DataCamp scholars' progress under GDG PUP supervision.
- Tables and columns required from the DataCamp Data Connector.
- Methodology (joins, aggregations, calculations).
- Potential insights and actions.

### **Question #1:**  
**What metrics determine the true skill of students based on their chosen technology or track?**  

#### **Relevance to Project DREAMS Objectives:**  
To effectively gather insights about our students, we must first define the key metrics. These metrics will serve as the foundation for visualization and analysis.  

#### **Tables and Columns Required from the DataCamp Data Connector:**  
If the relevant metrics can be derived from DataCamp’s data connector, the following datasets may be needed (let's say the time):  
- `course_dim`: `course_id`, `technology`
- `course_fact`: `user_id`, `course_id`, `started_at`, `completed_at`, `time_spent`

#### **Methodology:**  
1. Join the `course_dim` and the `course_fact` using `course_id`
2. Group the `user_id` (users) based on the technology
3. Calculate the average time spent from the courses taken by a `user_id`

#### **Potential Insights and Actions:**  
Beyond the time spent on courses and practice, a strong indicator of a student’s skill is their **deliverables**—what they can produce based on their learning.  
- We could brainstorm different skill levels and their expected deliverables.  
- Alternatively, we could consult professionals to understand what they expect from students at various levels.

- I am thinking maybe we could use ML to do feature selection once we have identified the candidates metrics that determine the true skill of our students


### **Question #2A:**  
**What value dictates high-performing, low-performing, or other categories of students?**  

#### **Relevance to Project DREAMS Objectives:**  
One of DREAMS' objectives is to assess students' skills. Once the key metrics are defined, we can determine the best value that classifies students as high-performing, low-performing, or other levels. This classification will allow for more specialized feedback.  

Additionally, distinguishing students into categories enables us to narrow our investigations, identify patterns, and make informed future decisions.  

#### **Tables and Columns Required from the DataCamp Data Connector:**<br> 
(If we go with time)
If relevant, the following datasets may be needed:  
- `course_dim`: `course_id`, `technology`
- `course_fact`: `user_id`, `course_id`, `started_at`, `completed_at`, `time_spent`

#### **Methodology:**  

Our goal is to find who are the best-performing students and low-performing (other categories)

1. Set the value that determines the level of competencies for each student
2. Join the `course_dim` and the `course_fact` using `course_id`
3. Group the `user_id` (users) based on the technology
4. Calculate the average time spent from the courses taken by a `user_id`
5. Group the `user_id` based on the value specified

#### **Potential Insights and Actions:**  
The first step is to clearly define the key metrics, as they form the basis for identifying the most meaningful classification values.  

- Once the metrics are established, we can explore research studies that use similar metrics to understand the significance of different values.
- Another thing here is when we have a labeled data based on these features such as the time we can do supervised learning to other students as well based on the time they are spending on a material kinda like a feedback where they fall among the categories.  


### **Question #2B:**  
**Assuming the values were extracted, what separates high-performing and low-performing (or other levels) students?**  

- If the metric is **X**, how much longer do high-performing students engage in **X** compared to low-performing students?  
- If the metric is **Y**, how much more do high-performing students complete compared to low-performing students?  
- ...  

#### **Relevance to Project DREAMS Objectives:**  
Understanding the differences between student categories will help analysts gather targeted insights and analyze specific groups, making the findings more personalized.  

#### **Tables and Columns Required from the DataCamp Data Connector:**  
If relevant, the following datasets may be needed:  
- `assessment_dim`: `assessment_id`, `technology`
- `assessment_fact`: entire table
- ... We can use almost all the tables since we are looking for the characteristics of the students

#### **Methodology:**  
1. With the labeled user
2. Group the `user_id` based on levels
3. Do descriptive statistics on their assessments, projects, finished courses

#### **Potential Insights and Actions:**  
Once the groups are identified based on the metrics, we can use this information to present in the dashboard how a user in a specific group need to proceed with his or her learnings. Again with these features we can use a classification model to show where they fall and then provide necessary feedbacks tailored with their needs.


### **Question #3:**  
**How long does it take to finish a course? Is the strategy to complete it in one sitting or spread it over a week/month?**  

#### **Relevance to Project DREAMS Objectives:**  
- This helps analyze how much time students spend on each course and its relationship with key metrics  
- Research suggests that spreading study time over a week is more effective than completing a course in one sitting. Understanding student behavior can reveal whether they are using effective study techniques or simply rushing through courses without proper learning.
- Spaced Repetition (Make it Stick)  

#### **Tables and Columns Required from the DataCamp Data Connector:**  
If relevant, the following datasets may be needed:  
- course_dim: course_id, technology
- course_fact: user_id, course_id, started_at, completed_at, time_spent
- date_dim: id 

#### **Methodology:**  
1. Join the course_dim and the course_fact using course_id
2. Extract the finished courses (those courses whose completed_at values != NULL)
3. If what I think is correct (new instance of record generates wheneever a user worked on a specific course) then we can join the date_dim and the table from step 1
4. Get the average interval from every finished course (Still thinking of a way to do this)
5. Now we have the average interval for every user
6. Divide the Joined table on technology and user level

#### **Potential Insights and Actions:**  
- If data shows that students are rushing through courses without retaining knowledge, we could introduce a structured course plan to optimize learning.  
- Instead of finishing multiple courses in a single day, we could suggest a schedule that includes two unrelated but relevant courses within the same track to encourage better retention and learning balance.  


### **Question #4:**  
**How many courses or tracks do students take simultaneously?**  

#### **Relevance to Project DREAMS Objectives:**  
- Understanding how students manage multiple courses on DataCamp can help determine whether their learning strategies are effective.  
- By comparing student behavior with research on effective study techniques, we can assess whether their approach is beneficial or if improvements can be suggested.  

#### **Tables and Columns Required from the DataCamp Data Connector:**  
- course_fact: user_id, course_id, started_at, completed_at
- course_dim: course_id, technology

#### **Methodology:**  
1. Join the two said tables on course_id
2. Determine Active Course Periods (those whose completed_at == NULL)
3. Identify Overlapping Courses:  if started_at for one course falls between started_at and completed_at of another course
4. Group by User and Count Simultaneous Courses
5. Separate by technology and by skill level

#### **Potential Insights and Actions:**  
- Based on this data, we could create a **personalized course scheduling plan** to help students maximize their learning.  
- For example, research may suggest that taking **two courses per week** is optimal, or that **spreading one course over a week** leads to better retention. The best approach can be determined based on relevant studies.  


### **Question #5:**  
**Do student demographics affect their progress in DataCamp?**  

#### **Relevance to Project DREAMS Objectives:**  
- Understanding more about our students allows us to tailor recommendations and support strategies to their specific needs.  

#### **Tables and Columns Required from the DataCamp Data Connector:**  
- Same as **Question #2A**  

#### **Methodology:**  
1. Define key metrics and skill levels.  
2. Analyze correlations between student demographics and their progress.  

#### **Potential Insights and Actions:**  
- Identify factors that may influence student progress, such as academic workload, personal challenges, or level of interest.  
- Determine if students are struggling due to external factors (e.g., school commitments) or lack of motivation.  
- Implement periodic check-ins through surveys to understand their challenges and provide better support.  


## 3. Conclusion

I made the questions to first answer the main question which is finding the right metrics to know the skill of our students and from this find the different levels of the student and analyze their habits/behaviours when learning. I also asked some questions which direct on the behaviors for I want the students to develop a more effective study routine and not just gain XP—this initiative can lead to a smart scheduling system. Lastly, I want to know how our students doing not just inside the datacamp but also in their lives for I believe these things affect how people learn and simple check-ups can help the student get pass their struggles when studying data tech.