# Analysis for Continuous Improvement

Author Name: Lei Xiao 

9-digit PID: 7305-26509

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. The course should allow students to retake the test because it will give them the incentive to relearn their missed material. 
2. The course should make the in-person lectures recorded and saved because the recorded videos are more engaging for the students.
3. The course should extend due dates on harder exercise assignments because they require longer time from the students to comprehend.
4. The course, in general, should expand the size and have more professors because many students can not get into the class.
5. The course should not have time limits on tests because students often don't get enough minutes to finish the questions fully. 

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze:
- Idea four (The course, in general, should expand the size and have more professors because many students can not get into the class) don't have sufficient data to analyze because the survey didn't ask questions on students outside of class.
- Idea five (The course should not have time limits on tests because students often don't get enough minutes to finish the questions fully) don't have sufficient date to analyze because there were no questions regarding the test specifically 

2. Suggestion for how to collect data to support this idea in the future: 
- My suggestion would be to ask the students in the class on their opinion on the class size, from 1 meaning too little seats, to 10 too much seats.
- My suggestion would be to ask the students on their opinion on the difficulty of the tests, from 1 meaning very easy, to 10 very hard.


## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data:
- idea two (The course should make the in-person lectures recorded and saved because the recorded videos are more engaging for the students)
2. This idea is more valuable than the others brainstormed because: 
- Personally for me, the live classes did helped understand the course materials, but not nearly as much as the youtube videos. By giving making the in-person courses optional,  I believe the general student body will see a boost in productivity because it will open up more time for a freer schedule. In return the students can be more flexible when trying to make time to watch the recorded videos. I chose to analyze this idea over the other ones because, for one, I think it was the most realistic suggestion out of the three ideas I had left. Second, in scanning through the survey data, I saw a lot of evidence that can be used to either support or go against my idea. 


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [10]:
%reload_ext autoreload
%autoreload 2
print("Autoreload of imported modules enabled. Be sure to save your work in other modules!")

Autoreload of imported modules enabled. Be sure to save your work in other modules!


We continue by importing the helper functions from `data_utils`.

In [3]:
# TODO: You complete the code blocks from here forward!

Next, ... (you take it from here and add additional code and markdown cells to read in the CSV file and process it as needed)

In [11]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

### Using `read_csv_rows` functions
To begin my analysis, I need supporting information that relates to my idea. First, I will be using the `read_csv_rows` function to gain an organized overview of the data. From the function, I could glimpse the number of students that were surveyed (which is indicated by the rows) and the different questions that were answered (indicated by the separate columns).


In [20]:
from data_utils import read_csv_rows
data_rows: list[dict[str, str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)

if len(data_rows) == 0:
    print("Go implement read_csv_rows in data_utils.py")
    print("Be sure to save your work before re-evaluating this cell!")
else:
    print(f"Data File Read: {SURVEY_DATA_CSV_FILE_PATH}")
    print(f"{len(data_rows)} rows")
    print(f"{len(data_rows[0].keys())} columns")
    print(f"Columns names: {data_rows[0].keys()}")

Data File Read: ../../data/survey.csv
620 rows
35 columns
Columns names: dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


### Using `columnar` functions
With the help of the `columnar` function, I can make each column into a dictionary with each question/key assigned with the answer/value. This function will help me better organize the data and operate with the following functions down the line. 

In [21]:
from data_utils import columnar

data_cols: dict[str, list[str]] = columnar(data_rows)

if len(data_cols.keys()) == 0:
    print("Complete your implementation of columnar in data_utils.py")
    print("Be sure to follow the guidelines above and save your work before re-evaluating!")
else:
    print(f"{len(data_cols.keys())} columns")
    print(f"{len(data_cols['row'])} rows")
    print(f"Columns names: {data_cols.keys()}")

35 columns
620 rows
Columns names: dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


### Using `head` functions
Because survey data is long and challenging to view in the CSV format, the `head` function allows me to cut parts of the data out for a quick scan. From here, I can start to look for patterns or information that stand out that can potentially support my case. 

In [22]:
from data_utils import head
from tabulate import tabulate

data_cols_head: dict[str, list[str]] = head(data_cols, 5)

if len(data_cols_head.keys()) != len(data_cols.keys()) or len(data_cols_head["row"]) != 5:
    print("Complete your implementation of columnar in data_utils.py")
    print("Be sure to follow the guidelines above and save your work before re-evaluating!")

tabulate(data_cols_head, data_cols_head.keys(), "html")

row,year,unc_status,comp_major,primary_major,data_science,prereqs,prior_exp,ap_principles,ap_a,other_comp,prior_time,languages,hours_online_social,hours_online_work,lesson_time,sync_perf,all_sync,flipped_class,no_hybrid,own_notes,own_examples,oh_visits,ls_effective,lsqs_effective,programming_effective,qz_effective,oh_effective,tutoring_effective,pace,difficulty,understanding,interesting,valuable,would_recommend
0,22,Returning UNC Student,No,Mathematics,No,"MATH 233, MATH 347, MATH 381",7-12 months,No,No,UNC,1 month or so,"Python, R / Matlab / SAS",3 to 5 hours,0 to 2 hours,6,2,2,1,2,4,4,0,7,3,7,5,,,1,1,7,5,6,5
1,25,Returning UNC Student,No,Mathematics,Yes,"MATH 130, MATH 231, STOR 155",None to less than one month!,,,,,,0 to 2 hours,5 to 10 hours,4,3,3,1,2,6,4,5,5,5,5,5,7.0,6.0,6,6,3,4,6,4
2,25,Incoming First-year Student,Yes - BA,Computer Science,No,"MATH 130, MATH 152, MATH 210",None to less than one month!,,,,,,3 to 5 hours,5 to 10 hours,3,3,4,2,1,7,7,2,5,6,7,7,4.0,,6,4,6,7,7,7
3,24,Returning UNC Student,Yes - BS,Computer Science,Maybe,"MATH 231, MATH 232, STOR 155",2-6 months,No,No,High school course (IB or other),None to less than one month!,Python,3 to 5 hours,3 to 5 hours,5,5,4,3,3,6,5,1,6,3,5,5,5.0,4.0,4,4,5,6,6,6
4,25,Incoming First-year Student,Yes - BA,Computer Science,No,MATH 130,None to less than one month!,,,,,,0 to 2 hours,3 to 5 hours,7,3,3,3,2,6,3,5,6,6,6,6,7.0,3.0,6,5,5,6,6,7


### Using `select` functions
After scanning through the sample, I picked out three questions I thought could be interesting to support my case. To further separate and emphasize my finding, I use the `select` function to isolate the questions into a chart format. Because now I have fewer columns, I can extend the rows to access more data.  

In [30]:
from data_utils import select

selected_data: dict[str, list[str]] = select(data_cols, ["sync_perf", "all_sync", "ls_effective"])

tabulate(head(selected_data, 20), selected_data.keys(), "html")

sync_perf,all_sync,ls_effective
2,2,7
3,3,5
3,4,5
5,4,6
3,3,6
2,2,7
3,3,7
2,2,7
5,4,7
2,2,7


### Using `common_value` functions
I made `common_value` to quickly show the most occurs response to gain a quick glimpse of what was the most popular choice for each question. 

In [28]:
from data_utils import common_value

sync_perf: dict[str, int] = common_value(selected_data["sync_perf"])
print(f"sync_perf: {sync_perf}")

all_sync: dict[str, int] = common_value(selected_data["all_sync"])
print(f"all_sync: {all_sync}")

ls_effective: dict[str, int] = common_value(selected_data["ls_effective"])
print(f"ls_effective: {ls_effective}")


sync_perf: 2
all_sync: 1
ls_effective: 7


### Using `count` functions
Although the `select` and `common_value` function result was very insightful, I'm still missing out on the question data from most of the survey. To further insure the data, now I can use the `count` function to go through all the people in the data and collect their responses. The count function shows how often the respondent chooses an answer.

In [38]:
from data_utils import count

sync_perf: dict[str, int] = count(selected_data["sync_perf"])
print(f"sync_perf: {sync_perf}")

all_sync: dict[str, int] = count(selected_data["all_sync"])
print(f"all_sync: {all_sync}")

ls_effective: dict[str, int] = count(selected_data["ls_effective"])
print(f"ls_effective: {ls_effective}")


sync_perf: {'2': 154, '3': 94, '5': 42, '4': 101, '6': 29, '1': 149, '7': 51}
all_sync: {'2': 155, '3': 71, '4': 85, '5': 44, '1': 201, '7': 39, '6': 25}
ls_effective: {'7': 257, '5': 120, '6': 154, '4': 46, '1': 8, '3': 28, '2': 7}


## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion



My recommendation was, "The course should make the in-person lectures recorded and saved because the recorded videos are more engaging for the students." From my data analysis, I found much information that supports my case. When got questioned, "Student's performance in this course would improve if every lecture were synchronous with required attendance during the regularly scheduled meeting time" and "Student would prefer this course to require every lecture be synchronous with required attendance during the regularly scheduled meeting time" most students chose to disagree with the statement strongly. From my results, the most common responses were either a two or a one(the lowest it can go), overall heavily leaning towards Strongly Disagree. Furthermore, when the students were asked to value if the "Lesson videos are effective in helping the student learn the topics of the course," most respondents chose 7 (the highest it can go to agree strongly).

To summarize, most students didn't want the class to be required at set times and found the videos to be very valuable. Basically, all cases that points to question the efficiency of in-person lectures. Furthermore, I think the idea can be further elaborated on by directly asking the students for their opinion on live lectures (maybe add a question in the survey, especially asking their thoughts on the live lecture's effectiveness and ways to improve them). The potential trade-off from this proposal is going to affect the students who enjoy live teaching. There might be way fewer live lectures scheduled in general because of the low initiative to do them. Additionally, there are chances of students taking advantage of the new change and making an opportunity to slack off, which in turn diminishes productivity. 