# Analysis for Continuous Improvement

Author Name: Maia Vierengel

9-digit PID: 730225263

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. The course should have an option for Zoom Office Hours because it will provide flexibility and convenience for COMP110 students and UTA's. Often times, I've had a trivial issue that kept me from progressing on my code, but stalled my progress on an assingment for multiple days because I live off-campus.

2. The course should include more data analysis techniques (i.e. graph making and working with large datasets) because it will provide valuable experience for students in the sciences who want to utilize coding in research settings. 

3. This course should have more worked out problems (video solutions) like those that were available for Quiz 0, because they provide step-by-step solutions for students, which help students to fully comprehend topics and would reduce chaos during TA review sessions. 

4. This course should change the reading assignments such that they are only free response answers, because this will help students to focus on the meaning of the passage rather than hunting for small details to fulfill points.

5. This course should have optional assignments for hopeful computer science majors, because they could have the opportunity to learn what the specific major will be like, and help them decide whether or not to pursue it. 

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: 
Idea #3, about worked-out review videos for quiz preparation.

2. Suggestion for how to collect data to support this idea in the future: 
Ask a question about how effective the student believes those videos are for their quiz preparation and understanding of the course material. 

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: Idea #1, implementing a Zoom office hours option.

2. This idea is more valuable than the others brainstormed because: 
I believe it could be very simple to implement, and allow students (particularly those living off-campus) to get the help they need on assignments earlier. For example, if I am beginning my assignment at my house on a Friday afternoon and hit a snag with just setting up my assignment (not even the code itself), it will stall my progress for the next 3 days until Monday afternoon when I will finally be on campus again. I could walk 30 minutes to campus for a 15-minute office hour session, but if it's a simple issue of either downloading something incorrectly or not refreshing my jupyter notebook, it seems like a waste of time when it's not a conceptual issue. I think this would also decrease the chaos of office hours the day an assignment is due, as students will be able to make more progress earlier. 


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [130]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`.

In [131]:
from data_utils import *


Next, ... (you take it from here and add additional code and markdown cells to read in the CSV file and process it as needed)

## My Analysis

First, I'm going to read the rows of the survey data into a table with read_csv_rows. 

In [132]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

row_table: list[dict[str ,str]] = read_csv_rows(SURVEY_DATA_CSV_FILE_PATH)
row_table


[{'row': '0',
  'year': '22',
  'unc_status': 'Returning UNC Student',
  'comp_major': 'No',
  'primary_major': 'Mathematics',
  'data_science': 'No',
  'prereqs': 'MATH 233, MATH 347, MATH 381',
  'prior_exp': '7-12 months',
  'ap_principles': 'No',
  'ap_a': 'No',
  'other_comp': 'UNC',
  'prior_time': '1 month or so',
  'languages': 'Python, R / Matlab / SAS',
  'hours_online_social': '3 to 5 hours',
  'hours_online_work': '0 to 2 hours',
  'lesson_time': '6',
  'sync_perf': '2',
  'all_sync': '2',
  'flipped_class': '1',
  'no_hybrid': '2',
  'own_notes': '4',
  'own_examples': '4',
  'oh_visits': '0',
  'ls_effective': '7',
  'lsqs_effective': '3',
  'programming_effective': '7',
  'qz_effective': '5',
  'oh_effective': '',
  'tutoring_effective': '',
  'pace': '1',
  'difficulty': '1',
  'understanding': '7',
  'interesting': '5',
  'valuable': '6',
  'would_recommend': '5'},
 {'row': '1',
  'year': '25',
  'unc_status': 'Returning UNC Student',
  'comp_major': 'No',
  'primary_m

Then, I'm going to transform this row-oriented table into a column-oriented table using the columnar function. This way, it will give me a dictionary where each key is a column, or data point collected (i.e. unc_status, prereqs, etc.).

In [133]:
column_table: dict[str, list[str]] = columnar(row_table)
column_table

{'row': ['0',
  '1',
  '2',
  '3',
  '4',
  '5',
  '6',
  '7',
  '8',
  '9',
  '10',
  '11',
  '12',
  '13',
  '14',
  '15',
  '16',
  '17',
  '18',
  '19',
  '20',
  '21',
  '22',
  '23',
  '24',
  '25',
  '26',
  '27',
  '28',
  '29',
  '30',
  '31',
  '32',
  '33',
  '34',
  '35',
  '36',
  '37',
  '38',
  '39',
  '40',
  '41',
  '42',
  '43',
  '44',
  '45',
  '46',
  '47',
  '48',
  '49',
  '50',
  '51',
  '52',
  '53',
  '54',
  '55',
  '56',
  '57',
  '58',
  '59',
  '60',
  '61',
  '62',
  '63',
  '64',
  '65',
  '66',
  '67',
  '68',
  '69',
  '70',
  '71',
  '72',
  '73',
  '74',
  '75',
  '76',
  '77',
  '78',
  '79',
  '80',
  '81',
  '82',
  '83',
  '84',
  '85',
  '86',
  '87',
  '88',
  '89',
  '90',
  '91',
  '92',
  '93',
  '94',
  '95',
  '96',
  '97',
  '98',
  '99',
  '100',
  '101',
  '102',
  '103',
  '104',
  '105',
  '106',
  '107',
  '108',
  '109',
  '110',
  '111',
  '112',
  '113',
  '114',
  '115',
  '116',
  '117',
  '118',
  '119',
  '120',
  '121',
  '12

## Choosing Analysis Parameters

For the idea of implementing Zoom office hours, I am particularly interested in the survey questions asking about whether synchronus classes are preferred (sync_perf, all_sync, no_hybrid) because it will give me an idea of whether students believe online learning is valuable/should continue to be a part of COMP 110. I'm also interested in oh_visits and oh_effective to understand which students use and/or believe office hours are helpful.

First, I'm going to use select to create a table with only these five columns (synch_perf, all_synch, no_hybrid, oh_visits, oh_effective) using the select function.

In [134]:
chosen: list[str] = ["sync_perf", "all_sync", "no_hybrid", "oh_visits", "oh_effective"]

zoom_oh_table: dict[str, list[str]] = select(column_table, chosen)
zoom_oh_table

{'sync_perf': ['2',
  '3',
  '3',
  '5',
  '3',
  '2',
  '3',
  '2',
  '5',
  '2',
  '4',
  '2',
  '5',
  '2',
  '6',
  '2',
  '4',
  '1',
  '5',
  '1',
  '1',
  '3',
  '1',
  '3',
  '7',
  '1',
  '4',
  '2',
  '1',
  '5',
  '2',
  '5',
  '3',
  '1',
  '4',
  '4',
  '7',
  '1',
  '7',
  '3',
  '1',
  '1',
  '2',
  '3',
  '3',
  '6',
  '1',
  '4',
  '2',
  '2',
  '4',
  '1',
  '4',
  '1',
  '4',
  '1',
  '1',
  '4',
  '4',
  '2',
  '3',
  '1',
  '1',
  '3',
  '1',
  '4',
  '5',
  '2',
  '2',
  '5',
  '1',
  '4',
  '1',
  '4',
  '3',
  '1',
  '1',
  '7',
  '3',
  '1',
  '2',
  '2',
  '2',
  '1',
  '1',
  '3',
  '3',
  '1',
  '1',
  '2',
  '7',
  '7',
  '2',
  '7',
  '7',
  '3',
  '4',
  '1',
  '3',
  '4',
  '6',
  '3',
  '7',
  '3',
  '3',
  '1',
  '1',
  '2',
  '2',
  '7',
  '2',
  '2',
  '1',
  '7',
  '2',
  '4',
  '2',
  '1',
  '1',
  '2',
  '3',
  '7',
  '1',
  '4',
  '1',
  '2',
  '5',
  '4',
  '2',
  '7',
  '2',
  '7',
  '2',
  '1',
  '4',
  '1',
  '1',
  '7',
  '2',
  '4',
  '3',


To verify that select has given me the correct table, I'm going to use the head function to show a smaller portion of the table, with only 5 rows. Additionally, I'm going to use tabulate to view my data as a more traditional-looking table. 

In [135]:
from tabulate import tabulate

small: dict[str, list[str]] = head(zoom_oh_table, 5)

tabulate(small, small.keys(), "html")

sync_perf,all_sync,no_hybrid,oh_visits,oh_effective
2,2,2,0,
3,3,2,5,7.0
3,4,1,2,4.0
5,4,3,1,5.0
3,3,2,5,7.0


### Utilizing Count

Then, I'm going to use the count function to give me an idea of how many students gave which answers for each parameter. 

In [136]:
sync_perf_count: dict[str, int] = count(zoom_oh_table["sync_perf"])
sync_perf_count

{'2': 154, '3': 94, '5': 42, '4': 101, '6': 29, '1': 149, '7': 51}

In [137]:
all_sync_count: dict[str, int] = count(zoom_oh_table["all_sync"])
all_sync_count

{'2': 155, '3': 71, '4': 85, '5': 44, '1': 201, '7': 39, '6': 25}

In [138]:
no_hybrid_count: dict[str, int] = count(zoom_oh_table["no_hybrid"])
no_hybrid_count

{'2': 120, '1': 317, '3': 69, '5': 27, '6': 12, '4': 63, '7': 12}

In [139]:
oh_visits_count: dict[str, int] = count(zoom_oh_table["oh_visits"])
oh_visits_count

{'0': 237, '5': 28, '2': 95, '1': 163, '4': 26, '3': 71}

In [140]:
oh_effective_count: dict[str, int] = count(zoom_oh_table["oh_effective"])
oh_effective_count

{'': 191, '7': 182, '4': 68, '5': 59, '6': 101, '3': 14, '1': 2, '2': 3}

## What percentage agree/disagree?

Now, I have an idea of how many students a) prefer in-person to online learning and b) make use of and value office hours. However, the count function portrays the data in a way that is not easily digestible; I cannot guage how many students generally agree/disagree with each prompt.

Next, I can get a more general figure for each of these using the of_these_rows function I built, which gives a percentage of how many answers are above a threshold. For the first agree/disagree survey question, I chose the threshold value to be 5, as any who answer above it agree with the prompt to some extent. 

In [141]:
of_these_rows(convert_str(zoom_oh_table["oh_effective"]), 5)

55.16129032258065

In [142]:
of_these_rows(convert_str(zoom_oh_table["oh_visits"]), 1)

61.7741935483871

From the above cell, we can see that 55.2% of students agree to some extent that office hours are effectiveat helping the student learn topics in the course.  Furthermore, 61.8% of students replied that they visit office hours at least once per week.

In [143]:
of_these_rows(convert_str(zoom_oh_table["sync_perf"]), 4)

35.96774193548387

In [144]:
of_these_rows(convert_str(zoom_oh_table["all_sync"]), 4)

31.129032258064516

In [145]:
of_these_rows(convert_str(zoom_oh_table["no_hybrid"]), 4)

18.387096774193548

From the above three cells related to students' preferences between online learning and in-person learning, I set the threshold to 4 to have the percentages *include* those who neither agree nor disagree. Thus, I can find the percentage of students who disagree to some extent (answered 1-3) by subtracting this percentage from 100%. 

    sync_perf:  100% - 36.0% = 64.0%
    all_sync:  100% - 31.1% = 68.9% 
    no_hybrid:  100% - 18.4% = 81.6%


- 64.0% of students disagree their performance in this course would improve if every lecture were synchronous with required attendance
- 68.9% of students would not prefer this course to require every lecture be synchronous with required attendance
- 81.6% of students disagree that in-person lectures should not be live streamed so that everyone is required to attend in-person

Based on these last three figures, it seems as though a majority of students believe that there should be some degree of flexibility with regard to in-person and online (asynchronus/livestream) options when it comes to the COMP 110 course. 

## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion



Based on the analysis carried out, a majority of students utilize office hours and believe them to be effective at helping them with the course material. Additionally, a majority of students believe that COMP110 should not be strictly synchronus and in-person, which implies that students may appreciate more flexible options for office hours as well. 

Although these data points help to begin an analysis on whether COMP 110 students would prefer or benefit from the option of Zoom office hours, more data is needed to make such a conclusion. However, I do believe this analysis should bring attention to this area for improvement. A question regarding the preference for an online Office Hours option should be present on the next course survey to address this. 

Although an online option could be more accessible, this proposal could have downsides. Implementing Zoom OH as an option may take out the stakes required to utilize office hours and allow students to take them less seriously; for example, going to office hours each time they have a small problem rather than trying to figure it out themselves.  However, if we continue to use the hour-long cool-down time and limit office hour visits per day, I believe this problem can still be avoided. Furthermore, there are benefits to doing office hours in-person, as they can avoid the technical issues of Zoom and get straight to the problem at hand, saving time for both students and UTA's. 

In conclusion, I believe we've all learned through the last two years that although in-person learning is extremely valuable, it's important to have alternative, flexible options so that when life happens, we are able to be resilient in our learning.