# Analysis for Continuous Improvement

Author Name: Ryan Dee

9-digit PID: 730464883

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. Office Hours should be expanded as they are the best way for students to interact with the course.
2. Only computer science majors should be able to take this class because most students are already computer science majors
3. This class should have less quizzes because students find them to not be as effective as lessons
4. Lessons should be only in class because students would be more likely to attend and retain information
5. There should be an alternative class to comp 110 which moves faster since most students have prior experience 

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze: Office hours should be expanded because they are the best way for students to interact with the course

2. Suggestion for how to collect data to support this idea in the future: Ask how students interact with teachers and TAs the most, to get information from those who know the most.

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data: Lessons should be only in class because students would be more likely to attend and retain information

2. This idea is more valuable than the others brainstormed because: While it is a serious difficult change, it would have the most significant improvement over the length of the course in ensuring students are able to ask questions as opposed to videos.

## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [101]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`.

In [102]:
# TODO: You complete the code blocks from here forward!
from data_utils import read_csv_rows
from data_utils import column_values
from data_utils import columnar
from data_utils import head
from data_utils import select
from data_utils import concat



Next, ... (you take it from here and add additional code and markdown cells to read in the CSV file and process it as needed)

In [103]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"

To tell us whether students are learning from online videos we will look at the response from ls_effective, as well as no_hybrid. This will help us form a baseline for comparison between ls_effective and no_hybrid.

In [104]:
DATA_DIRECTORY="../../data"
DATA_FILE_PATH=f"{DATA_DIRECTORY}/survey.csv"
data_rows: list[dict[str, str]] = read_csv_rows(DATA_FILE_PATH)
data_cols: dict[str, list[str]] = columnar(data_rows)
int_ls_effective: list[int] = []
i: int = 0
# while i < len(ls_effective[columns]):
    # int_ls_effective.append(int(ls_effective[columns][i]))
    # i = i + 1

# int_no_hybrid: list[int] = []
# for value in "no_hybrid":
    # int_no_hybrid.append(int(value))

selected_data: dict[str, list[str]] = select(data_cols, ['ls_effective'])
selected_data_no_hybrid: dict[str, list[str]] = select(data_cols, ['no_hybrid'])


print(selected_data)
print(selected_data_no_hybrid)

{'ls_effective': ['7', '5', '5', '6', '6', '7', '7', '7', '7', '7', '7', '6', '4', '7', '6', '6', '7', '4', '6', '7', '5', '6', '7', '7', '1', '7', '7', '6', '7', '5', '6', '5', '6', '7', '7', '6', '6', '6', '7', '5', '7', '7', '7', '6', '5', '6', '6', '5', '3', '6', '7', '7', '6', '4', '7', '7', '6', '6', '6', '7', '7', '6', '7', '7', '7', '4', '6', '7', '7', '6', '7', '5', '5', '2', '7', '6', '6', '7', '7', '6', '7', '6', '5', '6', '7', '7', '6', '7', '6', '6', '2', '5', '4', '2', '5', '7', '7', '5', '7', '5', '7', '5', '3', '5', '7', '7', '7', '6', '6', '6', '7', '5', '6', '4', '7', '7', '7', '5', '7', '5', '7', '5', '6', '3', '7', '7', '3', '7', '6', '5', '7', '1', '6', '7', '7', '7', '7', '6', '7', '6', '3', '5', '6', '6', '7', '7', '4', '7', '5', '4', '5', '4', '7', '7', '5', '7', '5', '7', '6', '6', '6', '4', '5', '7', '6', '7', '6', '5', '7', '1', '3', '7', '5', '6', '5', '7', '7', '5', '3', '7', '7', '5', '7', '5', '7', '5', '7', '7', '7', '5', '2', '5', '6', '5', '7', '4', '6

Next we will convert the list of strings to a list of intergers

In [105]:
def convert(given: list[str]) -> list[int]:
    i: int = 0
    ret_list: list[int] = []
    while i < len(given):
        ret_list.append(int(given[i]))
        i = i + 1
    return ret_list
for key in selected_data:
    idx: int = 0
    while idx < len(selected_data):
        ls_effective_list: list[int] = convert(selected_data[key])
        idx = idx + 1
for key in selected_data_no_hybrid:
    alt: int = 0
    while alt < len(selected_data_no_hybrid):
        no_hybrid_list: list[int] = convert(selected_data_no_hybrid[key])
        alt = alt + 1
  
print('ls_effective')
print(ls_effective_list)
print('no_hybrid')
print(no_hybrid_list)


ls_effective
[7, 5, 5, 6, 6, 7, 7, 7, 7, 7, 7, 6, 4, 7, 6, 6, 7, 4, 6, 7, 5, 6, 7, 7, 1, 7, 7, 6, 7, 5, 6, 5, 6, 7, 7, 6, 6, 6, 7, 5, 7, 7, 7, 6, 5, 6, 6, 5, 3, 6, 7, 7, 6, 4, 7, 7, 6, 6, 6, 7, 7, 6, 7, 7, 7, 4, 6, 7, 7, 6, 7, 5, 5, 2, 7, 6, 6, 7, 7, 6, 7, 6, 5, 6, 7, 7, 6, 7, 6, 6, 2, 5, 4, 2, 5, 7, 7, 5, 7, 5, 7, 5, 3, 5, 7, 7, 7, 6, 6, 6, 7, 5, 6, 4, 7, 7, 7, 5, 7, 5, 7, 5, 6, 3, 7, 7, 3, 7, 6, 5, 7, 1, 6, 7, 7, 7, 7, 6, 7, 6, 3, 5, 6, 6, 7, 7, 4, 7, 5, 4, 5, 4, 7, 7, 5, 7, 5, 7, 6, 6, 6, 4, 5, 7, 6, 7, 6, 5, 7, 1, 3, 7, 5, 6, 5, 7, 7, 5, 3, 7, 7, 5, 7, 5, 7, 5, 7, 7, 7, 5, 2, 5, 6, 5, 7, 4, 6, 7, 7, 4, 5, 5, 5, 7, 5, 7, 3, 6, 7, 7, 5, 7, 7, 6, 7, 5, 7, 7, 5, 7, 7, 6, 1, 6, 7, 6, 7, 5, 7, 7, 7, 6, 7, 7, 7, 5, 7, 7, 7, 6, 4, 6, 6, 6, 6, 6, 5, 7, 5, 4, 7, 5, 7, 7, 2, 5, 7, 5, 7, 5, 7, 3, 7, 4, 7, 6, 5, 7, 7, 5, 6, 5, 7, 7, 5, 6, 6, 5, 6, 4, 7, 7, 6, 6, 6, 7, 7, 7, 4, 6, 6, 6, 7, 7, 7, 6, 7, 7, 6, 4, 7, 6, 7, 5, 6, 7, 6, 5, 5, 7, 7, 5, 7, 6, 4, 7, 7, 6, 4, 3, 3, 5, 7, 5, 6, 5, 6, 4, 7,

Now we have the two lists converted into numbers we can take the averages of the two, in this instance the average of the two will represent which people feel helps them the most.

In [106]:
def average(numbers: list[int]) -> float:
    return sum(numbers) / len(numbers)

print(average(ls_effective_list))
print(average(no_hybrid_list))


5.827419354838709
2.108064516129032


This shows the average response to whether lesson videos are effective in teaching the course, or people would rather attend in person.  
If we wanted to only get the first sections responses however, and have a more personalized response to the first section, that I am in this is the code that would do that.
I don't know exactly how many people are in my section so to do this we will look at the data from the first 310 people

In [107]:

selected_data: dict[str, list[str]] = select(data_cols, ['ls_effective'])
selected_data_no_hybrid: dict[str, list[str]] = select(data_cols, ['no_hybrid'])
first_section_ls: dict[str, list[str]] = head(selected_data, 310)
first_section_hybrid: dict[str, list[str]] = head(selected_data_no_hybrid, 310)


for key in first_section_ls:
    idx: int = 0
    while idx < len(selected_data):
        ls_effective_list_selected: list[int] = convert(first_section_ls[key])
        idx = idx + 1
for key in first_section_hybrid:
    alt: int = 0
    while alt < len(selected_data_no_hybrid):
        no_hybrid_list_selected: list[int] = convert(first_section_hybrid[key])
        alt = alt + 1
# print('ls_effective_section 1')
# print(first_section_ls)
# print('no_hybrid section 1')
# print(first_section_hybrid)

def average(numbers: list[int]) -> float:
    return sum(numbers) / len(numbers)

print(average(ls_effective_list_selected))
print(average(no_hybrid_list_selected))


5.827419354838709
2.108064516129032


As seen there is almost no difference in the averages between the two averages, thus it would be best to stick with the current system.

## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysys results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion 

Overall my conclusions showed that people prefer the online lessons to a forced in person option.  

This refutes my idea to do more in person classes as students would go and retain more information.
We know this because the average for lessons was higher than the option to stop streaming, this means that on average for the whole class people feel they retain information more in videos than in lectures.
In this instance the stakeholders are mostly students, who would be exceeding or not doing as well based on the mode of instruction.  I greew with the validity of the analysis, however i wish a question was more geared towards the actual question of lessons or in person class, this would just make the data easier to understand.  One could ask on a scale of 1 to 7 with 1 being in person versus 7 being online lessons, this would make it easier because you could just average one value instead of comparing two.