# Analysis for Continuous Improvement

Author Name: Leia Reilly

9-digit PID: 730308277

Continuous Improvement embraces a belief there is _always room to make things better_. It is a mindset and process we value and practice in this course. In this assignment, you are able to practice continuous improvement and contribute to the design ideas of the course.

## Brainstorming Ideas

Reflect on your personal experiences and observations in COMP110 and **brainstorm modifications to the course that _create value_ beyond its current design**. When brainstorming, try not to be critical of the ideas you come up with regarding scale, stakeholders impacted, or for any other reasons. In the markdown cell below, brainstorm 3 to 5 ideas you think would create value for you.

Each brainstormed idea should state a. the suggested change or addition, b. what the expected value created, and c. which specific stakeholders would benefit.  If helpful, expand on the following template "The course should (state idea here) because it will (state value created here) for (insert stakeholders here)."

Example A: "The course should use only examples from psychology experiments because it will be more relevant for students who are psychology majors."

Example B: "The course should not have post-lesson questions because they are not useful for most students in the class."

### Part 1. Creative Ideation

1. "The course should offer more practice working with real data because major/minor students will be using data in real life situations"
2. "The course should offer in person class on Tuesdays becuase many students are better able to focus while working in person"
3. "The course should continue to offer online option becuase students would feel less stress about coivd and will be able to access education even if impacted by other life events."
4. "The course should offer more connections to outside projects so students interested in learning more can involve themselves in more of the course."
5. "This course should offer platforms for students to discuss theoretical questions and ideas since discussion and working with other people has been shown to help information retainment."
- it's a good course, I like it

## Connecting with Available Data

The data you have available for this analysis is limited to the anonymized course survey you and your peers filled out a few weeks ago. The data is found in the `survey.csv` file in this exercise directory. Each row represents an individual survey response. Each column has a description which can be found on the project write-up here: <https://22s.comp110.com/exercises/ex08.html>

Review the list of available data and identify which one of your ideas _does not_, or is _least likely to_, have relevant data to support the analysis of your idea to create value. In the box below, identify which of your ideas lacks data and suggest how we might be able to collect this data in the future. One aspect of _continuous improvement_ is trying to avoid "tunnel vision" where possible improvements are not considered because there is no data available to analyze it. Identifying new data sources can unlock improvements!

### Part 2. Identifying Missing Data

1. Idea without sufficient data to analyze:
     5

2. Suggestion for how to collect data to support this idea in the future:
After the first test offer the use of the platform before the second test and check score improvement.
* I'm aware we thave a platform already, but it's not widely used/known, so having one with just the TA's as moderators might be nice. Having the opportuity to discuss theory and ideas with others would be helpful." 

## Choosing an Idea to Analyze

Consider those of your ideas which _do_ seem likely to have relevant data to analyze. If none of your ideas do, spend a few minutes and brainstorm another idea or two with the added connection of data available on hand and add those ideas to your brainstormed ideas list.

Select the one idea which you believe is _most valuable_ to analyze relative to the others and has data to support the analysis of. In the markdown cell for Part 3 below, identify the idea you are exploring and articulate why you believe it is most valuable (e.g. widest impact, biggest opportunity for improvement, simplest change for significant improvement, and so on).

### Part 3. Choosing Your Analysis

1. Idea to analyze with available data:
"The course should offer more practice working with real data because major/minor students will be using data in real life situations"

2. This idea is more valuable than the others brainstormed because: 
    We will provide students insight for what a career in data or computer science would look like, prepare them for interacting with data in the future and help make useful programs for processing data. Non-major students would be able to see how data could be used and learn if it's something that they wish to do. 
    *I realize this is also kind of what I'm asking for more of in the course, and I'm rather enjoying this/find it useful. We should do more like this is my main point. 


## Your Analysis

Before you begin analysis, a reminder that we do not expect the data to support everyone's ideas and you can complete this exercise for full credit even if the data does not clearly support your suggestion or even completely refutes it. What we are looking for is a logical attempt to explore the data using the techniques you have learned up until now in a way that _either_ supports, refutes, or does not have a clear result and then to reflect on your findings after the analysis.

Using the utility functions you created for the previous exercise, you will continue with your analysis in the following part. Before you begin, refer to the rubric on the technical expectations of this section in the exercise write-up.

In this section, you are expected to interleave code and markdown cells such that for each step of your analysis you are starting with an English description of what you are planning to do next in a markdown cell, followed by a Python cell that performs that step of the analysis.

### Part 4. Analysis

We begin by changing some settings in the notebook to automatically reload changes to imported files.

In [37]:
%reload_ext autoreload
%autoreload 2

We continue by importing the helper functions from `data_utils`.

In [38]:
from data_utils import read_csv_rows
from data_utils import column_values
from data_utils import columnar
from tabulate import tabulate
from data_utils import head
from data_utils import select
from data_utils import concat
from data_utils import count
from data_utils import total
from matplotlib import pyplot as plt


We are recalling the data that we need to use. Namely, the survey from beginning of the year. 

In [39]:
SURVEY_DATA_CSV_FILE_PATH: str = "../../data/survey.csv"
DATA_DIRECTORY="../../data"
DATA_FILE_PATH=f"{DATA_DIRECTORY}/survey.csv"

Read an entire CSV of data into a list of rows, each row represented as dict[str, str]. We will be using the data from the major/minor column. 

In [40]:

data_rows: list[dict[str, str]] = read_csv_rows(DATA_FILE_PATH) 

# print(f"Data File Read: {DATA_FILE_PATH}")
# print(f"{len(data_rows)} rows")
# print(f"{len(data_rows[0].keys())} columns")
# print(f"Columns names: {data_rows[0].keys()}")

Produce a `list[str]` of all values in a single `column` whose name is the second parameter.
We will be doing this with `primary major`

In [41]:
comp_major: list[str] = column_values(data_rows, "comp_major")


print(f"Column 'comp_major' has {len(comp_major)} values.")
print("The first five values are:")
for i in range(5):
    print(comp_major[i])

Column 'comp_major' has 620 values.
The first five values are:
No
No
Yes - BA
Yes - BS
Yes - BA


Now we will transform the data into columns using the `columnar` function. It will go to a `dict[str, list[str]]`. This will let me see what the data columns are so I can select which one I want. 

In [42]:
data_cols: dict[str, list[str]] = columnar(data_rows)
print(f"{len(data_cols.keys())} columns")
print(f"{len(data_cols['comp_major'])} rows")
print(f"Columns names: {data_cols.keys()}")

35 columns
620 rows
Columns names: dict_keys(['row', 'year', 'unc_status', 'comp_major', 'primary_major', 'data_science', 'prereqs', 'prior_exp', 'ap_principles', 'ap_a', 'other_comp', 'prior_time', 'languages', 'hours_online_social', 'hours_online_work', 'lesson_time', 'sync_perf', 'all_sync', 'flipped_class', 'no_hybrid', 'own_notes', 'own_examples', 'oh_visits', 'ls_effective', 'lsqs_effective', 'programming_effective', 'qz_effective', 'oh_effective', 'tutoring_effective', 'pace', 'difficulty', 'understanding', 'interesting', 'valuable', 'would_recommend'])


I'm next going to use the head function and tabulate to put my data into a more comprehensive form

In [43]:
data_cols_head: dict[str, list[str]] = head(data_cols, 5)
tabulate(data_cols_head, data_cols_head.keys(), "html")

row,year,unc_status,comp_major,primary_major,data_science,prereqs,prior_exp,ap_principles,ap_a,other_comp,prior_time,languages,hours_online_social,hours_online_work,lesson_time,sync_perf,all_sync,flipped_class,no_hybrid,own_notes,own_examples,oh_visits,ls_effective,lsqs_effective,programming_effective,qz_effective,oh_effective,tutoring_effective,pace,difficulty,understanding,interesting,valuable,would_recommend
0,22,Returning UNC Student,No,Mathematics,No,"MATH 233, MATH 347, MATH 381",7-12 months,No,No,UNC,1 month or so,"Python, R / Matlab / SAS",3 to 5 hours,0 to 2 hours,6,2,2,1,2,4,4,0,7,3,7,5,,,1,1,7,5,6,5
1,25,Returning UNC Student,No,Mathematics,Yes,"MATH 130, MATH 231, STOR 155",None to less than one month!,,,,,,0 to 2 hours,5 to 10 hours,4,3,3,1,2,6,4,5,5,5,5,5,7.0,6.0,6,6,3,4,6,4
2,25,Incoming First-year Student,Yes - BA,Computer Science,No,"MATH 130, MATH 152, MATH 210",None to less than one month!,,,,,,3 to 5 hours,5 to 10 hours,3,3,4,2,1,7,7,2,5,6,7,7,4.0,,6,4,6,7,7,7
3,24,Returning UNC Student,Yes - BS,Computer Science,Maybe,"MATH 231, MATH 232, STOR 155",2-6 months,No,No,High school course (IB or other),None to less than one month!,Python,3 to 5 hours,3 to 5 hours,5,5,4,3,3,6,5,1,6,3,5,5,5.0,4.0,4,4,5,6,6,6
4,25,Incoming First-year Student,Yes - BA,Computer Science,No,MATH 130,None to less than one month!,,,,,,0 to 2 hours,3 to 5 hours,7,3,3,3,2,6,3,5,6,6,6,6,7.0,3.0,6,5,5,6,6,7


Next, I'm going to use the select function to gather the data that will help me analyze interest in computer science.

In [44]:
selected_data: dict[str, list[str]] = select(data_cols, ["comp_major", "data_science", "prior_time", "languages"])
tabulate(head(selected_data, 10), selected_data.keys(), "html")

comp_major,data_science,prior_time,languages
No,No,1 month or so,"Python, R / Matlab / SAS"
No,Yes,,
Yes - BA,No,,
Yes - BS,Maybe,None to less than one month!,Python
Yes - BA,No,,
Yes - BS,Maybe,1 month or so,"Python, Java / C#, JavaScript / TypeScript, HTML / CSS"
Yes - BA,Yes,7-12 months,"Python, Java / C#, JavaScript / TypeScript, HTML / CSS, Bash"
Yes - BA,No,,
Yes - BS,Yes,,
No,No,,


Now we'll take this data and put it into dict[str, list[str]]

In [45]:
additional_table: dict[str, list[str]] = columnar(read_csv_rows(f"{DATA_DIRECTORY}/nc_durham_2015_march_27.csv"))

combined = concat(data_cols_head, additional_table)

tabulate(head(combined, 10), combined.keys(), "html")

row,year,unc_status,comp_major,primary_major,data_science,prereqs,prior_exp,ap_principles,ap_a,other_comp,prior_time,languages,hours_online_social,hours_online_work,lesson_time,sync_perf,all_sync,flipped_class,no_hybrid,own_notes,own_examples,oh_visits,ls_effective,lsqs_effective,programming_effective,qz_effective,oh_effective,tutoring_effective,pace,difficulty,understanding,interesting,valuable,would_recommend,raw_row_number,date,time,location,county_name,subject_age,subject_race,subject_sex,officer_id_hash,department_name,type,arrest_made,citation_issued,warning_issued,outcome,contraband_found,contraband_drugs,contraband_weapons,frisk_performed,search_conducted,search_person,search_vehicle,search_basis,reason_for_frisk,reason_for_search,reason_for_stop,raw_Ethnicity,raw_Race,raw_action_description
0,22,Returning UNC Student,No,Mathematics,No,"MATH 233, MATH 347, MATH 381",7-12 months,No,No,UNC,1 month or so,"Python, R / Matlab / SAS",3 to 5 hours,0 to 2 hours,6,2,2,1,2,4,4,0,7,3,7,5,,,1,1,7,5,6,5,19179512,2015-03-27,01:02:00,"nan, Durham County",Durham County,25,black,male,a4d178e9f0,Durham Police Department,vehicular,False,False,True,warning,,,,False,False,False,False,,,,Vehicle Equipment Violation,N,B,Verbal Warning
1,25,Returning UNC Student,No,Mathematics,Yes,"MATH 130, MATH 231, STOR 155",None to less than one month!,,,,,,0 to 2 hours,5 to 10 hours,4,3,3,1,2,6,4,5,5,5,5,5,7.0,6.0,6,6,3,4,6,4,19179517,2015-03-27,03:06:00,"nan, Durham County",Durham County,46,white,male,0e55c98bd1,Durham Police Department,vehicular,False,False,True,warning,,,,False,False,False,False,,,,Speed Limit Violation,N,W,Verbal Warning
2,25,Incoming First-year Student,Yes - BA,Computer Science,No,"MATH 130, MATH 152, MATH 210",None to less than one month!,,,,,,3 to 5 hours,5 to 10 hours,3,3,4,2,1,7,7,2,5,6,7,7,4.0,,6,4,6,7,7,7,19179520,2015-03-27,03:42:00,"nan, Durham County",Durham County,46,hispanic,male,c0b31bf1de,Durham Police Department,vehicular,False,True,False,citation,,,,False,False,False,False,,,,Speed Limit Violation,H,W,Citation Issued
3,24,Returning UNC Student,Yes - BS,Computer Science,Maybe,"MATH 231, MATH 232, STOR 155",2-6 months,No,No,High school course (IB or other),None to less than one month!,Python,3 to 5 hours,3 to 5 hours,5,5,4,3,3,6,5,1,6,3,5,5,5.0,4.0,4,4,5,6,6,6,19179521,2015-03-27,06:55:00,"nan, Durham County",Durham County,25,white,male,8fbd51c440,Durham Police Department,vehicular,False,True,False,citation,,,,False,False,False,False,,,,Speed Limit Violation,N,W,Citation Issued
4,25,Incoming First-year Student,Yes - BA,Computer Science,No,MATH 130,None to less than one month!,,,,,,0 to 2 hours,3 to 5 hours,7,3,3,3,2,6,3,5,6,6,6,6,7.0,3.0,6,5,5,6,6,7,19179522,2015-03-27,07:30:00,"nan, Durham County",Durham County,38,white,female,dbdd0133c4,Durham Police Department,vehicular,False,False,True,warning,,,,False,False,False,False,,,,Speed Limit Violation,N,W,Verbal Warning


Now we'll see how much we have of each key peice of data, allowing us to analyze what it means. 

In [46]:
major_counts: dict[str, int] = count(selected_data["comp_major"])
print(f"major_counts: {major_counts}")

data_sci_counts: dict[str, int] = count(selected_data["data_science"])
print(f"data_sci_counts: {data_sci_counts}")

prior_time_counts: dict[str, int] = count(selected_data["prior_time"])
print(f"prior_time_counts: {prior_time_counts}")

# languages_counts: dict[str, int] = count(selected_data["languages"])
# print(f"languages_counts: {languages_counts}")


major_counts: {'No': 335, 'Yes - BA': 78, 'Yes - BS': 172, 'Yes - Minor': 35}
data_sci_counts: {'No': 358, 'Yes': 93, 'Maybe': 169}
prior_time_counts: {'1 month or so': 69, '': 369, 'None to less than one month!': 102, '7-12 months': 14, '2-6 months': 49, '1-2 years': 10, '> 2 years': 7}


## Conclusion

In the following markdown cell, write a reflective conclusion given the analysis you performed and identify recommendations.

If your analysis of the data supports your idea, state your recommendation for the change and summarize the data analysis results you found which support it. Additionally, describe any extensions or refinements to this idea which might be explored further. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change.

If your analysis of the data is inconclusive, summarize why your data analysis results were inconclusive in the support of your idea. Additionally, describe what experimental idea implementation or additional data collection might help build more confidence in assessing your idea. Finally, discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by experimenting with your idea.

Finally, if your analysis of the data does not support it, summarize your data analysis results and why it refutes your idea. Discuss the potential costs, trade-offs, or stakeholders who may be negatively impacted by this proposed change. If you disagree with the validity of the findings, describe why your idea still makes sense to implement and what alternative data would better support it. If you agree with the validity of the data analysis, describe what alternate ideas or extensions you would explore instead. 

### Part 5. Conclusion



In previous excersise I discovered that of the people taking computer science, 335 are non-majors, but the remaining 46% are. The computer science majors will proably face practical application of their skills later, and the non-majors should have the chance to see if they will enjoy the projects computer science could bring.  
Additionally, 27.74% of the class said they may decide to add a data science major. Data analytics projects will help them narrow down and see if they enjoy working with real life data. 
An extra survey question asking what types of projects would interest the class would also be viable. By breaking it into categories such as creating a game, analyzing data, or predicting trends you could find out what the class might be interested in. 
This will impact the entire class. While those that are undecided in data-science might be swayed one way or another, there's a chance there will be less interest from the computer science majors and the prospective computer science majors along with the others who are not interested in data analytics. However, real world analysis has the potential to take the class from the theoretical to the practical, serving as a glimpse into future opportunities. 
Including more data analysis in this course is reccomended. If we cannot fit everything into the course, then we could potentially offer links to real world analysis projects for students to do independently, or offer as extra credit a small project in analytics.

They will enjoy the practical appliation, and there's a chance that the non-majors will too. 