# Scoring Proposal
My plan for a new scoring system, in order to improve observability of trends and compariability over time. 

Previous strategy: 
Organizations were each classified as 'high capacity', 'moderate capacity', or 'low capacity', based on their responses. 

It was established that 'Agree' and 'Strongly agree' indicated strong capacity, while 'neutral', 'disagree', and 'strongly disagree' indicated a lack of capacity. For each section of the survey, organizations were grouped based on the proportion of responses that were 'agree' or 'strongly agree'. An agency was defined as 'strong' in a certain section if they had enough 'agree'/'strongly agree' responses in that section (threshold info below, if interested). Finally, organizations were grouped based on the number of sections identified as 'strong'. 

I wanted to change several aspects of this process: update the scoring of sections to reflect new survey structure, and create a different overall scoring system that allows for comparison over time. 

Survey Format
The survey format changed between 2023 and 2024. The 2023 survey was almost entirely multiple choice questions using the 'strongly disagree - strongly agree' scale. In 2024, each section had a combination of multiple choice questions using the 'strongly disagree - strongly agree' scale, multiple choice questions with different choices, and questions allowing multiple selections ('Select all that apply'). Since the proportion of each section made up of multiple choice questions had decreased, it did not make sense to classify sections as 'strong' or not, soley based on the multiple choice responses. 

Overall Scoring System
The 2023 scoring system simplified the responses by grouping agencies discretly, rather than recording responses continuously. As a result, the capacities of agencies were represented in an oversimplified way. 
The 2023 survey grouped agencies into 3 categories (high, medium, and low capacity) based on the number of sections labelled 'strong', and this label was applied to each section based on the proportion of 'agree'/'strongly agree' responses. This approach was not condusive to observing trends over time. The repeated discretization simplified the results too much, making it more difficult to measure an agency's progress. My goal was not to remove these discrete categories, but to add a continous measurement system, which the categories could be based on. 


First, section scoring. Instead of classifying each section as 'strong' or not, each section would have a numerical score. 

Strongly disagree: 1
Disagree: 2
Neutral: 3
Agree: 4
Strongly agree: 5

This rubric could be standardized differently later (ex: [-1, 1] or [0-100]), but the use of integers allows for easier scoring initially. 
Note that this approach keeps 'neutral' as a truly neutral response, instead of classifying it negatively. The following charts represent example data, to illustrate the importance of a continous scoring system. 

### EXAMPLE Agency Survey Responses by Year

In [18]:
import numpy as np
import pandas as pd

In [28]:
df_ex = pd.DataFrame({
    'Year 1': {
        "Our space is physically accessible for all clients, including disabled clients and elderly clients.": "Strongly disagree",
        "We can efficiently move food from delivery zone to the storage area.": "Neutral",
        "We can easily move food from the storage area to the clients.": "Disagree",
        "All of our equipment is in working order.": "Strongly disagree",
        "Our storage area is organized.": "Neutral",
        "We are able to store all our food according to food safety standards.": "Agree",
    },
    'Year 2': {
        "Our space is physically accessible for all clients, including disabled clients and elderly clients.": "Strongly agree",
        "We can efficiently move food from delivery zone to the storage area.": "Agree",
        "We can easily move food from the storage area to the clients.": "Neutral",
        "All of our equipment is in working order.": "Disagree",
        "Our storage area is organized.": "Neutral",
        "We are able to store all our food according to food safety standards.": "Strongly agree",
    }
})

df_ex

Unnamed: 0,Year 1,Year 2
"Our space is physically accessible for all clients, including disabled clients and elderly clients.",Strongly disagree,Strongly agree
We can efficiently move food from delivery zone to the storage area.,Neutral,Agree
We can easily move food from the storage area to the clients.,Disagree,Neutral
All of our equipment is in working order.,Strongly disagree,Disagree
Our storage area is organized.,Neutral,Neutral
We are able to store all our food according to food safety standards.,Agree,Strongly agree


In this example, the agency has clearly improved their operation capacity-- almost every question had an improved reponse. However, this improvement would be lost in analysis, because the agency's operational capacity classification would remain the same. The problem is not the discrete category of 'strong', but the lack of a continuous scale to track changes over time. For example, notice the jump from 'strongly disagree' to 'strongly agree' for the first question; this is a huge change, perhaps representing a big purchase like an elevator or moving to a new space. However, this increase would affect the agency's operational capacity score the same amount as a shift from 'neutral' to 'agree', which would likely represent a much smaller change in true operational capacity. 

Using the numeric scale, the agency would have recieved a score of -1 in Year 1, and 7 in Year 2. 
It is worth capuring the changes in scores, even if the overall designation ('strong' vs not) does not change. In the future, other departments at City Harvest might want to measure the effect of certain programs or initiatives on agency capacity, and the current thresholds are not specific enough to capture potential effects.  


# Summary
The proposed changes are as follows: 
- Multiple choice questions on the 'strongly disagree' - strongly agree' scale will be numerically scored based on the following scale: strongly disagree = 1, diagree = 2, neutral = 3, agree = 4, strongly agree = 5. This scale can be standardized later, in order to present more intuitive final scores.
- Each section will recieve a section score, and each agency will recieve a overall score, which is a weighted average of the section scores. Based on the distribution of overall scores, agencies can be grouped into capacity levels.

Benefits of recording numerical scores instead exclusively of binary classifications:
- Avoids unecessary simplification of the data
- Allows for closer observation of changes over time
- Supports comparison of survey results over time, especially given changes in survey structure / questions