# Problem decomposition and Computational thinking
#### Problem decomposition :
To create the program, we have to first split the problem into smaller parts:
1. Read and store data from records.csv
2. Sort each tutorial group into 10 teams
3. Write the data into a new csv
#### Abstraction :
We used functions to make the process flow easier to follow and reduce complexity.
#### Pattern recognition :
We recognised that each tutorial group has the same number of students, so we could write an algorithm to allocate teams for a specific tutorial group, and repeat it for every tutorial group.
#### Algorithm design :
Considerations for each team:
1. School Affiliation - maximum 2 students from same school
2. Gender - maximum 3 students of same gender
3. Current CGPA - try to put highest gpa with lowest gpa, 2nd highest with 2nd lowest, etc...

# Pseudocode for allocate_teams


# Step1:Define the functions we need.

#### allocate_teams

This is the main function and it allocate the students in each tutorial group into 10 lists trying to meet the requirements to the maximum extent

In [1]:
# 1. Initial setup
# all_students: a copy of student_list
# remaining_students: a copy of student_list. To track students that haven't been assigned to a group.
# group_list: initialize a nested list with 10 empty groups inside for future group assignment.


def allocate_teams(student_list):
    all_students = student_list[:]
    remaining_students = student_list[:]
    group_list = [[] for i in range(10)]


# 2. Assign the first 4 students of each group.
# the 1st and 3rd student: pick students with the highest cgpa, set index to -1, pick from the end of the cgpa list.
# the 2nd and 4th student: pick students with lowest cgpa, set index to 0, pick from the start of the cgpa list.

    #assign 4 students to each group
    for i in range(4):
        #1st and 3rd pick take highest cgpa
        if i % 2 == 0:
            index = -1
        #2nd and 4th pick take lowest cgpa
        else:
            index = 0

# 3. Add student to each group
# Loop through the lists for every group in `group_list`, add current student to each group list.
# 4. Check school and gender of current list
# If current list does not exceed gender and school threshold, add next student to the group.
# Otherwise, try to swap student with another group. However, if swapper cannot find any valid target to swap with, then proceed to assign the next student.

        for group_num in range(len(group_list)):
            next_student = remaining_students[index]
            group = group_list[group_num]
            if not exceed_gender_or_school(group, next_student):
                assign_group(next_student, group_num, group, remaining_students)
            else:
                if swapper(next_student, all_students, remaining_students, group_list, group_num, group):
                    pass
                else:
                    #if swapper cant find any valid target, just assign
                    assign_group(next_student, group_num, group, remaining_students)


# 5. Calculate the total cgpa of each current groups and store them with the group_num into group_cgpa.
    group_cgpa = []
    for group_num in range(len(group_list)):
        total_cgpa = 0
        for student in group_list[group_num]:
            total_cgpa += student[1]
        group_cgpa.append([group_num, cgpa])

# 6. Sort groups by their `total_gpa` in ascending orderand store the groups in the list sort_output.
    sorted_output = []
    for group in group_cgpa:
        sort_by_ascending_cgpa(group, group[1], sorted_output)
    #sorted_output[[0, 16.4], [1, 16.5], [7, 16.7], ...]

    #each group left 1 slot, assign based on gpa e.g. lowest gpa group picks highest gpa student
# 7. Assign the student based on gpa e.g. lowest gpa group picks highest gpa student
# 8. Check if the sutdent meets the gender or school requirement, if meets, assign student to the group whose index is group_num. If it dosen't meet, swap it using `swapper`.If `swapper` cannot find any valid target,just assign student. Finally return `group_list`.
    for i in range(len(sorted_output)):
        group_num = sorted_output[i][0]
        group = group_list[group_num]
        student = remaining_students[-1]
        if not exceed_gender_or_school(group, student):
            assign_group(student, group_num, group, remaining_students)
        else:
            if swapper(student, all_students, remaining_students, group_list, group_num, group):
                pass
            else:
                #if swapper cant find any valid target, just assign
                assign_group(student, group_num, group, remaining_students)
    return group_list

# Now we get a list `group_list`, including 10 lists, each list include 5 student.


#### assign_group

This function is to assign student to the group and remove it from the remaining_students

In [2]:
def assign_group(student, group_num, group, remaining_students):
    # 1.Change the group_num of the student into the current group_num and append the student to the group whose index is group_num.
    student[6] = group_num
    student[5] = True
    group.append(student)
    if student in remaining_students:
      # 2.Remove the student from the remaining_students.
      remaining_students.remove(student)

#### swap_validator

This function supports the function `swapper()` . It is used to check if the try_student in the group can be replaced by student. If can ,the function will return True.

In [3]:
#1.Check if the current group meets the requirement, if meets, return false.
def swap_validator(student, try_student, group_list, group):
    if exceed_gender_or_school(group, try_student):
        return False

#2.Firstly, check if the try_student is in a group. If is, copy the group of try_student and remove try_student.
# Then check if we replace try_student with student, whether it can meet the requirement. If meets, return True.
    if try_student[5]:
        try_student_group_copy = group_list[try_student[6]][:]
        try_student_group_copy.remove(try_student)
        return not exceed_gender_or_school(try_student_group_copy, student)

#3.If try_student is not in a group, return True.
    else:
        return True

#### swapper

This function is to swap student with the nearest gpa to meet the gender and school requirement.

In [4]:
#1.Find the index of the student in the all_students list, initialize the parameters.
def swapper(student, all_students, remaining_students, group_list, group_num, group):
    index = find_index(student, all_students)
    swap = False
    search_up, search_down = True, True
    index_up, index_down = 1, 1

#2.Loop through every students and check if there are still any more students of higher or lower gpa to check.
    while search_up == True or search_down == True:
        #check if there are still any more students of higher/lower gpa to check
        if index + index_up > len(all_students) - 1:
            search_up = False
        if index - index_down < 0:
            search_down = False

#3.If there is a person whose gpa is higher than student, name this person try_student.
#  Check if the student can be replaced by try_student.If can ,the swap_validator function will return True and we will assign True to swap.
#  Then assign try_student to swap_student and break the loop.
        if search_up == True:
            try_student = all_students[index + index_up]
            if swap_validator(student, try_student, group_list, group):
                swap = True
                swap_student = try_student
                break

#4.If we cannot find a person with a higher gpa, we will check if there is someone with lower gpa.
# If there is ,simliarly ,we will name him try_student,check if student can be replaced by him .
# If can ,assign True to swap and assign try_student to swap_student and break the loop.
        if search_down == True:
            try_student = all_students[index - index_down]
            if swap_validator(student, try_student, group_list, group):
                swap = True
                swap_student = try_student
                break

#5.If the person with the nearest gpa with student cannot meet the requirement,
# increase the index_up and index_down to find the person who meets the requirement and have a nearer gpa.
        index_up += 1
        index_down += 1

#6.If we find someone ,swap the student and the swap_student from their own groups and return True.
#  If we cannot find a person suitable, return False.
    if swap == True:
        perform_swap(student, swap_student, remaining_students, group_num, group_list, group)
        return True
    else:
        return False

#### perform_swap

This function is to swap student and swap_student from their own group.

In [5]:
#1.If the swap_student have already been assigned to a group,find his group and remove him from the group.
def perform_swap(student, swap_student, remaining_students, group_num, group_list, group):
    if swap_student[5] == True:
        swap_student_group_num = swap_student[6]
        swap_student_group = group_list[swap_student_group_num]
        swap_student_group.remove(swap_student)

#2.Then assign student to the group of swap_student and assign swap_student to the group whose index is group_num
 #i.e.the group which the student was supposed to be assigned to
        assign_group(student, swap_student_group_num, swap_student_group, remaining_students)
        assign_group(swap_student, group_num, group, remaining_students)

#3.If the swap_student haven't been assigned to a group, then just assign him to the gourp instead of student.
    else:
        assign_group(swap_student, group_num, group, remaining_students)

#### find_index

This function supports the `swapper()` function, it finds the position of the student in the student_list and returns the index.

In [6]:
def find_index(value, list):
    for index in range(len(list)):
        if list[index] == value:
            return index

#### increase_count

This function supports the `exceed_gender_or_school()` function, it checks if the argument `key`(i.e. The school/gender of the student) is in the argument `dict`(The `schools`, `genders` dictionaries).

If the school of the student:
1.   already exists in the dictionary `schools`: add 1 to the value of the key(e.g."CCDS").
2.   is not in the dictionary `schools`: append a item with the school name as key and set the value to 1.

It works the same for gender.

In [7]:
def increase_count(dict, key):
    if key not in dict:
        dict[key] = 1
    else:
        dict[key] += 1

#### exceed_gender_or_school


check if the current group allocation have exceed gender or school. If there are more than 2 people from the same school, or more than 3 people of the same gender, return True.


3. examine the school and gender  attribute of the new student, add 1 to the value of key in the dicitonary if it already exist in the current group.


4. check the final dictionary for school and gender, return True if there are more than 2 students from the same school or more than 3 students are the same gender.

In [8]:
# 1. define function with group and new_student as input.
# Create two empty dictionaries for school and gender to keep track of the count.
def exceed_gender_or_school(group, new_student):
    schools = {}
    genders = {}
    # 2. loop through all existing students in current group, examine the school and gender of the student.
    # If the school/gender is already in the dictionaries, add 1 to the value of the key.
    for existing_student in group:
        #student[id, cgpa, gender, name, school, assigned, group number]
        increase_count(schools, existing_student[4])
        increase_count(genders, existing_student[2])
    #add gender and school of new student
    increase_count(schools, new_student[4])
    increase_count(genders, new_student[2])
    for school_count in schools.values():
        if school_count > 2:
            return True
    for gender_count in genders.values():
        if gender_count > 3:
            return True
    return False

#### read_to_dict

In [9]:
def read_to_dict(file_path):

    data = []

    with open(file_path, mode='r', encoding='utf-8') as file:
        lines = file.readlines()
        for line in lines[1:]:
            values = line.strip().split(',')

            student_data = {
                'Tutorial Group': values[0],
                'Student ID': values[1],
                'Name': values[3],
                'School': values[2],
                'Gender': values[4],
                'CGPA': float(values[5]),
                'Assigned': False,
                'Group Number': -1
            }

            data.append(student_data)

    # Dictionary to store the output
    output_dict = {}

    # creating the dictionary
    for record in data:
        tutorial_group = record['Tutorial Group']
        student_id = record['Student ID']

        # create dictionary for each tutorial group
        if tutorial_group not in output_dict:
            output_dict[tutorial_group] = {}

        # Add student data to the dictionary
        output_dict[tutorial_group][student_id] = {
            'Name': record['Name'],
            'School': record['School'],
            'Gender': record['Gender'],
            'CGPA': record['CGPA'],
            'Assigned': record['Assigned'],
            'Group Number': record['Group Number']
        }


    return output_dict

#### sort_by_ascending_cgpa



This function supports the main function `allocate_teams()`. It sorts all groups in the list `group_cgpa` by their `total_cgpa` in ascending order and store the groups in the list `sorted_output`.

1. define the function.

Mapping to main function `allocate_teams()`:


*   `to_sort` -> `group` (e.g. [0, 16.4])
*   `cgpa` -> `group[1]` (e.g. 16.4)
*   `output` -> `sorted_output`(e.g. [[0, 16.4]])

If output is empty, add current group to the output list.

In [10]:
def sort_by_ascending_cgpa(to_sort, cgpa, output):
    if output == []:
        output.append(to_sort)
        return
#loop through each existing entry to find where to insert current entry based on cgpa
# Let current entry(cgpa) move right in the list until it is less than or equal to another group's cgpa.
    index = 0
    for entry in output:
        entry_cgpa = entry[1]
        if cgpa > entry_cgpa:
            #move right in list
            index += 1
            #at end of list
            if index == len(output):
                output.append(to_sort)
                return
            continue
        elif cgpa <= entry_cgpa:
            output.insert(index, to_sort)
            return

# Step 2: Assign the students into different groups, get the data for analysis and write groups into csv file.

This is the main code and we will use the functions defined above.

#### 1. Read file and store the information into a dictionary.

In [11]:
student_data_dict = read_to_dict("records.csv")

#student_data_dict={tutorial_gruup:{id:{'Gender': x,
#                                       'School': x,
#                                       'Gender': x,
#                                       'CGPA': x,
#                                       'Assigned': False,
#                                       'Group Number': -1},
#                                   id2:{...},
#                                   ...
#                                   }
#                   tutorial_group2:{...}
#                   ...
#                   }

#### 2. Initialize required variables
`successful` keeps track of the number of groups that meet the requirement of maxmimum 3 of one gender and maximum 2 of one group <br>
`unsuccessful`keeps track of the number of groups that do not mmeet **both** requirements <br>
`half` keeps track of the number of groups that does meets **only one** requirement and does not meet the other <br>
`stdev` is a list that will eventually store the standard deviations of each tutorial group <br>
`final_data` is a list that will eventually store the student data that will be written into a new file

In [12]:
successful = 0
unsuccessful = 0
half = 0
stdevs = []
final_data = []

In [13]:

#3.Create the list `student_data` to stores each student's information.
# Assign student's cgpa to cgpa and sort the students by their cgpa in ascending order and store the student_data into a new list
for tgnum in student_data_dict:
    sorted_student_data = []
    tutorial_group = student_data_dict[tgnum]
    for id, data in tutorial_group.items():
        student_data = [id, data['CGPA'], data['Gender'], data['Name'], data['School'], data['Assigned'], data['Group Number']]
        cgpa = student_data[1]
        sort_by_ascending_cgpa(student_data, cgpa, sorted_student_data)

#4.Allocate the students from the list `output` into 10 lists stored in a list `groupz`.
    grouped_student_data = (allocate_teams(sorted_student_data))
#grouped_student_data=[[[student_id,cgpa,gender,name,school,assigned,group number],[...],[...],[...],[...]],...]

#5.Rearrange the order of the student data and store them into list `final_data`.
    for group in grouped_student_data:
        for student in group:
            final_student_data = [tgnum, student[0],student[4],student[3],student[2],str(student[1]),str(student[6])]
#                                Tutorial Group,Student ID ,School, Name, Gender, CGPA, Team Assigned
            final_data.append(final_student_data)

#6.Initialize the parameters
    cgpas = []

    for group in grouped_student_data:
        schoolpass = True
        genderpass = True
        cgpa = 0       #stores the cumulative cgpa of each group
        schools = {}
        genders = {}   #dictionatires to keep track of the number of genders and schools in each group

#7.Tally current genders and schools for each group
        for student in group:
            cgpa += student[1]
            #student=[id, cgpa, gender, name, school, assigned, group number]
            increase_count(schools, student[4])
            increase_count(genders, student[2])

#8.Check if there is the situation that school or gender doesn't meet the reqirement.
        for school_count in schools.values():
            if school_count > 2:
                schoolpass = False
        for gender_count in genders.values():
            if gender_count > 3:
                genderpass = False

#9.Define meeting two requirements as successful,only meeting one as half,not meeting both reqirements as unsuccessful.
#Calculate the number of each situation.
        if not schoolpass and not genderpass:
            unsuccessful += 1
        elif not schoolpass:
            half += 1
        elif not genderpass:
            half += 1
        else:
            successful += 1

#10.Append each group's total gpa to list`cgpas`.
        cgpas.append(cgpa)

#11.Compute the mean which is the average of each group's total gpa.
    mean = sum(cgpas) / 10

#12.Calculate squared differences.
    squared_diffs = [(x - mean) ** 2 for x in cgpas]

#13.Calculate variance
    variance = sum(squared_diffs) / len(data)

#14.Calculate standard deviation of cgpas for the groups in the tutorial group
    stdev = variance ** 0.5
    stdevs.append(stdev)
#now we get stdevs which is a list including 120 standard deviations of the 120 tutorial groups.

#### 15. Print the number of groups which were successful, unsuccessful and half successful in meeting the requirements

In [14]:
print(f"{successful} are successful, {unsuccessful} are unsuccessful and {half} are half successful")
#print(stdevs)

1174 are successful, 0 are unsuccessful and 26 are half successful


#### 16. Write the data in a new file

In [15]:
with open('new_records.csv','w') as f:
    f.write('Tutorial Group,Student ID,School,Name,Gender,CGPA,Team Assigned\n') #writing the header
    for student_data in final_data:
        f.writelines(','.join(student_data))
        f.write('\n')


# Step3:Analysis the result with pie chart.