### Ckodon Bootcamp Analysis Goals
A. Extract:
1. Total no. of students
2. Total no. of SHS students
3. Total no. of university students
4. Total no. of gap year students

B. Group university students:
1. KNUST students
2. University of Ghana Legon students
3. Other university students

C. Group gap year students:
1. Accra-based students
2. Kumasi-based students
3. Other-based students

D. Combine & Group:
1. KNUST + Kumasi-based students
2. University of Ghana + Accra-based students
3. Other university students + other-based students

E. Prepare DataFrames for Export
1. Add SAT Group column
2. Prepare Other(unsorted) students DataFrame
3. Prepare Kumasi students DataFrame
4. Prepare Accra students DataFrame
5. Prepare SHS students DataFrame

F. Export all DataFrames to excel workbook

**A. Import the data set and extract each category of students**

In [1]:
import pandas as pd

#import students
students = pd.read_excel("../data/Ckodon SAT Bootcamp Survey (Responses).xlsx")
n_students = students.shape[0]

# test import
# print(students.head())
print("Total number of students:", n_students)

Total number of students: 388


In [2]:
# extract all SHS students
shs = students[students["Are you currently enrolled at a Senior High School (SHS)?"] == "Yes"]
n_shs = shs.shape[0]
print("Total number of SHS students:", n_shs)

Total number of SHS students: 24


In [3]:
# extract all university students
univ = students[students["Are you currently enrolled at a university?"] == "Yes"]
n_univ = univ.shape[0]
print("Total number of university students:", n_univ)

Total number of university students: 292


In [4]:
# extract all gap year students
gap = students[students["Are you currently enrolled at a Senior High School (SHS)?"] == "No"][
    students["Are you currently enrolled at a university?"] == "No"]
n_gap = gap.shape[0]
print("Total number of gap year students:", n_gap)

Total number of gap year students: 72


  gap = students[students["Are you currently enrolled at a Senior High School (SHS)?"] == "No"][


In [5]:
# verify total
total = n_gap + n_univ + n_shs

print("n_students:", n_students)
print("Total:", total)  # should equal n_students

n_students: 388
Total: 388


**B. Group university students**

In [6]:
# Grouping university students (knust, legon, other_uni)
terms_knust = ["knust", "kwame nkrumah university of science and technology", "kwame", "nkrumah"]
terms_legon = ["legon", "university of ghana"]
knust = pd.DataFrame(columns=univ.columns)
legon = pd.DataFrame(columns=univ.columns)
other_uni = pd.DataFrame(columns=univ.columns)
added_rows = []

# group knust
for row in univ.index:
    for term in terms_knust:
        if term in univ.loc[row]["At which university are you currently enrolled?"].lower():
            knust.loc[len(knust)] = univ.loc[row]
            added_rows.append(row)
            break

# group legon
for row in univ.index:
    if row not in added_rows:
        for term in terms_legon:
            if term in univ.loc[row]["At which university are you currently enrolled?"].lower():
                legon.loc[len(legon)] = univ.loc[row]
                added_rows.append(row)
                break

# group other
for row in univ.index:
    if row not in added_rows:
        other_uni.loc[len(other_uni)] = univ.loc[row]

# check totals
n_knust = len(knust)
n_legon = len(legon)
n_other = len(other_uni)

print("Total number of KNUST students:", n_knust)
print("Total number of Legon students:", n_legon)
print("Total number of Other university students:", n_other, end="\n\n")

print("Total number of university students:", n_univ)
print("Combined total:", n_knust + n_legon + n_other)  # must match n_univ


Total number of KNUST students: 220
Total number of Legon students: 24
Total number of Other university students: 48

Total number of university students: 292
Combined total: 292


**C. Group Gap-Year Students**

In [7]:
# group gap year students (kumasi, accra, other)
terms_kumasi = ["kumasi", "ashanti"]
terms_accra = ["greater accra", "accra"]
kumasi = pd.DataFrame(columns=gap.columns)
accra = pd.DataFrame(columns=gap.columns)
other = pd.DataFrame(columns=gap.columns)
added_rows = []

# group kumasi
for row in gap.index:
    for term in terms_kumasi:
        if term in gap.loc[row]["Where do you currently live?"].lower():
            kumasi.loc[len(kumasi)] = gap.loc[row]
            added_rows.append(row)
            break

# group accra
for row in gap.index:
    for term in terms_accra:
        if term in gap.loc[row]["Where do you currently live?"].lower():
            accra.loc[len(accra)] = gap.loc[row]
            added_rows.append(row)
            break

# group other
for row in gap.index:
    if row not in added_rows:
        other.loc[len(other)] = gap.loc[row]

# check totals
n_accra = len(accra)
n_kumasi = len(kumasi)
n_other = len(other)

print("Total number of Accra gap-year students:", n_accra)
print("Total number of Kumasi gap-year students:", n_kumasi)
print("Total number of Other gap-year students:", n_other)

print("Total no. of gap year students:", n_gap)
print("Combined total:", n_accra + n_kumasi + n_other)  # should equal n_gap

Total number of Accra gap-year students: 27
Total number of Kumasi gap-year students: 21
Total number of Other gap-year students: 24
Total no. of gap year students: 72
Combined total: 72


**D. Combine by Region & Group**

**Grouping Algorithm**
1. Combine university students + gap year students.
2. Find combined total, t
3. Divide t by 30 to obtain no. of 30-member groups possible, n
4. Create list of n DataFrames.
5. Iterate over all females and assign each to DataFrame.
6. Iterate over all males and assign each to DataFrame.

In [8]:
# Group Kumasi + KNUST
kumasi_all = pd.concat([knust, kumasi], ignore_index=True)

t_kumasi_all = len(kumasi_all)
n_kumasi_all = round(t_kumasi_all / 30)

groups_kumasi = [pd.DataFrame(columns=kumasi_all.columns) for i in range(n_kumasi_all)]

# distribute females among groups
group_no = 0
for row in kumasi_all.index:
    if kumasi_all.loc[row]["Gender"] == "Female":
        groups_kumasi[group_no % n_kumasi_all].loc[len(groups_kumasi[group_no % n_kumasi_all])] = kumasi_all.loc[row]
        group_no += 1


# distribute males among groups
group_no = 0
for row in kumasi_all.index:
    if kumasi_all.loc[row]["Gender"] == "Male":
        groups_kumasi[group_no % n_kumasi_all].loc[len(groups_kumasi[group_no % n_kumasi_all])] = kumasi_all.loc[row]
        group_no += 1


# Gender analysis per group
print("=======KUMASI GROUPS GENDER ANALYSIS=======")
for i in range(len(groups_kumasi)):
    print(f"---------GROUP-{i+1}---------")
    n_males = len(groups_kumasi[i][groups_kumasi[i]["Gender"]=="Male"])
    n_females = len(groups_kumasi[i][groups_kumasi[i]["Gender"]=="Female"])
    print("Males:", n_males)
    print("Females:", n_females)
    print("---------------------------")


---------GROUP-1---------
Males: 27
Females: 4
---------------------------
---------GROUP-2---------
Males: 26
Females: 4
---------------------------
---------GROUP-3---------
Males: 26
Females: 4
---------------------------
---------GROUP-4---------
Males: 26
Females: 4
---------------------------
---------GROUP-5---------
Males: 26
Females: 4
---------------------------
---------GROUP-6---------
Males: 26
Females: 4
---------------------------
---------GROUP-7---------
Males: 26
Females: 4
---------------------------
---------GROUP-8---------
Males: 26
Females: 4
---------------------------


In [9]:
# Group Accra + University of Ghana
accra_all = pd.concat([legon, accra], ignore_index=True)
t_accra_all = len(accra_all)
n_accra_all = round(t_accra_all / 30)


groups_accra = [pd.DataFrame(columns=accra_all.columns) for i in range(n_accra_all)]


# distribute females among groups
group_no = 0
for row in accra_all.index:
    if accra_all.loc[row]["Gender"] == "Female":
        groups_accra[group_no % n_accra_all].loc[len(groups_accra[group_no % n_accra_all])] = accra_all.loc[row]
        group_no += 1

# distribute males among groups
group_no = 0
for row in accra_all.index:
    if accra_all.loc[row]["Gender"] == "Male":
        groups_accra[group_no % n_accra_all].loc[len(groups_accra[group_no % n_accra_all])] = accra_all.loc[row]
        group_no += 1

# Gender analysis per group
print("=======ACCRA GROUPS GENDER ANALYSIS=======")
for i in range(len(groups_accra)):
    print(f"---------GROUP-{i+1}---------")
    n_males = len(groups_accra[i][groups_accra[i]["Gender"]=="Male"])
    n_females = len(groups_accra[i][groups_accra[i]["Gender"]=="Female"])
    print("Males:", n_males)
    print("Females:", n_females)
    print("---------------------------")


---------GROUP-1---------
Males: 22
Females: 4
---------------------------
---------GROUP-2---------
Males: 22
Females: 3
---------------------------


In [10]:
# Combine Other + Other_uni w/o grouping
other_all = pd.concat([other, other_uni], ignore_index=True)

n_other_all = len(other_all)
print("Total unsorted students:", n_other_all)

Total unsorted students: 72


### Prepare DataFrames for Export
**Algorithm for combining groups**:
1. For every group in groups:
     - Append column at end of group, SAT Group.
     - Assign alphabet to SAT Group column for all students in group.
2. Concatenate all groups

In [11]:
# append SAT Group column to Students DataFrame
group_col = pd.Series(index=students.index)
students["SAT Group"] = group_col

# DataFrame for other (unsorted) group
sheet_unsorted = pd.DataFrame(columns=students.columns)
sheet_unsorted = pd.concat([sheet_unsorted, other_all], ignore_index=True)


# dataframe for kumasi groups
sheet_kumasi = pd.DataFrame(columns=students.columns)
group_letters = {0:"A", 1:"B", 2:"C", 3:"D", 4:"E", 5:"F", 6:"G", 7:"H", 8:"I", 9:"J", 10:"K", 11:"L", 12:"M", 13:"N", 14:"O", 15:"P", 16:"Q", 17:"R", 18:"S", 19:"T", 20:"U", 21:"V", 22:"W", 23:"X", 24:"Y", 25:"Z"}
for i in range(len(groups_kumasi)):
    sheet_kumasi = pd.concat([sheet_kumasi, groups_kumasi[i]], ignore_index=True)
    sheet_kumasi["SAT Group"].fillna(group_letters[i], inplace=True)


#  dataframe for accra groups
sheet_accra = pd.DataFrame(columns=students.columns)
for i in range(len(groups_accra)):
    sheet_accra = pd.concat([sheet_accra, groups_accra[i]], ignore_index=True)
    sheet_accra["SAT Group"].fillna(group_letters[i], inplace=True)

# dataframe for shs students
sheet_shs = pd.DataFrame(columns=students.columns)
sheet_shs = pd.concat([sheet_shs, shs], ignore_index=True)

### Export to Excel Workbook
1. Export sheet_shs to sheet1
2. Export sheet_kumasi to sheet2
3. Export sheet_accra to sheet3
4. Export sheet_unsorted to sheet4

In [12]:
with pd.ExcelWriter("../data/output.xlsx") as writer:
    sheet_kumasi.to_excel(writer, sheet_name="Kumasi")
    sheet_accra.to_excel(writer, sheet_name="Accra")
    sheet_unsorted.to_excel(writer, sheet_name="Unsorted")
    sheet_shs.to_excel(writer, sheet_name="SHS")