<a href="https://colab.research.google.com/github/safry4/Research-Software-Engineering/blob/main/skills_audit_preprocessor.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Jupyter notebook export pre-processor

This notebook helps pre-process the skills-audit data for the Pair/Team builder script.

This notebook must be altered to match the questionnaire used for the skills audit...

In [None]:
'''
    A basic function to determine if a course is computer science related....

    @param: course:String      A string representing the course to check

    @return: _:Boolean         A boolean flag stating if the course is related to comp. sci. or not
'''
def is_comp_sci_related (course):
    course = course.lower().strip()
    compsci_terms =  ['computer', 'comp', 'software', 'information', 'technology', 'database', 'machine learning', 'data', 'automation']
    for compsci_term in compsci_terms:
        if compsci_term in course:
            return True

    return False


def get_english_score (response, index=0):
    response_index = [
            'I am not confident',
            'I am confident',
            'I have studied academically in an English-speaking environment',
            'I am confident writing in English',
            'I am confident in presenting my work / leading a meeting in English',
            'I am fluent/native speaker of English'
        ].index(response.strip())

    '''
        0,0

    '''

    if response_index == 0:
        return [0, 0][index]
    elif response_index == 1:
        return [2, 2][index]
    elif response_index == 2:
        return [2, 2][index]
    elif response_index == 3:
        return [1, 2][index]
    elif response_index == 4:
        return [2, 1][index]
    elif response_index == 5:
        return [2, 2][index]

### Modelling the Skills audit questionnaire

Here, we model the skills audit questionnaire as a dictionary. This model must reflect what the students were asked to supply.

In [None]:
'''
    Here, we model the questionnaire as a dictionary
'''


questionnaire = {
    'Response 1': {     # Open ended question about comp-sci proficiency
        'key': 'hasCompSciBackground',
        'isOpenEnded': True,
        'function': is_comp_sci_related,
        'options': None,
        'default': False
    },
    'Response 2': {     # english skill
        'key': 'EnglishProficiency',
        'keys': ['SpokenEnglishProficiency','WrittenEnglishProficiency'],
        'isOpenEnded': True,
        'function': get_english_score,
        'options': [
            'I am confident',
            'I am not confident',
            'I have studied academically in an English-speaking environment',
            'I am confident writing in English',
            'I am confident in presenting my work / leading a meeting in English'
            'I am fluent/native speaker of English',
        ],
        'defauit': 1
    },
    'Response 3': {     # programming skill
        'key': 'ProgrammingProficiency',
        'isOpenEnded': False,
        'function': None,
        'options': [
            'I have never programmed.',
            'I\'m not confident with my programming skills',
            'I\'m slightly confident with my programming skills',
            'I\'m very confident with my programming skills'
        ],
        'defauit': 1
    },
    'Response 4': {     # work experience
        'key': 'WorkExperience',
        'isOpenEnded': False,
        'function': None,
        'options': [
            'I have come straight from my first degree on to this programme',
            'I did a placement year as part of my first degree',
            'I have been working for at least one year since graduating',
            'I have been working for at least five years since graduating'
        ],
        'defauit': 1
    },
    'Response 5': {
        'key': 'DAP',
        'isOpenEnded': False,
        'function': None,
        'options': [
            'Yes',
            'No'
        ],
        'defauit': 1
    }
}

### Methods....

In [None]:
'''
    process_data
    This method makes use of the questionnaire to transform the provided data into the expected format.

    @param data:List of Dictionary

    @return transformed data:List of Dictionary
'''

def process_data (data):
    transformed_data = []

    for entry in data:                  # for each response in the skills audit
        if entry['State'].strip() == 'In progress':
            continue
        transformed_entry = {}
        for question in questionnaire:
            if question in entry:       # if the response has an entry for the question in the questionnaire
                if questionnaire[question]['isOpenEnded']:  # evaluate the open-ended question
                    if 'keys' in questionnaire[question]:
                        counter = 0
                        for key in questionnaire[question]['keys']:
                            transformed_entry[questionnaire[question]['keys'][counter]] = str(questionnaire[question]['function'](entry[question], counter))
                            counter += 1
                    else:
                        transformed_entry[questionnaire[question]['key']] = str(questionnaire[question]['function'](entry[question]))
                else:   # assign a numeric value to the response
                    transformed_entry[questionnaire[question]['key']] = str(questionnaire[question]['options'].index(entry[question].strip()))
                pass
            else:  # if the entry is incomplete, fill in the default value.....
                transformed_entry[questionnaire[question]['key']] = str(questionnaire[question]['default'])
        # merge name fields and copy across email field....
        transformed_entry['email'] = entry['Email address']
        transformed_entry['student'] = entry['First name'] + ' ' + entry['Surname']
        transformed_data.append(transformed_entry)
    return transformed_data



'''
    save_processed_data
    Saves the processed data into a file

    @param processed_data:List of Dictionary     processed responses (output of the process_data mathod)
    @param file_name:String                      the file to which we will be storing the values

    @return None                                 this method returns nothing.
'''
def save_processed_data (processed_data, file_name):
    if len(processed_data) >= 1:
        header = processed_data[0].keys()
        csv_file_contents = ','.join(header) + '\n'
        for entry in processed_data:
            csv_file_contents = csv_file_contents + ','.join(list(entry.values())) + '\n'
        with open(file_name, 'w') as file_writer:
            file_writer.write(csv_file_contents)


### Workflow (actual process)

- Read in questionnaires. (csv export from Moodle skills audit questionnaire)
- Process questionnaires
- Export results (export serves as an input to the pair builder script)

In [None]:
import csv

# path to skills audit responses (exported from moodle)...
SKILLS_AUDIT_RESPONSES = 'CM50109-Skills Audit 2023-24-responses (2).csv'

# path to save the pre-processor's output....
SAVE_AS = 'questionnaire_responses_cleaned_20223_final.csv'

USER_EXEMPTION_LIST =[]

student_responses = csv.DictReader(open(SKILLS_AUDIT_RESPONSES))          # read in data
processed_responses = process_data(student_responses)                                   # process responses
save_processed_data(processed_responses, SAVE_AS)   # output to a file

ValueError: ignored