
# Hidden Tests

In this file, the hidden tests for all the rubric points are to be described. The tests for the individual rubric points are enclosed within `# BEGIN <rubric_point>` and `# END <rubric_point>` NBConvert cells. `hidden_tests.py` works by executing the contents of those cells between those two tags for each `<rubric_point>`. In order to initialize variables, `hidden_tests.py` also executes all code within `BEGIN` and `END` tags that appear before the `original` test.

Code that is not enclosed within `BEGIN` and `END` tags are not executed by `hidden_tests.py`. They are used for generating the hidden datasets.

In [1]:
from hidden_tests import *
import otter_tests.gen_public_tests as gen_public_tests
import os, csv, json, copy, shutil
import random
import numpy as np

In [2]:
DIRECTORY = '..'
FILE = 'p7.ipynb'

In [3]:
results = {}

In [4]:
deductions = {}
rubric = parse_rubric_file(os.path.join(DIRECTORY, "rubric.md"))
directories = get_directories(rubric)
comments = get_all_comments(directories)

In [5]:
def write_readme(data, write_path):
    """write_readme(data, write_path) writes the contents of `data` into the README.txt file `write_path`"""
    f = open(write_path, encoding='utf-8')
    rubric_point = f.read().split("\n")[0].strip(" \n")
    f.close()
    
    f = open(write_path, 'w', encoding='utf-8')
    f.write(rubric_point + "\n\n" + data)
    f.close()

## Variables

Useful variables that are used by many rubric tests can be stored here. The contents of this tag will be executed before each rubric test, so these variables get initialized before each rubric test.

`verify_fn_defn` defines the function `verify_fn` which is used for verifying if the function `expected` and `actual` have the same outputs for all permutations of inputs from `var_lists`.

In [6]:
verify_fn_defn = """
def verify_fn(expected, actual, var_inputs, test_format):
    for var in var_inputs:
        try:
            actual_val = actual(*var)
        except Exception as e:
            output = "%s results: " % actual.__name__
            output += "%s error enountered on %s%s" % (type(e).__name__, actual.__name__, repr(var))
            return output
        expected_val = expected(*var)
        check = public_tests.compare(expected_val, actual_val, test_format)
        if check != public_tests.PASS:
            output = "%s results: " % actual.__name__
            output += "%s%s output: %s" % (actual.__name__, repr(var), check)
            return output
    return "%s results: All test cases passed!" % actual.__name__"""

`function_dependencies_functions` stores the previously defined functions that each function definition invokes. This variable is used for rubric points that the logical correctness of functions as well as those that check whether a required function is used. For these rubric points, when we test a particular function, we use `function_dependencies_functions` to ensure that all the functions that it depends on are replaced with logically correct versions. This helps isolate the issue with the functions.

In [7]:
function_dependencies_functions = {}
function_dependencies_functions['format_euros'] = []
function_dependencies_functions['cell'] = ['format_euros']
function_dependencies_functions['average_stat_by_position'] = ['cell', 'format_euros']
function_dependencies_functions['best_player_of_team_at_position'] = ['cell', 'format_euros']
function_dependencies_functions['best_starting_players_of'] = ['best_player_of_team_at_position', 'cell', 'format_euros']

`function_dependencies_data_structures` stores the previously defined data structures that each function definition invokes. This variable is used for rubric points that the logical correctness of functions as well as those that check whether a required function is used. For these rubric points, when we test a particular function, we use `function_dependencies_data_structures` to ensure that all the data structures that it depends on are replaced with logically correct versions. This helps isolate the issue with the functions.

In [8]:
function_dependencies_data_structures = {}
function_dependencies_data_structures['format_euros'] = []
function_dependencies_data_structures['cell'] = []
function_dependencies_data_structures['average_stat_by_position'] = ['players']
function_dependencies_data_structures['best_player_of_team_at_position'] = ['players']
function_dependencies_data_structures['best_starting_players_of'] = ['players']

`data_structure_dependencies_functions` stores the previously defined functions that each data structure definition invokes. This variable is used for rubric points that the logical correctness of functions as well as those that check whether a required data structure is used. For these rubric points, when we test a particular data structure, we use `data_structure_dependencies_functions` to ensure that all the functions that it depends on are replaced with logically correct versions. This helps isolate the issue with the data structures.

In [9]:
data_structure_dependencies_functions = {}
data_structure_dependencies_functions['players'] = ['cell', 'format_euros']

`data_structure_dependencies_data_structures` stores the previously defined data structures that each data structure definition accesses. This variable is used for rubric points that the logical correctness of data structures as well as those that check whether a required data structure is used. For these rubric points, when we test a particular data structure, we use `data_structure_dependencies_data_structures` to ensure that all the data structures that it depends on are replaced with logically correct versions. This helps isolate the issue with the data structures.

In [10]:
data_structure_dependencies_data_structures = {}
data_structure_dependencies_data_structures['players'] = [] 

## Functions

Useful functions that are used by many rubric tests can be stored here. The contents of this tag will be executed before each rubric test, so these function definitions get initialized before each rubric test.

`replace_with_false_function` replaces the given `function` with the **false version** of the function, and also replaces all **dependent** functions and data structures with their **true versions**.

In [11]:
def replace_with_false_function(nb, function, false_function):
    nb = replace_defn(nb, function, false_function)
    
    for dependent in function_dependencies_functions.get(function, []):
        nb = replace_defn(nb, dependent, true_functions[dependent])
    for dependent in function_dependencies_data_structures.get(function, []):
        idx = find_all_cell_indices(nb, "code", "grader.check('%s')" % (dependent))[-1]
        if idx == None:
            idx = find_all_cell_indices(nb, "markdown", "**Question 1:**")[-1]
        nb = inject_code(nb, idx, true_data_structures[dependent])
        nb = remove_initializations(nb, dependent, start=idx+1)
    return nb

`replace_with_false_data_structure` replaces the given `data_structure` with the **false** version of the data structure, and also replaces all **dependent** functions and data structures with their **true versions**.

In [12]:
def replace_with_false_data_structure(nb, data_structure, false_data_structure):
    idx = find_all_cell_indices(nb, "code", "grader.check('%s')" % (data_structure))[-1]
    if idx == None:
        idx = find_all_cell_indices(nb, "markdown", "**Question 1:**")[-1]
    nb = inject_code(nb, idx, false_data_structure)
    nb = remove_initializations(nb, data_structure, start=idx+1)
    
    for dependent in data_structure_dependencies_functions.get(data_structure, []):
        nb = replace_defn(nb, dependent, true_functions[dependent])
    for dependent in data_structure_dependencies_data_structures.get(data_structure, []):
        idx = find_all_cell_indices(nb, "code", "grader.check('%s')" % (dependent))[-1]
        if idx == None:
            idx = find_all_cell_indices(nb, "markdown", "**Question 1:**")[-1]
        nb = inject_code(nb, idx, true_data_structures[dependent])
        nb = remove_initializations(nb, dependent, start=idx+1)
    return nb

`get_test_text` returns test code that can be readily injected into the notebook. The input should be some code that updates the variable `test_output` and sets its value to be `"All test cases passed!"` when the conditions for passing the rubric test are met. This function will place this code inside a wrapper than ensures that it does not crash the student notebook during execution and also makes the output parsable.

In [13]:
def get_test_text(qnum, test_code):
    test_text = "\"\"\"grader.check('%s')\"\"\"\n\n" % (qnum)
    test_text += "test_output = '%s results: Test crashed!'\n" % (qnum)
    test_text += add_try_except(test_code)
    test_text += "\nprint(test_output)"
    return test_text

`inject_function_logic_check` injects code into the `nb` that detects whether `function` outputs the same as the **true version** of that function (all dependent functions and data structures are also replaced with their **true versions**) on all combinations of inputs from `var_lists`. The comparison between the outputs is performed assuming that the format of the answers is `test_format`.

In [14]:
def inject_function_logic_check(nb, function, var_inputs_code, test_format="TEXT_FORMAT"):
    for dependent in function_dependencies_functions.get(function, []):
        nb = replace_defn(nb, dependent, true_functions[dependent])
    for dependent in function_dependencies_data_structures.get(function, []):
        idx = find_all_cell_indices(nb, "code", "grader.check('%s')" % (dependent))[-1]
        if idx == None:
            idx = find_all_cell_indices(nb, "markdown", "**Question 1:**")[-1]
        nb = inject_code(nb, idx, true_data_structures[dependent])
        nb = remove_initializations(nb, dependent, start=idx+1)
        
    code = replace_call(true_functions[function], function, "true_"+function)
    code += "\n\n" + verify_fn_defn
    nb = inject_code(nb, len(nb['cells']), code)
    test_code = var_inputs_code + "\n"
    test_code += "test_output = verify_fn(true_%s, %s, var_inputs, '%s')" % (function, function, test_format)
    code = get_test_text(function, test_code)
    nb = inject_code(nb, len(nb['cells']), code)
    return nb

`inject_data_structure_check` injects code into the `nb` that detects whether `data_structure` has the same value as the **true version** of that data structure (all dependent functions and data structures are also replaced with their **true versions**). The comparison between the outputs is performed assuming that the format of the answers is `test_format`.

In [15]:
def inject_data_structure_check(nb, data_structure, test_format="TEXT_FORMAT"):
    for dependent in data_structure_dependencies_functions.get(data_structure, []):
        nb = replace_defn(nb, dependent, true_functions[dependent])
    for dependent in data_structure_dependencies_data_structures.get(data_structure, []):
        idx = find_all_cell_indices(nb, "code", "grader.check('%s')" % (dependent))[-1]
        if idx == None:
            idx = find_all_cell_indices(nb, "markdown", "**Question 1:**")[-1]
        nb = inject_code(nb, idx, true_data_structures[dependent])
        nb = remove_initializations(nb, dependent, start=idx+1)
        
    code = "import copy\n%s = copy.deepcopy(%s)\n\n" % (data_structure, data_structure)
    code += replace_variable(true_data_structures[data_structure], data_structure, "true_"+data_structure)
    nb = inject_code(nb, len(nb['cells']), code)
    
    test_code = "test_output = '%s results: '" % (data_structure)
    test_code += "+ public_tests.compare(true_%s, %s, '%s')" % (data_structure, data_structure, test_format)
    code = get_test_text(data_structure, test_code)
    nb = inject_code(nb, len(nb['cells']), code)
    return nb

## Random Data Generation

Here, functions are defined that can generate **random** data that is in the correct format.

**Warning:** This is the most complex function in the file, and is likely to have some bugs in it. So, **verify** this function **carefully**. The following **requirements** for this function **will not** be met by the function generated by GPT, it is **your responsibility** to modify the function so as to meet these requirements. Otherwise, the datasets are unlikely to produce interesting outputs for the project questions.
* The `ID` *158023* must appear in the dataset.
* There must be a **unique** highest `Value` in the dataset.
* There must be a **unique** highest `Wage` in the dataset.
* The `League` *Premier League (England)* must appear in the dataset.
* The `Team` *Manchester United* must appear in the dataset, and it must have a unique best player at `Position` *CDM*.
* The `Team` *Liverpool* must appear in the dataset, and it must have a unique best player at `Position` *RW*.
* The `Team` *Paris Saint Germain* must appear in the dataset.
* The `Team` *FC Bayern München* must appear in the dataset.
* The `Team` *FC Barcelona* must appear in the dataset.

In [16]:
import csv
import os
import random
from faker import Faker
from itertools import product

def random_data(directory, n=1000):
    faker = Faker()
    csv_filename = 'soccer_stars.csv'
    csv_path = os.path.join(directory, csv_filename)
    
    # Ensure that the IDs are unique
    IDs = list(set([random.randint(10, 1000000) for i in range(2*n)]))[:n]
    
    # Add required ID to list of IDs somewhere
    IDs[random.randint(1, n-1)] = 158023
    
    # Create some fake names in the correct format
    names = []
    for i in range(n):
        name_choice = random.randint(1, 10)
        if name_choice == 10:
            name = faker.last_name()
        elif name_choice == 9:
            name = faker.first_name_male() + " " + faker.last_name()
        elif name_choice == 8:
            name = faker.first_name_male() + ' Jr.'
        else:
            name = faker.first_name_male()[0] + ". " + faker.last_name()
        names.append(name)
        
    # Create some fake nations
    nations = list(set([faker.country() for i in range(2*(n//100 + 1))]))[:n//100+1]
    
    # Create some fake teams
    teams = list(set([faker.company() for i in range(2*(n//20 + 1))]))[:n//20+1]
    
    # Ensure that required teams appear in the dataset
    req_teams = ['Manchester United', 'Liverpool', 'Paris Saint Germain', 'FC Bayern München', 'FC Barcelona']
    for i in range(len(req_teams)):
        teams[i] = req_teams[i]
    random.shuffle(teams)
    
    # Create some fake leagues
    leagues = []
    for i in range(n//100):
        league_name = faker.company()
        country = random.choice(nations[:n//10+1])
        leagues.append(f"{league_name} ({country})")
        
    # Ensure that each team belongs to some league
    team_to_league = {}
    for team in teams:
        team_to_league[team] = random.choice(leagues)
        
    # Ensure that `Premier League (England)` appears as a league in the dataset
    for i in range(min(20, n//50+1)):
        team_to_league[teams[random.randint(0, len(teams)-1)]] = "Premier League (England)"
        
    # Create a list of values within the correct range
    raw_values = []
    for i in range(n):
        value_choice = random.randint(1, 10)
        if value_choice == 10:
            raw_values.append(random.randint(0, 3)*250)
        elif value_choice in [9, 8]:
            raw_values.append(random.randint(10**6, 10**7))
        elif value_choice == 7:
            raw_values.append(random.randint(10**7, 2*10**8))
        else:
            raw_values.append(random.randint(10**3, 10**6-1))
            
    # Ensure that the values are roughly and not exactly sorted in decreasing order
    raw_values.sort(reverse=True)
    # Ensure that the highest value is unique
    raw_values[0] += 5*10**7
    for i in range(0, n, 4):
        raw_values[i: i+5] = random.sample(raw_values[i: i+5], len(raw_values[i: i+5]))
        
    # Format the values correctly
    values = []
    for value in raw_values:
        if value >= 10**6:
            value_num = round(value/10**6, 1)
            value_char = 'M'
        elif value >= 10**3:
            value_num = round(value/10**3)
            value_char = 'K'
        else:
            value_num = value
            value_char = ''
        values.append('€%g%s' % (value_num, value_char))
        
    # Create a list of wages within the correct range    
    raw_wages = []
    for i in range(n):
        wage_choice = random.randint(1, 10)
        if wage_choice in [10, 9]:
            raw_wages.append(random.randint(0, 3)*250)
        elif wage_choice == 8:
            raw_wages.append(random.randint(10**5, 5*10**5))
        else:
            raw_wages.append(random.randint(10**3, 10**5))
            
    # Ensure that the wages are roughly and not exactly sorted in decreasing order
    raw_wages.sort(reverse=True)
    # Ensure that the highest wage is unique
    raw_wages[0] += 5 * 10**4
    for i in range(0, n, 4):
        raw_wages[i: i+5] = random.sample(raw_wages[i: i+5], len(raw_wages[i: i+5]))
        
    # Format the wages correctly
    wages = []
    for wage in raw_wages:
        if wage >= 10**3:
            wage_num = round(wage/10**3)
            wage_char = 'K'
        else:
            wage_num = value
            wage_char = ''
        wages.append('€%g%s' % (wage_num, wage_char))
        
    
    # Create the list of unique positions
    positions = ['CAM', 'CB', 'CDM', 'CF', 'CM', 'GK', 
                 'LB', 'LM', 'LW', 'LWB', 'RB', 'RM', 
                 'RW', 'RWB', 'ST']

    # Create the list of unique Preferred Foot choices
    preferred_foot_choices = ['Right', 'Left']
    
    # For each pair (position, preferred_foot), create a random order of magnitude for each of the four stats
    order_of_magnitudes = {}
    for item in list(product(positions, preferred_foot_choices)):
        order_of_magnitudes[item] = {}
        for stat in ['Attacking', 'Defending', 'Movement', 'Defending', 'Goalkeeping']:
            order_of_magnitudes[item][stat] = random.randint(25, 75)
            
    # Create the data
    data = []
    for i in range(n):
        player_id = IDs[i]
        name = names[i]
        age = random.choice([random.randint(20, 35) if random.random() < 0.75 else random.randint(15, 55)])
        nationality = random.choice(nations)
        team = random.choice(teams)
        league = team_to_league[team]
        value = None
        wage = None
        position = random.choice(positions)
        preferred_foot = random.choice(preferred_foot_choices)
        order_tuple = (position, preferred_foot)
        player_stats = {}
        for stat in ['Attacking', 'Defending', 'Movement', 'Goalkeeping']:
            player_stats[stat] = order_of_magnitudes[order_tuple][stat] + random.uniform(-20, 20)
            player_stats[stat] = round(player_stats[stat], 1)
        sorted_stats = sorted(player_stats.values())
        overall_rating = round(0.75*sorted_stats[-1] + 0.25*sorted_stats[-2])
        height = f"{random.randint(150, 210)}cm"
        

        player_data = {
            'ID': player_id,
            'Name': name,
            'Age': age,
            'Nationality': nationality,
            'Team': team,
            'League': league,
            'Value': value,
            'Wage': wage,
            'Attacking': player_stats['Attacking'],
            'Movement': player_stats['Movement'],
            'Defending': player_stats['Defending'],
            'Goalkeeping': player_stats['Goalkeeping'],
            'Overall rating': overall_rating,
            'Position': position,
            'Height': height,
            'Preferred foot': preferred_foot
        }
        data.append(player_data)

    # Sort data by overall rating
    data.sort(key=lambda x: x['Overall rating'], reverse=True)
    
    # Ensure that the required teams have a unique best player at the required positions
    for row in data:
        if row['Team'] == 'Manchester United':
            break
    row['Position'] = 'CDM'
    row['Overall rating'] += 1
    for row in data:
        if row['Team'] == 'Liverpool':
            break
    row['Position'] = 'RW'
    row['Overall rating'] += 1
    data.sort(key=lambda x: x['Overall rating'], reverse=True)
    
    # Add Wage and Value data
    for i in range(n):
        data[i]['Value'] = values[i]
        data[i]['Wage'] = wages[i]
    
    # Write data to csv
    with open(csv_path, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=['ID', 'Name', 'Age', 'Nationality', 'Team', 'League', 
                                                  'Value', 'Wage', 'Attacking', 'Movement', 'Defending', 
                                                  'Goalkeeping', 'Overall rating', 'Position', 'Height', 
                                                  'Preferred foot'])
        writer.writeheader()

        # Write the data into the file
        for row in data:
            writer.writerow(row)

## True Functions

Here, the **correct** versions of all functions that are defined in the notebook are stored. These functions are compared against the functions in the student notebook to check for their correctness.

In [17]:
true_functions = {}

In [18]:
true_functions['format_euros'] = """
def format_euros(euros):
    euros = euros[1:]
    if euros[-1] == 'K':
        euros = float(euros[:-1])*10**3
    elif euros[-1] == 'M':
        euros = float(euros[:-1])*10**6
    else:
        euros = float(euros)
    return round(euros)"""

In [19]:
true_functions['cell'] = """
import csv

def process_csv(filename):
    example_file = open(filename, encoding=\'utf-8\')
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data
    
csv_data = process_csv(\'soccer_stars.csv\')
csv_header = csv_data[0]
csv_rows = csv_data[1:]

def cell(row_idx, col_name):
    col_idx = csv_header.index(col_name)
    val = csv_rows[row_idx][col_idx]
    if col_name in [\'ID\', \'Age\', \'Overall rating\']:
        return int(val)
    elif col_name in [\'Attacking\', \'Movement\', \'Defending\', \'Goalkeeping\']:
        return float(val)
    elif col_name in [\'Height\']:
        return int(val[:-2])
    elif col_name in [\'Wage\', \'Value\']:
        return format_euros(val)
    else:
        return val"""

In [20]:
true_functions['average_stat_by_position'] = """
def average_stat_by_position(col_name):
    position_total_count = {}
    position_total_stat = {}
    for player_id in players:
        position = players[player_id][\'Position\']
        player_stat = players[player_id][col_name]
        if position not in position_total_count:
            position_total_count[position] = 0
            position_total_stat[position] = 0
        position_total_count[position] += 1
        position_total_stat[position] += player_stat
    position_avg_stat = {}
    for position in position_total_count:
        position_avg_stat[position] = position_total_stat[position] / position_total_count[position]
    return position_avg_stat"""

In [21]:
true_functions['best_player_of_team_at_position'] = """
def best_player_of_team_at_position(position, team):
    best_player = None
    for player_id in players:
        player = players[player_id]
        if player[\'Position\'] != position or player[\'Team\'] != team:
            continue
        if best_player == None or player[\'Overall rating\'] > players[best_player][\'Overall rating\']:
            best_player = player_id
        elif player[\'Overall rating\'] == players[best_player][\'Overall rating\'] and player_id < best_player:
            best_player = player_id
    return best_player"""

In [22]:
true_functions['best_starting_players_of'] = """
positions = set()
for player_id in players:
    positions.add(players[player_id][\'Position\'])

def best_starting_players_of(team):
    starters = {}
    for position in positions:
        player_at_position = best_player_of_team_at_position(position, team)
        if player_at_position != None:
            starters[position] = player_at_position
    return starters"""

## True Data Structures

Here, the **correct** versions of all data structures that are defined in the notebook are stored. These data structures are compared against the data structures in the student notebook to check for their correctness.

In [23]:
true_data_structures = {}

In [24]:
true_data_structures['players'] = """
players = {}
for idx in range(len(csv_rows)):
    player = {}
    for col_name in csv_header:
        player[col_name] = cell(idx, col_name)
    players[player[\'ID\']] = player"""

## Original

The original test simply runs the student's notebook as it is (after removing cells with syntax errors, and performing other clean-up). This helps us detect if the student failed any public tests.

In [25]:
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))

results['original'] = parse_nb(run_nb(nb, os.path.join(DIRECTORY, "hidden", "original", FILE)))

## Hardcode

The hardcode tests run the student's notebook on different datasets. However, `public_tests.py` remains unchanged. So, if the answers are hardcoded in the student's notebook, we expect their code to still pass the public tests on all the different datasets. If their code fails any one of the different hardcode datasets, we take that to mean that the answer is not hardcoded.

In [26]:
for subdirectory in os.listdir(os.path.join(DIRECTORY, "hidden", "hardcode")):
    path = os.path.join(DIRECTORY, "hidden", "hardcode", subdirectory)
    good_dataset = False
    while not good_dataset:
        if os.path.exists(os.path.join(path, FILE)):
            nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
        hardcode_results = parse_nb(run_nb(nb, os.path.join(path, FILE)))
        good_dataset = True
        for qnum in hardcode_results:
            if qnum.startswith('q') and hardcode_results[qnum] == 'All test cases passed!':
                print(qnum + ' failed!')
                good_dataset = False
                break
        if not good_dataset:
            random_data(path, 5000)
    print(subdirectory + ' done!')

q1 failed!
1 done!
q1 failed!
2 done!
q1 failed!
q4 failed!
3 done!


In [27]:
for hardcode in os.listdir(os.path.join(DIRECTORY, "hidden", "hardcode")):
    nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
    results['hardcode: ' + hardcode] = parse_nb(run_nb(nb, os.path.join(DIRECTORY, "hidden", "hardcode", hardcode, FILE)))

## Rubric Tests

The tests for the rubric points will be defined below. Only the code inside the tags will be executed by `hidden_tests.py`, so the code outside the tags are used for generating the hidden datasets in the first place.

### Instructions for creating rubric tests:

Functions inside `hidden_tests.py` can be used to modify the student notebook, before executing and parsing the outputs. It is recommended that before trying to create rubric tests, a user goes through all the functions inside `hidden_tests.py` first. Here is a list of commonly used functions that will be most useful:

* **`read_nb`**: `read_nb(file)` **reads** a `file` in the `.ipynb` file format and returns a `nb`.
* **`run_nb`**: `run_nb(nb, file)` **executes** `nb` at the location `file` and **writes** the contents back into `file`.
* **`parse_nb`**: `parse_nb(nb)` read the contents of a student `nb` and **extracts** all graded questions and answers.
* **`truncate_nb`**: `truncate_nb(nb, start, end)` takes in a `nb`, and returns a **sliced** notebook between the cells indexed `start` and `end`.
* **`find_all_cell_indices`**: `find_all_cell_indices(nb, cell_type, marker)` returns **all** the indices in `nb` of cell type `cell_type` that **contains** the `marker` in its source.
* **`inject_code`**: `inject_code(nb, idx, code)` creates a **new** code cell in `nb` **after** the index `idx` with `code` in it.
* **`count_defns`**: `count_defns(nb, func_name)` **counts** the number of times `func_name` is defined in the `nb`.
* **`replace_defn`**: `replace_defn(nb, func_name, new_defn)` **replaces** the definition of `func_name` in `nb` with `new_defn`.
* **`replace_call`**: `replace_call(text, func_name, new_name)` **replaces** all **calls** and definition **names** to `func_name` with `new_name` in `text`.
* **`find_code`**: `find_code(nb, target)` returns the **number** of times that the **text** `target` appears in a code cell in `nb`.
* **`replace_code`**: `replace_code(nb, target, new_code, start, end)` **replaces** all instances of the **text** `target` in a code cell between the indices `start` and `end` with the **text** `new_code`.
* **`add_try_except`**: `add_try_except(text)` adds a (bare) **try/except block** around any given block of code.
* **`detect_restart_and_run_all`**: `detect_restart_and_run_all(nb)` flags if any **non-empty code cell** in `nb` is **not executed**.
* **`detect_imports`**: `detect_imports(nb)` returns a list of **all** the **import** statements in the `nb`.
* **`detect_ast_objects`**: `detect_ast_objects(nb, objects)` returns a dict of **all** cells in the `nb` with the **ast objects** `objects` in them.
* **`get_first_plot`**: `get_first_plot(nb, image_file)` returns the first **image** found in the output of a code cell in `nb`, and also stores it in `image_file` for reference.
* **`get_label_plot`**: `get_label_plot(plot, kind)` **crops** the `plot` and returns returns a plot containing just the **label** at the location indicated by `kind` - `"left"`, `"right"`, `"top"`, or `"bottom"`.
* **`get_without_label_plot`**: `get_without_label_plot(plot, kind)` **crops** the `plot` and returns returns a plot containing everything **except** the **label** at the location indicated by `kind` - `"left"`, `"right"`, `"top"`, or `"bottom"`.
* **`get_ticks_plot`**: `get_ticks_plot(plot, kind)` **crops** the `plot` and returns returns a plot containing just the **ticks** at the location indicated by `kind` - `"left"`, or `"bottom"`.
* **`get_without_ticks_plot`**: `get_without_ticks_plot(plot, kind)` **crops** the `plot` and returns returns a plot containing everything **except** the **ticks** at the location indicated by `kind` - `"left"`, or `"bottom"`.
* **`get_bounding_box_plot`**: `get_bounding_box_plot(plot)` **crops** the `plot` and returns returns a plot containing just the **bounding box** of the plot.
* **`check_text_in_plot`**: `check_text_in_plot(plot, expected_text)` checks if the `expected_text` is in the `plot`, and returns both the **missing** and the **extra** text in the given `plot`.

### format_euros: function logic is incorrect when the input ends with `"K"`

In [28]:
rubric_item = 'format_euros: function logic is incorrect when the input ends with `\"K\"`'
readme_text = """Check the implementation of `format_euros` for
converting inputs ending with 'K'. Ensure it
correctly interprets 'K' as thousands and rounds
the final integer appropriately."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [29]:
rubric_item = 'format_euros: function logic is incorrect when the input ends with `\"K\"`'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('format_euros')")[-1])

var_inputs_code = '''
import csv
def process_csv(filename):
    example_file = open(filename, encoding="utf-8")
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data
    
csv_data = process_csv("soccer_stars.csv")
csv_header = csv_data[0]
csv_rows = csv_data[1:]

var_inputs = []
for row in csv_rows:
    if row[csv_header.index('Value')][-1] == 'K':
        var_inputs.append((row[csv_header.index('Value')],))
    if row[csv_header.index('Wage')][-1] == 'K':
        var_inputs.append((row[csv_header.index('Wage')],))
'''
nb = inject_function_logic_check(nb, 'format_euros', var_inputs_code, "TEXT_FORMAT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### format_euros: function logic is incorrect when the input ends with `"M"`

In [30]:
rubric_item = 'format_euros: function logic is incorrect when the input ends with `\"M\"`'
readme_text = """Check your handling of inputs ending with "M". It
should correctly convert millions and round to the
nearest integer. Review conditions and
mathematical operations for accuracy."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [31]:
rubric_item = 'format_euros: function logic is incorrect when the input ends with `\"M\"`'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('format_euros')")[-1])

var_inputs_code = '''
import csv
def process_csv(filename):
    example_file = open(filename, encoding="utf-8")
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data
    
csv_data = process_csv("soccer_stars.csv")
csv_header = csv_data[0]
csv_rows = csv_data[1:]

var_inputs = []
for row in csv_rows:
    if row[csv_header.index('Value')][-1] == 'M':
        var_inputs.append((row[csv_header.index('Value')],))
    if row[csv_header.index('Wage')][-1] == 'M':
        var_inputs.append((row[csv_header.index('Wage')],))
'''
nb = inject_function_logic_check(nb, 'format_euros', var_inputs_code, "TEXT_FORMAT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### format_euros: function logic is incorrect when the input does not end with `"K"` or `"M"`

In [32]:
rubric_item = 'format_euros: function logic is incorrect when the input does not end with `\"K\"` or `\"M\"`'
readme_text = """The test found issues when the input does not end
with "K" or "M". Please ensure your function
handles such inputs correctly, converting the euro
amount directly to an integer and rounding to the
nearest integer if necessary. Review the handling
of different input cases."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [33]:
rubric_item = 'format_euros: function logic is incorrect when the input does not end with `\"K\"` or `\"M\"`'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('format_euros')")[-1])

var_inputs_code = '''
import csv
def process_csv(filename):
    example_file = open(filename, encoding="utf-8")
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data
    
csv_data = process_csv("soccer_stars.csv")
csv_header = csv_data[0]
csv_rows = csv_data[1:]

var_inputs = []
for row in csv_rows:
    if row[csv_header.index('Value')][-1] not in ['K', 'M']:
        var_inputs.append((row[csv_header.index('Value')],))
    if row[csv_header.index('Wage')][-1] not in ['K', 'M']:
        var_inputs.append((row[csv_header.index('Wage')],))
'''

nb = inject_function_logic_check(nb, 'format_euros', var_inputs_code, 'TEXT_FORMAT')

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### cell: variables `csv_data`, `csv_header`, and `csv_rows` are not defined as expected

In [34]:
rubric_item = 'cell: variables `csv_data`, `csv_header`, and `csv_rows` are not defined as expected'
readme_text = """Ensure `csv_data`, `csv_header`, and `csv_rows`
are correctly defined and populated from
`soccer_stars.csv`. Verify that `process_csv` is
defined and called with the correct filename, and
`csv_data` is correctly split into `csv_header`
and `csv_rows`."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [35]:
random_data(directories[rubric_item], 100)

In [36]:
rubric_item = 'cell: variables `csv_data`, `csv_header`, and `csv_rows` are not defined as expected'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('cell')")[-1])

test_code = '''
import csv
def true_process_csv(filename):
    example_file = open(filename, encoding="utf-8")
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data
    
true_csv_data = true_process_csv("soccer_stars.csv")
true_csv_header = true_csv_data[0]
true_csv_rows = true_csv_data[1:]

if public_tests.compare(true_csv_data, csv_data, 'TEXT_FORMAT_ORDERED_LIST') == public_tests.PASS:
    if public_tests.compare(true_csv_header, csv_header, 'TEXT_FORMAT_ORDERED_LIST') == public_tests.PASS:
        if public_tests.compare(true_csv_rows, csv_rows, 'TEXT_FORMAT_ORDERED_LIST') == public_tests.PASS:
            test_output = "cell results: All test cases passed!"
        else:
            test_output = "cell results: 'csv_rows' not defined as expected"
    else:
        test_output = "cell results: 'csv_header' not defined as expected"
else:
    test_output = "cell results: 'csv_data' not defined as expected"
'''
nb = inject_code(nb, len(nb['cells']), get_test_text("cell", test_code))

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### cell: function does not typecast the correct columns to `int` or `float` as expected

In [37]:
rubric_item = 'cell: function does not typecast the correct columns to `int` or `float` as expected'
readme_text = """Ensure that your `cell` function correctly
typecasts column values to the specified data
types. Review the list of columns and their target
data types, and compare these to your typecasting
logic. Consider using type conversion functions
like `int()` for integers and `float()` for
floating-point numbers where appropriate."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [38]:
rubric_item = 'cell: function does not typecast the correct columns to `int` or `float` as expected'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('cell')")[-1])

var_inputs_code = '''
csv_data = process_csv('soccer_stars.csv')
csv_header = csv_data[0]
csv_rows = csv_data[1:]
type_cast_cols = ['ID', 'Name', 'Age', 'Nationality', 'Team', 'League', 'Overall rating', 'Attacking', 'Movement', 'Defending', 'Goalkeeping', 'Position', 'Preferred foot']
var_inputs = [(idx, col) for idx in range(len(csv_rows)) for col in type_cast_cols]
'''
nb = inject_function_logic_check(nb, 'cell', var_inputs_code, "TEXT_FORMAT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### cell: function does not format the `Height` column correctly

In [39]:
rubric_item = 'cell: function does not format the `Height` column correctly'
readme_text = """Check that your `cell` function correctly slices
off the unit from the `Height` value and converts
it to an integer. Revisit string slicing and typecasting
to integer in Python."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [40]:
rubric_item = 'cell: function does not format the `Height` column correctly'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('cell')")[-1])

var_inputs_code = '''
csv_data = process_csv('soccer_stars.csv')
csv_header = csv_data[0]
csv_rows = csv_data[1:]
var_inputs = [(idx, 'Height') for idx in range(len(csv_rows))]
'''
nb = inject_function_logic_check(nb, 'cell', var_inputs_code, "TEXT_FORMAT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### cell: function does not use `format_euros` to format the relevant columns

In [41]:
rubric_item = 'cell: function does not use `format_euros` to format the relevant columns'
readme_text = """Ensure your `cell` function converts 'Value' and
'Wage' to integers using `format_euros` helper
function. Test your function for different row
indices and especially on 'Value' and 'Wage'
columns to confirm the conversion."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [42]:
rubric_item = 'cell: function does not use `format_euros` to format the relevant columns'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('cell')")[-1])

var_inputs_code = '''
csv_data = process_csv('soccer_stars.csv')
csv_header = csv_data[0]
csv_rows = csv_data[1:]
var_inputs = [(i, 'Value') for i in range(len(csv_rows))] + \
             [(i, 'Wage') for i in range(len(csv_rows))]
'''
nb = inject_function_logic_check(nb, 'cell', var_inputs_code, 'TEXT_FORMAT')

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### cell: function typecasts based on the column index and not the `col_name`

In [43]:
rubric_item = 'cell: function typecasts based on the column index and not the `col_name`'
readme_text = """Check that your `cell` function dynamically
locates the column index using the `col_name` and
does not rely on hardcoded indices, as column
order may change. Ensure typecasting is done
correctly based on column name, not index."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [44]:
import os
import csv
import copy
import random

# Define the modify_data function to permute the columns in the dataset
def modify_data(directory):
    # Define the path to the soccer_stars.csv file
    data_path = os.path.join(directory, 'soccer_stars.csv')
    
    # Read in the current data
    with open(data_path, mode='r', encoding='utf-8', newline='') as file:
        reader = csv.reader(file)
        data = list(reader)
    
    # Shuffle the order of the columns
    header = copy.deepcopy(data[0])
    random.shuffle(header)
    column_order = {column: data[0].index(column) for column in header}
    
    # Reorder each row according to the new column order
    new_data = [header]
    for row in data[1:]:
        new_data.append([row[column_order[column]] for column in header])
    
    # Write the permuted data back to the file
    with open(data_path, mode='w', encoding='utf-8', newline='') as file:
        writer = csv.writer(file)
        writer.writerows(new_data)

random_data(directories[rubric_item], 1000)
modify_data(directories[rubric_item])

In [45]:
rubric_item = 'cell: function typecasts based on the column index and not the `col_name`'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('cell')")[-1])

# Create inputs to test the function
var_inputs_code = '''
csv_data = process_csv('soccer_stars.csv')
csv_header = csv_data[0]
csv_rows = csv_data[1:]
var_inputs = [(row_idx, col_name) for row_idx in range(5) for col_name in csv_header]
'''
nb = inject_function_logic_check(nb, 'cell', var_inputs_code, 'TEXT_FORMAT')

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### players: data structure is not defined correctly

In [46]:
rubric_item = 'players: data structure is not defined correctly'
readme_text = """Check your implementation of the `players`
dictionary ensures each inner dictionary properly
maps column names to their values. Ensure data
types for each value correspond to the example
given, such as integer for 'Age', 'Value', 'Wage',
and 'Overall rating'. Make sure to use the given
`cell` function correctly to retrieve cell data."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [47]:
rubric_item = 'players: data structure is not defined correctly'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('players')")[-1])

nb = inject_data_structure_check(nb, 'players', "TEXT_FORMAT_DICT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### players: logic used to define data structure is incorrect

In [48]:
rubric_item = 'players: logic used to define data structure is incorrect'
readme_text = """Ensure your logic for creating the dictionary of
dictionaries matches the provided solution. Check
for correct looping through rows and columns, and
that each player's ID maps to their respective
data. Verify that column names correctly match
their values."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [49]:
random_data(directories[rubric_item], 2000)

In [50]:
rubric_item = 'players: logic used to define data structure is incorrect'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('players')")[-1])

nb = inject_data_structure_check(nb, 'players', "TEXT_FORMAT_DICT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### players: `cell` function is not used to read data

In [51]:
rubric_item = 'players: `cell` function is not used to read data'
readme_text = """Ensure you are using the `cell` function correctly
to retrieve data from the dataset. Review its
usage in your code and check if you followed the
provided data structure specifications accurately.
If values don't match, it may be due to incorrect
indexing or misuse of the `cell` function."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [52]:
random_data(directories[rubric_item], 100)

In [53]:
rubric_item = 'players: `cell` function is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('players')")[-1])

nb = inject_data_structure_check(nb, 'players', "TEXT_FORMAT_DICT")

false_cell_function = '''
import csv
import random
random.seed(0)

def process_csv(filename):
    example_file = open(filename, encoding=\'utf-8\')
    example_reader = csv.reader(example_file)
    example_data = list(example_reader)
    example_file.close()
    return example_data
    
false_csv_data = process_csv(\'soccer_stars.csv\')
false_csv_header = false_csv_data[0]
false_csv_rows = false_csv_data[1:]

csv_col_data = []
for col in range(len(false_csv_header)):
    csv_col_data.append([])
    for idx in range(len(false_csv_rows)):
        csv_col_data[-1].append(false_csv_rows[idx][col])
    random.shuffle(csv_col_data[-1])

false_teams, false_leagues = [], []
for row in false_csv_rows:
    false_teams.append(row[false_csv_header.index('Team')])
    false_leagues.append(row[false_csv_header.index('League')])

false_teams = sorted(list(set(false_teams)))
false_leagues = sorted(list(set(false_leagues)))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
scrambled_rows = []
for idx in range(len(false_csv_rows)):
    scrambled_rows.append([])
    for col in range(len(csv_col_data)):
        scrambled_rows[-1].append(csv_col_data[col][idx])
        if col == false_csv_header.index('League'):
            scrambled_rows[-1][-1] = false_team_to_league[scrambled_rows[-1][false_csv_header.index('Team')]]
            
scrambled_rows.sort(key=lambda row: int(row[false_csv_header.index('Overall rating')]), reverse=True)

def cell(row_idx, col_name):
    col_idx = false_csv_header.index(col_name)
    val = scrambled_rows[row_idx][col_idx]
    if col_name in ['ID', 'Age', 'Overall rating']:
        return int(val)
    elif col_name in ['Attacking', 'Movement', 'Defending', 'Goalkeeping']:
        return float(val)
    elif col_name in ['Height']:
        return int(val[:-2])
    elif col_name in ['Wage', 'Value']:
        return format_euros(val)
    else:
        return val
'''
nb = replace_with_false_function(nb, 'cell', false_cell_function)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### q1: `players` data structure is not used to read data

In [54]:
rubric_item = 'q1: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [55]:
random_data(directories[rubric_item], 1000)

In [56]:
rubric_item = 'q1: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q1')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [57]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q2: incorrect logic is used to find the player with the highest `Value`

In [58]:
rubric_item = 'q2: incorrect logic is used to find the player with the highest `Value`'
readme_text = """Ensure your logic correctly identifies the player
with the highest `Value`. Verify that you loop
through all players, compare their `Value`
correctly, and update the highest valued player
accordingly. Consider boundary conditions and
initialization of variables. Review your
comparison and assignment statements."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [59]:
import os
import csv
import random

def modify_data(directory):
    """
    This function modifies the dataset by selecting a random player that does not have
    the highest value and then changing the 'Value' column so that it now becomes the
    highest value in the dataset.
    """
    # Load the dataset from the CSV file
    soccer_stars_csv = os.path.join(directory, 'soccer_stars.csv')
    
    with open(soccer_stars_csv, mode='r', encoding='utf-8', newline='') as file:
        reader = csv.DictReader(file)
        data = list(reader)
    
    # Find the current highest value in the dataset
    highest_value = max([player['Value'] for player in data], key=lambda x: float(x[1:].replace('M', 'e6').replace('K', 'e3')))
    
    # Select a random player that does not have the highest value
    player = random.choice([p for p in data if p['Value'] != highest_value])
    
    # Modify the 'Value' of the selected player to be higher than the current highest value
    new_value = float(highest_value[1:].replace('M', 'e6').replace('K', 'e3')) + 100e6
    
    # Update the value to the right format with at most one significant digit
    if new_value >= 1e6:
        new_value = f"€{new_value/1e6:.1f}M"  # Represent in millions with 'M'
    else:
        new_value = f"€{new_value/1e3:.1f}K"  # Represent in thousands with 'K'
    player['Value'] = new_value

    # Save the modified dataset back to the CSV file
    with open(soccer_stars_csv, mode='w', encoding='utf-8', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=data[0].keys())
        writer.writeheader()
        writer.writerows(data)

# Call the modify_data function with the given project directory
modify_data(directories[rubric_item])

In [60]:
rubric_item = 'q2: incorrect logic is used to find the player with the highest `Value`'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q2')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [61]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q2: incorrect logic is used to find the statistics of the player with the highest `Value`

In [62]:
rubric_item = 'q2: incorrect logic is used to find the statistics of the player with the highest `Value`'
readme_text = """Check your code to ensure that it correctly
identifies the player with the highest 'Value' and
that you are extracting all their statistics
correctly. The test uses a modified dataset, so
make sure you're not relying on implicit dataset
patterns, but instead using logical comparisons to
find the correct result."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [63]:
import os
import csv
import random

def modify_data(directory):
    # Read the csv file and store the rows in a list
    soccer_stars_path = os.path.join(directory, 'soccer_stars.csv')
    with open(soccer_stars_path, mode='r', encoding='utf-8', newline='') as file:
        reader = csv.DictReader(file)
        rows = list(reader)

    # Find the index of the row with the highest 'Value'
    highest_value_index = None
    highest_value = -1
    for index, row in enumerate(rows):
        value = float(row['Value'][1:].replace('M', 'e6').replace('K', 'e3'))
        if value > highest_value:
            highest_value = value
            highest_value_index = index
    
    # Extract the column names without 'ID' and 'Value'
    column_names = [col for col in rows[0].keys() if col not in ('ID', 'Value')]

    # For each chosen column, modify the value to match that of the row with the highest 'Value'
    for column in column_names:
        value_to_copy = rows[highest_value_index][column]
        # Pick a random row other than the one with the highest 'Value'
        random_row_index = random.choice([i for i in range(len(rows)) if i != highest_value_index])
        rows[random_row_index][column] = value_to_copy

    # Write the modified rows back to the csv file
    with open(soccer_stars_path, mode='w', encoding='utf-8', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=rows[0].keys())
        writer.writeheader()
        for row in rows:
            writer.writerow(row)

# Generate random data
random_data(directories[rubric_item], 1000)
modify_data(directories[rubric_item])

In [64]:
rubric_item = 'q2: incorrect logic is used to find the statistics of the player with the highest `Value`'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q2')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [65]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q2: `players` data structure is not used to read data

In [66]:
rubric_item = 'q2: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [67]:
random_data(directories[rubric_item], 1000)

In [68]:
rubric_item = 'q2: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q2')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [69]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q3: incorrect logic is used to find the player with the highest `Wage`

In [70]:
rubric_item = 'q3: incorrect logic is used to find the player with the highest `Wage`'
readme_text = """Ensure that your logic correctly identifies the
unique player with the highest wage by comparing
each player's wage. Review your loop and
comparison conditions for potential errors."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [71]:
import os
import csv
import random

def modify_data(directory):
    # Complete path to the soccer_stars.csv file
    file_path = os.path.join(directory, 'soccer_stars.csv')

    # Read the data from csv
    with open(file_path, mode='r', newline='', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        data = list(reader)
    
    # Get the maximum wage and its nationality
    max_wage = max(data, key=lambda x: float(x['Wage'][1:].replace('K','e3').replace('M','e6')))
    max_wage_value = float(max_wage['Wage'][1:].replace('K','e3').replace('M','e6'))
    max_wage_nationality = max_wage['Nationality']

    # Select a random player that does not have the maximum wage
    players_except_max_wage = [player for player in data if player['Wage'] != max_wage['Wage']]
    random_player = random.choice(players_except_max_wage)
    
    # Assign the new highest wage and fix the Nationality of max wage player
    random_player['Wage'] = f"€{(max_wage_value + 100e3)/1e6:.1f}K"
    random_player['Nationality'] = max_wage_nationality

    # Choose a random wage value such that it is smaller than the current maximum
    # Also, ensure all other players' nationalities are different from the max wage player's nationality
    for player in data:
        if player['Wage'] != random_player['Wage']:
            player['Nationality'] = random.choice(
                [nat for nat in set(p['Nationality'] for p in data) if nat != max_wage_nationality])

    # Write the updated data back to csv
    with open(file_path, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=data[0].keys())
        writer.writeheader()
        writer.writerows(data)

# Call the modify_data function with the appropriate directory
modify_data(directories[rubric_item])

In [72]:
rubric_item = 'q3: incorrect logic is used to find the player with the highest `Wage`'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q3')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [73]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q3: incorrect logic is used to find the `Nationality` of the player with the highest `Wage`

In [74]:
rubric_item = 'q3: incorrect logic is used to find the `Nationality` of the player with the highest `Wage`'
readme_text = """Check that your code correctly identifies the
player with the highest wage by comparing wages,
not by other attributes. Ensure wages are accessed
appropriately and comparisons are made correctly."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [75]:
import os
import random
import csv

def modify_data(directory):
    # Load the data from the csv file
    file_path = os.path.join(directory, 'soccer_stars.csv')
    players = []
    with open(file_path, 'r', encoding='utf-8', newline='') as file:
        reader = csv.DictReader(file)
        for row in reader:
            players.append(row)
    
    # Find the player with the highest wage
    highest_wage = 0
    player_with_highest_wage = None
    for player in players:
        wage_in_euros = float(player['Wage'][1:].replace('M', 'e6').replace('K', 'e3'))
        if wage_in_euros > highest_wage:
            highest_wage = wage_in_euros
            player_with_highest_wage = player
    
    # Set of unique nationalities to avoid having the same nationality as the player with the highest wage
    unique_nationalities = set(player['Nationality'] for player in players)
    try:
        unique_nationalities.remove(player_with_highest_wage['Nationality'])
    except KeyError:
        pass
    
    # Extract the column names without 'ID' and 'Value'
    column_names = [col for col in players[0].keys() if col not in ('ID', 'Wage', 'Nationality')]

    # For each chosen column, modify the value to match that of the row with the highest 'Wage'
    for column in column_names:
        value_to_copy = player_with_highest_wage[column]
        # Pick a random row other than the one with the highest 'Wage'
        random_player = random.choice([p for p in players if p != player_with_highest_wage])
        random_player[column] = value_to_copy
        
    for player in players:
        if player != player_with_highest_wage:
            player['Nationality'] = random.choice(list(unique_nationalities))
    
    # Write the data back to the csv file
    with open(file_path, 'w', encoding='utf-8', newline='') as file:
        fieldnames = players[0].keys()
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(players)

# First, create completely modified random data
random_data(directories[rubric_item], 1000)

# Modify the random data as per the test requirements
modify_data(directories[rubric_item])

In [76]:
rubric_item = 'q3: incorrect logic is used to find the `Nationality` of the player with the highest `Wage`'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q3')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [77]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q3: `players` data structure is not used to read data

In [78]:
rubric_item = 'q3: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [79]:
random_data(directories[rubric_item], 1000)

In [80]:
rubric_item = 'q3: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q3')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [81]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q4: the player with the highest `Wage` is recomputed

In [82]:
rubric_item = 'q4: the player with the highest `Wage` is recomputed'
readme_text = """Ensure you're using the `ID` from Question 3 to
find the `Position` of the player with the highest
`Wage`. Check if you've stored the `ID` correctly
and are not computing the player with the highest
`Wage` again. Review how to access the `Position`
using the stored `ID`."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [83]:
import os
import random
import csv

def modify_data(directory):
    # Load the data from the csv file
    file_path = os.path.join(directory, 'soccer_stars.csv')
    players = []
    with open(file_path, 'r', encoding='utf-8', newline='') as file:
        reader = csv.DictReader(file)
        for row in reader:
            players.append(row)
    
    # Find the player with the highest wage
    highest_wage = 0
    player_with_highest_wage = None
    for player in players:
        wage_in_euros = float(player['Wage'][1:].replace('M', 'e6').replace('K', 'e3'))
        if wage_in_euros > highest_wage:
            highest_wage = wage_in_euros
            player_with_highest_wage = player
    
    # Set of unique positions to avoid having the same position as the player with the highest wage
    unique_positions = set(player['Position'] for player in players)
    try:
        unique_positions.remove(player_with_highest_wage['Position'])
    except KeyError:
        pass
        
    for player in players:
        if player != player_with_highest_wage:
            player['Position'] = random.choice(list(unique_positions))
    
    # Write the data back to the csv file
    with open(file_path, 'w', encoding='utf-8', newline='') as file:
        fieldnames = players[0].keys()
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(players)

# First, create completely modified random data
random_data(directories[rubric_item], 1000)

# Modify the random data as per the test requirements
modify_data(directories[rubric_item])

In [84]:
rubric_item = 'q4: the player with the highest `Wage` is recomputed'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q4')")[-1])

# Inject code to find the index of the player with the highest Wage and modify their Wage to be smaller
inject_code_before_q4 = '''
max_wage = 0
max_wage_index = None
for i, player in players.items():
    if player['Wage'] > max_wage:
        max_wage = player['Wage']
        max_wage_index = i

# Modify the player's Wage
players[max_wage_index]['Wage'] = 100
'''
# Inject our modification code before the start of Question 4.
nb = inject_code(nb, find_all_cell_indices(nb, "markdown", '**Question 4:**')[-1], inject_code_before_q4)

# Now run the modified notebook and parse the output.
results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [85]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q4: incorrect logic is used to find the `Position` of the player with the highest `Wage`

In [86]:
rubric_item = 'q4: incorrect logic is used to find the `Position` of the player with the highest `Wage`'
readme_text = """Check that the logic in your code uses the `ID`
from Question 3 to find the `Position` of the
player with the highest `Wage`. Avoid re-computing
or using any other attributes to identify the
player. Make sure the `ID` is correctly stored and
used in Question 4."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [87]:
import os
import random
import csv

def modify_data(directory):
    # Load the data from the csv file
    file_path = os.path.join(directory, 'soccer_stars.csv')
    players = []
    with open(file_path, 'r', encoding='utf-8', newline='') as file:
        reader = csv.DictReader(file)
        for row in reader:
            players.append(row)
    
    # Find the player with the highest wage
    highest_wage = 0
    player_with_highest_wage = None
    for player in players:
        wage_in_euros = float(player['Wage'][1:].replace('M', 'e6').replace('K', 'e3'))
        if wage_in_euros > highest_wage:
            highest_wage = wage_in_euros
            player_with_highest_wage = player
    
    # Set of unique positions to avoid having the same position as the player with the highest wage
    unique_positions = set(player['Position'] for player in players)
    try:
        unique_positions.remove(player_with_highest_wage['Position'])
    except KeyError:
        pass
    
    # Extract the column names without 'ID' and 'Value'
    column_names = [col for col in players[0].keys() if col not in ('ID', 'Wage', 'Position')]

    # For each chosen column, modify the value to match that of the row with the highest 'Wage'
    for column in column_names:
        value_to_copy = player_with_highest_wage[column]
        # Pick a random row other than the one with the highest 'Wage'
        random_player = random.choice([p for p in players if p != player_with_highest_wage])
        random_player[column] = value_to_copy
        
    for player in players:
        if player != player_with_highest_wage:
            player['Position'] = random.choice(list(unique_positions))
    
    # Write the data back to the csv file
    with open(file_path, 'w', encoding='utf-8', newline='') as file:
        fieldnames = players[0].keys()
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(players)

# First, create completely modified random data
random_data(directories[rubric_item], 1000)

# Modify the random data as per the test requirements
modify_data(directories[rubric_item])

In [88]:
rubric_item = 'q4: incorrect logic is used to find the `Position` of the player with the highest `Wage`'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q4')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [89]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q4: `players` data structure is not used to read data

In [90]:
rubric_item = 'q4: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [91]:
random_data(directories[rubric_item], 1000)

In [92]:
rubric_item = 'q4: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q4')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [93]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q5: incorrect logic is used to answer

In [94]:
rubric_item = 'q5: incorrect logic is used to answer'
readme_text = """Ensure your code appends teams to the list only if
they belong to 'Premier League (England)'. Convert
the list to a set to remove duplicates, then
convert it back to a list. Review dataset
modifications to account for any test failures."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [95]:
random_data(directories[rubric_item], 2000)

In [96]:
rubric_item = 'q5: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q5')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [97]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q5: teams whose `League` is not exactly as required are added to the list

In [98]:
rubric_item = 'q5: teams whose `League` is not exactly as required are added to the list'
readme_text = """Ensure your code matches the league exactly,
considers duplicates, and is case-sensitive. The
list should only include unique team names. Check
string comparison and the `set` function for
uniqueness."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [99]:
import csv
import os
import random

# Define the function 'modify_data' to make modifications to the dataset
def modify_data(directory):
    # Define the path to the soccer_stars.csv file using os.path.join
    soccer_stars_path = os.path.join(directory, 'soccer_stars.csv')
    
    # Set variables for the incorrect league variations to add
    incorrect_leagues = [
        'Premier LeaguE (England)',
        'PremierLeague(England)',
        'Premier League',
        'Premier League (England) EPL',
        'English Premier League (England)'
    ]
    
    # Open the soccer_stars.csv file in read mode
    with open(soccer_stars_path, mode='r', encoding='utf-8', newline='') as file:
        # Read the contents of the CSV file
        reader = csv.DictReader(file)
        players = list(reader)  # Convert reader to a list for easy manipulation
        
    # Identify the League each Team belongs to
    team_to_league = {}
    for player in players:
        team_to_league[player['Team']] = player['League']
        
    # Change the Leagues of all other Teams
    for team in team_to_league:
        if team_to_league[team] != 'Premier League (England)':
            team_to_league[team] = random.choice(incorrect_leagues)
            
    # Loop through each player in the dataset
    for player in players:
        # Modify the league to an incorrect version
        player['League'] = team_to_league[player['Team']]

    # Open the soccer_stars.csv file in write mode to overwrite it with new data
    with open(soccer_stars_path, mode='w', encoding='utf-8', newline='') as file:
        fieldnames = players[0].keys()
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(players)

# If the dataset must be completely modified, call the predefined 'random_data' function
# In this case, the test details specify a complete dataset modification
random_data(directories[rubric_item], 2000)
# After calling 'random_data', call the 'modify_data' function
modify_data(directories[rubric_item])

In [100]:
rubric_item = 'q5: teams whose `League` is not exactly as required are added to the list'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q5')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [101]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q5: `players` data structure is not used to read data

In [102]:
rubric_item = 'q5: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [103]:
random_data(directories[rubric_item], 1000)

In [104]:
rubric_item = 'q5: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q5')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [105]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q6: incorrect logic is used to answer

In [106]:
rubric_item = 'q6: incorrect logic is used to answer'
readme_text = """Examine if the counts were incremented correctly
for each `Preferred foot` and check whether the
foot values were extracted dynamically from the
`players` data structure without hardcoding.
Adjust your code to handle various foot entries
and to ensure proper tallying."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [107]:
random_data(directories[rubric_item], 1000)

In [108]:
rubric_item = 'q6: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q6')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [109]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q6: the keys of `preferred_foot_count` are hardcoded

In [110]:
rubric_item = 'q6: the keys of `preferred_foot_count` are hardcoded'
readme_text = """Check if the keys in your dictionary
`preferred_foot_count` are determined by iterating
through the `players` data structure, rather than
being explicitly defined in your code. Make sure
the keys are dynamically created based on the
dataset."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [111]:
import os
import csv
import random

def modify_data(directory):
    # Define the path for the soccer_stars.csv file using os.path.join
    file_path = os.path.join(directory, 'soccer_stars.csv')
    
    # Open the csv file for reading and create a list of dictionaries for its contents
    with open(file_path, mode='r', encoding='utf-8', newline='') as file:
        reader = csv.DictReader(file)
        data = list(reader)
    
    # A list of random new foot preferences (ensuring it's different from 'Left' and 'Right')
    new_foot_preferences = ['Ambidextrous', 'None', 'Hands']
    
    # Modify each player's 'Preferred foot' with a randomly chosen new preference
    for player in data:
        player['Preferred foot'] = random.choice(new_foot_preferences)
    
    # Open the csv file for writing and overwrite it with the modified data
    with open(file_path, mode='w', encoding='utf-8', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=reader.fieldnames)
        writer.writeheader()
        writer.writerows(data)

# Assuming 'directories' and 'rubric_item' have been defined earlier

# The dataset must be completely modified, so we call random_data
random_data(directories[rubric_item], 1000)  # 1000 rows as per file structure

# Now modify the data using the function we just created
modify_data(directories[rubric_item])

In [112]:
rubric_item = 'q6: the keys of `preferred_foot_count` are hardcoded'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q6')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [113]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q6: `players` data structure is not used to read data

In [114]:
rubric_item = 'q6: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [115]:
random_data(directories[rubric_item], 1000)

In [116]:
rubric_item = 'q6: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q6')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [117]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q7: incorrect logic is used to answer

In [118]:
rubric_item = 'q7: incorrect logic is used to answer'
readme_text = """Verify your code logic to compute averages; pay
attention to data accumulation and handling
division to calculate the average. Ensure you
dynamically extract foot information and use the
correct counts for averaging. Review dict
comprehension and the accumulation of sums and
counts."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [119]:
random_data(directories[rubric_item], 1000)

In [120]:
rubric_item = 'q7: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q7')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [121]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q7: the keys of `preferred_foot_avg_overall` are hardcoded

In [122]:
rubric_item = 'q7: the keys of `preferred_foot_avg_overall` are hardcoded'
readme_text = """The test checks for hardcoded keys in your
`preferred_foot_avg_overall` dictionary. Ensure
you are not directly specifying the foot names in
the dictionary. Instead, extract these dynamically
from your data. Re-examine your code to ensure the
foot names are derived from the `players`
dictionary keys or values."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [123]:
import os
import csv
import random

def modify_data(directory):
    # Read the current data from soccer_stars.csv
    soccer_stars_path = os.path.join(directory, 'soccer_stars.csv')
    with open(soccer_stars_path, mode='r', newline='', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        players_data = list(reader)

    # Generate new unique preferred foot values that are different from 'Left' and 'Right'
    new_feet = ['Foot A', 'Foot B', 'Foot C']

    # Modify the 'Preferred foot' values with new random choices and ensure the total count is different
    for player in players_data:
        player['Preferred foot'] = random.choice(new_feet)

    # Write the modified data back to soccer_stars.csv
    with open(soccer_stars_path, mode='w', newline='', encoding='utf-8') as file:
        writer = csv.DictWriter(file, fieldnames=players_data[0].keys())
        writer.writeheader()
        writer.writerows(players_data)

# Call to completely modify the dataset
random_data(directories[rubric_item], 1000)  # The size of the dataset is specified as 1000

# Call modify_data with the updated project directory
modify_data(directories[rubric_item])

In [124]:
rubric_item = 'q7: the keys of `preferred_foot_avg_overall` are hardcoded'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q7')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [125]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q7: `players` data structure is not used to read data

In [126]:
rubric_item = 'q7: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [127]:
random_data(directories[rubric_item], 1000)

In [128]:
rubric_item = 'q7: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q7')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [129]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q8: incorrect logic is used to answer

In [130]:
rubric_item = 'q8: incorrect logic is used to answer'
readme_text = """Ensure you are iterating over player IDs and
extracting position without hardcoding. Check if
you are correctly initializing and incrementing
position counts in the dictionary. The logic must
dynamically count positions from the given
dataset. Review and test your code with different
datasets."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [131]:
random_data(directories[rubric_item], 2000)

In [132]:
rubric_item = 'q8: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q8')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [133]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q8: the keys of `positions_count` are hardcoded

In [134]:
rubric_item = 'q8: the keys of `positions_count` are hardcoded'
readme_text = """Ensure you are dynamically extracting position
names from the dataset, not hardcoding them. Check
your loop and dictionary key assignment logic,
referring to the solution's method for updating
the count for each position. Update your code to
respond to varying datasets."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [135]:
import csv
import os
import random

def modify_data(directory):
    """
    Modify the 'soccer_stars.csv' file within the given project directory.
    The values under the 'Position' column are changed so none of the new values
    are the same as in the original dataset, and the number of unique values
    under this column is also made different from the original dataset.
    """
    soccer_stars_csv = os.path.join(directory, 'soccer_stars.csv')
    
    with open(soccer_stars_csv, 'r', newline='', encoding='utf-8') as file:
        reader = csv.DictReader(file)
        players_data = list(reader)
        
    # List of positions not in the original dataset
    new_positions = ['A', 'B', 'C', 'D', 'E']
    
    # Assign new positions randomly to each player
    for player in players_data:
        player['Position'] = random.choice(new_positions)
    
    # Write modified data back to csv
    with open(soccer_stars_csv, 'w', newline='', encoding='utf-8') as file:
        fieldnames = players_data[0].keys()  # get fieldnames from the data
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        
        writer.writeheader()
        writer.writerows(players_data)

# Call the function to completely modify the dataset as prescribed
random_data(directories[rubric_item])
modify_data(directories[rubric_item])

In [136]:
rubric_item = 'q8: the keys of `positions_count` are hardcoded'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q8')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [137]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q8: `players` data structure is not used to read data

In [138]:
rubric_item = 'q8: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [139]:
random_data(directories[rubric_item], 1000)

In [140]:
rubric_item = 'q8: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q8')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [141]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q9: incorrect logic is used to answer

In [142]:
rubric_item = 'q9: incorrect logic is used to answer'
readme_text = """Test conducted on an altered dataset to examine
the logic used for calculating averages. Review
the division of total age by count for each
position. Ensure you're accounting for all players
and not using hardcoded values."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [143]:
random_data(directories[rubric_item], 2000)

In [144]:
rubric_item = 'q9: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q9')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [145]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q9: the keys of `positions_avg_age` are hardcoded

In [146]:
rubric_item = 'q9: the keys of `positions_avg_age` are hardcoded'
readme_text = """The test validates that `positions_avg_age` keys
are dynamically populated from the data. Verify
your code iterates through the `players`
dictionary and extracts position information
directly from it, ensuring it creates keys based
on the actual positions found, rather than using
hardcoded values."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [147]:
import os
import random
import csv

def modify_data(directory):
    """
    Modify the dataset such that the 'Position' column has completely different values
    from the original dataset and has a different number of unique values.
    """

    # Define the path to the soccer_stars.csv within the directory
    csv_path = os.path.join(directory, 'soccer_stars.csv')

    # Define new positions that are different from the original dataset
    new_positions = ['FWD', 'MID', 'DEF', 'GKP', 'WNG', 'SPT', 'DMF', 'AMF']

    # Read in the data from the existing csv file
    with open(csv_path, mode='r', newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        rows = list(reader)
    
    # Change the 'Position' column to a new value from the list of new_positions
    for row in rows:
        row['Position'] = random.choice(new_positions)

    # Write the modified data back to the csv file
    with open(csv_path, mode='w', newline='', encoding='utf-8') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=reader.fieldnames)
        writer.writeheader()
        writer.writerows(rows)

# Call random_data only if required by the test details
# The directories and rubric_item variables have been defined already externally.
random_data(directories[rubric_item], 1000)

# Since the details specify that modify_data must be called after random_data
# Call modify_data passing the correct directory
modify_data(directories[rubric_item])

In [148]:
rubric_item = 'q9: the keys of `positions_avg_age` are hardcoded'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q9')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [149]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q9: `players` data structure is not used to read data

In [150]:
rubric_item = 'q9: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [151]:
random_data(directories[rubric_item], 1000)

In [152]:
rubric_item = 'q9: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q9')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [153]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q10: incorrect logic is used to answer

In [154]:
rubric_item = 'q10: incorrect logic is used to answer'
readme_text = """Ensure you cycle through all player entries to sum
heights and count occurrences by position, then
calculate the average height for each position.
Avoid hardcoding. Check and debug your loop and
dictionary operations for correct logic."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [155]:
random_data(directories[rubric_item], 2000)

In [156]:
rubric_item = 'q10: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q10')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [157]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q10: the keys of `positions_avg_height` are hardcoded

In [158]:
rubric_item = 'q10: the keys of `positions_avg_height` are hardcoded'
readme_text = """Ensure positions are dynamically determined from
the dataset and not hardcoded. Check how you
retrieve and aggregate `Position` data, and use
looping constructs to handle different `Position`
values present in the `players` dictionary."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [159]:
import os
import random
import csv

def modify_data(directory):
    """
    Modify the dataset such that the 'Position' column has completely different values
    from the original dataset and has a different number of unique values.
    """

    # Define the path to the soccer_stars.csv within the directory
    csv_path = os.path.join(directory, 'soccer_stars.csv')

    # Define new positions that are different from the original dataset
    new_positions = ['FWD', 'MID', 'DEF', 'GKP', 'WNG', 'SPT', 'DMF', 'AMF']

    # Read in the data from the existing csv file
    with open(csv_path, mode='r', newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        rows = list(reader)
    
    # Change the 'Position' column to a new value from the list of new_positions
    for row in rows:
        row['Position'] = random.choice(new_positions)

    # Write the modified data back to the csv file
    with open(csv_path, mode='w', newline='', encoding='utf-8') as csvfile:
        writer = csv.DictWriter(csvfile, fieldnames=reader.fieldnames)
        writer.writeheader()
        writer.writerows(rows)

# Call random_data only if required by the test details
# The directories and rubric_item variables have been defined already externally.
random_data(directories[rubric_item], 1000)

# Since the details specify that modify_data must be called after random_data
# Call modify_data passing the correct directory
modify_data(directories[rubric_item])

In [160]:
rubric_item = 'q10: the keys of `positions_avg_height` are hardcoded'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q10')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [161]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q10: `players` data structure is not used to read data

In [162]:
rubric_item = 'q10: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [163]:
random_data(directories[rubric_item], 1000)

In [164]:
rubric_item = 'q10: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q10')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [165]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### average_stat_by_position: function logic is incorrect

In [166]:
rubric_item = 'average_stat_by_position: function logic is incorrect'
readme_text = """Check if the function correctly calculates
averages by initializing and updating counters and
sums for each position. Ensure correct dictionary
construction, use of keys for indexing, and
division to calculate averages. Review the
handling of different column names and
corresponding data types."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [167]:
rubric_item = 'average_stat_by_position: function logic is incorrect'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('average_stat_by_position')")[-1])

var_inputs_code = '''
col_names = ['Attacking', 'Movement', 'Defending', 'Goalkeeping']
var_inputs = [(col_name,) for col_name in col_names]
'''
nb = inject_function_logic_check(nb, 'average_stat_by_position', var_inputs_code, "TEXT_FORMAT_DICT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))
test_output = results[rubric_item][rubric_item.split(":")[0]]
if test_output != 'All test cases passed!':
    comments[rubric_item] += '\nFAILED TEST CASE: ' + test_output

### average_stat_by_position: function only works for certain numerical columns

In [168]:
rubric_item = 'average_stat_by_position: function only works for certain numerical columns'
readme_text = """Check that calculations of averages handle various
numerical data types and ensure that your function
iterates over all players and positions. Verify
that the function doesn't have hardcoded column
names or conditions that limit it to specific
columns."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [169]:
rubric_item = 'average_stat_by_position: function only works for certain numerical columns'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)

var_inputs_code = '''
col_names = ['Age', 'Value', 'Wage', 'Overall rating', 'Height']
var_inputs = [(col_name,) for col_name in col_names]
'''
nb = inject_function_logic_check(nb, 'average_stat_by_position', var_inputs_code, "TEXT_FORMAT_DICT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))
test_output = results[rubric_item][rubric_item.split(":")[0]]
if test_output != 'All test cases passed!':
    comments[rubric_item] += '\nFAILED TEST CASE: ' + test_output

### average_stat_by_position: `players` data structure is not used to read data

In [170]:
rubric_item = 'average_stat_by_position: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [171]:
random_data(directories[rubric_item], 1000)

In [172]:
rubric_item = 'average_stat_by_position: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('average_stat_by_position')")[-1])

var_inputs_code = '''
col_names = ['Attacking', 'Movement', 'Defending', 'Goalkeeping']
var_inputs = [(col_name,) for col_name in col_names]
'''
nb = inject_function_logic_check(nb, 'average_stat_by_position', var_inputs_code, "TEXT_FORMAT_DICT")

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### q11: `average_stat_by_position` function is not used to answer

In [173]:
rubric_item = 'q11: `average_stat_by_position` function is not used to answer'
readme_text = """The test checks if you used
`average_stat_by_position` function to calculate
average attacking stats by position. Ensure your
code calls this function with 'Attacking' as an
argument and assigns the result to
`positions_avg_attacking`. Review the usage of
this function and update your code accordingly."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [174]:
random_data(directories[rubric_item], 1000)

In [175]:
rubric_item = 'q11: `average_stat_by_position` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q11')")[-1])

new_average_stat_by_position = '''
def average_stat_by_position(col_name):
    position_avg_stat = {pos: float(idx) for idx, pos in enumerate(sorted(set(player['Position'] for player in players.values())))}
    return position_avg_stat
'''
nb = replace_with_false_function(nb, 'average_stat_by_position', new_average_stat_by_position)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [176]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q12: `average_stat_by_position` function is not used to answer

In [177]:
rubric_item = 'q12: `average_stat_by_position` function is not used to answer'
readme_text = """The test checks if you used
`average_stat_by_position` function to calculate
average attacking stats by position. Ensure your
code calls this function with 'Attacking' as an
argument and assigns the result to
`positions_avg_attacking`. Review the usage of
this function and update your code accordingly."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [178]:
random_data(directories[rubric_item], 1000)

In [179]:
rubric_item = 'q12: `average_stat_by_position` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q12')")[-1])

new_average_stat_by_position = '''
def average_stat_by_position(col_name):
    position_avg_stat = {pos: float(idx) for idx, pos in enumerate(sorted(set(player['Position'] for player in players.values())))}
    return position_avg_stat
'''
nb = replace_with_false_function(nb, 'average_stat_by_position', new_average_stat_by_position)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [180]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q13: incorrect logic is used to answer

In [184]:
rubric_item = 'q13: incorrect logic is used to answer'
readme_text = """The test checks correct usage of comparison logic
to find the position with the highest average
'Defending' stat. Review your use of the
`average_stat_by_position` function and ensure you
are comparing average stat values correctly to
identify the highest one."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [185]:
random_data(directories[rubric_item], 2000)

In [186]:
rubric_item = 'q13: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q13')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [187]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q13: `average_stat_by_position` function is not used to answer

In [188]:
rubric_item = 'q13: `average_stat_by_position` function is not used to answer'
readme_text = """The test checks if you used
`average_stat_by_position` function to calculate
average attacking stats by position. Ensure your
code calls this function with 'Attacking' as an
argument and assigns the result to
`positions_avg_attacking`. Review the usage of
this function and update your code accordingly."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [189]:
random_data(directories[rubric_item], 1000)

In [190]:
rubric_item = 'q13: `average_stat_by_position` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q13')")[-1])

new_average_stat_by_position = '''
def average_stat_by_position(col_name):
    position_avg_stat = {pos: float(idx) for idx, pos in enumerate(sorted(set(player['Position'] for player in players.values())))}
    return position_avg_stat
'''
nb = replace_with_false_function(nb, 'average_stat_by_position', new_average_stat_by_position)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [191]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### best_player_of_team_at_position: function logic is incorrect when there is a unique best player of the `team` at `position`

In [192]:
rubric_item = 'best_player_of_team_at_position: function logic is incorrect when there is a unique best player of the `team` at `position`'
readme_text = """Check the conditions and comparisons in your
function to ensure that you are identifying the
unique best player correctly. Make sure to test
your function with scenarios where one player has
a higher 'Overall rating' at a given 'position'
for a 'team'. Also, check how you handle
comparisons and the return value for the best
player's 'ID'."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [193]:
rubric_item = 'best_player_of_team_at_position: function logic is incorrect when there is a unique best player of the `team` at `position`'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_player_of_team_at_position')")[-1])

var_inputs_code = '''
positions = set([player['Position'] for player in players.values()])
teams = set([player['Team'] for player in players.values()])
var_inputs = []
for position in positions:
    for team in teams:
        players_of_team_at_position = [(player_id, player['Overall rating']) for player_id, player in players.items() if player['Team'] == team and player['Position'] == position]
        players_of_team_at_position.sort(key=lambda row: row[1], reverse=True)
        if len(players_of_team_at_position) == 1 or len(players_of_team_at_position) > 1 and players_of_team_at_position[1][1] != players_of_team_at_position[0][1]:
            var_inputs.append((position, team))
'''

nb = inject_function_logic_check(nb, 'best_player_of_team_at_position', var_inputs_code, "TEXT_FORMAT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))
test_output = results[rubric_item][rubric_item.split(":")[0]]
if test_output != 'All test cases passed!':
    comments[rubric_item] += '\nFAILED TEST CASE: ' + test_output

### best_player_of_team_at_position: function logic is incorrect when there are multiple players tied for best player of the `team` at `position`

In [194]:
rubric_item = 'best_player_of_team_at_position: function logic is incorrect when there are multiple players tied for best player of the `team` at `position`'
readme_text = """Check whether your function correctly handles ties
by choosing the player with the smaller ID. Review
your comparisons to ensure ties are broken by ID
and that None is returned if there are no players
at a position for a team."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [195]:
rubric_item = 'best_player_of_team_at_position: function logic is incorrect when there are multiple players tied for best player of the `team` at `position`'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_player_of_team_at_position')")[-1])

var_inputs_code = '''
positions = set([player['Position'] for player in players.values()])
teams = set([player['Team'] for player in players.values()])
var_inputs = []
for position in positions:
    for team in teams:
        players_of_team_at_position = [(player_id, player['Overall rating']) for player_id, player in players.items() if player['Team'] == team and player['Position'] == position]
        players_of_team_at_position.sort(key=lambda row: row[1], reverse=True)
        if len(players_of_team_at_position) > 1 and players_of_team_at_position[1][1] == players_of_team_at_position[0][1]:
            var_inputs.append((position, team))
'''
nb = inject_function_logic_check(nb, 'best_player_of_team_at_position', var_inputs_code, "TEXT_FORMAT")

players_permute_code = true_data_structures["players"] + """
import random
random.seed(0)
players = dict(random.sample(list(players.items()), len(players)))"""
nb = replace_with_false_data_structure(nb, "players", players_permute_code)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))
test_output = results[rubric_item][rubric_item.split(":")[0]]
if test_output != 'All test cases passed!':
    comments[rubric_item] += '\nFAILED TEST CASE: ' + test_output

### best_player_of_team_at_position: function logic is incorrect when there are no players of the `team` at `position`

In [196]:
rubric_item = 'best_player_of_team_at_position: function logic is incorrect when there are no players of the `team` at `position`'
readme_text = """Check whether your function properly handles cases
where there are no players at the given position
for the given team, as it should return `None`.
Ensure the function iterates through all players
and that conditions for `position` and `team`
match correctly."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [197]:
rubric_item = 'best_player_of_team_at_position: function logic is incorrect when there are no players of the `team` at `position`'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_player_of_team_at_position')")[-1])

var_inputs_code = '''
positions = set([player['Position'] for player in players.values()])
teams = set([player['Team'] for player in players.values()])
var_inputs = []
for position in positions:
    for team in teams:
        players_of_team_at_position = [(player_id, player['Overall rating']) for player_id, player in players.items() if player['Team'] == team and player['Position'] == position]
        if len(players_of_team_at_position) == 0:
            var_inputs.append((position, team))
'''
nb = inject_function_logic_check(nb, 'best_player_of_team_at_position', var_inputs_code, "TEXT_FORMAT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))
test_output = results[rubric_item][rubric_item.split(":")[0]]
if test_output != 'All test cases passed!':
    comments[rubric_item] += '\nFAILED TEST CASE: ' + test_output

### best_player_of_team_at_position: `players` data structure is not used to read data

In [198]:
rubric_item = 'best_player_of_team_at_position: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [199]:
random_data(directories[rubric_item], 1000)

In [200]:
rubric_item = 'best_player_of_team_at_position: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_player_of_team_at_position')")[-1])

var_inputs_code = '''
positions = set([player['Position'] for player in players.values()])
teams = set([player['Team'] for player in players.values()])
team = sorted(list(teams))[0]
var_inputs = []
for position in positions:
    var_inputs.append((position, team))
'''
nb = inject_function_logic_check(nb, 'best_player_of_team_at_position', var_inputs_code, "TEXT_FORMAT")

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### q14: `best_player_of_team_at_position` function is not used to answer

In [201]:
rubric_item = 'q14: `best_player_of_team_at_position` function is not used to answer'
readme_text = """Ensure you use the
`best_player_of_team_at_position` function to find
the best CDM for Manchester United. Review your
code to confirm that this function is called
correctly with the appropriate arguments."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [202]:
random_data(directories[rubric_item], 1000)

In [203]:
rubric_item = 'q14: `best_player_of_team_at_position` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q14')")[-1])

false_best_player_of_team_at_position = '''
def best_player_of_team_at_position(position, team):
    # Incorrectly returns the player with the lowest Overall rating instead of the highest
    worst_player = None
    for player_id in players:
        player = players[player_id]
        if player['Position'] != position or player['Team'] != team:
            continue
        if worst_player == None or player['Overall rating'] < players[worst_player]['Overall rating']:
            worst_player = player_id
        elif player['Overall rating'] == players[worst_player]['Overall rating'] and player_id > worst_player:
            worst_player = player_id
    return worst_player
'''
nb = replace_with_false_function(nb, 'best_player_of_team_at_position', false_best_player_of_team_at_position)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [204]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q15: `best_player_of_team_at_position` function is not used to answer

In [205]:
rubric_item = 'q15: `best_player_of_team_at_position` function is not used to answer'
readme_text = """Ensure you use the
`best_player_of_team_at_position` function to find
the best RW for Liverpool. Review your
code to confirm that this function is called
correctly with the appropriate arguments."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [206]:
random_data(directories[rubric_item], 1000)

In [207]:
rubric_item = 'q15: `best_player_of_team_at_position` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q15')")[-1])

false_best_player_of_team_at_position = '''
def best_player_of_team_at_position(position, team):
    # Incorrectly returns the player with the lowest Overall rating instead of the highest
    worst_player = None
    for player_id in players:
        player = players[player_id]
        if player['Position'] != position or player['Team'] != team:
            continue
        if worst_player == None or player['Overall rating'] < players[worst_player]['Overall rating']:
            worst_player = player_id
        elif player['Overall rating'] == players[worst_player]['Overall rating'] and player_id > worst_player:
            worst_player = player_id
    return worst_player
'''
nb = replace_with_false_function(nb, 'best_player_of_team_at_position', false_best_player_of_team_at_position)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [208]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q15: `players` data structure is not used to read data

In [209]:
rubric_item = 'q15: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [210]:
random_data(directories[rubric_item], 1000)

In [211]:
rubric_item = 'q15: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q15')")[-1])

new_players = true_data_structures['players'] + '''
import random
import copy
random.seed(0)

original_id = list(players.keys())
shuffled_id = original_id.copy()
random.shuffle(shuffled_id)
id_map = {original_id[i]: shuffled_id[i] for i in range(len(original_id))}

players_list = list(players.values())
players_list.sort(key=lambda player: player['Overall rating'], reverse=True)
for i in range(len(players_list)):
    oid = players_list[i]['ID']
    nid = id_map[oid]
    players_list[i]['ID'] = nid

players = {}
for player in players_list:
    players[player['ID']] = player
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [212]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### best_starting_players_of: function logic is incorrect

In [213]:
rubric_item = 'best_starting_players_of: function logic is incorrect'
readme_text = """The function should correctly map the best player
at each position for a given team, considering
overall ratings and ID values for tie-breaking.
Verify your logic for handling tiebreaks and
ensure positions with no players are not included
in the output. Update your code to address these
requirements and pass all unique team inputs."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [214]:
import os
import csv
import random

def modify_data(directory):
    """
    This function modifies the dataset by selecting a random player that does not have
    the highest value and then changing the 'Value' column so that it now becomes the
    highest value in the dataset.
    """
    # Load the dataset from the CSV file
    soccer_stars_csv = os.path.join(directory, 'soccer_stars.csv')
    
    with open(soccer_stars_csv, mode='r', encoding='utf-8', newline='') as file:
        reader = csv.DictReader(file)
        data = list(reader)
    
    # Find all the unique teams in the dataset
    all_teams = list(set([player['Team'] for player in data]))
    
    # Select 200 teams
    random.seed(0)
    random.shuffle(all_teams)
    selected_teams = set(all_teams[:200])

    # filter out the team members
    new_data = []
    for item in data:
        if item['Team'] in selected_teams:
            new_data.append(item)

    # Save the modified dataset back to the CSV file
    with open(soccer_stars_csv, mode='w', encoding='utf-8', newline='') as file:
        writer = csv.DictWriter(file, fieldnames=new_data[0].keys())
        writer.writeheader()
        writer.writerows(new_data)

# Call the modify_data function with the given project directory
modify_data(directories[rubric_item])

In [215]:
rubric_item = 'best_starting_players_of: function logic is incorrect'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_starting_players_of')")[-1])

var_inputs_code = '''
var_inputs = [(team,) for team in list(set((player['Team'] for player in players.values())))]
'''
nb = inject_function_logic_check(nb, 'best_starting_players_of', var_inputs_code, "TEXT_FORMAT_DICT")

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))
test_output = results[rubric_item][rubric_item.split(":")[0]]
if test_output != 'All test cases passed!':
    comments[rubric_item] += '\nFAILED TEST CASE: ' + test_output

### best_starting_players_of: all positions are looped through instead of just the unique positions

In [216]:
rubric_item = 'best_starting_players_of: all positions are looped through instead of just the unique positions'
readme_text = """Check for duplicate positions before looping and
eliminate them to improve function efficiency.
Ensure your code does not handle the same position
multiple times, which would cause unnecessary
repeated calls to
`best_player_of_team_at_position`. Also ensure
that inside the loop, you create a variable to
store the output of `best_player_of_team_at_position`
instead of calling it multiple times."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [217]:
import os
import csv
import random

# Define the modify_data function as requested
def modify_data(directory):
    """
    Modifies the data in the soccer_stars.csv file in the specified directory by changing all 'Team'
    and 'League' values to 'Test Team' and 'Test League' respectively.
    """
    # Path to the soccer_stars.csv file
    file_path = os.path.join(directory, "soccer_stars.csv")
    
    # List to store the modified rows
    modified_rows = []
    
    # Open the CSV file for reading and modify the data
    with open(file_path, mode='r', newline='', encoding='utf-8') as csvfile:
        reader = csv.DictReader(csvfile)
        for row in reader:
            # Modify 'Team' and 'League' fields for all players
            row['Team'] = 'Test Team'
            row['League'] = 'Test League'
            modified_rows.append(row)
    
    # Open the CSV file for writing and write the modified data
    with open(file_path, mode='w', newline='', encoding='utf-8') as csvfile:
        fieldnames = reader.fieldnames
        writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        writer.writeheader()
        writer.writerows(modified_rows)

# Function call as per the instructions
random_data(directories[rubric_item], 1000)
modify_data(directories[rubric_item])

In [218]:
rubric_item = 'best_starting_players_of: all positions are looped through instead of just the unique positions'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_starting_players_of')")[-1])

false_best_player_of_team_at_position = '''
hidn_positions = []
def best_player_of_team_at_position(position, team):
    global hidn_positions
    hidn_positions.append(position)
    best_player = None
    for player_id in players:
        player = players[player_id]
        if player[\'Position\'] != position or player[\'Team\'] != team:
            continue
        if best_player == None or player[\'Overall rating\'] > players[best_player][\'Overall rating\']:
            best_player = player_id
        elif player[\'Overall rating\'] == players[best_player][\'Overall rating\'] and player_id < best_player:
            best_player = player_id
    return best_player
'''
nb = replace_with_false_function(nb, 'best_player_of_team_at_position', false_best_player_of_team_at_position)

test_code = '''
hidn_positions = []
test_call = best_starting_players_of('Test Team')  # this function call will update hidn_positions
if len(hidn_positions) <= 3*len(set(hidn_positions)):
    test_output = 'best_starting_players_of results: ' + public_tests.PASS
'''
nb = inject_code(nb, len(nb['cells']), get_test_text('best_starting_players_of', test_code))

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### best_starting_players_of: `best_player_of_team_at_position` function is not used to answer

In [219]:
rubric_item = 'best_starting_players_of: `best_player_of_team_at_position` function is not used to answer'
readme_text = """Check that your `best_starting_players_of`
function utilizes the
`best_player_of_team_at_position` function
appropriately for each position. Ensure the
function handles ties by ID and excludes positions
with no players. Review your logic for correctness
and adherence to the question's requirements."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [220]:
random_data(directories[rubric_item], 2000)

In [221]:
rubric_item = 'best_starting_players_of: `best_player_of_team_at_position` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_starting_players_of')")[-1])

var_inputs_code = '''
unique_teams = set(player['Team'] for player in players.values())
var_inputs = [(team,) for team in unique_teams]
'''
nb = inject_function_logic_check(nb, 'best_starting_players_of', var_inputs_code, "TEXT_FORMAT_DICT")

false_best_player_of_team_at_position = '''
def best_player_of_team_at_position(position, team):
    # Incorrectly returns the player with the lowest Overall rating instead of the highest
    worst_player = None
    for player_id in players:
        player = players[player_id]
        if player['Position'] != position or player['Team'] != team:
            continue
        if worst_player == None or player['Overall rating'] < players[worst_player]['Overall rating']:
            worst_player = player_id
        elif player['Overall rating'] == players[worst_player]['Overall rating'] and player_id > worst_player:
            worst_player = player_id
    return worst_player
'''
nb = replace_with_false_function(nb, 'best_player_of_team_at_position', false_best_player_of_team_at_position)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

### best_starting_players_of: `players` data structure is not used to read data

In [222]:
rubric_item = 'best_starting_players_of: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [223]:
random_data(directories[rubric_item], 1000)

In [224]:
rubric_item = 'best_starting_players_of: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('best_starting_players_of')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [225]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q16: `best_starting_players_of` function is not used to answer

In [226]:
rubric_item = 'q16: `best_starting_players_of` function is not used to answer'
readme_text = """The test checks if you used the
`best_starting_players_of` function as required.
Ensure you are calling this function with 'Paris
Saint Germain' as an argument and returning the
result without additional processing. Modify your
code to use the function and retest."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [227]:
random_data(directories[rubric_item], 1000)

In [228]:
rubric_item = 'q16: `best_starting_players_of` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q16')")[-1])

false_best_starting_players_of = '''
def false_best_player_of_team_at_position(position, team):
    best_player = None
    for player_id in players:
        player = players[player_id]
        if player[\'Position\'] != position or player[\'Team\'] != team:
            continue
        if best_player == None or player[\'Overall rating\'] < players[best_player][\'Overall rating\']:
            best_player = player_id
        elif player[\'Overall rating\'] == players[best_player][\'Overall rating\'] and player_id > best_player:
            best_player = player_id
    return best_player
    
def best_starting_players_of(team):
    positions = set()
    for player_id in players:
        positions.add(players[player_id]['Position'])
        starters = {}
        
    for position in positions:
        player_at_position = false_best_player_of_team_at_position(position, team)
        if player_at_position != None:
            starters[position] = player_at_position
    return starters
'''
nb = replace_with_false_function(nb, 'best_starting_players_of', false_best_starting_players_of)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [229]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q17: incorrect logic is used to answer

In [230]:
rubric_item = 'q17: incorrect logic is used to answer'
readme_text = """The test checks for correct logic in obtaining the
best starting players. Please review your
implementation for logical errors and ensure you
are using the `best_starting_players_of` function
correctly. Verify that your code successfully
handles different datasets."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [231]:
random_data(directories[rubric_item], 2000)

In [232]:
rubric_item = 'q17: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q17')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [233]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q17: `best_starting_players_of` function is not used to answer

In [234]:
rubric_item = 'q17: `best_starting_players_of` function is not used to answer'
readme_text = """The test checks if the `best_starting_players_of`
function is used appropriately. Review the usage
of this function in your code and ensure it is
implemented correctly to build the final
dictionary. Consider different scenarios on how
the function's output can affect your result."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [235]:
random_data(directories[rubric_item], 1000)

In [236]:
rubric_item = 'q17: `best_starting_players_of` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q17')")[-1])

false_best_starting_players_of = '''
def false_best_player_of_team_at_position(position, team):
    best_player = None
    for player_id in players:
        player = players[player_id]
        if player[\'Position\'] != position or player[\'Team\'] != team:
            continue
        if best_player == None or player[\'Overall rating\'] < players[best_player][\'Overall rating\']:
            best_player = player_id
        elif player[\'Overall rating\'] == players[best_player][\'Overall rating\'] and player_id > best_player:
            best_player = player_id
    return best_player
    
def best_starting_players_of(team):
    positions = set()
    for player_id in players:
        positions.add(players[player_id]['Position'])
        starters = {}
        
    for position in positions:
        player_at_position = false_best_player_of_team_at_position(position, team)
        if player_at_position != None:
            starters[position] = player_at_position
    return starters
'''

nb = replace_with_false_function(nb, 'best_starting_players_of', false_best_starting_players_of)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [237]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q17: `players` data structure is not used to read data

In [238]:
rubric_item = 'q17: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [239]:
random_data(directories[rubric_item], 1000)

In [240]:
rubric_item = 'q17: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q17')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [241]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q18: incorrect logic is used to answer

In [242]:
rubric_item = 'q18: incorrect logic is used to answer'
readme_text = """Ensure your logic correctly differentiates between
the best starting players and the bench players
for the team 'FC Barcelona'. Calculate the
cumulative value by summing the 'Value' of only
the bench players, excluding the starting ones.
Review the use of relevant functions and
conditions in your code."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [243]:
random_data(directories[rubric_item], 2000)

In [244]:
rubric_item = 'q18: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q18')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [245]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q18: `best_starting_players_of` function is not used to find the best starting players

In [246]:
rubric_item = 'q18: `best_starting_players_of` function is not used to find the best starting players'
readme_text = """Check if you're correctly using the
`best_starting_players_of` function to identify
the best starting players for FC Barcelona and
then summing the value of all other players not in
that list. Review your usage of this function and
ensure you're considering all non-starter players
when calculating the cumulative value."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [247]:
random_data(directories[rubric_item], 2000)

In [248]:
rubric_item = 'q18: `best_starting_players_of` function is not used to find the best starting players'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q18')")[-1])

false_best_starting_players_of = '''
def false_best_player_of_team_at_position(position, team):
    best_player = None
    for player_id in players:
        player = players[player_id]
        if player[\'Position\'] != position or player[\'Team\'] != team:
            continue
        if best_player == None or player[\'Overall rating\'] < players[best_player][\'Overall rating\']:
            best_player = player_id
        elif player[\'Overall rating\'] == players[best_player][\'Overall rating\'] and player_id > best_player:
            best_player = player_id
    return best_player
    
def best_starting_players_of(team):
    positions = set()
    for player_id in players:
        positions.add(players[player_id]['Position'])
        starters = {}
        
    for position in positions:
        player_at_position = false_best_player_of_team_at_position(position, team)
        if player_at_position != None:
            starters[position] = player_at_position
    return starters
'''
nb = replace_with_false_function(nb, 'best_starting_players_of', false_best_starting_players_of)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [249]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q18: `players` data structure is not used to read data

In [250]:
rubric_item = 'q18: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [251]:
random_data(directories[rubric_item], 1000)

In [252]:
rubric_item = 'q18: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q18')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [253]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q19: incorrect logic is used to answer

In [254]:
rubric_item = 'q19: incorrect logic is used to answer'
readme_text = """Examine your code to ensure that the logic you've
implemented for calculating the average attacking
stat is correct and adheres to the question's
requirements. Review the use of loops,
accumulation of the sum of attacking stats, and
the division by the number of best starting
players, comparing it to the provided solution."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [255]:
random_data(directories[rubric_item], 1000)

In [256]:
rubric_item = 'q19: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q19')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [257]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q19: best starting players of a single team is computed more than once

In [258]:
rubric_item = 'q19: best starting players of a single team is computed more than once'
readme_text = """Make sure to reuse the result from calling
`best_starting_players_of` for each team instead
of calling the function multiple times for the
same team. Store its output in a variable and use
that to calculate the average. This can improve
performance and avoid possible repetition in
computation."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [259]:
import os
import csv
import random

def modify_data(directory):
    # Define the path to the soccer_stars.csv using os.path.join
    soccer_stars_path = os.path.join(directory, 'soccer_stars.csv')
    
    # Open the soccer_stars.csv file in read mode
    with open(soccer_stars_path, mode='r', encoding='utf-8', newline='') as file:
        # Read the content of the file
        reader = csv.DictReader(file)
        rows = list(reader)
    
    # Open soccer_stars.csv file in write mode to make modifications
    with open(soccer_stars_path, mode='w', encoding='utf-8', newline='') as file:
        # Define fieldnames from the keys of a row in the dataset
        fieldnames = rows[0].keys()
        # Create a csv writer object with required fieldnames
        writer = csv.DictWriter(file, fieldnames=fieldnames)
        # Write the header row
        writer.writeheader()
        
        # Iterate over rows to modify each 'League' value
        for row in rows:
            # Set 'League' value to 'Premier League (England)'
            row['League'] = 'Premier League (England)'
            # Write the modified row back to the file
            writer.writerow(row)

# Since the test details state that the dataset must be **completely modified**
# Call random_data with the specified directory and default argument for size
random_data(directories[rubric_item], 1000)

# After calling random_data, the modify_data function must be called
modify_data(directories[rubric_item])

In [260]:
rubric_item = 'q19: best starting players of a single team is computed more than once'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
end_of_q19 = find_all_cell_indices(nb, "code", "grader.check('q19')")[-1]
nb = truncate_nb(nb, end=end_of_q19)

# Replace the best_starting_players_of function with a version that appends to hidn_teams
new_best_starting_players_of = '''
hidn_teams = []
def best_starting_players_of(team):
    global hidn_teams
    hidn_teams.append(team)
    positions = set()
    for player_id in players:
        positions.add(players[player_id][\'Position\'])
        
    starters = {}
    for position in positions:
        player_at_position = best_player_of_team_at_position(position, team)
        if player_at_position != None:
            starters[position] = player_at_position
    return starters
'''
nb = replace_with_false_function(nb, 'best_starting_players_of', new_best_starting_players_of)

# Inject the initialization of hidn_teams at the start of q19
init_hidn_teams_code = "hidn_teams = []"
q19_start_idx = find_all_cell_indices(nb, "markdown", '**Question 19:**')[-1] + 1
nb = inject_code(nb, q19_start_idx, init_hidn_teams_code)

# Inject the code at the end to check the length of hidn_teams
test_len_hidn_teams_code = '''
if len(hidn_teams) == len(set(hidn_teams)):
    test_output = "q19 results: All test cases passed!"
'''
nb = inject_code(nb, len(nb['cells']), get_test_text("q19", test_len_hidn_teams_code))

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [261]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q19: the list of unique teams in the league is recomputed

In [262]:
rubric_item = 'q19: the list of unique teams in the league is recomputed'
readme_text = """Ensure you are using the correct list of Premier
League teams. If you recompute or redefine the
list, verify it contains the appropriate teams.
Use the earlier computed list if available, and
calculate the average as instructed."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [263]:
random_data(directories[rubric_item], 2000)

In [264]:
rubric_item = 'q19: the list of unique teams in the league is recomputed'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
q19_start = find_all_cell_indices(nb, "markdown", "**Question 19:**")[-1]
q19_end = find_all_cell_indices(nb, "code", "grader.check('q19')")[-1]
nb = truncate_nb(nb, end=q19_end)

non_premier_teams_code = '''
premier_league_teams = list(set([players[player_id]['Team'] for player_id in players if players[player_id]['League'] != 'Premier League (England)']))
'''
nb = inject_code(nb, q19_start+1, non_premier_teams_code)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [265]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q19: `best_starting_players_of` function is not used to answer

In [266]:
rubric_item = 'q19: `best_starting_players_of` function is not used to answer'
readme_text = """Check if you are using the
`best_starting_players_of` function to get the
best starters. Ensure you are iterating correctly
over teams and calculating the average by summing
the `Attacking` stats and dividing by the number
of best starting players. Reuse lists and
functions wherever possible to avoid redundancy
and extra computations."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [267]:
random_data(directories[rubric_item], 1000)

In [268]:
rubric_item = 'q19: `best_starting_players_of` function is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q19')")[-1])

false_best_starting_players_of = '''
def false_best_player_of_team_at_position(position, team):
    best_player = None
    for player_id in players:
        player = players[player_id]
        if player[\'Position\'] != position or player[\'Team\'] != team:
            continue
        if best_player == None or player[\'Overall rating\'] < players[best_player][\'Overall rating\']:
            best_player = player_id
        elif player[\'Overall rating\'] == players[best_player][\'Overall rating\'] and player_id > best_player:
            best_player = player_id
    return best_player
    
def best_starting_players_of(team):
    positions = set()
    for player_id in players:
        positions.add(players[player_id]['Position'])
        starters = {}
        
    for position in positions:
        player_at_position = false_best_player_of_team_at_position(position, team)
        if player_at_position != None:
            starters[position] = player_at_position
    return starters
'''
nb = replace_with_false_function(nb, 'best_starting_players_of', false_best_starting_players_of)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [269]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q19: `players` data structure is not used to read data

In [270]:
rubric_item = 'q19: `players` data structure is not used to read data'
readme_text = """The test checks if the 'players' data structure is
correctly utilized to fetch statistics. To pass,
ensure you reference the 'players' dictionary with
the player ID as the key to get all column values.
Revisit your code and verify you're using
'players' as specified."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [271]:
random_data(directories[rubric_item], 1000)

In [272]:
rubric_item = 'q19: `players` data structure is not used to read data'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q19')")[-1])

new_players = true_data_structures['players'] + '''
import random
random.seed(0)

col_data = {}
for player_id in players:
    for col in players[player_id]:
        if col not in col_data:
            col_data[col] = []
        col_data[col].append(players[player_id][col])
        
for col in col_data:
    random.shuffle(col_data[col])

false_teams = sorted(list(set(col_data['Team'])))
false_leagues = sorted(list(set(col_data['League'])))
false_team_to_league = {}
for team in false_teams:
    false_team_to_league[team] = random.choice(false_leagues)
    
dict_rows = []
for idx in range(len(col_data['ID'])):
    dict_rows.append({})
    for col in col_data:
        dict_rows[-1][col] = col_data[col][idx]
        if col == 'League':
            dict_rows[-1][col] = false_team_to_league[dict_rows[-1]['Team']]
dict_rows.sort(key=lambda row: row['Overall rating'], reverse=True)
players = {}
for row in dict_rows:
    players[row['ID']] = row
'''
nb = replace_with_false_data_structure(nb, 'players', new_players)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [273]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q20: incorrect logic is used to answer

In [274]:
rubric_item = 'q20: incorrect logic is used to answer'
readme_text = """Check your logic for finding the highest average
'Attacking' stat. Ensure you're using the
precomputed average for each team and not
recalculating it. Consider edge cases and verify
you're comparing the correct values."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [275]:
random_data(directories[rubric_item], 2000)

In [276]:
rubric_item = 'q20: incorrect logic is used to answer'
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q20')")[-1])

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [277]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### q20: `avg_attacking_prem_league` is not used to answer

In [278]:
rubric_item = 'q20: `avg_attacking_prem_league` is not used to answer'
readme_text = """Ensure the existing `avg_attacking_prem_league`
dictionary is utilized to determine the team with
the highest average attacking stat. Looping
through the dataset again or recomputation should
be avoided. Check if the dictionary is correctly
referenced and used in your solution."""

write_readme(readme_text, os.path.join(directories[rubric_item], "README.txt"))

In [279]:
random_data(directories[rubric_item], 1000)

In [280]:
rubric_item = 'q20: `avg_attacking_prem_league` is not used to answer'
nb = read_nb(os.path.join(DIRECTORY, FILE))
nb = clean_nb(nb)
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('q20')")[-1])

injected_code = '''
import random
random.seed(0)
avg_attacking_prem_league = {team: random.randint(70, 90) for team in sorted(avg_attacking_prem_league)}
'''

nb = inject_code(nb, find_all_cell_indices(nb, "markdown", '**Question 20:**')[-1], injected_code)

results[rubric_item] = parse_nb(run_nb(nb, os.path.join(directories[rubric_item], FILE)))

In [281]:
gen_public_tests.gen_public_tests(os.path.join(directories[rubric_item], FILE))

### general_deductions: Did not save the notebook file prior to running the cell containing "export". We cannot see your output if you do not save before generating the zip file. This deduction will become stricter for future projects.

In [282]:
rubric_item = "general_deductions: Did not save the notebook file prior to running the cell containing \"export\". We cannot see your output if you do not save before generating the zip file. This deduction will become stricter for future projects."
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('general_deductions')")[-1])

results[rubric_item] = {}
results[rubric_item]['general_deductions'] = rubric_item.split(":")[1].strip()
if detect_restart_and_run_all(nb):
    results[rubric_item]['general_deductions'] = "All test cases passed!"

### general_deductions: Used concepts/modules such as pandas not covered in class yet - built-in functions that you have been introduced to can be used.

In [283]:
rubric_item = "general_deductions: Used concepts/modules such as pandas not covered in class yet - built-in functions that you have been introduced to can be used."
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, end=find_all_cell_indices(nb, "code", "grader.check('general_deductions')")[-1])

results[rubric_item] = {}

allowed_modules = {"otter", "public_tests", "csv", "math"}
found_imports = set([item for item in detect_imports(nb) if not item.split(".")[0] in allowed_modules])

if found_imports  == set():
    results[rubric_item]['general_deductions'] = "All test cases passed!"
else:
    results[rubric_item]['general_deductions'] = "found unexpected import(s):" + repr(list(found_imports))
    comments[rubric_item] = results[rubric_item]['general_deductions']

### general_deductions: Import statements are not all placed at the top of the notebook.

In [284]:
rubric_item = "general_deductions: Import statements are not all placed at the top of the notebook."
nb = clean_nb(read_nb(os.path.join(DIRECTORY, FILE)))
nb = truncate_nb(nb, start=find_all_cell_indices(nb, "markdown", "### Function 1: `format_euros(euros)`")[0]+1, end=find_all_cell_indices(nb, "code", "grader.check('general_deductions')")[-1])

results[rubric_item] = {}
results[rubric_item]['general_deductions'] = 'All test cases passed!'

found_imports = detect_imports(nb)
if found_imports != []:
    results[rubric_item]['general_deductions'] = "found unexpected import(s):" + repr(found_imports)
    comments[rubric_item] = results[rubric_item]['general_deductions']