#Robotics Data Analyst
In this activity, you are taking on the role of a Data Analyst for a Robotics Tournament. Your job is to clean up messy information about teams, matches, scores, and workshops so the event can run smoothly. You’ll decide which Python data structures (lists, dictionaries, tuples, and sets) are the best tools for different tasks—like sorting schedules, adding up scores, or checking for mistakes. Just like a real analyst, you’ll need to think carefully about how to organize data so it’s accurate, efficient, and easy to use.<br><br>

Your audience for this is someone who doesn't know about code, but wants to see how you got your results. They don't just want to see you write a paper with no supporting code to back it up, but they also don't want to see a big block of code with no support explanation. <br><br>

A key part of your job is to do the analysis, but also to explain your work. The final report should be a very intentional mix of code with text. **You should not have giant blocks of code**. For example, a function should be its own code block. Every 4-6 lines should include some explanation of what you are doing. That's just a rough estimate though. Think about your code in terms of discreet chunks. Each chunk should be a code block.

In [1]:
# teams.csv-like rows you parsed already:
raw_teams = [
    ["HHS-1729", "Hudson HS", "Ada", "ada@hudsonhs.org", "Lin", "lin@hudsonhs.org"],
    ["FRM-101",  "Framingham HS", "Sam", "sam@frhs.org", "Tess", "tess@frhs.org"],
    ["MRB-88",   "Marlborough HS", "Kai", "kai@mrbhs.org", "Ada", "ada@hudsonhs.org"], 
    ["WBY-77",   "Westborough HS", "Ira", "ira@wby.org", "Mo", "mo@wby.org"],
]

# provisional match schedule (field, start_time, teamA, teamB)
# Times are ISO strings in the format YYYY-MM-DDTHH:MM
randomized_matches = [
    ('Field-2', '2025-09-27T12:30', 'FRM-101', 'WBY-77'),
    ('Field-1', '2025-09-27T16:15', 'FRM-101', 'HHS-1729'),
    ('Field-2', '2025-09-27T11:45', 'WBY-77', 'FRM-101'),
    ('Field-1', '2025-09-27T11:00', 'HHS-1729', 'MRB-88'),
    ('Field-2', '2025-09-27T17:00', 'FRM-101', 'MRB-88'),
    ('Field-1', '2025-09-27T15:30', 'WBY-77', 'HHS-1729'),
    ('Field-1', '2025-09-27T17:45', 'MRB-88', 'WBY-77'),
    ('Field-2', '2025-09-27T17:45', 'HHS-1729', 'FRM-101'),
    ('Field-2', '2025-09-27T18:30', 'MRB-88', 'HHS-1729'),
    ('Field-1', '2025-09-27T13:15', 'HHS-1729', 'WBY-77'),
    ('Field-2', '2025-09-27T10:15', 'MRB-88', 'WBY-77'),
    ('Field-2', '2025-09-27T15:30', 'MRB-88', 'FRM-101'),
    ('Field-2', '2025-09-27T14:00', 'WBY-77', 'MRB-88'),
    ('Field-1', '2025-09-27T17:00', 'HHS-1729', 'WBY-77'),
    ('Field-1', '2025-09-27T14:45', 'HHS-1729', 'MRB-88'),
    ('Field-2', '2025-09-27T16:15', 'WBY-77', 'MRB-88'),
    ('Field-1', '2025-09-27T09:30', 'WBY-77', 'HHS-1729'),
    ('Field-1', '2025-09-27T12:30', 'MRB-88', 'HHS-1729'),
    ('Field-2', '2025-09-27T14:45', 'FRM-101', 'WBY-77'),
    ('Field-2', '2025-09-27T13:15', 'MRB-88', 'FRM-101'),
    ('Field-2', '2025-09-27T09:30', 'MRB-88', 'FRM-101'),
    ('Field-1', '2025-09-27T19:15', 'HHS-1729', 'FRM-101'),
    ('Field-1', '2025-09-27T14:00', 'FRM-101', 'HHS-1729'),
    ('Field-1', '2025-09-27T10:15', 'HHS-1729', 'FRM-101'),
    ('Field-1', '2025-09-27T18:30', 'WBY-77', 'FRM-101'),
]

# reported scores (field, time, team_id, points)
raw_scores = [
    ("Field-1", "2025-09-27T09:30", "WBY-77", 22, "HHS-1729", 18),
    ("Field-2", "2025-09-27T09:30", "MRB-88", 25, "FRM-101", 25),
    ("Field-1", "2025-09-27T10:15", "HHS-1729", 17, "FRM-101", 31),
    ("Field-2", "2025-09-27T10:15", "MRB-88", 27, "WBY-77", 23),
    ("Field-1", "2025-09-27T11:00", "HHS-1729", 22, "MRB-88", 19),

    ("Field-2", "2025-09-27T11:45", "WBY-77", 26, "FRM-101", 24),
    ("Field-1", "2025-09-27T12:30", "MRB-88", 28, "HHS-1729", 30),
    ("Field-2", "2025-09-27T12:30", "FRM-101", 24, "WBY-77", 22),
    ("Field-1", "2025-09-27T13:15", "HHS-1729", 19, "WBY-77", 23),
    ("Field-2", "2025-09-27T13:15", "MRB-88", 32, "FRM-101", 29),

    ("Field-1", "2025-09-27T14:00", "FRM-101", 27, "HHS-1729", 25),
    ("Field-2", "2025-09-27T14:00", "WBY-77", 20, "MRB-88", 22),
    ("Field-1", "2025-09-27T14:45", "HHS-1729", 24, "MRB-88", 26),
    ("Field-2", "2025-09-27T14:45", "FRM-101", 18, "WBY-77", 17),
    ("Field-1", "2025-09-27T15:30", "WBY-77", 28, "HHS-1729", 31),

    ("Field-2", "2025-09-27T15:30", "MRB-88", 24, "FRM-101", 26),
    ("Field-1", "2025-09-27T16:15", "FRM-101", 33, "HHS-1729", 20),
    ("Field-2", "2025-09-27T16:15", "WBY-77", 23, "MRB-88", 27),
    ("Field-1", "2025-09-27T17:00", "HHS-1729", 29, "WBY-77", 27),
    ("Field-2", "2025-09-27T17:00", "FRM-101", 22, "MRB-88", 24),

    ("Field-1", "2025-09-27T17:45", "MRB-88", 21, "WBY-77", 19),
    ("Field-2", "2025-09-27T17:45", "HHS-1729", 20, "FRM-101", 18),
    ("Field-1", "2025-09-27T18:30", "WBY-77", 28, "FRM-101", 26),
    ("Field-2", "2025-09-27T18:30", "MRB-88", 25, "HHS-1729", 21),
    ("Field-1", "2025-09-27T19:15", "HHS-1729", 26, "FRM-101", 24),
]

# workshop signups: participant_email, workshop_code
raw_workshops = [
    ("ada@hudsonhs.org", "VIS"), 
    ("lin@hudsonhs.org", "PID"),
    ("sam@frhs.org", "PID"), 
    ("tess@frhs.org", "PID"),
    ("kai@mrbhs.org", "PID"), 
    ("ira@wby.org", "VIS"), 
    ("mo@wby.org", "VIS"),
    ("ada@hudsonhs.org", "PID"),  
]

# workshop capacities (could also be given as flat CSV rows later)
workshop_caps = [
    ("PID", "4"),
    ("VIS", "3")
]

Whenever you are getting to work with data, you should first explore it. We will get more tools for doing this as we learn more, but we can still investigate this. In the space below, you have a couple different coding fields to work with. In that space, I want you to show evidence that you have explored these lists. That can be:


*   Looping through the lists printing items
*   Using type() to confirm data types
*   Using other tools such as len(), min(), max() to find key data points
*   Looking at individual dictionaries to confirm that you know how they work. <br>
You can add additional entries by hitting the 'b' key. After each code block, insert a short written block to explain what you found.






In [16]:
print(type(raw_teams))
print(len(randomized_matches))
print(type(randomized_matches))

<class 'list'>
25
<class 'list'>


#Part A — Choose representations (immutability vs. mutability)

1. Participants/teams: Convert each raw_teams row into a record you won’t accidentally mutate (e.g., for stable IDs).
*   The data structure should have information about each school: team number, high school name, grouped mentor 1 information, grouped mentor 2 information
*   **Deliverable**: a collection of team records; **justify** your data structure choice.

In [2]:
#Complete #1 here
new_raw_teams = []
for lop in (raw_teams):
    new_raw_teams.append((lop[0], lop[1], (lop[2], lop[3]), (lop[4], lop[5])))
new_raw_teams

[('HHS-1729',
  'Hudson HS',
  ('Ada', 'ada@hudsonhs.org'),
  ('Lin', 'lin@hudsonhs.org')),
 ('FRM-101',
  'Framingham HS',
  ('Sam', 'sam@frhs.org'),
  ('Tess', 'tess@frhs.org')),
 ('MRB-88',
  'Marlborough HS',
  ('Kai', 'kai@mrbhs.org'),
  ('Ada', 'ada@hudsonhs.org')),
 ('WBY-77', 'Westborough HS', ('Ira', 'ira@wby.org'), ('Mo', 'mo@wby.org'))]

Tuples make sense for storing the teams in a more organized list because the teams never change. The key concept that relates to tuples is that they cannot be modified, and since the teams never change they should not be modified either, making tuples the most logical choice here.


2. Build a fast lookup from team code → school/mentors.
*   **Deliverable**: a dictionary mapping `team_id` to a collection of information about that team. Information included should: which school is represented by the id and the team's mentors. We'll deal with points and records later

In [3]:
#Complete #2 here
teams_dict = {}
for info in raw_teams:
    team_id, school, mentor_one, mentor_one_email, mentor_two, mentor_two_email = info
    teams_dict[team_id] = {"school":school, "mentor_one":mentor_one, "mentor_two":mentor_two,"mentor_one_email":mentor_one_email, "mentor_two_email":mentor_two_email}
teams_dict['HHS-1729']["school"]
teams_dict

{'HHS-1729': {'school': 'Hudson HS',
  'mentor_one': 'Ada',
  'mentor_two': 'Lin',
  'mentor_one_email': 'ada@hudsonhs.org',
  'mentor_two_email': 'lin@hudsonhs.org'},
 'FRM-101': {'school': 'Framingham HS',
  'mentor_one': 'Sam',
  'mentor_two': 'Tess',
  'mentor_one_email': 'sam@frhs.org',
  'mentor_two_email': 'tess@frhs.org'},
 'MRB-88': {'school': 'Marlborough HS',
  'mentor_one': 'Kai',
  'mentor_two': 'Ada',
  'mentor_one_email': 'kai@mrbhs.org',
  'mentor_two_email': 'ada@hudsonhs.org'},
 'WBY-77': {'school': 'Westborough HS',
  'mentor_one': 'Ira',
  'mentor_two': 'Mo',
  'mentor_one_email': 'ira@wby.org',
  'mentor_two_email': 'mo@wby.org'}}

I used a dictionary of dictionaries to store my teams in an organized fashion because it allows for each team to be accessed by name and then within that, you can access any specific information by the key that corresponds to that information rather than by a index (like you would have in a list or a tuple). This will help me write neater code later on because I can set helper variables to the various values in the dictionary. 
(Making the dictionary itself is also pretty simple, I just iterated through the raw teams and stored the info as variables then added the team as a key with a dictionary of it's info as the value).

#Part B - Sort a mixed schedule

Clean and sort `raw_matches` by **start_time** then **field** (so ties on time are broken alphabetically by field). We'll talk a bit more about how to do that below.

*   **Deliverable**: a list of normalized match tuples

We should see something like below afte we are done:  

("Field-1", "2025-09-27T09:30", "HHS-1729", "WBY-77"),
("Field-2", "2025-09-27T09:30", "FRM-101", "MRB-88"),
("Field-1", "2025-09-27T10:15", "FRM-101", "HHS-1729"),
("Field-2", "2025-09-27T10:15", "MRB-88", "WBY-77"),
("Field-1", "2025-09-27T11:00", "MRB-88", "HHS-1729")

In this task, we need to do multi-sorting. We might find that two matches have the same start time, which could cause a jam in our sorting. We need to break that jam by sorting using two factors. We can do that by creating an extra function to help us:


In [4]:
def sorting_key(match):
  #Return a tuple where the first element is the time of the match and the second is the field number
    field = match[0]
    time = match[1]
    return (time, field)
    pass

If done correctly, we should see `("2025-09-27T09:30", "Field-2")`as the output from the function below:

In [5]:
print(sorting_key(randomized_matches[0]))

('2025-09-27T12:30', 'Field-2')


We can then apply this function to our sorting method:

In [6]:
sorted_matches = sorted(randomized_matches, key=sorting_key) #This will do conventional, ascending sorts but will first look at time of
print(sorted_matches)

[('Field-1', '2025-09-27T09:30', 'WBY-77', 'HHS-1729'), ('Field-2', '2025-09-27T09:30', 'MRB-88', 'FRM-101'), ('Field-1', '2025-09-27T10:15', 'HHS-1729', 'FRM-101'), ('Field-2', '2025-09-27T10:15', 'MRB-88', 'WBY-77'), ('Field-1', '2025-09-27T11:00', 'HHS-1729', 'MRB-88'), ('Field-2', '2025-09-27T11:45', 'WBY-77', 'FRM-101'), ('Field-1', '2025-09-27T12:30', 'MRB-88', 'HHS-1729'), ('Field-2', '2025-09-27T12:30', 'FRM-101', 'WBY-77'), ('Field-1', '2025-09-27T13:15', 'HHS-1729', 'WBY-77'), ('Field-2', '2025-09-27T13:15', 'MRB-88', 'FRM-101'), ('Field-1', '2025-09-27T14:00', 'FRM-101', 'HHS-1729'), ('Field-2', '2025-09-27T14:00', 'WBY-77', 'MRB-88'), ('Field-1', '2025-09-27T14:45', 'HHS-1729', 'MRB-88'), ('Field-2', '2025-09-27T14:45', 'FRM-101', 'WBY-77'), ('Field-1', '2025-09-27T15:30', 'WBY-77', 'HHS-1729'), ('Field-2', '2025-09-27T15:30', 'MRB-88', 'FRM-101'), ('Field-1', '2025-09-27T16:15', 'FRM-101', 'HHS-1729'), ('Field-2', '2025-09-27T16:15', 'WBY-77', 'MRB-88'), ('Field-1', '2025-

A sorting key was useful for this because it allows us to just plug in our matches along with our sorting key and get the ordered matches. It also keeps the code super readable because the key itself is very simplistic code (just returns the data in a certain order) in contrast to something like a lambda function which can look pretty confusing.

#Part C — Detect cross-team mentor conflicts (duplicate emails)

Using the team data, find any emails that appear on multiple teams.
*   **Deliverable**: produce a report like "xxxx@xxxxx.org": {"xxx-####","xxx-####"}}. Where we can see that xxxx@xxxxx.org was listed for two different teams

Complete this using code that could be applied to a different data set, for example, one with more names. 


In [7]:
dupe_check = {}
dupes = {}
for num, teams in teams_dict.items():
    current_mentor_one = teams["mentor_one"]
    current_mentor_two = teams["mentor_two"]
    current_mentor_one_email = teams["mentor_one_email"]
    current_mentor_two_email = teams["mentor_two_email"]
    if current_mentor_one in dupe_check:
        dupe_check[current_mentor_one].append(num)
        dupes[current_mentor_one_email] = dupe_check[current_mentor_one]
    else:         
        dupe_check[current_mentor_one] = [num] 
    if current_mentor_two in dupe_check:
        dupe_check[current_mentor_two].append(num)
        dupes[current_mentor_two_email] = dupe_check[current_mentor_two]
    else:
        dupe_check[current_mentor_two] = [num]
dupes

    

{'ada@hudsonhs.org': ['HHS-1729', 'MRB-88']}

To find out which mentors were on multiple teams, I created two dictionaries: one that would hold all the mentors that are listed for each team (dupe_check), and one that held the teams that appeared more than once (dupes). I iterated through all of the teams in the teams dictionary I made earlier, keeping track of each team id and all of the data stored for that team. I then stored each team's current mentors and their emails. I checked if each mentor was already in dupe_check, if they were, I appended dupe_check so that the code could keep track of every team they were on (even if they were on more than two), if they were not, I added them to dupe_check to keep track of them and the team number they were listed under. If either mentor was found on dupe_check I also added them to a dictionary of dupes where their email was the key, and their value was a list of the teams they were on. Lastly, I printed the dupes list once I was done iterating through the teams to find out that Ada was the only mentor on more than one team, being on Hudson HS and Malborough HS.

#Part D — Build standings with real aggregation tools

Compute **total points by team** across all **raw_scores**, then list the top 3. Handle ties by breaking ties with **team_id** alphabetically.



*   **Deliverable**: an ordered list like [("FRM-101", 56), ("MRB-88", 71), ...].

A tool that can be useful for us is:

In [8]:
for match in raw_scores:
    field, time, teamA, ptsA, teamB, ptsB = match
    
    print(f"The match was played on {field} at {time} {teamA} had {ptsA} and {teamB} had {ptsB}")

The match was played on Field-1 at 2025-09-27T09:30 WBY-77 had 22 and HHS-1729 had 18
The match was played on Field-2 at 2025-09-27T09:30 MRB-88 had 25 and FRM-101 had 25
The match was played on Field-1 at 2025-09-27T10:15 HHS-1729 had 17 and FRM-101 had 31
The match was played on Field-2 at 2025-09-27T10:15 MRB-88 had 27 and WBY-77 had 23
The match was played on Field-1 at 2025-09-27T11:00 HHS-1729 had 22 and MRB-88 had 19
The match was played on Field-2 at 2025-09-27T11:45 WBY-77 had 26 and FRM-101 had 24
The match was played on Field-1 at 2025-09-27T12:30 MRB-88 had 28 and HHS-1729 had 30
The match was played on Field-2 at 2025-09-27T12:30 FRM-101 had 24 and WBY-77 had 22
The match was played on Field-1 at 2025-09-27T13:15 HHS-1729 had 19 and WBY-77 had 23
The match was played on Field-2 at 2025-09-27T13:15 MRB-88 had 32 and FRM-101 had 29
The match was played on Field-1 at 2025-09-27T14:00 FRM-101 had 27 and HHS-1729 had 25
The match was played on Field-2 at 2025-09-27T14:00 WBY-77

This will extract six variables from each match. In this case I just printed the contents, but there could be more valuable uses!

Write your code in chunks with short explanations of what you are doing after each chunk.

In [9]:
wins_and_points = {}
for match in raw_scores:
    field, time, teamA, ptsA, teamB, ptsB = match
    if teamA not in wins_and_points:
        wins_and_points[teamA] = [0, 0]
    if teamB not in wins_and_points:
        wins_and_points[teamB] = [0, 0]
    current_teamA_wins = wins_and_points[teamA][0]
    current_teamA_pts = wins_and_points[teamA][1] + ptsA
    current_teamB_wins = wins_and_points[teamB][0]
    current_teamB_pts = wins_and_points[teamB][1] + ptsB
    
    #part two:
    
    if ptsA > ptsB:
        wins_and_points[teamA] = [current_teamA_wins + 1, current_teamA_pts]
        wins_and_points[teamB] = [current_teamB_wins, current_teamB_pts]
    elif ptsB > ptsA:
        wins_and_points[teamB] = [current_teamB_wins + 1, current_teamB_pts]
        wins_and_points[teamA] = [current_teamA_wins, current_teamA_pts]
    else:
        wins_and_points[teamA] = [current_teamA_wins, current_teamA_pts]
        wins_and_points[teamB] = [current_teamB_wins, current_teamB_pts]
wins_and_points
    

{'WBY-77': [4, 278],
 'HHS-1729': [6, 302],
 'MRB-88': [8, 300],
 'FRM-101': [6, 327]}

In this part, I found the amount of wins and points for each team using a dictionary with lists as values to keep track of the teams' stats throughout the matches. To start, I created an empty divctionary that would be used to store the teams with their respective wins and points. I then iterated through all the matches and seperated organized the data for each match by creating variables for the field, the time, the first team playing, the first team's points, the second team playing, and the second team's points. I checked whether or not either team was in the dictonary yet, if not, I added that team with 0 wins and 0 points. After that, I stored each team's current wins and points (plus the points they recieved from the current match).

Once I had all of the data stored in variables, I checked which team won the match by comparing the points they each earned in that match. If a team won, I stored that team back into the dictionary with one additional win. I checked if either team won, or if there was a tie and in each, I added a win to the dictionary accordingly (for a tie everyone loses). From there, by printing out the dictionary where we can see that Westbrough HS had 4 wins and 278  points, Hudson HS had 6 wins and 302 points, Malborough HS had 8 wins and 300 points, and Framingham HS had 6 wins and 327 points

In [10]:
def sorting_help(team):
    team_name = team[0]
    team_wins = team[1][0]
    team_score = team[1][1]
    return (team_wins,team_score,team_name)
    
print(sorted(wins_and_points.items(), key=sorting_help, reverse=True))


[('MRB-88', [8, 300]), ('FRM-101', [6, 327]), ('HHS-1729', [6, 302]), ('WBY-77', [4, 278])]


Once I had a finished dictionary, it was fairly easy to create standings ordered by wins with points and alphabetical names as the tiebreaker. I made a function that returned the data from the dictionary's values in the order it should be sorted by: wins, points, name. I then just had to print the items of the wins and points dictionary while using the sorted function to sort it using our key. I also reversed it to put it into descending order so that first place prints out first.

#Part E — Team activity summary (grouping)

Produce, for each team, a summary dict:

```
{
  "HHS-1729": {
    "school": "Hudson HS",
    "mentors": ["Ada","Lin"],
    "matches": 3,
    "total_points": 57,
    "workshops": ["PID", "VIS"]
  },
  ...
}
```


In [11]:
summary_dict = {}
for team_num, team_info in teams_dict.items():
    number_of_matches = 0
    school, mentor_one, mentor_two, mentor_one_email, mentor_two_email = team_info.values()
    workshops = set({})
    points = wins_and_points[team_num][1]
    for match in randomized_matches:
        number_of_matches += match.count(team_num)
    for workshop in raw_workshops:
        if mentor_one_email in workshop or mentor_two_email in workshop:
            workshops.add(workshop[1])
    summary_dict[team_num] = {"school": school, "mentors": [mentor_one, mentor_two], "matches": number_of_matches, "win_rate": f"{100* wins_and_points[team_num][0]/number_of_matches:.2f}%", "total_points": points, "workshops": workshops}

In [12]:
for team, information in summary_dict.items():
    print(f"{team}: {{")
    for key, item in information.items():
        print(f"\t{key}: {item}")



HHS-1729: {
	school: Hudson HS
	mentors: ['Ada', 'Lin']
	matches: 13
	win_rate: 46.15%
	total_points: 302
	workshops: {'PID', 'VIS'}
FRM-101: {
	school: Framingham HS
	mentors: ['Sam', 'Tess']
	matches: 13
	win_rate: 46.15%
	total_points: 327
	workshops: {'PID'}
MRB-88: {
	school: Marlborough HS
	mentors: ['Kai', 'Ada']
	matches: 12
	win_rate: 66.67%
	total_points: 300
	workshops: {'PID', 'VIS'}
WBY-77: {
	school: Westborough HS
	mentors: ['Ira', 'Mo']
	matches: 12
	win_rate: 33.33%
	total_points: 278
	workshops: {'VIS'}


To finish by creating a summary dictionary, I started by iterating through the teams dictionary I made in part B and storing all the variables within it. I then made a new set that would store workshops (I chose a set because I don't want any repeats) and I iterated through the raw workshop list, adding any workshops that had either of the mentors' emails associated witht them to the set. I also iterated through the randomized matches list and counted the matches eacht team played by keeping track of the number of appearences by each team id using the .count() function. I also calculated the win rate by dividing the number of wins by the number of matches played and multiplying that answer by 100 to get a percent. I put all of that info into a dictionary with the team id as the key, and then printed it out by iterating through it and indenting the information of each team.