#Robotics Data Analyst
In this activity, you are taking on the role of a Data Analyst for a Robotics Tournament. Your job is to clean up messy information about teams, matches, scores, and workshops so the event can run smoothly. You’ll decide which Python data structures (lists, dictionaries, tuples, and sets) are the best tools for different tasks—like sorting schedules, adding up scores, or checking for mistakes. Just like a real analyst, you’ll need to think carefully about how to organize data so it’s accurate, efficient, and easy to use.<br><br>

Your audience for this is someone who doesn't know about code, but wants to see how you got your results. They don't just want to see you write a paper with no supporting code to back it up, but they also don't want to see a big block of code with no support explanation. <br><br>

A key part of your job is to do the analysis, but also to explain your work. The final report should be a very intentional mix of code with text. **You should not have giant blocks of code**. For example, a function should be its own code block. Every 4-6 lines should include some explanation of what you are doing. That's just a rough estimate though. Think about your code in terms of discreet chunks. Each chunk should be a code block.

In [75]:
# teams.csv-like rows you parsed already:
raw_teams = [
    ["HHS-1729", "Hudson HS", "Ada", "ada@hudsonhs.org", "Lin", "lin@hudsonhs.org"],
    ["FRM-101",  "Framingham HS", "Sam", "sam@frhs.org", "Tess", "tess@frhs.org"],
    ["MRB-88",   "Marlborough HS", "Kai", "kai@mrbhs.org", "Ada", "ada@hudsonhs.org"], 
    ["WBY-77",   "Westborough HS", "Ira", "ira@wby.org", "Mo", "mo@wby.org"],
]

# provisional match schedule (field, start_time, teamA, teamB)
# Times are ISO strings in the format YYYY-MM-DDTHH:MM
raw_matches = [
    ('Field-2', '2025-09-27T12:30', 'FRM-101', 'WBY-77'),
    ('Field-1', '2025-09-27T16:15', 'FRM-101', 'HHS-1729'),
    ('Field-2', '2025-09-27T11:45', 'WBY-77', 'FRM-101'),
    ('Field-1', '2025-09-27T11:00', 'HHS-1729', 'MRB-88'),
    ('Field-2', '2025-09-27T17:00', 'FRM-101', 'MRB-88'),
    ('Field-1', '2025-09-27T15:30', 'WBY-77', 'HHS-1729'),
    ('Field-1', '2025-09-27T17:45', 'MRB-88', 'WBY-77'),
    ('Field-2', '2025-09-27T17:45', 'HHS-1729', 'FRM-101'),
    ('Field-2', '2025-09-27T18:30', 'MRB-88', 'HHS-1729'),
    ('Field-1', '2025-09-27T13:15', 'HHS-1729', 'WBY-77'),
    ('Field-2', '2025-09-27T10:15', 'MRB-88', 'WBY-77'),
    ('Field-2', '2025-09-27T15:30', 'MRB-88', 'FRM-101'),
    ('Field-2', '2025-09-27T14:00', 'WBY-77', 'MRB-88'),
    ('Field-1', '2025-09-27T17:00', 'HHS-1729', 'WBY-77'),
    ('Field-1', '2025-09-27T14:45', 'HHS-1729', 'MRB-88'),
    ('Field-2', '2025-09-27T16:15', 'WBY-77', 'MRB-88'),
    ('Field-1', '2025-09-27T09:30', 'WBY-77', 'HHS-1729'),
    ('Field-1', '2025-09-27T12:30', 'MRB-88', 'HHS-1729'),
    ('Field-2', '2025-09-27T14:45', 'FRM-101', 'WBY-77'),
    ('Field-2', '2025-09-27T13:15', 'MRB-88', 'FRM-101'),
    ('Field-2', '2025-09-27T09:30', 'MRB-88', 'FRM-101'),
    ('Field-1', '2025-09-27T19:15', 'HHS-1729', 'FRM-101'),
    ('Field-1', '2025-09-27T14:00', 'FRM-101', 'HHS-1729'),
    ('Field-1', '2025-09-27T10:15', 'HHS-1729', 'FRM-101'),
    ('Field-1', '2025-09-27T18:30', 'WBY-77', 'FRM-101'),
]

# reported scores (field, time, team_id, points)
raw_scores = [
    ("Field-1", "2025-09-27T09:30", "WBY-77", 22, "HHS-1729", 18),
    ("Field-2", "2025-09-27T09:30", "MRB-88", 25, "FRM-101", 25),
    ("Field-1", "2025-09-27T10:15", "HHS-1729", 17, "FRM-101", 31),
    ("Field-2", "2025-09-27T10:15", "MRB-88", 27, "WBY-77", 23),
    ("Field-1", "2025-09-27T11:00", "HHS-1729", 22, "MRB-88", 19),

    ("Field-2", "2025-09-27T11:45", "WBY-77", 26, "FRM-101", 24),
    ("Field-1", "2025-09-27T12:30", "MRB-88", 28, "HHS-1729", 30),
    ("Field-2", "2025-09-27T12:30", "FRM-101", 24, "WBY-77", 22),
    ("Field-1", "2025-09-27T13:15", "HHS-1729", 19, "WBY-77", 23),
    ("Field-2", "2025-09-27T13:15", "MRB-88", 32, "FRM-101", 29),

    ("Field-1", "2025-09-27T14:00", "FRM-101", 27, "HHS-1729", 25),
    ("Field-2", "2025-09-27T14:00", "WBY-77", 20, "MRB-88", 22),
    ("Field-1", "2025-09-27T14:45", "HHS-1729", 24, "MRB-88", 26),
    ("Field-2", "2025-09-27T14:45", "FRM-101", 18, "WBY-77", 17),
    ("Field-1", "2025-09-27T15:30", "WBY-77", 28, "HHS-1729", 31),

    ("Field-2", "2025-09-27T15:30", "MRB-88", 24, "FRM-101", 26),
    ("Field-1", "2025-09-27T16:15", "FRM-101", 33, "HHS-1729", 20),
    ("Field-2", "2025-09-27T16:15", "WBY-77", 23, "MRB-88", 27),
    ("Field-1", "2025-09-27T17:00", "HHS-1729", 29, "WBY-77", 27),
    ("Field-2", "2025-09-27T17:00", "FRM-101", 22, "MRB-88", 24),

    ("Field-1", "2025-09-27T17:45", "MRB-88", 21, "WBY-77", 19),
    ("Field-2", "2025-09-27T17:45", "HHS-1729", 20, "FRM-101", 18),
    ("Field-1", "2025-09-27T18:30", "WBY-77", 28, "FRM-101", 26),
    ("Field-2", "2025-09-27T18:30", "MRB-88", 25, "HHS-1729", 21),
    ("Field-1", "2025-09-27T19:15", "HHS-1729", 26, "FRM-101", 24),
]

# workshop signups: participant_email, workshop_code
raw_workshops = [
    ("ada@hudsonhs.org", "VIS"), 
    ("lin@hudsonhs.org", "PID"),
    ("sam@frhs.org", "PID"), 
    ("tess@frhs.org", "PID"),
    ("kai@mrbhs.org", "PID"), 
    ("ira@wby.org", "VIS"), 
    ("mo@wby.org", "VIS"),
    ("ada@hudsonhs.org", "PID"),  
]

# workshop capacities (could also be given as flat CSV rows later)
workshop_caps = [
    ("PID", "4"),
    ("VIS", "3")
]

Whenever you are getting to work with data, you should first explore it. We will get more tools for doing this as we learn more, but we can still investigate this. In the space below, you have a couple different coding fields to work with. In that space, I want you to show evidence that you have explored these lists. That can be:


*   Looping through the lists printing items
*   Using type() to confirm data types
*   Using other tools such as len(), min(), max() to find key data points
*   Looking at individual dictionaries to confirm that you know how they work. <br>
You can add additional entries by hitting the 'b' key. After each code block, insert a short written block to explain what you found.






#Part A — Choose representations (immutability vs. mutability)

1. Participants/teams: Convert each raw_teams row into a record you won’t accidentally mutate (e.g., for stable IDs).
*   The data structure should have information about each school: team number, high school name, grouped mentor 1 information, grouped mentor 2 information
*   **Deliverable**: a collection of team records; **justify** your data structure choice.

In [300]:
#Complete #1 here
for team in raw_teams:
    team_id, school, mentor1_name, mentor1_email, mentor2_name, mentor2_email= team
    team_id_codes.append(team_id)
    mentor1=(mentor1_name,mentor1_email)
    mentor2=(mentor2_name, mentor2_email)
    team_info= (team_id, school)
    raw_team = ( team_info, mentor1, mentor2)
    print(raw_team)

(('HHS-1729', 'Hudson HS'), ('Ada', 'ada@hudsonhs.org'), ('Lin', 'lin@hudsonhs.org'))
(('FRM-101', 'Framingham HS'), ('Sam', 'sam@frhs.org'), ('Tess', 'tess@frhs.org'))
(('MRB-88', 'Marlborough HS'), ('Kai', 'kai@mrbhs.org'), ('Ada', 'ada@hudsonhs.org'))
(('WBY-77', 'Westborough HS'), ('Ira', 'ira@wby.org'), ('Mo', 'mo@wby.org'))


Use this space to justify why tuples made sense:

Tuples make sense to use for this task because each team ID is acociated with one school and each email is acociated with one mentor name. By grouping everything into tuples and then into one large tuple for each school, it ensures that the data is not accidentily edited laer since this is raw data that will not need to change.


2. Build a fast lookup from team code → school/mentors.
*   **Deliverable**: a dictionary mapping `team_id` to a collection of information about that team. Information included should: which school is represented by the id and the team's mentors. We'll deal with points and records later

In [301]:
teams_dicti = {}
for team in raw_teams:
    team_id, school, mentor1_name, mentor1_email, mentor2_name, mentor2_email = team
    mentor1=(mentor1_name,mentor1_email)
    mentor2=(mentor2_name,mentor2_email)
    teams_dicti[team_id]= school,mentor1,mentor2


The Code block above creates a dictionary where the key is the team ID and the output is the school and mentor info. The code block below allows you to input a team ID and prints out the team info that corresponds to that ID

In [302]:
inputid= "HHS-1729"
print(f"{inputid}: {teams_dicti[inputid]}")

HHS-1729: ('Hudson HS', ('Ada', 'ada@hudsonhs.org'), ('Lin', 'lin@hudsonhs.org'))


Use this space to justify your choice in data structure:
In this task I used a dictionary. I did this because the goal was to print out a team's info based on theri team ID meaning that each ID should be able to call a bunch of info about the school. Using an ID as the dictionary key, I can easily pulll up information about the school based on the ID just by using the ID to access the information in the dictionary.

#Part B - Sort a mixed schedule

Clean and sort `raw_matches` by **start_time** then **field** (so ties on time are broken alphabetically by field). We'll talk a bit more about how to do that below.

*   **Deliverable**: a list of normalized match tuples

We should see something like below afte we are done:  

("Field-1", "2025-09-27T09:30", "HHS-1729", "WBY-77"),
("Field-2", "2025-09-27T09:30", "FRM-101", "MRB-88"),
("Field-1", "2025-09-27T10:15", "FRM-101", "HHS-1729"),
("Field-2", "2025-09-27T10:15", "MRB-88", "WBY-77"),
("Field-1", "2025-09-27T11:00", "MRB-88", "HHS-1729")

In this task, we need to do multi-sorting. We might find that two matches have the same start time, which could cause a jam in our sorting. We need to break that jam by sorting using two factors. We can do that by creating an extra function to help us:


In [293]:
def sorting_key(match):
    field, start_time, team_a, team_b = match
    schedule=(start_time,field)
    return schedule

If done correctly, we should see `("2025-09-27T09:30", "Field-2")`as the output from the function below:

In [294]:
print(sorting_key(raw_matches[0]))

('2025-09-27T12:30', 'Field-2')


We can then apply this function to our sorting method:

In [259]:
sorted_matches = sorted(raw_matches, key=sorting_key) #This will do conventional, ascending sorts but will first look at time of
print(sorted_matches)

[('Field-1', '2025-09-27T09:30', 'WBY-77', 'HHS-1729'), ('Field-2', '2025-09-27T09:30', 'MRB-88', 'FRM-101'), ('Field-1', '2025-09-27T10:15', 'HHS-1729', 'FRM-101'), ('Field-2', '2025-09-27T10:15', 'MRB-88', 'WBY-77'), ('Field-1', '2025-09-27T11:00', 'HHS-1729', 'MRB-88'), ('Field-2', '2025-09-27T11:45', 'WBY-77', 'FRM-101'), ('Field-1', '2025-09-27T12:30', 'MRB-88', 'HHS-1729'), ('Field-2', '2025-09-27T12:30', 'FRM-101', 'WBY-77'), ('Field-1', '2025-09-27T13:15', 'HHS-1729', 'WBY-77'), ('Field-2', '2025-09-27T13:15', 'MRB-88', 'FRM-101'), ('Field-1', '2025-09-27T14:00', 'FRM-101', 'HHS-1729'), ('Field-2', '2025-09-27T14:00', 'WBY-77', 'MRB-88'), ('Field-1', '2025-09-27T14:45', 'HHS-1729', 'MRB-88'), ('Field-2', '2025-09-27T14:45', 'FRM-101', 'WBY-77'), ('Field-1', '2025-09-27T15:30', 'WBY-77', 'HHS-1729'), ('Field-2', '2025-09-27T15:30', 'MRB-88', 'FRM-101'), ('Field-1', '2025-09-27T16:15', 'FRM-101', 'HHS-1729'), ('Field-2', '2025-09-27T16:15', 'WBY-77', 'MRB-88'), ('Field-1', '2025-

Explain how a sorting key was useful for this.

A sorting key was helpful here because it grouped together the important info about each match into a tuple so that it cannot be edited later but all of the info can be called up by accessing sorting_key. 

#Part C — Detect cross-team mentor conflicts (duplicate emails)

Using the team data, find any emails that appear on multiple teams.
*   **Deliverable**: produce a report like "xxxx@xxxxx.org": {"xxx-####","xxx-####"}}. Where we can see that xxxx@xxxxx.org was listed for two different teams

Complete this using code that could be applied to a different data set, for example, one with more names. 



In [275]:
unique_email = set()
duplicates = []

for mentor in emailing:
    if mentor in unique_email:
        duplicates.append(mentor)
    else:
        unique_email.add(mentor)

The code snippet above finds any duplicate mentor's emails and the code below prints out their email and the teams that they are on

In [400]:
team_mentors_emails={}
for team in raw_teams:
    team_id, school, mentor1_name, mentor1_email, mentor2_name, mentor2_email = team
    for email in [mentor1_email, mentor2_email]:
        if email not in team_mentors_emails:
            team_mentors_emails[email]=[]
        team_mentors_emails[email].append(team_id)

The code block below prints out the mentor email and team IDs only for the duplicate mentors

In [401]:
for emails in duplic:
    print(f"{emails}: {team_mentors_emails[emails]}")

ada@hudsonhs.org: ['HHS-1729', 'MRB-88']


Explain your work:
I started by using a set to find the duplicate emails by adding each email to a set on the first pass through and then checking if each email is already in the set or not. If it is not in the set, it gets added to the set, if it is in the set then it gets added to the duplicates list. Then I use a dictionary to find the  team ID codes that belong to that email to find what teams they are on.

#Part D
Compute **total wins by team** and **total points** by teams across all raw_scores. Then list the teams in order. Handle ties by who scored more points.
    **Deliverable**: an ordered list like [("FRM-101", 4, 56), ("MRB-88", 3, 71), ...].
Write your code in chunks with short explanations of what you are doing after each chunk.

In [392]:
team_points={}
for field, time, team1_id, team1_points, team2_id, team2_points in raw_scores:
    for team, points in [(team1_id, team1_points), (team2_id, team2_points)]:
        if team not in team_points:
            team_points[team]=[0,0]
        team_points[team][1] += points
 
    if team1_points > team2_points:
        team_points[team1_id][0] += 1
    elif team2_points >team1_points:
        team_points[team2_id][0] += 1
    team_tuple= (team, team_points[team])


The code block above finds the total number of wins as well as total number of points for each team using a team_points dictionary and puts them into a tuple that shows the team ID, wins, and points. The code block below sorts each team's tuple so that they are displayed with the team with the greatest number of points to start, the the others following in order.

In [393]:
def sorting_key2 (result):
    team, wins, points= result
    return (-wins, -points)
result.sort(key=sorting_key2)
print (result)

[('MRB-88', 8, 300), ('FRM-101', 6, 327), ('HHS-1729', 6, 302), ('WBY-77', 4, 278)]


#Part E — Team activity summary (grouping)

Produce, for each team, a summary dict:

```
{
  "HHS-1729": {
    "school": "Hudson HS",
    "mentors": ["Ada","Lin"],
    "matches": 3,
    "total_points": 57,
    "workshops": ["PID", "VIS"]
  },
  ...
}
```


In [381]:
summary={}
for team in raw_teams:
    team_id, school, mentor1_name, mentor1_email, mentor2_name, mentor2_email = team
    summary[team_id]={"school":school, 
                      "mentors":[mentor1_name,mentor2_name], 
                      "Matches Won": team_points[team_id][0], 
                      "Total Points": team_points[team_id][1]}
    print(f"{team_id}:  {summary[team_id]}")

HHS-1729:  {'school': 'Hudson HS', 'mentors': ['Ada', 'Lin'], 'Matches Won': 6, 'Total Points': 302}
FRM-101:  {'school': 'Framingham HS', 'mentors': ['Sam', 'Tess'], 'Matches Won': 6, 'Total Points': 327}
MRB-88:  {'school': 'Marlborough HS', 'mentors': ['Kai', 'Ada'], 'Matches Won': 8, 'Total Points': 300}
WBY-77:  {'school': 'Westborough HS', 'mentors': ['Ira', 'Mo'], 'Matches Won': 4, 'Total Points': 278}


The summary dictionary takes in a team ID as its key and returns a whole list of information about the team: Their school, mentors, matches won, and total points