#Robotics Data Analyst
In this activity, you are taking on the role of a Data Analyst for a Robotics Tournament. Your job is to clean up messy information about teams, matches, scores, and workshops so the event can run smoothly. You’ll decide which Python data structures (lists, dictionaries, tuples, and sets) are the best tools for different tasks—like sorting schedules, adding up scores, or checking for mistakes. Just like a real analyst, you’ll need to think carefully about how to organize data so it’s accurate, efficient, and easy to use.<br><br>

Your audience for this is someone who doesn't know about code, but wants to see how you got your results. They don't just want to see you write a paper with no supporting code to back it up, but they also don't want to see a big block of code with no support explanation. <br><br>

A key part of your job is to do the analysis, but also to explain your work. The final report should be a very intentional mix of code with text. **You should not have giant blocks of code**. Every 2-4 lines should include some explanation of what you are doing.

In [None]:
# teams.csv-like rows you parsed already:
raw_teams = [
    ("HHS-1729", "Hudson HS", ("Ada", "ada@hudsonhs.org"), ("Lin", "lin@hudsonhs.org")),
    ("FRM-101",  "Framingham HS", ("Sam", "sam@frhs.org"), ("Tess", "tess@frhs.org")),
    ("MRB-88",   "Marlborough HS", ("Kai", "kai@mrbhs.org"), ("Ada", "ada@hudsonhs.org")),  # oops: duplicate mentor email across teams
    ("WBY-77",   "Westborough HS", ("Ira", "ira@wby.org"), ("Mo", "mo@wby.org")),
]

# provisional match schedule (field, start_time, teamA, teamB). Times are ISO strings.
raw_matches = [
    ("Field-2", "2025-09-27T09:30", "FRM-101", "MRB-88"),
    ("Field-1", "2025-09-27T09:30", "HHS-1729", "WBY-77"),
    ("Field-1", "2025-09-27T10:15", "FRM-101", "HHS-1729"),
    ("Field-2", "2025-09-27T10:15", "MRB-88", "WBY-77"),
    ("Field-1", "2025-09-27T11:00", "MRB-88", "HHS-1729"),
]

# reported scores coming in from refs (team_id -> points for that match)
# multiple rows per match; some typos/duplicates possible
raw_scores = [
    {"time": "2025-09-27T09:30", "field": "Field-1", "scores": {"HHS-1729": 18, "WBY-77": 22}},
    {"time": "2025-09-27T09:30", "field": "Field-2", "scores": {"FRM-101": 25, "MRB-88": 25}},  # tie
    {"time": "2025-09-27T10:15", "field": "Field-1", "scores": {"FRM-101": 31, "HHS-1729": 17}},
    {"time": "2025-09-27T10:15", "field": "Field-2", "scores": {"MRB-88": 27, "WBY-77": 27}},   # tie
    {"time": "2025-09-27T11:00", "field": "Field-1", "scores": {"MRB-88": 19, "HHS-1729": 22}},
]

# workshop signups: (participant_email, workshop_code)
raw_workshops = [
    ("ada@hudsonhs.org", "VIS"), ("lin@hudsonhs.org", "PID"),
    ("sam@frhs.org", "PID"), ("tess@frhs.org", "PID"),
    ("kai@mrbhs.org", "PID"), ("ira@wby.org", "VIS"), ("mo@wby.org", "VIS"),
    ("ada@hudsonhs.org", "PID"),  # duplicate signup; should count once per workshop
]
workshop_caps = {"PID": 4, "VIS": 3}  # PID control lab; Vision workshop


Whenever you are getting to work with data, you should first explore it. We will get more tools for doing this as we learn more, but we can still investigate this. In the space below, you have a couple different coding fields to work with. In that space, I want you to show evidence that you have explored these lists. That can be:


*   Looping through the lists printing items
*   Using type() to confirm data types
*   Using other tools such as len(), min(), max() to find key data points
*   Looking at individual dictionaries to confirm that you know how they work. <br>
You can add additional entries by hitting the 'b' key. After each code block, insert a short written block to explain what you found.






#Part A — Choose representations (immutability vs. mutability)

1. Participants/teams: Convert each raw_teams row into a record you won’t accidentally mutate (e.g., for stable IDs).
*   Construct a structure for each team where the team code and school name are fixed, and mentors are stored as fixed pairs.
*   **Deliverable**: a collection of team records; **justify** why **tuples** (possibly nested) are appropriate for the atomic “record” parts.
*   *Hint*: immutable records (tuples) are often used as fixed keys or row-like containers.


2. Build a fast lookup from team code → school/mentors.
*   **Deliverable**: a dictionary mapping `team_id` to a collection of information about that team. Information included should: which school is represented by the id and the team's mentors. We'll deal with points and records later

In [None]:
#Complete #1 below

Use this space to justify why tuples made sense

In [None]:
#Complete #2 below

Use this space to justify your choice in data structure

#Part B - Sort a mixed schedule

Clean and sort `raw_matches` by **start_time** then **field** (so ties on time are broken alphabetically by field). We'll talk a bit more about how to do that below.

*   **Deliverable**: a list of normalized match tuples

In this task, we need to do multi-sorting. We might find that two matches have the same start time, which could cause a jam in our sorting. We need to break that jam by sorting using two factors. We can do that by creating an extra function to help us:


In [None]:
def sorting_key(match):
  #Return a tuple where the first element is the time of the match and the second is the field number
  pass

If done correctly, we should see `("2025-09-27T09:30", "Field-2")`as the output from the function below:

In [None]:
print(sorting_key(raw_matches[0]))

We can then apply this function to our sorting method:

In [None]:
sorted_matches = sorted(raw_matches, key=sorting_key) #This will do conventional, ascending sorts but will first look at time of
print(sorted_matches)

Explain how a sorting key was useful for this. Choose another set of data from our robot match to explain how you might apply the sorting function

#Part C — Detect cross-team mentor conflicts (duplicate emails)

Using the team data, find any emails that appear on multiple teams.
*   **Deliverable**: produce a report like {"ada@hudsonhs.org": {"HHS-1729","MRB-88"}}. Where we can see that ada@hudsonhs.org was listed for two different teams
*   **Hint**: Think about how we can use sets and unions/intersections for this


Explain your work and how sets were used

#Part D — Build standings with real aggregation tools

Compute **total points by team** across all **raw_scores**, then list the top 3. Handle ties by breaking ties with **team_id** alphabetically.



*   **Deliverable**: an ordered list like [("FRM-101", 56), ("MRB-88", 71), ...].

Write your code in chunks with short explanations of what you are doing after each chunk.

#Part E — Team activity summary (grouping)

Produce, for each team, a summary dict:

```
{
  "HHS-1729": {
    "school": "Hudson HS",
    "mentors": ["Ada","Lin"],
    "matches": 3,
    "total_points": 57,
    "workshops": ["PID", "VIS"]
  },
  ...
}
```
