### Input

Hi ChatGPT, I’m working on my Week 8 Instructor-led Lab for my graduate Python course (BGEN 632 - UM MSBA). 
I am not looking for final answers. I want help understanding how to approach the assignment step-by-step. Provde a step by Step using a different hypothetical question without giving me answers so I can figute it out myself.
Can you walk me through how I should structure the logic using pseudo-code or examples—without giving me the full solution? I’ll write my own code based on what I learn. # Instructor-led Lab: Advanced Data Manipulation

In this assignment you will practice your new skills in data manipulation with the *piping* expression for `pandas`. Please follow the instructions below.

## Sorting and Ordering Data

Last week you sorted, ordered, and filtered the data [github_teams.csv](/data/github_teams.csv) using basic pandas techniques. Now that you have learned to use advanced techniques relying on piping expressions, you will perform the similar operations again, but with your new skills. 

Please perform the following operations in order:

* Select the columns `Team_type`, `human_work`, and `work_per_human`.
* Select columns that end in the letter `t`. Use the regex `t$`.
* Sort your data descending using the columns `Team_size_class`, `human_work`, `work_per_human`.
* Select `human-bot` teams that have a `bot_members_count` value greater than and equal to 3.
* Find the `human` teams that are `Large` and have a `human_gini` value greater than and equal to 0.75.
* How many teams are in the `Small` or `Large` category?
* How many teams are in the `Small` or `Large` category with a `human_gini` value less than and equal to 0.25?
* How many `human` teams are in the `Medium` category?
* Save the columns `Team_size_class` and `work_per_human` as a new DataFrame.
* Rename these two columns in the new DataFrame. Change `human_gini` to `work_inequality` and `eval_survival_day_median` to `issue_resolution_time`.

## Dataset Contents

| Variable  | Description | 
| :------------ |:---------------|
| name_h | Alphanumeric hash generated to anonymize repo name |
| Team_type |  A qualitative descriptor indicating whether the team was made of only humans or a blend of humans and bots (two levels: human, human-bot) |
| Team_size_class |  A qualitative descriptor indicating the size of the team, derived from human_members_count (three levels: small [2, 3], medium [4, 6], large [7, 246]) |
| human_members_count  | The number of human users in the repo |
| bot_members_count | The number of bots in the repo |
| human_work | The number of work events generated by humans in the repo |
| work_per_human | The ratio of works events to humans, derived from human_members_count and human_work |
| human_gini | Gini coefficient for human work in the repo |
| human_Push | The number of push events generated by humans in the repo |
| human_IssueComments | The number of issue comment events generated by humans in the repo |
| human_PRReviewComment | The number of pull request review comment events generated by humans in the repo |
| human_MergedPR | The number of merged pull request events generated by humans in the repo |
| bot_work | The number of work events generated by bots in the repo |
| bot_Push | The number of push events generated by bots in the repo |
| bot_IssueComments | The number of issue comment events generated by bots in the repo |
| bot_PRReviewComment | The number of pull request review comment events generated by bots in the repo |
| bot_MergedPR | The number of merged pull request events generated by bots in the repo |
| eval_survival_day_median | The median number of days that an issue remained open in the repo (teams who were not included in issue survival analysis have NA value) |
| issues_count | The number of issues in the repo |


### Output

Absolutely, Michelle! Here’s a step-by-step walk-through using pseudo-code and a different hypothetical example, so you can confidently apply the same logic to your GitHub teams lab.

⸻

🧠 Hypothetical Scenario (Just for Practice)

Imagine you have a dataset: startup_teams.csv, with the following columns:
	•	Team_category = type of team (human-only, human-ai)
	•	Team_size = size category (Small, Medium, Large)
	•	ai_members = number of AI agents
	•	human_effort = number of work units by humans
	•	effort_per_human = human_effort divided by human headcount
	•	effort_gini = Gini coefficient for effort distribution
	•	resolution_days_median = median time to resolve tasks

⸻

✅ Pseudo-Code Strategy by Task (Matches Your Real Assignment)

⸻

1. Select specific columns

Goal: Select "Team_category", "human_effort", "effort_per_human"

# Think: df %>% select(...)
(df
 .filter(["Team_category", "human_effort", "effort_per_human"])
)



⸻

2. Select columns ending in “t” using regex

Goal: Keep only columns like effort_per_human, result_count, etc.

(df
 .filter(regex="t$")
)



⸻

3. Sort by multiple columns descending

Goal: Sort by Team_size, human_effort, effort_per_human

(df
 .sort_values(by=["Team_size", "human_effort", "effort_per_human"], ascending=False)
)



⸻

4. Filter rows with two conditions

Goal: Only human-ai teams with ≥ 3 AI members

(df
 .query("Team_category == 'human-ai' and ai_members >= 3")
)



⸻

5. Filter based on size and inequality

Goal: Only large human teams with high effort Gini (≥ 0.75)

(df
 .query("Team_category == 'human' and Team_size == 'Large' and effort_gini >= 0.75")
)



⸻

6. Count teams in Small or Large category

(df
 .query("Team_size in ['Small', 'Large']")
 .shape[0]
)



⸻

7. Count Small/Large teams with low Gini (≤ 0.25)

(df
 .query("Team_size in ['Small', 'Large'] and effort_gini <= 0.25")
 .shape[0]
)



⸻

8. Count human-only Medium teams

(df
 .query("Team_category == 'human' and Team_size == 'Medium'")
 .shape[0]
)



⸻

9. Save subset of two columns as new DataFrame

new_df = df.filter(["Team_size", "effort_per_human"])



⸻

10. Rename columns in the new DataFrame

If it had these names, this would be the logic:

new_df.rename(columns={
    "effort_gini": "effort_inequality",
    "resolution_days_median": "task_resolution_time"
})



⸻

🧭 General Tips for Your Real Lab
	•	Think in terms of: Filter → Query → Sort → Save → Rename
	•	Use .query() for row conditions, .filter() for column selection
	•	Wrap your pipelines in parentheses () for clean chaining
	•	Think of each .method() as the next %>% step in tidyverse

⸻

Let me know when you’re ready to test your logic on the actual dataset and I can help you debug or structure your answers!