In [9]:
import pandas as pd


data = {
    'tweet_id': [1, 2],
    'content': ['Let us Code', 'More than fifteen chars are here!']
}


tweets = pd.DataFrame(data)

Input: 
Tweets table:
+----------+-----------------------------------+
| tweet_id | content                           |
+----------+-----------------------------------+
| 1        | Let us Code                       |
| 2        | More than fifteen chars are here! |
+----------+-----------------------------------+
Output: 
+----------+
| tweet_id |
+----------+
| 2        |
+----------+


In [12]:

def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    """
    Returns a DataFrame containing tweets that are longer than 15 characters.
    
    :param tweets: DataFrame with tweet data
    :return: DataFrame with invalid tweets
    """
    
    return tweets[tweets['content'].str.len() > 15][['tweet_id']]

In [13]:
invalid_tweets(tweets)

Unnamed: 0,tweet_id
1,2


you're passing a Boolean mask to the DataFrame.

Each row gets evaluated: True if content length > 15, otherwise False.

This returns a new filtered DataFrame containing only rows that match.
The double brackets ([[]]) are used to select a subset of columns as a DataFrame (not just a Series).

Single brackets: tweets['tweet_id'] → returns a Series.

Double brackets: tweets[['tweet_id']] → returns a DataFrame.

## Table Schema

| Column Name | Type    |
|-------------|---------|
| employee_id | int     |
| name        | varchar |
| salary      | int     |

- `employee_id` is the primary key (column with unique values) for this table.
- Each row of this table indicates the employee ID, employee name, and salary.

---

## Task

Write a solution to calculate the bonus of each employee.

- The bonus of an employee is **100% of their salary** if:
  - The employee ID is an **odd number**, and
  - The employee's name **does not start with** the character `'M'`.
- The bonus of an employee is **0 otherwise**.

Return the result table **ordered by `employee_id`**.

---

## Example 1

### Input: `Employees` table

| employee_id | name    | salary |
|-------------|---------|--------|
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |

---

### Output

| employee_id | bonus |
|-------------|-------|
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |


In [14]:
data = {
    'employee_id': [2, 3, 7, 8, 9],
    'name': ['Meir', 'Michael', 'Addilyn', 'Juan', 'Kannon'],
    'salary': [3000, 3800, 7400, 6100, 7700]
}

employees = pd.DataFrame(data)

In [None]:
def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    
    employees = employees.copy()  # Avoid modifying the original DataFrame

    employees['bonus'] = 0  # Initialize bonus column with zeros

    condition = (employees['employee_id'] % 2 == 1) & (~employees['name'].str.startswith('M')) # boolean mask (T/F) for each row that represents who should get a bonus

    #employees['employee_id'] % 2 == 1,  True for odd-numbered employee IDs.
    #~employees['name'].str.startswith('M'), True for names that do not start with 'M'., ~ is the logical NOT operator
    # & combines both conditions, you only get true if both are true 


    employees.loc[condition, 'bonus'] = employees.loc[condition, 'salary']
    #this assigns bonus values only to rows that match the condition
    # employees.loc[condition, 'bonus'], this selects the 'bonus' column only for the rows where the condition is True
    # employees.loc[condition, 'salary'], this selects the 'salary' column only for the rows where the condition is True
    
    return employees[['employee_id', 'bonus']]


In [37]:
calculate_special_bonus(employees)

Unnamed: 0,employee_id,bonus
0,2,0
1,3,0
2,7,7400
3,8,0
4,9,7700


df.loc[rows, columns]
Used when you want to select by names of rows/columns.

Rows can be selected with Boolean conditions (most common).

Columns are selected by name.

Example:

df.loc[df['salary'] > 5000, 'name']  # All names where salary > 5000




.iloc → Index-based

df.iloc[rows, columns]
Used when you want to select by integer position.

Rows/columns must be integers or slices.

Example:

df.iloc[0:2, 1]  # Row 0 to 1, column at index 1 (usually the 2nd column)





A Series is a single column of data — like a column in Excel.

It has values and an index.

It's 1-dimensional.


df['salary']  # This is a Series
Think of it as:

Index   Value  
0       3000  
1       4000  
2       5000  


A DataFrame is a table — like a full Excel sheet.

It’s 2-dimensional: rows and columns.

Each column is a Series.

df[['salary', 'name']]  # This is a DataFrame




A Boolean Series is a column of True or False values.

You get one when you ask a yes/no question about a column:

df['salary'] > 5000
This returns something like:

0    False  
1     True  
2     True  
And you use this to filter rows in a DataFrame:

df[df['salary'] > 5000]

bitwise operators 
& → "and" between arrays of True/False

| → "or"

~ → "not"

| Name  | Salary | is_active |
| ----- | ------ | ---------- |
| Alice | 3000   | True       |
| Bob   | 4000   | False      |
| Carol | 5000   | True       |



df['salary'] → You're asking: "What is everyone's salary?"

df[df['is_active']] → You're asking: "Show me the rows where is_active is True"




In [38]:
#optimal 

def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees['bonus'] = 0
    employees.loc[(employees['employee_id'] % 2 != 0) & (~employees['name'].str.startswith('M')),'bonus'] = employees['salary']
    return employees[['employee_id','bonus']].sort_values(by='employee_id')