# 595. Big Countries

Table: World


| Column Name | Type    |
|:---:|:---:|
| name        | varchar |
| continent   | varchar |
| area        | int     |
| population  | int     |
| gdp         | bigint  |

name is the primary key (column with unique values) for this table.

Each row of this table gives information about the name of a country, the continent to which it belongs, its area, the population, and its GDP value.
 

A country is big if:

it has an area of at least three million (i.e., 3000000 km2), or

it has a population of at least twenty-five million (i.e., 25000000).

Write a solution to find the name, population, and area of the big countries.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
World table:

| name        | continent | area    | population | gdp          |
|:---:|:---:|:---:|:---:|:---:|
| Afghanistan | Asia      | 652230  | 25500100   | 20343000000  |
| Albania     | Europe    | 28748   | 2831741    | 12960000000  |
| Algeria     | Africa    | 2381741 | 37100000   | 188681000000 |
| Andorra     | Europe    | 468     | 78115      | 3712000000   |
| Angola      | Africa    | 1246700 | 20609294   | 100990000000 |

Output: 
| name        | population | area    |
|:---:|:---:|:---:|
| Afghanistan | 25500100   | 652230  |
| Algeria     | 37100000   | 2381741 |

In [2]:
import pandas as pd

data = {
    'name': ['Afghanistan', 'Albania', 'Algeria', 'Andorra', 'Angola'],
    'continent': ['Asia', 'Europe', 'Africa', 'Europe', 'Africa'],
    'area': [652230, 28748, 2381741, 468, 1246700],
    'population': [25500100, 2831741, 37100000, 78115, 20609294],
    'gdp': [20343000000, 12960000000, 188681000000, 3712000000, 100990000000]
}

df = pd.DataFrame(data)

df

Unnamed: 0,name,continent,area,population,gdp
0,Afghanistan,Asia,652230,25500100,20343000000
1,Albania,Europe,28748,2831741,12960000000
2,Algeria,Africa,2381741,37100000,188681000000
3,Andorra,Europe,468,78115,3712000000
4,Angola,Africa,1246700,20609294,100990000000


In [14]:
def big_countries(world: pd.DataFrame) -> pd.DataFrame:
    return world.query('population >= 25000000 or area >= 3000000')[['name', 'population', 'area']]

big_countries(df)

Unnamed: 0,name,population,area
0,Afghanistan,25500100,3000000
1,Albania,25000000,28748


# 1757. Recyclable and Low Fat Products

Table: Products

| Column Name | Type    |
|:---:|:---:|
| product_id  | int     |
| low_fats    | enum    |
| recyclable  | enum    |

product_id is the primary key (column with unique values) for this table.

low_fats is an ENUM (category) of type ('Y', 'N') where 'Y' means this product is low fat and 'N' means it is not.

recyclable is an ENUM (category) of types ('Y', 'N') where 'Y' means this product is recyclable and 'N' means it is not.
 

Write a solution to find the ids of products that are both low fat and recyclable.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Products table:
| product_id  | low_fats | recyclable |
|:---:|:---:|:---:|
| 0           | Y        | N          |
| 1           | Y        | Y          |
| 2           | N        | Y          |
| 3           | Y        | Y          |
| 4           | N        | N          |

Output: 
| product_id  |
|:---:|
| 1           |
| 3           |

Explanation: Only products 1 and 3 are both low fat and recyclable.

In [15]:
import pandas as pd

data = {
    'product_id': [0, 1, 2, 3, 4],
    'low_fats': ['Y', 'Y', 'N', 'Y', 'N'],
    'recyclable': ['N', 'Y', 'Y', 'Y', 'N']
}

df = pd.DataFrame(data)

df

Unnamed: 0,product_id,low_fats,recyclable
0,0,Y,N
1,1,Y,Y
2,2,N,Y
3,3,Y,Y
4,4,N,N


In [21]:
def find_products(products: pd.DataFrame) -> pd.DataFrame:
    return products.query('low_fats == "Y" and recyclable == "Y"')[['product_id']]

find_products(df)

Unnamed: 0,product_id
1,1
3,3


# 183. Customers Who Never Order

Table: Customers

| Column Name | Type    |
|:---:|:---:|
| id          | int     |
| name        | varchar |

id is the primary key (column with unique values) for this table.
Each row of this table indicates the ID and name of a customer.
 

Table: Orders

| Column Name | Type |
|:---:|:---:|
| id          | int  |
| customerId  | int  |

id is the primary key (column with unique values) for this table.

customerId is a foreign key (reference columns) of the ID from the Customers table.

Each row of this table indicates the ID of an order and the ID of the customer who ordered it.
 

Write a solution to find all customers who never order anything.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Customers table:

| id | name  |
|:---:|:---:|
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |

Orders table:

| id | customerId |
|:---:|:---:|
| 1  | 3          |
| 2  | 1          |

Output: 

| Customers |
|:---:|
| Henry     |
| Max       |

In [25]:
import pandas as pd

# Creating the Customers DataFrame
customers_data = {
    'id': [1, 2, 3, 4],
    'name': ['Joe', 'Henry', 'Sam', 'Max']
}
customers_df = pd.DataFrame(customers_data)

# Creating the Orders DataFrame
orders_data = {
    'id': [1, 2],
    'customerId': [3, 1]
}
orders_df = pd.DataFrame(orders_data)

display(customers_df)
display(orders_df)

Unnamed: 0,id,name
0,1,Joe
1,2,Henry
2,3,Sam
3,4,Max


Unnamed: 0,id,customerId
0,1,3
1,2,1


In [42]:
def find_customers(customers: pd.DataFrame, orders: pd.DataFrame) -> pd.DataFrame:
    df_merged = pd.merge(customers, orders, how='left', left_on='id', right_on='customerId')
    df_merged['buys'] = ~df_merged['customerId'].isna()
    return df_merged.query('buys == False')[['name']].rename(columns={'name': 'Customers'})

find_customers(customers_df, orders_df)

Unnamed: 0,Customers
1,Henry
3,Max


# 1148. Article Views I

Table: Views

| Column Name   | Type    |
|:---:|:---:|
| article_id    | int     |
| author_id     | int     |
| viewer_id     | int     |
| view_date     | date    |

There is no primary key (column with unique values) for this table, the table may have duplicate rows.

Each row of this table indicates that some viewer viewed an article (written by some author) on some date. 

Note that equal author_id and viewer_id indicate the same person.
 

Write a solution to find all the authors that viewed at least one of their own articles.

Return the result table sorted by id in ascending order.

The result format is in the following example.

 

Example 1:

Input: 
Views table:

| article_id | author_id | viewer_id | view_date  |
|:---:|:---:|:---:|:---:|
| 1          | 3         | 5         | 2019-08-01 |
| 1          | 3         | 6         | 2019-08-02 |
| 2          | 7         | 7         | 2019-08-01 |
| 2          | 7         | 6         | 2019-08-02 |
| 4          | 7         | 1         | 2019-07-22 |
| 3          | 4         | 4         | 2019-07-21 |
| 3          | 4         | 4         | 2019-07-21 |

Output: 

| id   |
|:---:|
| 4    |
| 7    |

In [47]:
import pandas as pd

# Your table data as a multi-line string
table_data = """
1          | 3         | 5         | 2019-08-01 |
1          | 3         | 6         | 2019-08-02 |
2          | 7         | 7         | 2019-08-01 |
2          | 7         | 6         | 2019-08-02 |
4          | 7         | 1         | 2019-07-22 |
3          | 4         | 4         | 2019-07-21 |
3          | 4         | 4         | 2019-07-21 |
"""

# Convert the string to a list of dictionaries
rows = [dict(zip(['article_id', 'author_id', 'viewer_id', 'view_date'], map(str.strip, row.split('|')))) for row in table_data.split('\n')[1:-1]]

# Create the DataFrame
views = pd.DataFrame(rows)

# Display the DataFrame
views

Unnamed: 0,article_id,author_id,viewer_id,view_date
0,1,3,5,2019-08-01
1,1,3,6,2019-08-02
2,2,7,7,2019-08-01
3,2,7,6,2019-08-02
4,4,7,1,2019-07-22
5,3,4,4,2019-07-21
6,3,4,4,2019-07-21


In [53]:
def article_views(views: pd.DataFrame) -> pd.DataFrame:
    return views.query('author_id == viewer_id')\
                    .drop_duplicates(subset=['author_id', 'viewer_id'])[['author_id']]\
                        .rename(columns={'author_id': 'id'})\
                            .sort_values(by='id')

article_views(views)

Unnamed: 0,id
5,4
2,7


# 1683. Invalid Tweets

Table: Tweets

| Column Name    | Type    |
|:---:|:---:|
| tweet_id       | int     |
| content        | varchar |

tweet_id is the primary key (column with unique values) for this table.

This table contains all the tweets in a social media app.
 

Write a solution to find the IDs of the invalid tweets. The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Tweets table:

| tweet_id | content                          |
|:---:|:---:|
| 1        | Vote for Biden                   |
| 2        | Let us make America great again! |

Output: 

| tweet_id |
|:---:|
| 2        |

Explanation: 

Tweet 1 has length = 14. It is a valid tweet.

Tweet 2 has length = 32. It is an invalid tweet.


In [1]:
import pandas as pd

# Define the data
data = {
    'tweet_id': [1, 2],
    'content': ['Vote for Biden', 'Let us make America great again!']
}

# Create the DataFrame
tweets = pd.DataFrame(data)

# Display the DataFrame
tweets

Unnamed: 0,tweet_id,content
0,1,Vote for Biden
1,2,Let us make America great again!


In [6]:
def invalid_tweets(tweets: pd.DataFrame) -> pd.DataFrame:
    tweets['content_len'] = tweets['content'].apply(len)
    return tweets.query('content_len > 15')[['tweet_id']]

invalid_tweets(tweets)

Unnamed: 0,tweet_id
1,2


# 1873. Calculate Special Bonus

Table: Employees

| Column Name | Type    |
|:---:|:---:|
| employee_id | int     |
| name        | varchar |
| salary      | int     |

employee_id is the primary key (column with unique values) for this table.

Each row of this table indicates the employee ID, employee name, and salary.
 

Write a solution to calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee's name does not start with the character 'M'. The bonus of an employee is 0 otherwise.

Return the result table ordered by employee_id.

The result format is in the following example.

 

Example 1:

Input: 
Employees table:
| employee_id | name    | salary |
|:---:|:---:|:---:|
| 2           | Meir    | 3000   |
| 3           | Michael | 3800   |
| 7           | Addilyn | 7400   |
| 8           | Juan    | 6100   |
| 9           | Kannon  | 7700   |

Output: 
| employee_id | bonus |
|:---:|:---:|
| 2           | 0     |
| 3           | 0     |
| 7           | 7400  |
| 8           | 0     |
| 9           | 7700  |

Explanation: 

The employees with IDs 2 and 8 get 0 bonus because they have an even employee_id.

The employee with ID 3 gets 0 bonus because their name starts with 'M'.

The rest of the employees get a 100% bonus.


In [3]:
import pandas as pd

data = {
    'employee_id': [2, 3, 7, 8, 9],
    'name': ['Meir', 'Michael', 'Addilyn', 'Juan', 'Kannon'],
    'salary': [3000, 3800, 7400, 6100, 7700]
}

employees = pd.DataFrame(data)

employees

Unnamed: 0,employee_id,name,salary
0,2,Meir,3000
1,3,Michael,3800
2,7,Addilyn,7400
3,8,Juan,6100
4,9,Kannon,7700


In [22]:
def calculate_special_bonus(employees: pd.DataFrame) -> pd.DataFrame:
    employees.loc[(employees['employee_id'] % 2 != 0) & (employees['name'].str[0] != 'M'), 'is_suitable'] = True
    employees['bonus'] = employees['salary'] * employees['is_suitable'].fillna(0)
    return employees[['employee_id', 'bonus']].sort_values('employee_id')


calculate_special_bonus(employees)

Unnamed: 0,employee_id,bonus
0,2,0
1,3,0
2,7,7400
3,8,0
4,9,7700


# 1667. Fix Names in a Table

Table: Users

| Column Name    | Type    |
|:---:|:---:|
| user_id        | int     |
| name           | varchar |

user_id is the primary key (column with unique values) for this table.

This table contains the ID and the name of the user. The name consists of only lowercase and uppercase characters.
 

Write a solution to fix the names so that only the first character is uppercase and the rest are lowercase.

Return the result table ordered by user_id.

The result format is in the following example.

 

Example 1:

Input: 

Users table:

| user_id | name  |
|:---:|:---:|
| 1       | aLice |
| 2       | bOB   |

Output: 

| user_id | name  |
|:---:|:---:|
| 1       | Alice |
| 2       | Bob   |


In [1]:
import pandas as pd

# Step 2: Create a list of dictionaries representing the table
data = [
    {'user_id': 1, 'name': 'aLice'},
    {'user_id': 2, 'name': 'bOB'}
]

# Step 3: Convert the list into a Pandas DataFrame
users = pd.DataFrame(data)

users

Unnamed: 0,user_id,name
0,1,aLice
1,2,bOB


In [5]:
def fix_names(users: pd.DataFrame) -> pd.DataFrame:
    users['name'] = users['name'].str.capitalize()
    return users.sort_values('user_id')

fix_names(users)

Unnamed: 0,user_id,name
0,1,Alice
1,2,Bob


# 1517. Find Users With Valid E-Mails

Table: Users

| Column Name   | Type    |
|:---:|:---:|
| user_id       | int     |
| name          | varchar |
| mail          | varchar |

user_id is the primary key (column with unique values) for this table.

This table contains information of the users signed up in a website. Some e-mails are invalid.
 

Write a solution to find the users who have valid emails.

A valid e-mail has a prefix name and a domain where:


The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_', period '.', and/or dash '-'. The prefix name must start with a letter.

The domain is '@leetcode.com'.

Return the result table in any order.

The result format is in the following example.

 

Example 1:

Input: 
Users table:

| user_id | name      | mail                    |
|:---:|:---:|:---:|
| 1       | Winston   | winston@leetcode.com    |
| 2       | Jonathan  | jonathanisgreat         |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |
| 5       | Marwan    | quarz#2020@leetcode.com |
| 6       | David     | david69@gmail.com       |
| 7       | Shapiro   | .shapo@leetcode.com     |

Output: 
| user_id | name      | mail                    |
|:---:|:---:|:---:|
| 1       | Winston   | winston@leetcode.com    |
| 3       | Annabelle | bella-@leetcode.com     |
| 4       | Sally     | sally.come@leetcode.com |

Explanation: 

The mail of user 2 does not have a domain.

The mail of user 5 has the # sign which is not allowed.

The mail of user 6 does not have the leetcode domain.

The mail of user 7 starts with a period.


In [6]:
import pandas as pd

data = {
    'user_id': [1, 2, 3, 4, 5, 6, 7],
    'name': ['Winston', 'Jonathan', 'Annabelle', 'Sally', 'Marwan', 'David', 'Shapiro'],
    'mail': [
        'winston@leetcode.com',
        'jonathanisgreat',
        'bella-@leetcode.com',
        'sally.come@leetcode.com',
        'quarz#2020@leetcode.com',
        'david69@gmail.com',
        '.shapo@leetcode.com'
    ]
}

users = pd.DataFrame(data)

users

Unnamed: 0,user_id,name,mail
0,1,Winston,winston@leetcode.com
1,2,Jonathan,jonathanisgreat
2,3,Annabelle,bella-@leetcode.com
3,4,Sally,sally.come@leetcode.com
4,5,Marwan,quarz#2020@leetcode.com
5,6,David,david69@gmail.com
6,7,Shapiro,.shapo@leetcode.com


In [None]:
def valid_emails(users: pd.DataFrame) -> pd.DataFrame: