In [1]:
import pandas as pd

## Invalid Tweets

In [2]:
tweet_dict = {
     'tweet_id': [1, 2],
     'content':['Vote for Biden', 'Let us make America great again!'] 
}

In [3]:
tweets = pd.DataFrame(tweet_dict)

tweets

Unnamed: 0,tweet_id,content
0,1,Vote for Biden
1,2,Let us make America great again!


Write a solution to find the IDs of the invalid tweets. 

The tweet is invalid if the number of characters used in the content of the tweet is strictly greater than 15.

<span style="font-family:Comic Sans MS; color:red">The method str.len() computes the length of each element in a Series, **returning a series** of the same length where each value is the number of characters in the corresponding element.</span>

In [4]:
print(tweets['content'].str.len())

print(type(tweets['content'].str.len()))

0    14
1    32
Name: content, dtype: int64
<class 'pandas.core.series.Series'>


In [5]:
tweets[['tweet_id']][(tweets['content'].str.len())>15]

Unnamed: 0,tweet_id
1,2


## Fix Names in a Table

In [6]:
users_dict = {
     'user_id': [1, 2, 222], 
     'name': ['aLice', 'bOB', 'MaRRy aNN']   
}

users = pd.DataFrame(users_dict)
users

Unnamed: 0,user_id,name
0,1,aLice
1,2,bOB
2,222,MaRRy aNN


Write a solution to fix the names so that only the first character is uppercase and the rest are lowercase.

Return the result table ordered by user_id.

In [7]:
users['name'] = users['name'].str.title()

users
# One of the test case failed Actual Output is Marry Ann whereas Expected is Marry ann 

Unnamed: 0,user_id,name
0,1,Alice
1,2,Bob
2,222,Marry Ann


In [8]:
name = 'MaRRy aNN'
name[0].upper() + name[1:].lower()


'Marry ann'

In [9]:
# This Logic worked
users['name'] = users['name'].apply(lambda x: x[0].upper() + x[1:].lower())

users

Unnamed: 0,user_id,name
0,1,Alice
1,2,Bob
2,222,Marry ann


## Find Users With Valid E-Mails

In [10]:
users_dict = {
     'user_id': [1, 2, 3, 4, 5, 6, 7],
     'name': ['Winston', 'Jonathan', 'Annabelle', 'Sally', 'Marwan', 'David', 'Shapiro'],
     'mail': ['winston@leetcode.com', 'jonathanisgreat', 'bella-@leetcode.com', 'sally.come@leetcode.com', 'quarz#2020@leetcode.com', 'david69@gmail.com', '.shapo@leetcode.com']     
}

users = pd.DataFrame(users_dict)
users

Unnamed: 0,user_id,name,mail
0,1,Winston,winston@leetcode.com
1,2,Jonathan,jonathanisgreat
2,3,Annabelle,bella-@leetcode.com
3,4,Sally,sally.come@leetcode.com
4,5,Marwan,quarz#2020@leetcode.com
5,6,David,david69@gmail.com
6,7,Shapiro,.shapo@leetcode.com


Write a solution to find the users who have valid emails.

A valid e-mail has a prefix name and a domain where:

The prefix name is a string that may contain letters (upper or lower case), digits, underscore '_', period '.', and/or dash '-'. The prefix name must start with a letter.
The domain is '@leetcode.com'.

In [11]:
import re

x='.shapo@leetcode.com'

output = re.search('^[a-zA-Z][a-zA-Z0-9_./-]*@leetcode\.com$', x)
print(bool(output))

False


In [12]:
users['temp'] = users['mail'].apply(lambda x: bool(re.search('^[a-zA-Z][a-zA-Z0-9_./-]*@leetcode\.com$', x)))

users[users.temp == True][['user_id', 'name', 'mail']]

Unnamed: 0,user_id,name,mail
0,1,Winston,winston@leetcode.com
2,3,Annabelle,bella-@leetcode.com
3,4,Sally,sally.come@leetcode.com


Use str.contains instead of apply with lambda: The str.contains method is optimized for working with pandas Series and can directly apply the regex pattern to each element in the column. This will be faster than using apply with a lambda function.

str.contains: This method applies the regex pattern to the entire 'mail' column, returning a boolean Series where each element is True if the corresponding email matches the pattern and False otherwise.

In [13]:
# Improved performance and Readability
pattern = '^[a-zA-Z][a-zA-Z0-9_./-]*@leetcode\.com$'

mask = users['mail'].str.contains(pattern, regex=True)

print(mask)
users.loc[mask]

# Single line code: users.loc[users['mail'].str.contains(pattern, regex=True)]

0     True
1    False
2     True
3     True
4    False
5    False
6    False
Name: mail, dtype: bool


Unnamed: 0,user_id,name,mail,temp
0,1,Winston,winston@leetcode.com,True
2,3,Annabelle,bella-@leetcode.com,True
3,4,Sally,sally.come@leetcode.com,True


## Patients with a condition

In [14]:
patients_dict = {
     'patient_id': [1,2,3,4,5,6], 
     'patient_name': ['Daniel', 'Alice', 'Bob', 'George', 'Alain', 'Winston'],
     'conditions': ['YFEV COUGH', '', 'DIAB100 MYOP', 'ACNE DIAB100', 'DIAB201', 'SADIAB100']      
}

patients = pd.DataFrame(patients_dict)
patients

Unnamed: 0,patient_id,patient_name,conditions
0,1,Daniel,YFEV COUGH
1,2,Alice,
2,3,Bob,DIAB100 MYOP
3,4,George,ACNE DIAB100
4,5,Alain,DIAB201
5,6,Winston,SADIAB100


Write a solution to find the patient_id, patient_name, and conditions of the patients who have Type I Diabetes. Type I Diabetes always starts with DIAB1 prefix.


In [15]:
patients[patients['conditions'].str.contains('^DIAB1..| DIAB1..', regex=True)]

Unnamed: 0,patient_id,patient_name,conditions
2,3,Bob,DIAB100 MYOP
3,4,George,ACNE DIAB100


## Calculate Special Bonus

In [16]:
employee_dict = {
     'employee_id': [2,3,7,8,9],
     'name': ['Meir', 'Michael', 'Addilyn', 'Juan', 'Kannon'],
     'salary': [3000, 3800, 7400, 6100, 7700]
}

employee = pd.DataFrame(employee_dict)
employee

Unnamed: 0,employee_id,name,salary
0,2,Meir,3000
1,3,Michael,3800
2,7,Addilyn,7400
3,8,Juan,6100
4,9,Kannon,7700


Write a solution to calculate the bonus of each employee. The bonus of an employee is 100% of their salary if the ID of the employee is an odd number and the employee's name does not start with the character 'M'. The bonus of an employee is 0 otherwise.

Return the result table ordered by employee_id.


In [17]:
employee['bonus']=employee.apply(lambda row : row['salary'] if not bool(re.search('^m', row['name'], re.IGNORECASE)) and row['employee_id']%2==1 else 0, axis=1)

employee[['employee_id','bonus']].sort_values(by='employee_id')

Unnamed: 0,employee_id,bonus
0,2,0
1,3,0
2,7,7400
3,8,0
4,9,7700


In [18]:
employee.apply(lambda row: [row['employee_id'], row['salary'] if not bool(re.search('^m', row['name'], re.IGNORECASE)) and row['employee_id'] % 2 == 1 else 0], axis=1, result_type='expand')

Unnamed: 0,0,1
0,2,0
1,3,0
2,7,7400
3,8,0
4,9,7700
