- Python has a function ("str.contains()") that has Excel's filter function.
- As in the 'product_name' column of the sample data,  
  there are cases in which two types of information (product type and country of origin) are mixed in one column.
- In this situation, if it is necessary to classify the data with only the product type,  
  use the str.contains() function to clean the data.
- Cleaning methods include a method of dividing data by product type  
  and a method of creating a product type classification column in the entire data frame.

In [16]:
import pandas as pd
import numpy as np

In [17]:
df = pd.read_excel("data/sample_data_filter.xlsx")
print(df.shape)
df

(17, 2)


Unnamed: 0,product_name,sales
0,banana_chile,2
1,banana_chile,3
2,"apple, Australia",1
3,"apple, Australia",2
4,Bananas from the Philippines,1
5,apple,3
6,apple,3
7,Bananas from the Philippines,5
8,"banana, india",1
9,"banana, india",5


### Method 1. Dividing data by product type

In [19]:
df_banana = df[df['product_name'].str.contains('banana', case=False)]

print(df_banana.shape)
df_banana

(8, 2)


Unnamed: 0,product_name,sales
0,banana_chile,2
1,banana_chile,3
4,Bananas from the Philippines,1
7,Bananas from the Philippines,5
8,"banana, india",1
9,"banana, india",5
10,banana_chile,4
11,banana_chile,1


In [20]:
df_apple = df[df['product_name'].str.contains('apple', case=False)]

print(df_apple.shape)
df_apple

(9, 2)


Unnamed: 0,product_name,sales
2,"apple, Australia",1
3,"apple, Australia",2
5,apple,3
6,apple,3
12,apple,4
13,apple,4
14,apple,5
15,"apple, Australia",1
16,"apple, Australia",2


### Method 2. Creating a product type classification column in the entire data frame

In [21]:
df['product_name_L'] = np.where(df['product_name'].str.contains('banana', case=False), 'banana', np.nan)
df['product_name_L'] = np.where(df['product_name'].str.contains('apple', case=False), 'apple', df['product_name_L'])

In [22]:
print(df.shape)
df

(17, 3)


Unnamed: 0,product_name,sales,product_name_L
0,banana_chile,2,banana
1,banana_chile,3,banana
2,"apple, Australia",1,apple
3,"apple, Australia",2,apple
4,Bananas from the Philippines,1,banana
5,apple,3,apple
6,apple,3,apple
7,Bananas from the Philippines,5,banana
8,"banana, india",1,banana
9,"banana, india",5,banana
