- There are cases where it is necessary to check whether there is an error in the 'code' column,  
  which must be expressed with a certain number of digits.
- You can simply check for errors by counting the number of digits using Python's "len()" function.
- Note that when calculating the number of digits of a numeric type,  
  you must first convert the data type to string and then apply the len() function.

- The sample data assumes that there is a code column of string type and a code column of number type,  
  and the number of digits must all be 5 digits.
- Use the len() function to find data where the number of digits in the code is not 5.

In [17]:
import pandas as pd

In [18]:
df = pd.read_excel("data/sample_data_code.xlsx")
print(df.shape)
df.head()

(19, 2)


Unnamed: 0,code_str,code_num
0,AB123,12345
1,AB124,12346
2,AB125,12347
3,AB126,12348
4,AB127,12349


In [19]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19 entries, 0 to 18
Data columns (total 2 columns):
 #   Column    Non-Null Count  Dtype 
---  ------    --------------  ----- 
 0   code_str  19 non-null     object
 1   code_num  19 non-null     int64 
dtypes: int64(1), object(1)
memory usage: 432.0+ bytes


In [20]:
df['code_str_length'] = df.code_str.apply(lambda x : len(x))
df['code_num_length'] = df.code_num.apply(lambda x : len(str(x)))

In [21]:
print(df.shape)
df.head()

(19, 4)


Unnamed: 0,code_str,code_num,code_str_length,code_num_length
0,AB123,12345,5,5
1,AB124,12346,5,5
2,AB125,12347,5,5
3,AB126,12348,5,5
4,AB127,12349,5,5


- Columns representing the number of digits ('code_str_length', 'code_num_length') were created.
- If you use these columns to search for cases where the number of digits is not 5,  
  you can easily find the error as shown below.

In [22]:
df[(df['code_str_length'] != 5)|(df['code_str_length'] != 5)]

Unnamed: 0,code_str,code_num,code_str_length,code_num_length
18,AB14,1236,4,4
