#  Step 2 : Check if billing address is set as default

We can assume that there is no need to add billing address and shipping address, if there is no dupe in the id/names such as tel, postcode.<br>
<br>
→ If you find "tel" twice + "zipcode" or "postal-code" twice in the same column, there maybe double place to fill in for shipping address and billing address.<br>

In [1]:
import pandas as pd
from bs4 import BeautifulSoup

In [51]:
# Add part of the definition made on step 1

def parse_html_inputs(filename):
    with open(filename, 'r') as f:
        html = f.read()
    soup = BeautifulSoup(html, 'html.parser')
    input_elements = soup.find_all('input', {'type': ['text', 'email', 'tel']})
    labels = soup.find_all('label',{'class':'ot-scrn-rdr'})

    # Create a list of dictionaries for each input element
    inputs_list = []
    for input_element in input_elements:
        inputs_list.append(input_element.attrs)

    # Create a DataFrame from the list of input elements
    df = pd.DataFrame(inputs_list)

    # Delete the search input(Special case 1)
    for index, row in df.iterrows():
        if any("search" in str(col).lower() for col in row):
            df.drop(index, inplace=True)

    #Delete the 'class':'ot-scrn-rdr'(Special case 2)
    for label in labels:
        input_row = df[df['id'] == label.get('for')]
        if not input_row.empty:
            for attr, value in label.attrs.items():
                df.loc[input_row.index, attr] = value
    for index, row in df.iterrows():
        if any("ot-scrn-rdr" in str(col).lower() for col in row):
            df.drop(index, inplace=True)    
    
    #Delete hidden(Special case 3)
    if 'class' in df.columns and 'aria-label' in df.columns:
        # Remove rows where 'class' contains 'visually-hidden'
        df = df[~df['class'].apply(lambda x: 'visually-hidden' in x if isinstance(x, list) else False)]
    
        # Remove rows where 'aria-label' equals 'PIN'
        df = df[df['aria-label'] != "PIN"]

        # Remove rows containing 'yotpo' in any column
        indices_to_drop = []
        for index, row in df.iterrows():
            if any("yotpo" in str(col).lower() for col in row):
                indices_to_drop.append(index)
        df = df.drop(indices_to_drop)
            
    return df# added only for step2


#     if len(df) > 8:
#         print("The input form is too long")
#         print(df)
#     else:
#         print("The input form looks good")
#         print(df)

--- Check Muji JP---

In [52]:
df = parse_html_inputs("無印良品.html")

You can't find any dupe info here, so we can assume that there is no need to add shipping address again, and it's already set as default.<br>
Website wise, it was also set as default.


In [53]:
input_error = False
billing_address_message_printed = False

# Check each column for duplicates
for col in df.columns:
    if any(df[col].duplicated()) and (col.lower().count('tel') > 1 or col.lower().count("phone") > 1) and (col.lower().count('zipcode') > 1 or col.lower().count('postal-code') > 1):
        if not billing_address_message_printed:
            print("You need to fill the billing address")
            print(df)
            billing_address_message_printed = True
        input_error = True

# This check happens after all rows have been checked.
if not input_error:
    print("Your input for billing address/shipping address looks great")
    print(df)



Your input for billing address/shipping address looks great
                          id          placeholder maxlength sectionbtnfocus  \
1         customerInfo.email  abcdef1234@muji.com       NaN             NaN   
2      customerInfo.fullName                無印　太郎       NaN             NaN   
3  customerInfo.fullNameKana             ムジルシ　タロウ       NaN             NaN   
4       customerInfo.zipCode              1708424         8             NaN   
5      customerInfo.address1               東京都豊島区       NaN             NaN   
6      customerInfo.address2                  東池袋       NaN             NaN   
7      customerInfo.address3               ４ー２６ー３       NaN             NaN   
8      customerInfo.address4                  NaN       NaN             NaN   
9         customerInfo.telNo           0339894191        13             NaN   

  inputbackgroundcolor                                              class  \
1                  NaN  [ant-input, textField__Input-sc-17ccvkc-1, iSh..

---Check ZARA US---

In [54]:
df = parse_html_inputs("ZARA US.html")

In [55]:
input_error = False
billing_address_message_printed = False

# Check each column for duplicates
for col in df.columns:
    if any(df[col].duplicated()) and (col.lower().count('tel') > 1 or col.lower().count("phone") > 1) and (col.lower().count('zipcode') > 1 or col.lower().count('postal-code') > 1):
        if not billing_address_message_printed:
            print("You need to fill the billing address")
            print(df)
            billing_address_message_printed = True
        input_error = True

# This check happens after all rows have been checked.
if not input_error:
    print("Your input for billing address/shipping address looks great")
    print(df)


Your input for billing address/shipping address looks great
                                               class  \
0  [form-input-label__input, form-input-text__input]   
1  [form-input-label__input, form-input-text__input]   
2  [form-input-label__input, form-input-text__input]   
3  [form-input-label__input, form-input-text__input]   
4  [form-input-label__input, form-input-text__input]   
5  [form-input-label__input, form-input-text__input]   
6  [form-input-label__input, form-input-text__input]   
7  [form-input-label__input, form-input-text__input]   

                         name                        placeholder  type  \
0                   firstName                                     text   
1                    lastName                                     text   
2             addressLines[1]  Apartment, Suite, Building floor…  text   
3                     zipCode                                     text   
4                        city                                    

I saw on the website that the you can select whether to send to home address, or pick up at store in the next page(Not the HTML we parsed here). But it seemed that there is no need to set the address again.

---Check Muji US---

In [56]:
df = parse_html_inputs("Muji US.html")

In [57]:
input_error = False
billing_address_message_printed = False

# Check each column for duplicates
for col in df.columns:
    if any(df[col].duplicated()) and (col.lower().count('tel') > 1 or col.lower().count("phone") > 1) and (col.lower().count('zipcode') > 1 or col.lower().count('postal-code') > 1):
        if not billing_address_message_printed:
            print("You need to fill the billing address")
            print(df)
            billing_address_message_printed = True
        input_error = True

# This check happens after all rows have been checked.
if not input_error:
    print("Your input for billing address/shipping address looks great")
    print(df)


Your input for billing address/shipping address looks great
                                          placeholder autocapitalize  \
0                                               Email            off   
11                                         First name            NaN   
12                                          Last name            NaN   
13                                 Company (optional)            NaN   
14  Put a hyphen between the unit/suite/apartment ...            NaN   
15         Enter door codes or drop-off instructions.            NaN   
16                                               City            NaN   
17                                           ZIP code            NaN   
18                                              Phone            NaN   
19                                Mobile phone number            off   
20                         Gift card or discount code            NaN   

   spellcheck             autocomplete data-shopify-pay-handle data-autofoc