## Challenge 402: Parsing Equipment Tags

Link: https://community.alteryx.com/t5/Weekly-Challenges/Challenge-402-Parsing-Equipment-Tags/td-p/1213271

Your task this week is to parse these concatenated equipment tags, making life smoother for everyone involved. Ensure the new tags start from the first letter and end with the last letter sequentially.

 

Create an Alteryx workflow to sequentially generate new equipment tags based on the provided dataset. For instance, if you have P-101A/E, the output should be P-101A, P-101B, P-101C, P-101D, and P-101E.

### Prepare data

In [23]:
import pandas as pd
import numpy as np
import re

In [24]:
df = pd.read_csv("tags.csv")

In [25]:
df

Unnamed: 0,tag,Service,WorkingTemp,WorkingPressure
0,P101A/B/C,Oil,35,2000
1,P203A-D,Gas,41,3000
2,P401A/G,N2,10,2100
3,T101B,FW,40,1500
4,B301,CW,23,2400


In [26]:
# Get the first four character from the tag column
df['left_four'] = df['tag'].str.extract(r'(\w{1}\d{3}).*')

In [27]:
df

Unnamed: 0,tag,Service,WorkingTemp,WorkingPressure,left_four
0,P101A/B/C,Oil,35,2000,P101
1,P203A-D,Gas,41,3000,P203
2,P401A/G,N2,10,2100,P401
3,T101B,FW,40,1500,T101
4,B301,CW,23,2400,B301


In [28]:
# Get all characters after the first 4 characters in the tag column
df['tag_name'] = df['tag'].str.extract(r'\w{1}\d{3}(.*)')

In [29]:
df

Unnamed: 0,tag,Service,WorkingTemp,WorkingPressure,left_four,tag_name
0,P101A/B/C,Oil,35,2000,P101,A/B/C
1,P203A-D,Gas,41,3000,P203,A-D
2,P401A/G,N2,10,2100,P401,A/G
3,T101B,FW,40,1500,T101,B
4,B301,CW,23,2400,B301,


In [30]:
# Write a function to scaffold the missing character
def filling_missing_char(txt):
    # if the tag name value is null, then return null
    if pd.isna(txt):
        return np.nan()
    # if the tag name contains "-" and the length of value is 3
    if '-' in txt and len(txt) == 3:
        #then, split the string with delimiter is "-"
        start, end = txt.split('-')
        # return a list of character in range of first character to the last character in ASCII code
        return [chr(i) for i in range(ord(start), ord(end)+1)]
    # Do the same as above with different delimiter
    if '/' in txt and len(txt) ==3:
        start,end = txt.split('/')
        return [chr(i) for i in range(ord(start),ord(end)+1)]
    # for other cases, only split the character by delimiter "/"
    else:
        return txt.split('/')


In [31]:
# Create a new column called split_char to store all the list of characters
df['split_char'] = df['tag_name'].apply(filling_missing_char)

In [32]:
df

Unnamed: 0,tag,Service,WorkingTemp,WorkingPressure,left_four,tag_name,split_char
0,P101A/B/C,Oil,35,2000,P101,A/B/C,"[A, B, C]"
1,P203A-D,Gas,41,3000,P203,A-D,"[A, B, C, D]"
2,P401A/G,N2,10,2100,P401,A/G,"[A, B, C, D, E, F, G]"
3,T101B,FW,40,1500,T101,B,[B]
4,B301,CW,23,2400,B301,,[]


In [33]:
# Use the explode function to split each character in the list into a new row
df = df.explode('split_char', ignore_index=True)

In [34]:
df

Unnamed: 0,tag,Service,WorkingTemp,WorkingPressure,left_four,tag_name,split_char
0,P101A/B/C,Oil,35,2000,P101,A/B/C,A
1,P101A/B/C,Oil,35,2000,P101,A/B/C,B
2,P101A/B/C,Oil,35,2000,P101,A/B/C,C
3,P203A-D,Gas,41,3000,P203,A-D,A
4,P203A-D,Gas,41,3000,P203,A-D,B
5,P203A-D,Gas,41,3000,P203,A-D,C
6,P203A-D,Gas,41,3000,P203,A-D,D
7,P401A/G,N2,10,2100,P401,A/G,A
8,P401A/G,N2,10,2100,P401,A/G,B
9,P401A/G,N2,10,2100,P401,A/G,C


In [35]:
# Concatnate the left 4 characters with the split character above
df['new_tag'] = df['left_four']+df['split_char']

In [36]:
# Drop some unecessary columns
df_result = df.drop(columns=['left_four','tag_name','split_char'],axis=1)

In [37]:
df_result

Unnamed: 0,tag,Service,WorkingTemp,WorkingPressure,new_tag
0,P101A/B/C,Oil,35,2000,P101A
1,P101A/B/C,Oil,35,2000,P101B
2,P101A/B/C,Oil,35,2000,P101C
3,P203A-D,Gas,41,3000,P203A
4,P203A-D,Gas,41,3000,P203B
5,P203A-D,Gas,41,3000,P203C
6,P203A-D,Gas,41,3000,P203D
7,P401A/G,N2,10,2100,P401A
8,P401A/G,N2,10,2100,P401B
9,P401A/G,N2,10,2100,P401C
