## Stats_Access_Link 

---



Assume any database includes below columns and you are requested to process Stats_Access_Link column and extract pure url information inside per device type. 

Rules: 
-   Xml tags and protocol parts is guaranteed to be lower case  
-   Access link part that we are interested in can have alpha-numeric, case insensitive characters, underscore ( _ ) character and dot ( . ) character only.  

What would you use for this task, please write your detailed answer with exact solution? Please  provide the link to your code as answer to this question 

Example: for the device type AXO145, we would like to get xcd32112.smart_meter.com regardless from its access protocol is SSL secured or not.

In [3]:
# import the necessary libraries
import pandas as pd
import re

# create a DataFrame 
data = {
    'Device_Type': ['AXO145', 'BZO234', 'AXO145', 'CZO456', 'BZO234'],
    'Stats_Access_Link': [
        '<xml>https://abcd1234.smart_meter.com/path/to/resource</xml>',
        'http://efgh5678.smart_meter.com/path/to/resource',
        'https://ijkl9012.smart_meter.com/path/to/resource',
        '<xml>http://mnop3456.smart_meter.com/path/to/resource</xml>',
        'https://qrst7890.smart_meter.com/path/to/resource']
}
df = pd.DataFrame(data)


In the next step, we define a regular expression pattern:

In [9]:
pattern = re.compile(r'(https?://[a-z0-9_.]+\.smart_meter\.com)', re.IGNORECASE)

# The "re.IGNORECASE" makes the pattern case-insensitive via the pattern that https?://[a-z0-9_.]+\.smart_meter\.com

Next, we define a function that takes a device type and returns the corresponding pure URL information:

In [5]:
def extract_url(device_type):
    urls = []
    for link in df.loc[df['Device_Type'] == device_type, 'Stats_Access_Link']:
        match = pattern.search(link)
        if match:
            urls.append(match.group(1))
    return urls


This function uses boolean indexing to select DataFrame rows corresponding to the given device type. It then looks for a regular expression pattern match for each access link in those lines. If a match is found, it adds the matching substring to a list. Finally, it returns the list of URLs.

In [11]:
urls = extract_url('AXO145')
print(urls) 

['https://abcd1234.smart_meter.com', 'https://ijkl9012.smart_meter.com']
