# Subdomain : Regex and Parsing

### Problem 1 : Detect Floating Point Number
* **You are given a string N.**
* **Your task is to verify that N is a floating point number.**
* **In this task, a valid float number must satisfy all of the following requirements:**
* **Number can start with +, - or . symbol.**


In [1]:
import re
for i in range(int(input())):
    print(bool(re.search(r"^[+-]?[0-9]*\.[0-9]+$", input())))

4
4.000
True
-1.00
True
+4.54
True
SomeRandomStuff
False


## Problem 2 : Re.split()
* **You are given a string s consisting only of digits 0-9, commas ,, and dots .**
* **Your task is to complete the regex_pattern defined below, which will be used to re.split() all of the , and . symbols in s.**
* **It’s guaranteed that every comma and every dot in s is preceeded and followed by a digit.**

In [2]:
regex_pattern = r"[.,]+" # Do not delete 'r'.

import re
print("\n".join(re.split(regex_pattern, input())))

100,000,000.000
100
000
000
000


## Problem 3 : Group(), Groups() & Groupdict()
* **You are given a string S.**
* **Your task is to find the first occurrence of an alphanumeric character in S (read from left to right) that has consecutive repetitions.**

In [3]:
import re
string = input()
m = re.search(r'([a-zA-Z0-9])\1',string)
if m:
    print(m.group(1))
else:
    print(-1)

..12345678910111213141516171820212223
1


## Problem 4 : Re.findall() & Re.finditer()
* **You are given a string S. It consists of alphanumeric characters, spaces and symbols(+,-).**
* **Your task is to find all the substrings of S that contains 2 or more vowels.**
* **Also, these substrings must lie in between 2 consonants and should contain vowels only.**

In [4]:
import re
string = input()
m = re.escape(string) # for deleting characters,spaces & symbols
vowels = ' aeiou'
consonants = 'qwrtypsdfghjklzxcvbnm'
match = re.findall(r'(?<=[' + consonants + '])([' + vowels + ']{2,})(?=[' + consonants + '])',m,flags=re.I)

if len(match) > 0:
    print("\n".join(match))
else:
    print(-1)

rabcdeefgyYhFjkIoomnpOeorteeeeet
ee
Ioo
Oeo
eeeee


## Problem 5 : Re.start() & Re.end()
* **You are given a string S.**
* **Your task is to find the indices of the start and end of string k in S.**

In [5]:
import re

S = input()
k = input()

pattern = re.compile(k)
match = pattern.search(S)
if not match: print('(-1, -1)')
while match:
    print('({0}, {1})'.format(match.start(), match.end() - 1))
    match = pattern.search(S, match.start() + 1)

aaadaa
aa
(0, 1)
(1, 2)
(4, 5)


## Problem 6 : Regex Substitution
* **You are given a text of N lines. The text contains && and || symbols.**
* **Your task is to modify those symbols to the following:**

    **&& → and**
    **|| → or**
* **Both && and || should have a space " " on both sides.**

In [6]:
import re

def change(match):
    if match.group(1) == '&&':
        return 'and'
    else:
        return 'or'

for i in range(int(input())):
    print(re.sub(r"(?<= )(\|\||&&)(?= )", change,input()))

11
a = 1;
a = 1;
b = input();
b = input();


if a + b > 0 && a - b < 0:
if a + b > 0 and a - b < 0:
    start()
    start()
elif a*b > 10 || a/b < 1:
elif a*b > 10 or a/b < 1:
    stop()
    stop()
print set(list(a)) | set(list(b)) 
print set(list(a)) | set(list(b)) 
#Note do not change &&& or ||| or & or |
#Note do not change &&& or ||| or & or |
#Only change those '&&' which have space on both sides.
#Only change those '&&' which have space on both sides.
#Only change those '|| which have space on both sides.
#Only change those '|| which have space on both sides.


## Problem 7 : Validating Roman Numerals
* **You are given a string, and you have to validate whether it's a valid Roman numeral. If it is valid, print True. Otherwise, print False. Try to create a regular expression for a valid Roman numeral.**

In [9]:
Thousand = 'M{0,3}'
Hundred = '(C[MD]|D?C{0,3})'
Ten = '(X[CL]|L?X{0,3})'
Digit = '(I[VX]|V?I{0,3})'
regex_pattern = r"%s%s%s%s$" % (Thousand, Hundred, Ten, Digit) # Do not delete 'r'.

import re
print(str(bool(re.match(regex_pattern, input()))))

CDXXI
True


## Problem 8 : Validating phone numbers
* **Let's dive into the interesting topic of regular expressions! You are given some input, and you are required to check whether they are valid mobile numbers.**
* **A valid mobile number is a ten digit number starting with a 7 8 or 9.**

In [10]:
import re

N = int(input())

for i in range(N):
    if re.match(r'^[789]\d{9}$',input()):
        print("YES")
    else:
        print("NO")

2
9587456281
YES
1252478965
NO


## Problem 9 : Validating and Parsing Email Addresses
* **A valid email address meets the following criteria:**
    * **It's composed of a username, domain name, and extension assembled in this format: username@domain.extension**
    * **The username starts with an English alphabetical character, and any subsequent characters consist of one or more of the following: alphanumeric characters, -,., and _.**
    * **The domain and extension contain only English alphabetical characters.**
    * **The extension is 1, 2, or 3 characters in length.**
* **Given n pairs of names and email addresses as input, print each name and email address pair having a valid email address on a new line.**

In [11]:
import email.utils
import re

n = int(input())
for _ in range(n):
    s = input()
    parsed_email = email.utils.parseaddr(s)[1].strip()
    match_result = bool(re.match(r"(^[A-Za-z][A-Za-z0-9\._-]+)@([A-Za-z]+)\.([A-Za-z]{1,3})$", parsed_email))
    
    if match_result:
        print(s)

2
DEXTER <dexter@hotmail.com>
DEXTER <dexter@hotmail.com>
VIRUS <virus!@variable.:p>


## Problem 10 : Hex Color Code
* **CSS colors are defined using a hexadecimal (HEX) notation for the combination of Red, Green, and Blue color values (RGB).**
* **Specifications of HEX Color Code**
    * **It must start with a '#' symbol.**
    * **It can have 3 or 6 digits.**
    * **Each digit is in the range of 0 to F. (1,2,3,4,5,6,7,8,9,0,A,B,C,D,E and F).**
    * **A - F letters can be lower case.**

In [20]:
import re

n = int(input())
for _ in range(n):
    s = input()
    match_result = re.findall(r"(#[0-9A-Fa-f]{3}|#[0-9A-Fa-f]{6})(?:[;,.)]{1})", s)
    for i in match_result:
        if i != "":
            print(i)

11
#BED
    color: #FfFdF8; background-color:#aef;
#FfFdF8
#aef
    font-size: 123px;
    background: -webkit-linear-gradient(top, #f9f9f9, #fff);
#f9f9f9
#fff
}
#Cab
{
    background-color: #ABC;
#ABC
    border: 2px dashed #fff;
#fff
}
}


## Problem 11 : HTML Parser - Part 1
* **You are given an HTML code snippet of N lines.**
* **Your task is to print start tags, end tags and empty tags separately.**

In [14]:
from html.parser import HTMLParser

class MyParser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print("Start :", tag)
        for n, m in attrs:
            print("->", n, ">", m)

    def handle_startendtag(self, tag, attrs):
        print("Empty :", tag)
        for n, m in attrs:
            print("->", n, ">", m)

    def handle_endtag(self, tag):
        print("End   :", tag)

parser = MyParser()
for i in range(int(input())):
    parser.feed(input())

2
<html><head><title>HTML Parser - I</title></head>
Start : html
Start : head
Start : title
End   : title
End   : head
<body data-modal-target class='1'><h1>HackerRank</h1><br /></body></html>
Start : body
-> data-modal-target > None
-> class > 1
Start : h1
End   : h1
Empty : br
End   : body
End   : html


## Problem 12 : HTML Parser - Part 2
* **You are given an HTML code snippet of N lines.**
* **Your task is to print the single-line comments, multi-line comments and the data.**

In [15]:
from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def handle_comment(self, data):
        if "\n" in data:
            print(">>> Multi-line Comment")
            print(data)
        else:
            print(">>> Single-line Comment")
            print(data)
    def handle_data(self, data):
        if data != "\n":
            print(">>> Data")
            print(data)

html = ""       
for i in range(int(input())):
    html += input().rstrip()
    html += '\n'
    
parser = MyHTMLParser()
parser.feed(html)
parser.close()

4
<!--[if IE 9]>IE9-specific content
<![endif]-->
<div> Welcome to HackerRank</div>
<!--[if IE 9]>IE9-specific content<![endif]-->
>>> Multi-line Comment
[if IE 9]>IE9-specific content
<![endif]
>>> Data
 Welcome to HackerRank
>>> Single-line Comment
[if IE 9]>IE9-specific content<![endif]


## Problem 13 : Detect HTML Tags, Attributes and Attribute Values
* **You are given an HTML code snippet of N lines.**
* **Your task is to detect and print all the HTML tags, attributes and attribute values.**

In [16]:
from html.parser import HTMLParser


class html_parser(HTMLParser):
    def handle_starttag(self, tag, attrs):
        print(tag)
        [print('-> {} > {}'.format(*attr)) for attr in attrs]
        
html = '\n'.join([input() for _ in range(int(input()))])
parser = html_parser()
parser.feed(html)
parser.close()

9
<head>
<title>HTML</title>
</head>
<object type="application/x-flash" 
  data="your-file.swf" 
  width="0" height="0">
  <!-- <param name="movie" value="your-file.swf" /> -->
  <param name="quality" value="high"/>
</object>
head
title
object
-> type > application/x-flash
-> data > your-file.swf
-> width > 0
-> height > 0
param
-> name > quality
-> value > high


## Problem 14 : Validating UID
* **ABCXYZ company has up to  employees.
The company decides to create a unique identification number (UID) for each of its employees.
The company has assigned you the task of validating all the randomly generated UIDs.**

In [17]:
import re
for _ in range(int(input())):
    lst = []
    s = input()
    for i in [r"[A-Z0-9]{10}", r"([A-Z].*){2,}", r"([0-9].*){3,}"]:
        lst.append(bool(re.search(i, s, flags=re.I)))
    if all(lst) is True:
        if bool(re.search(r".*(.).*\1.*", s)) is True:
            print("Invalid")
        else:
            print("Valid")
    else:
        print("Invalid")

2
B1CD102354
Invalid
B1CDEF2354
Valid


## Problem 15 : Validating Credit Card Numbers
* **You and Fredrick are good friends. Yesterday, Fredrick received N credit cards from ABCD Bank. He wants to verify whether his credit card numbers are valid or not. You happen to be great at regex so he is asking for your help!**

In [19]:
import re
n = int(input())
for _ in range(n):
    credit = input().strip()
    credit_removed_hiphen = credit.replace("-", "")
    valid = True
    length_16 = bool(re.match(r"^[4-6]\d{15}$", credit))
    length_19 = bool(re.match(r"^[4-6]\d{3}-\d{4}-\d{4}-\d{4}$", credit))
    consecutive = bool(re.findall(r"(?=(\d)\1\1\1)", credit_removed_hiphen))
    if length_16 == True or length_19 == True:
        if consecutive == True:
            valid = False
    else:
        valid = False
    if valid:
        print("Valid")
    else:
        print("Invalid")

6
4123456789123456
Valid
5123-4567-8912-3456
Valid
61234-567-8912-3456
Invalid
4123356789123456
Valid
5133-3367-8912-3456
Invalid
5123 - 3567 - 8912 - 3456
Invalid


# THE END!!