## Question 1: Regular Expressions
Write a regular expression pattern matching a _valid URL_. For the purposes of this exercise, a valid URL is any string of the form `protocol://domain/optional_file_path/optional_file_name`, where

   * `protocol` is one of `file`, `http`, `https`, or `ftp`.
   * `domain` is a sequence of labels separated by a single `.` (dot) character where each label is a combination of alphanumeric (i.e., both letters and numbers) characters in either lower or upper case, and the rightmost label representing the top-level domain is not all numbers.
   * `optional_file_path` is a (potentially empty) sequence of labels separated by a `/` (forward slash) character, where each label is a combination of alphanumeric characters in either lower or upper case, and hyphens (`-`).
   * `optional_file_name` is a sequence of labels (of length at most 2) separated by a `.` (dot) character, where each label is a combination of alphanumeric characters in either lower or upper case and hyphens (`-`).

For example all of the following strings are valid URLs: https://my.Domain.com/some/file.html, ftp://com/my-file.json, http://123.456.12a/, http://bigdata and http://cs5234.rhul.ac.uk/sub-dir/ , whereas http://234.345, http://rhul.ac.uk/my.long.filename.html, http://.bigdata/, and http://big..data are not.

In [1]:
import re

# Put your pattern inside ''
url_regex = '^(file|http|https|ftp):\/\/([a-zA-Z0-9]+)+((\.[a-zA-Z0-9]+)*(\.[a-zA-Z0-9]+[a-zA-Z]+))?(\/[a-zA-Z0-9\-]+)*(\.[a-zA-Z]+)?\/?'
valid_tests = ['https://my.Domain.com/some/file.html', 'ftp://com/my-file.json', 'http://123.456.12a/',
               'http://bigdata', 'http://cs5234.rhul.ac.uk/sub-dir']
invalid_tests = ['http://234.345', 'http://rhul.ac.uk/my.long.filename.html', 'http://.bigdata/', 'http://big..data']

for s in valid_tests:
    print(re.compile(url_regex).fullmatch(s))

for s in invalid_tests:
    print(re.compile(url_regex).fullmatch(s))

<re.Match object; span=(0, 36), match='https://my.Domain.com/some/file.html'>
<re.Match object; span=(0, 22), match='ftp://com/my-file.json'>
<re.Match object; span=(0, 19), match='http://123.456.12a/'>
<re.Match object; span=(0, 14), match='http://bigdata'>
<re.Match object; span=(0, 32), match='http://cs5234.rhul.ac.uk/sub-dir'>
None
None
None
None


  url_regex = '^(file|http|https|ftp):\/\/([a-zA-Z0-9]+)+((\.[a-zA-Z0-9]+)*(\.[a-zA-Z0-9]+[a-zA-Z]+))?(\/[a-zA-Z0-9\-]+)*(\.[a-zA-Z]+)?\/?'


Your solution is correct if the value returned by `re.compile(url_regex).fullmatch(s)` is not
`None` for every string `s`, which is a valid URL according to 
the definition above, and `None`, otherwise. *N.B. The tests in `valid_tests` and `invalid_tests` are given for the sake of example only and passing them is a necessary but not sufficient condition to receive full marks!* We will perform more exhaustive tests based on the definition above that your solution must also be able to handle.

## Question 2: Regular Expressions
Write a regular expression pattern matching any string consisting of non-empty _fields_ separated by _commas_. A field may include any printable characters except whitespaces and commas. 
A valid string must start and end with a field. 
For example, the strings `'ab1c,de_f,xyz'`, `'ab1c,de_%^f,xyz'`, `abc` 
are  valid whereas the strings `'ab1c,, de_f'` and 
`'ab1c,de_f, xyz,'` are not.

In [24]:
# Put your pattern inside ''
csv_regex = '^([^\s,]+)(\,?[^\s,]+)*'
valid_tests = ['ab1c,de_f,xyz', 'ab1c,de_%^f,xyz', 'abc']
invalid_tests = ['ab1c,, de_f', 'ab1c,de_f, xyz,']

for s in valid_tests:
    print(re.compile(csv_regex).fullmatch(s))

for s in invalid_tests:
    print(re.compile(csv_regex).fullmatch(s))

<re.Match object; span=(0, 13), match='ab1c,de_f,xyz'>
<re.Match object; span=(0, 15), match='ab1c,de_%^f,xyz'>
<re.Match object; span=(0, 3), match='abc'>
None
None


Your solution is correct if the value returned by `re.compile(csv_regex).fullmatch(s)` is not
`None` for every string `s`, which is valid according to 
the definition above, and `None`, otherwise. *N.B. The tests in `valid_tests` and `invalid_tests` are given for the sake of example only and passing them is a necessary but not sufficient condition to receive full marks!*. We will perform more exhaustive tests based on the definition above that your solution must also be able to handle.

## Question 3: Generator Functions
Write a generator function `gen_running_count_from_csv_string(s)` that takes a string `s` matching the regular expression pattern described by `csv_regex` as argument and produces, based on the fields extracted from `s`, a running count of the number of fields in `s` that contain a digit. For example, `gen_running_count_from_csv_string('ab1c,de_f,xy4z5b6')` will return the sequence 
`1`, `1`, `2`

In [25]:
'''
s: a string matching the pattern stored in csv_regex
Returns a running count of fields containing a digit 

Replace pass with your code
'''
import re

s = 'ab1c,de_f,xy4z5b6'
running_count = 0
item = 0


def gen_running_count_from_csv_string(s):
    global running_count
    global item
    for i in range(0, len(re.split(",", s))):
        if re.search(r'\d', re.split(",", s)[item]):
            running_count = running_count + 1
        item = item + 1
        yield running_count


# tester 

# test = []
# for i in range(0, len(re.split(",", s))):
#     test.append(next(gen_running_count_from_csv_string('ab1c,de_f,xy4z5b6')))
# if test == [1, 1, 2]:
#     print("test passed")
# else:
#     print("failed", test)

## Question 4: Lambda Expressions
Write the following lambda expressions:
1. `valid_url`: takes a string `s` as argument and returns `True` if `s` 
matches `url_regex`, and `False`, otherwise
2. `concat_csv_strings`: takes two strings `s1` and `s2` as arguments and 
returns a single string consisting of `s1` and `s2` separated by comma. For example, if
the strings
`'ab1c,de_f,xyz'` and `'ab1c,de_%^f,xyz'` are given as arguments, the output must be the string
`'ab1c,de_f,xyz,ab1c,de_%^f,xyz'`
3. `val_by_vec`: takes an object `x` and a sequence of objects `seq`, and returns a sequence
(i.e., an iterator) of tuples `(x, t[0]), (x, t[1]), ...`.<br>
_Hint_: Use a generator expression.
4. `not_self_loop`: takes a 2-tuple `(a, b)` and returns `True` if `a != b` and `False`, otherwise.

In [26]:
# Replace the right-hand side of each lambda with your code

# function1

valid_url = lambda s: True if re.compile(url_regex).fullmatch(s) else False

# function 1 tester
# valid_tests = ['https://my.Domain.com/some/file.html', 'ftp://com/my-file.json', 'http://123.456.12a/',
#                'http://bigdata', 'http://cs5234.rhul.ac.uk/sub-dir']
# invalid_tests = ['http://234.345', 'http://rhul.ac.uk/my.long.filename.html', 'http://.bigdata/', 'http://big..data']
# passed = True
# for i in valid_tests:
#   if valid_url(i) != True:
#     passed = False
# for i in invalid_tests:
#   if valid_url(i) == True:
#     print("invalid",i)
#     passed = False
# if passed == True:
#     print("1 passed")
# else:
#     print("1 failed")

# function 2

concat_csv_strings = lambda s1, s2: f"{s1},{s2}"

# function 2 tester 
# 
# s1 = 'ab1c,de_f,xyz'
# s2 = 'ab1c,de_%^f,xyz'
# con = concat_csv_strings(s1, s2)
# if con == 'ab1c,de_f,xyz,ab1c,de_%^f,xyz':
#     print("2 passed")
# else:
#     print("2 failed", con)

# function 3 

val_by_vec = lambda x, t: ((x, t) for t in seq)

# function 3 tester 
# 
# x = "x"
# seq = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
# test = []
# for t in val_by_vec(x, seq):
#     test.append(t)
# if test == [('x', 0), ('x', 1), ('x', 2), ('x', 3), ('x', 4), ('x', 5), ('x', 6), ('x', 7), ('x', 8), ('x', 9)]:
#     print("3 passed")
# else:
#     print("3 failed", test)

# function 4

not_self_loop = lambda t: True if not (t[0] == t[1]) else False

# function 4 tester 

# t = ("a", "b")
# if not_self_loop(t) == True:
#     print("4 passed")
# else:
#     print("4 failed")