# Exercise 3 INF201

### Task 0: warm-up

Assume a file with the content

The following document contains several email addresses. They all have the form example.name@institution.domain, example-name@institution.domain, or example_name@institution.domain. 

Two examples with the nmbu domain are jonas.kusch@nmbu.no or lena.scholzer@nmbu.no.
Two examples with a generic domain are jonas.kusch@gmail.domain or lena-scholzer@nmbu.domain.
Write a function that, given a file name, goes through the file (skipping its first line) and finds all email addresses. The program identifies the name, the institution, and the domain.

In [3]:
import re

def find_emails(filename):
    email_regex = r'([a-zA-Z0-9._-]+)@([a-zA-Z0-9.-]+)\.([a-zA-Z]{2,})'

    with open(filename) as data:
        data.readline()
        content = data.read()
        emails = re.findall(email_regex, content)

        result = []
        for name, institution, domain in emails:
            result.append({
                "name": name,
                "institution": institution,
                "domain": domain
            })

    return result

emails_found = find_emails("data/email.txt")

for email in emails_found:
    print(f"Name: {email['name']}, Institution: {email['institution']}, Domain: {email['domain']}")


Name: jonas.kusch, Institution: nmbu, Domain: no
Name: lena.scholzer, Institution: nmbu, Domain: no
Name: jonas.kusch, Institution: gmail, Domain: domain
Name: lena-scholzer, Institution: nmbu, Domain: domain


### Task 1: Regex

Assume that we have sentences of the form

* Ali and Per are friends.
* Kari and Joe know each other.
* James has known Peter since school days.

The common structure here is that each sentence contains two names and that the names are the only words beginning with capital letters. Create a regular expression that

* matches these sentences (one sentence at a time)
* collects the names in groups

and write a program that can go through a list of such sentences and print a table of the form (and formatting)

      Friendships      
-----------------------
       Ali - Per       
      Kari - Joe       
     James - Peter 

Assume that
* each sentence begins with a name
* all names only contain letters from A–Z and a–z
* that all names have at least two letters.

Make sure the program is efficient by ensuring that the regex is not re-parsed and compiled each time it is used. Your program should work for all sentences with two names that fulfill the previously mentioned rules.

In [None]:
import re

regex = re.compile(r'\b([A-Z][a-z]+)\b.*\b([A-Z][a-z]+)\b')

def print_friendships(sentences):
    print(f'{"Friendships":^20}')
    print(f'{"-"*20}')
    
    for sentence in sentences:
        match = regex.match(sentence)
        if match:
            name1, name2 = match.groups()
            print(f'   {name1:^5} - {name2:^5}')

sentences = [
    "Ali and Per are friends.",
    "Kari and Joe know each other.",
    "James has known Peter since school days."
]

print_friendships(sentences)


    Friendships     
--------------------
    Ali  -  Per 
   Kari  -  Joe 
   James - Peter


### Task 2: Password validation

Write a Python function validate_password that checks if a given password string is valid based on the following rules:

* Starts with an uppercase letter from I to Z.
* Contains at least one word character (a-z, A-Z, 0-9, or underscore).
* Has exactly 4 to 5 characters in length.
* Ends with a digit.
* May contain spaces between the characters but cannot start or end with a space.
* The password must end at the string's end.

Example passwords:

* Valid: J1234, I_ab5, Z9_w4
* Invalid: A1234 (starts with wrong letter), J12345 (too many characters), I__ (does not end with a digit)

In [None]:
import re

def validate_password(password):
    pattern = re.compile(r'^[I-Z][\w ]{2,3}\d$')

    if pattern.match(password) and not password.startswith(' ') and not password.endswith(' '):
        if re.search(r'\w', password):
            return True
    return False

passwords = ["J1234", "I_ab5", "Z9_w4", "A1234", "J12345", "I__", "  I12", "I12  "]
for pwd in passwords:
    print(f"Password: {pwd}, Valid: {validate_password(pwd)}")

Password: J1234, Valid: True
Password: I_ab5, Valid: True
Password: Z9_w4, Valid: True
Password: A1234, Valid: False
Password: J12345, Valid: False
Password: I__, Valid: False
Password:   I12, Valid: False
Password: I12  , Valid: False


### Task 3: challange exercise

Write a program that goes through all the python files in the current working directory and prints out all imports per file. Here, an import means the word that follows the keyword «import». Your program should be able to find patterns beginning with «import» and «from».

Assume you have a file «dummy.py» with the following lines:

    import re # this imports re

    from pathlib import Path # this imports Path
    
Then, the output of your program should look like:

    path/to/file/dummy.py: ['re']

    path/to/file/dummy.py: ['Path']

You can assume that module and package names only have word characters and that imports end with a white space. Proceed as follows:

- Use the compile function to construct two regular expressions to find and return the package name in
1. lines starting with «import»
2. lines starting with «from»
- Use the pathlib-module to find all files in the current working directory which end with «.py».
- Iterate through all the lines of the found files and extract the import names using the compiled regular expressions.
- If the file contains any imports, print the path of the file along with the packages that are imported.

A few maybe useful regex-commands:
* ? one or none characters
* one or more characters
* {3,} three or more characters
* \A beginning of line
* \s white spaces
* \w word characters
* \b end of word
* () grouping, only the term in brackets is returned.

In [None]:
import re
from pathlib import Path

def check_python_files(directory):
    pattern_import = re.compile(r'^\s*import\s+(\w+)')
    pattern_from = re.compile(r'^\s*from\s+\w+\s+import\s+(\w+)\b')

    path_to_dir = Path(directory)
    pathlist = path_to_dir.rglob('*.py')

    for path in pathlist:
        imports = []
        with open(path, 'r') as file:
            for line in file:
                import_match = pattern_import.match(line)
                if import_match:
                    imports.append(import_match.group(1))

                from_match = pattern_from.match(line)
                if from_match:
                    imports.append(from_match.group(1))

        if imports:
            full_path = str(path.resolve())
            for module in imports:
                print(f"{full_path}: {module}")

check_python_files('data/')


/Users/fredericstrand/Documents/INF201/data/dummy.py: re
/Users/fredericstrand/Documents/INF201/data/dummy.py: Path
