## Advanced Python Hacker Rank problems

1. Write a function which reads in a csv file, and then returns a dictionary where keys are (standardized) degrees and values are their frequencies in the file. 
    
    - Your function will be tested using the csv file located at: https://github.com/thisismetis/dsp/blob/master/lessons/data/faculty.csv
    - Use regular expressions so that your code considers "PhD" and "Ph.D." to be the same string. 
    - Do not use pandas. 

In [3]:
import csv
import re

In [126]:
def count_degrees(csv_file_name):
    csv_reader = csv.reader(open(csv_file_name), delimiter=',')
    
    # Get list of degrees (rows from second column)
    degrees = list(list(row)[1] for row in csv_reader)[1:]
    
    # Clean list of degrees - get rid of periods, convert to lower case, split spaces 
    degrees = [re.sub("\.", "", item).upper().split() for item in degrees]
    
    # flatten list - take lists within list and make one list
    degrees = [item for sublist in degrees for item in sublist]
    
    # Count rows with given degree
    freq = {} 
    for items in degrees: 
        freq[items] = degrees.count(items)
        
    return freq

In [127]:
# read in csv file
file_path = "/Users/AuerPower/Metis/git/dsp/lessons/data/faculty.csv"
count_degrees(file_path)

{'SCD': 6,
 'PHD': 31,
 'MD': 1,
 'MPH': 2,
 'BSED': 1,
 'MS': 2,
 'JD': 1,
 'MA': 1,
 '0': 1}

2. Write a function which reads in a csv file, and then returns a dictionary where keys are (standardized) titles and values are their frequencies in the file. 

    - Your function will be tested using the csv file located at: https://github.com/thisismetis/dsp/blob/master/lessons/data/faculty.csv
    - Note that there is an unintentional type in the dataset. Your code needs to account for that. 
    - Do not use pandas

In [149]:
def count_titles(csv_file_name):
    csv_reader = csv.reader(open(csv_file_name), delimiter=',')
    titles = list(list(row)[2] for row in csv_reader)[1:]
    
    # Fix typo
    titles = [re.sub(" is ", " of ", item) for item in titles]
    
    # Count rows with given title
    freq = {} 
    for items in titles:
        freq[items] = titles.count(items)
    
    return freq

In [150]:
file_path = "/Users/AuerPower/Metis/git/dsp/lessons/data/faculty.csv"
count_titles(file_path)

{'Associate Professor of Biostatistics': 12,
 'Professor of Biostatistics': 13,
 'Assistant Professor of Biostatistics': 12}

3. Write a function which reads in a csv file, and then returns a list of emails in the file. 

    - Your function will be tested using the csv file located at: https://github.com/thisismetis/dsp/blob/master/lessons/data/faculty.csv
    - Do not use pandas

In [214]:
def emails(csv_file_name):
    csv_reader = csv.reader(open(csv_file_name), delimiter=',')
    emails = list(list(row)[3] for row in csv_reader)[1:]
    return emails

In [188]:
csv_file_name = "/Users/AuerPower/Metis/git/dsp/lessons/data/faculty.csv"
emails(csv_file_name)

['bellamys@mail.med.upenn.edu',
 'warren@upenn.edu',
 'bryanma@upenn.edu',
 'jinboche@upenn.edu',
 'sellenbe@upenn.edu',
 'jellenbe@mail.med.upenn.edu',
 'ruifeng@upenn.edu',
 'bcfrench@mail.med.upenn.edu',
 'pgimotty@upenn.edu',
 'wguo@mail.med.upenn.edu',
 'hsu9@mail.med.upenn.edu',
 'rhubb@mail.med.upenn.edu',
 'whwang@mail.med.upenn.edu',
 'mjoffe@mail.med.upenn.edu',
 'jrlandis@mail.med.upenn.edu',
 'liy3@email.chop.edu',
 'mingyao@mail.med.upenn.edu',
 'hongzhe@upenn.edu',
 'rlocalio@upenn.edu',
 'nanditam@mail.med.upenn.edu',
 'knashawn@mail.med.upenn.edu',
 'propert@mail.med.upenn.edu',
 'mputt@mail.med.upenn.edu',
 'sratclif@upenn.edu',
 'michross@upenn.edu',
 'jaroy@mail.med.upenn.edu',
 'msammel@cceb.med.upenn.edu',
 'shawp@upenn.edu',
 'rshi@mail.med.upenn.edu',
 'hshou@mail.med.upenn.edu',
 'jshults@mail.med.upenn.edu',
 'alisaste@mail.med.upenn.edu',
 'atroxel@mail.med.upenn.edu',
 'rxiao@mail.med.upenn.edu',
 'sxie@mail.med.upenn.edu',
 'dxie@upenn.edu',
 'weiyang@mail.m

4. Write a function that does the following: Given an list of emails, returns the unique email domains. 
    - do not use pandas

In [202]:
def unique_domains(emails):
    domains = []
    for d in emails:
        domains += [re.split("@", d)[-1]]
    domains = set(domains)
    return list(domains)

In [203]:
email_list = emails(csv_file_name)
unique_domains(email_list)

['cceb.med.upenn.edu', 'email.chop.edu', 'upenn.edu', 'mail.med.upenn.edu']

5. Write a function that, given a list of emails, writes the emails to a file called "emails.csv". Add a header of your choosing to your file as the first line.

In [234]:
def write_to_csv(list_of_emails):
    with open('emails.csv', 'w', newline='') as csvfile:
        writer = csv.writer(csvfile, delimiter=',')
        writer.writerow(['emails'])
        for e in list_of_emails:
            writer.writerow([e])

In [204]:
pwd

'/Users/AuerPower/Metis/git/dsp/lessons/python_advanced'

In [235]:
email_list = ['bellamys@mail.med.upenn.edu',
 'warren@upenn.edu',
 'bryanma@upenn.edu',
 'jinboche@upenn.edu',
 'sellenbe@upenn.edu',
 'jellenbe@mail.med.upenn.edu',
 'ruifeng@upenn.edu',
 'bcfrench@mail.med.upenn.edu',
 'pgimotty@upenn.edu',
 'wguo@mail.med.upenn.edu',
 'hsu9@mail.med.upenn.edu',
 'rhubb@mail.med.upenn.edu',
 'whwang@mail.med.upenn.edu',
 'mjoffe@mail.med.upenn.edu',
 'jrlandis@mail.med.upenn.edu',
 'liy3@email.chop.edu',
 'mingyao@mail.med.upenn.edu',
 'hongzhe@upenn.edu',
 'rlocalio@upenn.edu']

write_to_csv(email_list)

6. Building on the previous question; Write a function which reads in "faculty.csv", and then returns a dictionary where keys are last names, and values are corresponding rows.

example = {'Li': [[' PhD', 'Assistant Professor of Biostatistics', 'liy3@email.chop.edu'],}

In [336]:
def get_dict():
    csv_reader = csv.reader(open('/Users/AuerPower/Metis/git/dsp/lessons/data/faculty.csv', 'r'), delimiter=',')
    name_dict = {}
    firstline = True
    for row in csv_reader:
        if firstline:
            firstline = False
            continue
        if row[0].split()[-1] in name_dict:
            name_dict[row[0].split()[-1]].append([row[1], row[2], row[3]])
        else:
            name_dict[row[0].split()[-1]] = [[row[1], row[2], row[3]]]
    return name_dict

In [337]:
get_dict()

{'Bellamy': [[' Sc.D.',
   'Associate Professor of Biostatistics',
   'bellamys@mail.med.upenn.edu']],
 'Bilker': [['Ph.D.', 'Professor of Biostatistics', 'warren@upenn.edu']],
 'Bryan': [[' PhD',
   'Assistant Professor of Biostatistics',
   'bryanma@upenn.edu']],
 'Chen': [[' Ph.D.',
   'Associate Professor of Biostatistics',
   'jinboche@upenn.edu']],
 'Ellenberg': [[' Ph.D.', 'Professor of Biostatistics', 'sellenbe@upenn.edu'],
  [' Ph.D.', 'Professor of Biostatistics', 'jellenbe@mail.med.upenn.edu']],
 'Feng': [[' Ph.D',
   'Assistant Professor of Biostatistics',
   'ruifeng@upenn.edu']],
 'French': [[' PhD',
   'Associate Professor of Biostatistics',
   'bcfrench@mail.med.upenn.edu']],
 'Gimotty': [[' Ph.D', 'Professor of Biostatistics', 'pgimotty@upenn.edu']],
 'Guo': [[' Ph.D', 'Professor of Biostatistics', 'wguo@mail.med.upenn.edu']],
 'Hsu': [[' Ph.D.',
   'Assistant Professor of Biostatistics',
   'hsu9@mail.med.upenn.edu']],
 'Hubbard': [[' PhD',
   'Associate Professor of 

7. Write a function which reads in "faculty.csv", and then returns a dictionary where keys are tuples of names, and values are corresponding rows. You can assume that the tupels are unique so there will be a unique row per tuple.
    - faculty_dict = { ('Benjamin', 'C.', 'French'): ['PhD', 'Associate Professor of Biostatistics', 'bcfrench@mail.med.upenn.edu'], ...}

In [359]:
def get_dict():
    csv_reader = csv.reader(open('/Users/AuerPower/Metis/git/dsp/lessons/data/faculty.csv', 'r'), delimiter=",")
    faculty_dict = {}
    firstline = True
    for row in csv_reader:
        if firstline:
            firstline = False
            continue
        faculty_dict[tuple(row[0].split())] = [row[1], row[2], row[3]]
    return faculty_dict

In [360]:
get_dict()

{('Scarlett', 'L.', 'Bellamy'): [' Sc.D.',
  'Associate Professor of Biostatistics',
  'bellamys@mail.med.upenn.edu'],
 ('Warren', 'B.', 'Bilker'): ['Ph.D.',
  'Professor of Biostatistics',
  'warren@upenn.edu'],
 ('Matthew', 'W', 'Bryan'): [' PhD',
  'Assistant Professor of Biostatistics',
  'bryanma@upenn.edu'],
 ('Jinbo', 'Chen'): [' Ph.D.',
  'Associate Professor of Biostatistics',
  'jinboche@upenn.edu'],
 ('Susan', 'S', 'Ellenberg'): [' Ph.D.',
  'Professor of Biostatistics',
  'sellenbe@upenn.edu'],
 ('Jonas', 'H.', 'Ellenberg'): [' Ph.D.',
  'Professor of Biostatistics',
  'jellenbe@mail.med.upenn.edu'],
 ('Rui', 'Feng'): [' Ph.D',
  'Assistant Professor of Biostatistics',
  'ruifeng@upenn.edu'],
 ('Benjamin', 'C.', 'French'): [' PhD',
  'Associate Professor of Biostatistics',
  'bcfrench@mail.med.upenn.edu'],
 ('Phyllis', 'A.', 'Gimotty'): [' Ph.D',
  'Professor of Biostatistics',
  'pgimotty@upenn.edu'],
 ('Wensheng', 'Guo'): [' Ph.D',
  'Professor of Biostatistics',
  'wguo@