## Submission for Kits DatE-IT coding competition

### Comments on the solution
I have provided a general merge function that takes the path to two csvs and where the merged csv should be placed. Then there are also 3 options that can be provied to the function.
* What separator the inputed and resulting csv uses
* Whether the function is allowed to be destructive and overwrite already existing files when creating the merged csv
* Whether rows with empty values should be retained in the merged csv.
  * By checking this option, this challenge's desired filter is applied to the csv.

I have also included basic error messages to reflect if something went wrong and why. The library pandas throws an exception when trying to read a non-existing file which was left in.

### Contact info

Name: Eric Carlsson  
Phone: nollsju 25 123 178 (to avoid spammers scraping my phone number)  
email: mail(at)ericcarlsson.com  

### Solution

In [1]:
import pandas as pd
from pathlib import Path


# helper function
def intersection(a, b):
    return list(set(a).intersection(b))


"""
Description:
A function that, given 2 csvs, tries to merge them.

Parameters:
 - csv_1 path, csv_2_path, csv_result_path: path to file
 - separator: sepator used in csv_1 and csv_2
 - overwrite_result: determines whether this function should be allowed to overwrite a previously written file. 
     Allows the used to avoid destructive actions and potential loss of data
 - remove_empty: determines if rows with empty values should be used in retained in the resulting csv

Return value: (bool, str) -> (successful, error_message)

"""
def merge_csvs(csv_1_path :str, csv_2_path :str, csv_result_path: str, separator=";", overwrite_result=False, remove_empty=False) -> (bool, str):
    
    # Error definitions
    errors = {
    "no_shared_key": "There was no shared key between the two provided csvs",
    "result_path": "There was already a file at the path provided for the result csv.\nIf you wish to overwrite set {overwrite_result=True}"
        }

    # if {overwrite_result==False} check if a file of name {result_path} already exists 
    if not overwrite_result:
        res_path = Path(csv_result_path)
        if res_path.is_file():
            return (False, errors["result_path"])
    
    # load CSVs
    csv_1 = pd.read_csv(csv_1_path, sep=separator, dtype=str)
    csv_2 = pd.read_csv(csv_2_path, sep=separator, dtype=str)
    
    # find column(s) to merge on
    shared_cols = intersection(csv_1.columns, csv_2.columns)
    if len(shared_cols) == 0:
        return (False, errors["no_shared_key"])
    
    # remove empty if specified
    if remove_empty:
        csv_1 = csv_1.dropna(how="any")
        csv_2 = csv_2.dropna(how="any")
    
    # merge the csvs on the shared columns
    res = pd.merge(csv_1, csv_2, on=shared_cols)
    
    # create the result CSV
    res.to_csv(csv_result_path, sep=separator, index=False)
    
    return (True, "")

### Example usage

In [2]:
merge_csvs("kommuner.csv", "skolverksamhet.csv", "result.csv")

(True, '')

In [3]:
merge_csvs("kommuner.csv", "skolverksamhet.csv", "result.csv")

(False,
 'There was already a file at the path provided for the result csv.\nIf you wish to overwrite then set {overwrite_result=True}')

In [4]:
merge_csvs("kommuner.csv", "skolverksamhet.csv", "result.csv", overwrite_result=True)

(True, '')

In [5]:
merge_csvs("kommuner.csv", "skolverksamhet.csv", "result.csv", remove_empty=True, overwrite_result=True)

(True, '')