# Cohort Analysis 
By Sarah May
05/02/2017

### Used to merge seperate surveys sent to the same group of students. 

#### Determines which students were sent which surveys, and which students were sent all  surveys. 
#### Saves results within a results sub-directory.
#### Students are identified by their emails, which are unique even after graduating.
#### Both student emails and student_ids are stored in the output file. 

## Instructions: 
Below, replace the first three "None" values with:
1. the directory that contains the cohort survey files 
2. the column number (between 0 and n inclusive) of the email column
3. the column number (between 0 and n inclusive) of the student id column
IMPORTANT: if the file has a header row, change the value of the has_headers variable from False to True

### Important information about the directory and format of the inputs: 
* The directory should contain only the CSV files that represent surveys meant to be part of the cohort. No extra CSV files should be present in the directory.
* Each CSV file should be in the same format (i.e. the keycol and idcol should be the same for each CSV file in the directory)
* The surveys will be identified by the names of the files, so make them descriptive. 
* This will create an output sub-folder within the called "results". If a subfolder named "results" already exists, the files will be written there. 



In [8]:
import os
import csv
import argparse 

directory = None
keycol = None
idcol = None
has_headers = False

cd = os.getcwd()
file_names = []
for file in os.listdir(cd + "/" + directory): 
    form = file.split('.')
    if len(form) > 1: 
        form = form[1]
        if form == 'csv': 
            file_names.append(directory + "/" + file)
find_cohort(file_names, keycol, idcol, has_headers)
print "\nDONE.\n"

TypeError: cannot concatenate 'str' and 'NoneType' objects

## Code below to read/edit ~~~~ 

get_surveylist takes and updates a dictionary that maps student emails to "ids:::the survey" if they were sent it
This is a helper method for find_cohort, defined below 

In [7]:
def get_surveylist(file, keycol, idcol, survey_dict, has_headers): 
    """Given a .csv file, the locations of email/id columns [0..n], inclusive, and a survey_dict that maps students
    by email and id to surveys, updates the survey_dict for the current file. """
    
    cd = os.getcwd()
    print "\ncurrent working directory: {c}\n".format(c=cd)
    with open(cd + '/' + file, 'rU') as f:
        rd_f = csv.reader(f)
        f_list = list(rd_f)
    
    if has_headers:
        f_list = f_list[1:]
    
    for row in f_list: 
        if row[keycol] in survey_dict: 
            survey_dict[row[keycol]].append(file.split('.')[0] + ":::" + row[idcol])
        else:
            survey_dict[row[keycol]] = [file.split('.')[0]+ ":::" + row[idcol]]
    
    return survey_dict

Below, we define the function find_cohort, which takes:
    1. a list of file names 
    2. keycol -- column of the student email 
    3. idcol -- column of the student id 
and writes two files by calling get_surveylist:
    1. results/cohort.csv -- contains student ids and emails who were sent all surveys (+ names of surveys)
    2. results/results.csv -- contains student ids and emails and the surveys they were sent 

In [10]:
def find_cohort(file_names, keycol, idcol, has_headers):
    """Given a list of file names, writes to a results file the names of each 
    student_id, corresponding email, and the surveys they were sent.  The ids and emails 
    who were sent all surveys are also written to a cohort.csv file. """
    
    cd = os.getcwd()
    survey_dict = {}
    
    for f in file_names: 
        print "\nLooking at file: {f}\n".format(f=f)
        survey_dict = get_surveylist(f, keycol, idcol, survey_dict, has_headers)
    
    # set up results directory and files. WILL DELETE any pre-existing
    # csv files named results.csv / cohort.csv within a results subdirectory
    if not os.path.isdir(cd + "/results"): 
        os.mkdir("results")
    if os.path.isfile(cd + "/results/cohort.csv"): 
        os.remove(cd + "/results/cohort.csv")
    if os.path.isfile(cd + "/results/results.csv"): 
        os.remove(cd + "/results/results.csv")
        
    os.chdir(cd +"/results")

    cohort_csv = open(cd+"/results/cohort.csv", "w+")
    w_cohort = csv.writer(cohort_csv)
    results_csv = open(cd+"/results/results.csv", "w+")
    w_results = csv.writer(results_csv)

    for key, values in survey_dict.iteritems():  
        row = [key] + values
        w_results.writerow(row)
        if len(values) == len(file_names): 
            w_cohort.writerow(row)