# MS Summer Academy NPS Data Exploration (2016-2017)

## Questsions Asked:

    * How many promoters, passives, and detractors are there in both years? How do the scores differ by year?
    * What track of students had the best experience at the summer academy? What about the worst experience?
    * Did students feel as though the pacing increased as the program went on?
    * Which location had the best overall experience?
    * Did students at the NY location have a better or worse expeience as the program went on?
  

# General NPS Cleaning Process:

- Step 1: Import, Clean, and Aggregate Weeks 1-7 for 2016 Data
- Step 2: Import and Clean Week 8 for 2016 Data
- Step 2.5: Aggregate Weeks 1-7 with Week 8 to produce Full 2016 Dataset
- Step 3: Import and Clean 2017 Data
- Step 3.5: Aggregate Full 2016 Data with 2017 Data to produce Complete Dataset

### Promoters, Passives, Detractors for 2017: 78, 34, 8


In [1]:
# Pandas is a library for basic data analysis
import pandas as pd

# NumPy is a library for advanced mathematical computation
import numpy as np

# MatPlotLib is a library for basic data visualization
import matplotlib.pyplot as plt

# SeaBorn is a library for advanced data visualization
import seaborn as sns

import glob

## _Stretch Challenge_:

### Functionalize Data Manipulation code!

In [2]:
sns.set(style="white", context="notebook", palette="deep")

COLOR_COLUMNS = ["#66C2FF", "#5CD6D6", "#00CC99", "#85E085", "#FFD966", "#FFB366", "#FFB3B3", "#DAB3FF", "#C2C2D6"]

sns.set_palette(palette=COLOR_COLUMNS, n_colors=4)

---

# Data Cleaning and Aggregation

In [3]:
REL_PATH_DIRECTORY = "../datasets/SA_Feedback_Surveys_FINAL/2016/"
ALL_BUT_8_PATH = "Anon*.csv"

THE_8_PATH = "Week 8 Feedback (2016, incomplete) - results.csv"

### Weeks 1-7 (2016)

- NOTE: Data is _slightly_ different across various weeks and locations. **Approach with caution!**

In [4]:
all_but_8_2016_files = glob.glob(REL_PATH_DIRECTORY + ALL_BUT_8_PATH)

In [5]:
all_but_8_2016_files

['../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 7 Feedback - Taipei.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 6 Feedback - Tokyo.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 1 Feedback - Singapore.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 7 Feedback - LA.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 4 Feedback - SF.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 5 Feedback - SV.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 4 Feedback - SG.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 6 Feedback - NY.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 5 Feedback - HK.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 1 Feedback - SF.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 2 Feedback - LA.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 6 Feedback - Taipei.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 3 F

## GOAL: Place _Week_ and _Location_ data in each 2016 (not Week 8) DF

In [6]:
filename = "Anon Week 1 Feedback - LA.csv"
fileparts = filename.split(" ")

test_file_path = REL_PATH_DIRECTORY + filename
df_test = pd.read_csv(test_file_path)

In [7]:
week_num, location = int(fileparts[2]), fileparts[5].split(".")[0]

In [8]:
df_garbage = pd.DataFrame([1, 2, 5, 3, 4, 1], columns=["trash_nums"])
df_garbage["Week"], df_garbage["Location"] = week_num, location
df_garbage

Unnamed: 0,trash_nums,Week,Location
0,1,1,LA
1,2,1,LA
2,5,1,LA
3,3,1,LA
4,4,1,LA
5,1,1,LA


In [9]:
 
# #Could set location and week columns manually to each file by setting a new DF for each filepath and running this function 
# def addLocationAndWeek(filepath, week, location):
#     location_and_week_arr
#     df = pd.read_csv(filepath)
#     df["Week"] = week
#     df["Location"] = place
#     return dataframe






# week = 48 - 53 
# import re
# # re.find()

# list_of_weeks = list()
# for filepath in all_but_8_2016_files:
#     week = (filepath[48:55])
#     list_of_weeks.append(week)
#     print(week)
    
    

In [10]:
def read_all_files():
    data_arr = list()
    for filename in all_but_8_2016_files:
        # When reading each file, we grab the Week & Location data and add it to two new columns
        week, location = _get_week_and_location(filename)
        df = pd.read_csv(filename)
        df["Week"], df["Location"] = week, location
        data_arr.append(df)
    return data_arr

def _get_week_and_location(filename):
    fileparts = filename.split(" ")
    week, location = int(fileparts[2]), fileparts[5].split(".")[0]
    return week, location

data = read_all_files()

### NOTE: The bad dataframe with `Unnamed: 0` is the `Anon Week 1 Feedback - SV.csv` DF

The `Unnamed: 0` column/feature is measuring Timestamp data (Datetime)

### NOTE: The bad dataframe with no `Timestamp` or `Unnamed: 0` is the `Anon Week 5 Feedback - SF.csv` DF

In [11]:
youre_the_one = None

for df in data:
    if "Unnamed: 0" not in df.columns and "Timestamp" not in df.columns:
        youre_the_one = df
        
youre_the_one

Unnamed: 0,What track are you in?,How would you rate your overall satisfaction with the Summer Academy this week?,How well is the schedule paced?,Week,Location
0,Apps,3,3,5,SF
1,Apps,4,3,5,SF
2,Apps,4,3,5,SF
3,Apps,4,4,5,SF
4,Apps,3,3,5,SF
5,Apps,5,3,5,SF
6,Apps,3,2,5,SF
7,Apps,5,3,5,SF
8,Apps,1,3,5,SF
9,Apps,4,3,5,SF


In [12]:
# week_one_through_seven_df = pd.DataFrame(index=index, columns=columns)
# Find a way to grab `week` and `location` data from filenames
# Find all potential unique columns
# Use unique columns to create master DF and put all data into that one (merging, copies)


column_names = dict()

for df in data:
    for column in df.columns:
        if column in column_names:
            column_names[column] += 1
        else:
            column_names[column] = 1
            
column_names

# for df in data:
#     week_one_through_seven_df = pd.concat(df)
    
# week_one_through_seven_df
# # week_one_through_seven_df .concat()

    
# week_one_through_seven_df + df
    
# week_one_through_seven_df

# rating = 'How would you rate your overall satisfaction with the Summer Academy this week?'
# data[data[rating]==4]

# dataframe["Week"] = week
# dataframe["Location"] = place



{'Timestamp': 37,
 'How would you rate your overall satisfaction with the Summer Academy this week?': 39,
 'How well is the schedule paced?': 33,
 'Week': 39,
 'Location': 39,
 'How well are the tutorials paced?': 6,
 'What track are you in?': 24,
 'Unnamed: 0': 1}

## TODO: Aggregate Schedule and Tutorial data into _Pacing_ column (mutual exclusion)

## Checking Each 2016 (not Week 8) DF for Unique Values

In [66]:
# Functionalized grabbing unique values

def check_unique_values_by_column(col_name):
    unique_values = set()
    for df in data:
        if col_name in df.columns:
            df_vals = df[col_name].unique().tolist()
            unique_values.update(df_vals)
    return unique_values

In [67]:
col_name_list = list(column_names.keys())
col_name_list

['Timestamp',
 'How would you rate your overall satisfaction with the Summer Academy this week?',
 'How well is the schedule paced?',
 'Week',
 'Location',
 'How well are the tutorials paced?',
 'What track are you in?',
 'Unnamed: 0']

### Create Dictionary of Unique Values per Column Name over All Data

In [99]:
uniques_dict = dict()
# For each column name in our list above, we create a dictionary element where...
# KEY: Column Name
# VALUE: All Unique Values for that Column Name across All Data
for index in range(8):
    col_name = col_name_list[index]
    uniques_dict[col_name] = check_unique_values_by_column(col_name)

In [100]:
uniques = list(uniques_dict["How would you rate your overall satisfaction with the Summer Academy this week?"])
uniques

[0, 1, 2, 3, 4, 5]

In [102]:
'''Map errors  to 0, then change datatype of column to type int '''
  
satisfaction_map = dict(zip(uniques, [1, 2, 3, 4, 5, 4, 3, 5, 0]))

satisfaction_column = 'How would you rate your overall satisfaction with the Summer Academy this week?'

for df in data:
    if 'How would you rate your overall satisfaction with the Summer Academy this week?' in df:
#         if '#REF!' in df[satisfaction_column]:
        df['How would you rate your overall satisfaction with the Summer Academy this week?'] = df['How would you rate your overall satisfaction with the Summer Academy this week?'].map(satisfaction_map).astype(int)
#         df['How would you rate your overall satisfaction with the Summer Academy this week?'] = df['How would you rate your overall satisfaction with the Summer Academy this week?'].astype(int)
            


for df in data:
    if 'How would you rate your overall satisfaction with the Summer Academy this week?' in df:
        print(df['How would you rate your overall satisfaction with the Summer Academy this week?'].unique(), "\n\n")

    

[5 4] 


[5 4] 


[4 5] 


[5 4] 


[5 4] 


[4 5] 


[4 5] 


[4 5] 


[4 5] 


[5 4] 


[4 5] 


[5 4] 


[4 5] 


[5 3 4] 


[5 4 3] 


[4 5] 


[4 5] 


[5 4] 


[4 5] 


[5 4] 


[4 5] 


[5 4] 


[4 5 3] 


[4 5] 


[5 4] 


[5 4] 


[5 3 4] 


[5 4] 


[5 4] 


[4 5] 


[5 4] 


[5 4 3] 


[4 5] 


[5 4] 


[5 4] 


[4 5 3] 


[5 4] 


[4 5] 


[5 4 2] 




In [None]:
# 2016 NOT Week 8 data dictionary mappings 

''' Mapping Schedule pacing to convert np.NaN and errors to 0 so that we can only deal with values greater than 0 in 
our calculations and making sure values are integers. '''

'''Our collection of mapping dictionaries to deal with np.NaN, #REF!, and wrong datatypes'''

schedule_pacing_map = {
    '#REF!': 0,
    np.NaN: 0, 
    1: 1, 
    2: 2, 
    '2': 2, 
    3: 3, 
    '3': 3, 
    '4': 4, 
    4: 4, 
    5: 5, 
    '5': 5
}


for df in data:
    if "How well is the schedule paced?" in df:
        df["How well is the schedule paced?"] = df["How well is the schedule paced?"].map(schedule_pacing_map)
        df = df.drop(df[df["How well is the schedule paced?"] > 0].index)


In [98]:
for df in data:
    if 'How would you rate your overall satisfaction with the Summer Academy this week?' in df:
        print(df['How would you rate your overall satisfaction with the Summer Academy this week?'].unique(), "\n\n")

[3 4 5] 


[3 4 5] 


[2 3 4 5] 


[3 5 4] 


[5 4 3] 


[4 5 3 2] 


[4 3] 


[2 4 5 3] 


[4 5] 


[3 4 5] 


[4 3 5] 


[5 4 3] 


[4 5 3] 


[5 3 1 4] 


[3 4 5 1] 


[4 3 2] 


[2 3 4 5] 


[3 5 4] 


[4 5] 


[3 4 5] 


[2 3 4 5] 


[3 4 5] 


[4 3 2 5 1] 


[4 5 3] 


[3 5 4 2] 


[5 4] 


[3 5 1 4] 


[3 4 5] 


[3 4 5] 


[4 5] 


[5 3 4] 


[5 3 4 1 2] 


[4 5 3 2] 


[5 4] 


[5 4 3] 


[2 3 4 5 1] 


[3 4 5] 


[4 3] 


[5 4 3 0] 




In [111]:
col_name_list

['Timestamp',
 'How would you rate your overall satisfaction with the Summer Academy this week?',
 'How well is the schedule paced?',
 'Week',
 'Location',
 'How well are the tutorials paced?',
 'What track are you in?',
 'Unnamed: 0']

In [123]:
unique_types = set()

for item in check_unique_values_by_column(col_name_list[0]):
    unique_types.update([type(item)])
    
# unique_types
print(col_name_list[0], ": ", list(unique_types))

Timestamp :  [<class 'str'>, <class 'float'>]


### Checks all unique datatypes across every feature across every dataset

In [133]:
type_dict

{'Timestamp': [str, float],
 'How would you rate your overall satisfaction with the Summer Academy this week?': [int],
 'How well is the schedule paced?': [str, int],
 'Week': [int],
 'Location': [str],
 'How well are the tutorials paced?': [int],
 'What track are you in?': [str],
 'Unnamed: 0': [str]}

In [None]:
def check_unique_values_by_column(col_name):
    unique_values = set()
    for df in data:
        if col_name in df.columns:
            df_vals = df[col_name].unique().tolist()
            unique_values.update(df_vals)
    return unique_values

In [178]:
def dict_of_unique_types():
    
    '''Returns a dict (KEY: Unique Columns in all dataframes, VALUE: Set of unique value types)'''
    type_dict = dict()

    for feature in range(len(col_name_list)):
        unique_types = set()

        for item in check_unique_values_by_column(col_name_list[feature]):
            unique_types.update([type(item)])

        type_dict[col_name_list[feature]] = list(unique_types)
        
type_dict

{'Timestamp': [str, float],
 'How would you rate your overall satisfaction with the Summer Academy this week?': [int],
 'How well is the schedule paced?': [str, int],
 'Week': [int],
 'Location': [str],
 'How well are the tutorials paced?': [int],
 'What track are you in?': [str],
 'Unnamed: 0': [str]}

In [109]:
# 'What track are you in?' looks pretty clean
unique_track_vals = check_unique_values_by_column('What track are you in?')
unique_track_vals

{'Apps', 'Average:', 'Games', 'Intro', 'VR'}

## Example Magic Type Mapper!

In [183]:
def map_types_to_value(df, column_name, type_to_change, change_to_value):
    '''Takes a Pandas Dataframe, column name, object type to change, and the value to replace with
        Returns a "mapped" dataframe based on value types that you want to replace'''
    for index, item in df.iterrows():
        changer = None
        if type(item[0]) == type_to_change:
            changer = change_to_value
            return df.set_value(index, column_name, changer)


In [185]:
johnny = pd.DataFrame(data=["asdf", "zxcv", "qwer", "pou", "ncmv", True], columns=["chen"])

mapped = map_types_to_value(johnny, "chen", bool, "WORKING")

mapped

# for index, item in johnny.iterrows():
#     changer = None
#     if type(item[0]) == bool:
#         changer = np.nan
#         johnny.set_value(index, "chen", changer)

  import sys


Unnamed: 0,chen
0,asdf
1,zxcv
2,qwer
3,pou
4,ncmv
5,WORKING


In [180]:
johnny

Unnamed: 0,chen
0,asdf
1,zxcv
2,qwer
3,pou
4,ncmv
5,


In [None]:
def map_types_to_value(df, column_name, type_to_change, change_to_value):
    '''Takes a Pandas Dataframe and type object to return a "mapped" dataframe based on value types'''
    for index, item in df.iterrows():
        changer = None
        if type(item[0]) == typeof:
            changer = change_to_value
            df.set_value(index, column_name, changer)
            
            


In [112]:
unique_timestamp_vals = check_unique_values_by_column('Timestamp')
unique_timestamp_vals

for df in data:
    if 'Timestamp' in df.columns:
        #Potentially map all strings to pandas.Timestamps, might need regex
        #map non-strings to empty strings
    
    

{'6/23/2016 13:17:43',
 '6/23/2016 13:19:38',
 '6/23/2016 13:19:41',
 '6/23/2016 13:19:58',
 '6/23/2016 13:21:46',
 '6/23/2016 14:44:06',
 '6/23/2016 14:48:47',
 '6/23/2016 15:55:12',
 '6/23/2016 16:11:39',
 '6/23/2016 16:14:04',
 '6/23/2016 22:07:29',
 '6/23/2016 22:09:23',
 '6/23/2016 22:11:39',
 '6/23/2016 22:11:44',
 '6/23/2016 22:13:04',
 '6/23/2016 22:13:09',
 '6/23/2016 22:13:25',
 '6/23/2016 22:15:33',
 '6/23/2016 22:24:53',
 '6/23/2016 22:41:02',
 '6/23/2016 23:56:36',
 '6/24/2016 0:18:42',
 '6/24/2016 11:05:49',
 '6/24/2016 11:34:57',
 '6/24/2016 11:37:05',
 '6/24/2016 11:40:59',
 '6/24/2016 11:41:34',
 '6/24/2016 11:41:42',
 '6/24/2016 11:41:47',
 '6/24/2016 11:42:30',
 '6/24/2016 11:46:04',
 '6/24/2016 11:47:36',
 '6/24/2016 11:53:20',
 '6/24/2016 11:53:29',
 '6/24/2016 12:58:35',
 '6/24/2016 13:05:10',
 '6/24/2016 13:07:03',
 '6/24/2016 13:07:59',
 '6/24/2016 13:09:46',
 '6/24/2016 13:11:15',
 '6/24/2016 13:16:51',
 '6/24/2016 13:18:32',
 '6/24/2016 13:20:34',
 '6/24/2016 

### Example of Merging Mutually Exclusive Columns

In [83]:
df1 = pd.DataFrame([1, 0, 3, 4, np.nan, np.nan, 0, 6, np.nan, np.nan, np.nan], columns=["a"])
df2 = pd.DataFrame([np.nan, np.nan, np.nan, np.nan, 5, 7, np.nan, np.nan, 2, 9, 8], columns=["b"])

Step 1: Replace all **NaNs**

In [84]:
df1["a"] = df1["a"].fillna(0)
df2["b"] = df2["b"].fillna(0)

In [86]:
df_final = df1['a'] + df2['b']

In [None]:
s1, s2 = "New York", "No Location"
s1 + s2

In [87]:
df_final

0     1.0
1     0.0
2     3.0
3     4.0
4     5.0
5     7.0
6     0.0
7     6.0
8     2.0
9     9.0
10    8.0
dtype: float64

In [None]:
pacing_name1 = 'How well is the schedule paced?'
pacing_name2 = 'How well are the tutorials pacecd?'

### Since we know that the `Schedule Pacing` and `Tutorial Pacing` are mutually exclusive columns, we can simply _ADD_ them together into a new column.

In [48]:
pacing_df =  pd.DataFrame(columns= ['How well is the schedule paced?','How well are the tutorials paced?'])

array_of_columns = list()


for df in data:
    if 'How well is the schedule paced?' in df.columns:
        array_of_columns.append(df['How well is the schedule paced?'])
        nan_df = pd.DataFrame(np.nan, index=df['How well is the schedule paced?'].size, columns='How well are the tutorials paced?')
        array_of_columns.append(nan_df)
#         pacing_df['How well is the schedule paced?'] = pacing_df['How well is the schedule paced?'] + df['How well is the schedule paced?']
    elif 'How well are the tutorials paced?' in df.columns:
        array_of_columns.append(df['How well are the tutorials paced?'])
        nan_df = pd.DataFrame(np.nan, index=df['How well are the tutorials paced?'].size, columns='How well are the tutorials paced?')
        array_of_columns.append(nan_df)
        
        
#         pacing_df[ 'How well are the tutorials paced?'] = pacing_df['How well are the tutorials paced?'] + df['How well are the tutorials paced?']
        
pacing_df

TypeError: object of type 'int' has no len()

In [62]:
pacing_df =  pd.DataFrame(columns= ['How well is the schedule paced?','How well are the tutorials paced?'])
p1 = list()
p2 = list()




for df in data:
    if 'How well is the schedule paced?' in df.columns:
        p1.append(df['How well is the schedule paced?'].values)
        temp_arr = [p1.append(i) for i in df['How well is the schedule paced?'].values]
        p2.append(np.nan * (df['How well is the schedule paced?'].size))
#         p1.append(df['How well is the schedule paced?'].values)

    elif 'How well are the tutorials paced?' in df.columns:
        temp_arr = [p2.append(i) for i in df['How well are the tutorials paced?'].values]
        p2.append(np.nan * (df['How well are the tutorials paced?'].size))

        
#         pacing_df[ 'How well are the tutorials paced?'] = pacing_df['How well are the tutorials paced?'] + df['How well are the tutorials paced?']
        
p1
# p2

[array([3, 3, 3, 4, 4, 3, 3, 3, 3, 3, 3]),
 3,
 3,
 3,
 4,
 4,
 3,
 3,
 3,
 3,
 3,
 3,
 array([4, 3, 1, 2, 4, 3, 5, 2, 4, 3, 3, 4, 3]),
 4,
 3,
 1,
 2,
 4,
 3,
 5,
 2,
 4,
 3,
 3,
 4,
 3,
 array([2, 3, 2, 3, 3, 3, 4, 3, 3]),
 2,
 3,
 2,
 3,
 3,
 3,
 4,
 3,
 3,
 array([3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 2, 3, 3, 4, 5, 3, 4, 4, 4, 5, 4, 3,
        3, 3, 3, 4, 3, 3, 3, 4, 4]),
 3,
 3,
 4,
 4,
 4,
 4,
 3,
 3,
 3,
 3,
 2,
 3,
 3,
 4,
 5,
 3,
 4,
 4,
 4,
 5,
 4,
 3,
 3,
 3,
 3,
 4,
 3,
 3,
 3,
 4,
 4,
 array([3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 4, 3, 5, 2, 3, 4, 4, 4, 3, 3,
        4, 3, 3, 5, 3, 3, 3, 3, 3, 3, 3, 4, 2, 3, 5, 2, 3, 3, 3]),
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 4,
 3,
 3,
 4,
 3,
 5,
 2,
 3,
 4,
 4,
 4,
 3,
 3,
 4,
 3,
 3,
 5,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 4,
 2,
 3,
 5,
 2,
 3,
 3,
 3,
 array([3, 4, 3, 3, 4, 3, 3, 4, 3]),
 3,
 4,
 3,
 3,
 4,
 3,
 3,
 4,
 3,
 array([2, 3, 4, 3, 3, 3, 3, 3, 3, 3, 4, 3, 3, 1, 3, 3, 3, 3, 5, 3]),
 2,
 3,
 4,
 3,
 3,
 3,
 3,
 3,
 3,
 3,
 4,
 3,
 3,
 1,


In [45]:
height = 10
width = 20
df_0 = pd.DataFrame(0, index=range(height), columns=range(width))
len(df_0.index)

10

### Week 8 (2016)

In [None]:
df_week_8 = pd.read_csv(REL_PATH_DIRECTORY + THE_8_PATH)

In [None]:
df_week_8["Week"] = 8
df_week_8.rename(columns={"location": "Location"}, inplace=True)
df_week_8

### 2017

In [50]:
FILEPATH = "../datasets/SA_Feedback_Surveys_FINAL/2017/Student_Feedback_Surveys_Superview.csv"
df_2017 = pd.read_csv(FILEPATH)

# 9 or 10 are promoters
# 7-8 are passives
# 0-6 are detractors

# df_2017.head()

# Clean Week data to remove redundant "week"

week_mapper = {
    "Week 1": 1,
    "Week 2": 2,
    "Week 3": 3,
    "Week 4": 4,
    "Week 5": 5,
    "Week 6": 6,
    "Week 7": 7,
    "Week 8": 8
}

df_2017["Week"] = df_2017["Week"].map(week_mapper)

In [52]:
df_2017.unique()

AttributeError: 'DataFrame' object has no attribute 'unique'

In [None]:
df_2017 = pd.read_csv(FILEPATH)
df_2017 = df_2017[df_2017["Rating (Num)"] != "#ERROR!"]
df_2017["Rating (Num)"] = df_2017["Rating (Num)"].astype(int)

df_promoters = df_2017.loc[(df_2017['Rating (Num)'] >= 9) & (df_2017['Week'] == 'Week 7')]
df_passives = df_2017.loc[(df_2017['Rating (Num)'] >= 7) & (df_2017['Rating (Num)'] <= 8) & (df_2017['Week'] == 'Week 7')]
df_detractors = df_2017.loc[(df_2017['Rating (Num)'] < 7) & (df_2017['Week'] == 'Week 7')]


# df_promoters
# df_passives
# df_detractors   

# len(df_promoters) #78
# len(df_passives) #34
# len(df_detractors) #8

# num_of_promoters = len(df_promoters)
# num_of_promoters


## Two choices for Pro-Pas-Det Data Divisions:

- **Divide** up your promoters, passives, and detractors into _three_ independent DataFrames
- **Convert** your logic for promoter, passive, and detractor identifiation into _arguments_ that you can pass to your global DataFrame at anytime

---

# Data Manipulation

In [None]:
promoter_count = 0
passive_count = 0
detractor_count = 0

index = []
columns = []

promoter_df = pd.DataFrame(index=index, columns=columns)
passive_df = pd.DataFrame(index=index, columns=columns)
detractor_df = pd.DataFrame(index=index, columns=columns)

In [None]:
arg_promoter = (df["Rating (Num)"] >= 9)
promoters = df[arg_promoter]


# week_one = (promoters["Week"] == "Week 1")

# week_one


# len(promoters)
# promoters

In [None]:
df2017.loc[:, ['ID', 'Track', 'Week', 'Rating (Num)']]


all_students = {}

for index, row in df.iterrows():
    
    