# MS Summer Academy NPS Data Exploration (2016-2017)

## Questsions Asked:

    * How many promoters, passives, and detractors are there in both years? How do the scores differ by year?
    * What track of students had the best experience at the summer academy? What about the worst experience?
    * Did students feel as though the pacing increased as the program went on?
    * Which location had the best overall experience?
    * Did students at the NY location have a better or worse expeience as the program went on?
  

In [None]:
### Promoters for 2016: 

In [39]:
# Pandas is a library for basic data analysis
import pandas as pd

# NumPy is a library for advanced mathematical computation
import numpy as np

# MatPlotLib is a library for basic data visualization
import matplotlib.pyplot as plt

# SeaBorn is a library for advanced data visualization
import seaborn as sns

import glob

## _Stretch Challenge_:

### Functionalize Data Manipulation code!

In [2]:
sns.set(style="white", context="notebook", palette="deep")

COLOR_COLUMNS = ["#66C2FF", "#5CD6D6", "#00CC99", "#85E085", "#FFD966", "#FFB366", "#FFB3B3", "#DAB3FF", "#C2C2D6"]

sns.set_palette(palette=COLOR_COLUMNS, n_colors=4)

---

# Data Cleaning and Aggregation

In [3]:
REL_PATH_DIRECTORY = "../datasets/SA_Feedback_Surveys_FINAL/2016/"
ALL_BUT_8_PATH = "Anon*.csv"

THE_8_PATH = "Week 8 Feedback (2016, incomplete) - results.csv"

### Weeks 1-7 (2016)

- NOTE: Data is _slightly_ different across various weeks and locations. **Approach with caution!**

In [4]:
all_but_8_2016_files = glob.glob(REL_PATH_DIRECTORY + ALL_BUT_8_PATH)

In [5]:
all_but_8_2016_files

['../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 7 Feedback - Taipei.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 6 Feedback - Tokyo.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 1 Feedback - Singapore.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 7 Feedback - LA.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 4 Feedback - SF.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 5 Feedback - SV.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 4 Feedback - SG.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 6 Feedback - NY.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 5 Feedback - HK.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 1 Feedback - SF.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 2 Feedback - LA.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 6 Feedback - Taipei.csv',
 '../datasets/SA_Feedback_Surveys_FINAL/2016/Anon Week 3 F

In [6]:
# df["Kashy"] = np.ones(shape=1450)
# df.drop(columns=["Kashy"], inplace=True)
# df

In [14]:
def all_files():
    data_arr = list()
    for file in all_but_8_2016_files:
        data_arr.append(pd.read_csv(file))
    return data_arr

dataset = all_files()

[           Timestamp  \
 0   8/5/2016 1:39:41   
 1   8/5/2016 1:40:47   
 2   8/5/2016 1:40:50   
 3   8/5/2016 1:42:44   
 4   8/5/2016 1:45:13   
 5   8/5/2016 1:45:39   
 6   8/5/2016 1:49:21   
 7   8/8/2016 1:30:34   
 8   8/8/2016 1:33:45   
 9   8/8/2016 1:49:29   
 10  8/8/2016 1:51:00   
 
     How would you rate your overall satisfaction with the Summer Academy this week?  \
 0                                                   3                                 
 1                                                   4                                 
 2                                                   4                                 
 3                                                   4                                 
 4                                                   5                                 
 5                                                   4                                 
 6                                                   4                            

## TODO: 

* Find a way to grab `week` and `location` data from filenames
* Check through all DFs in `dataset` to find all potential unique columns
* Use unique columns to create master DF and put all data into that one (merging, copies)

### Week 8 (2016)

In [8]:
df_week_8 = pd.read_csv(REL_PATH_DIRECTORY + THE_8_PATH)

In [9]:
df_week_8.head(3)

Unnamed: 0,#,How likely is it that you would recommend the Make School Summer Academy to a friend?,location,track,Start Date (UTC),Submit Date (UTC),Network ID
0,00b836bda84e6bdbe780af97e249e59f,10,New York,summerApps,9/7/16 1:03,9/7/16 1:04,3212b7a834
1,39dde6dc0e1e375845d756fc7e39fc5f,10,San Francisco,summerIntro,9/7/16 1:03,9/7/16 1:04,f4954355aa
2,5e56b9de91670b308cb98dd2848b8739,10,New York,summerIntro,9/7/16 1:03,9/7/16 1:05,3d69ca289b


### 2017

In [21]:
FILEPATH = "../datasets/SA_Feedback_Surveys_FINAL/2017/Student_Feedback_Surveys_Superview.csv"
df_2017 = pd.read_csv(FILEPATH)

# 9 or 10 are promoters
# 7-8 are passives
# 0-6 are detractors

df_2017.head()

Unnamed: 0,ID,Location,Track,Week,Rating (Num),Schedule Pacing
0,134,San Francisco,"Apps, Explorer",Week 1,3,Just right
1,36,Los Angeles,Apps,Week 1,4,A little too fast
2,117,San Francisco,Games,Week 1,4,Way too slow
3,253,,,Week 2,4,A little too fast
4,350,New York City,"Apps, Explorer",Week 1,4,Just right


In [46]:
df_2017 = pd.read_csv(FILEPATH)
df_2017 = df_2017[df_2017["Rating (Num)"] != "#ERROR!"]
df_2017["Rating (Num)"] = df_2017["Rating (Num)"].astype(int)

df_promoters = df_2017.loc[(df_2017['Rating (Num)'] >= 9) & (df_2017['Week'] == 'Week 7')]
df_passives = df_2017.loc[(df_2017['Rating (Num)'] >= 7) & (df_2017['Rating (Num)'] <= 8) & (df_2017['Week'] == 'Week 7')]
df_detractors = df_2017.loc[(df_2017['Rating (Num)'] < 7) & (df_2017['Week'] == 'Week 7')]


# df_promoters
# df_passives
# df_detractors   

# len(df_promoters) #78
# len(df_passives) #34
# len(df_detractors) #8

# num_of_promoters = len(df_promoters)
# num_of_promoters


8

## Two choices for Pro-Pas-Det Data Divisions:

- **Divide** up your promoters, passives, and detractors into _three_ independent DataFrames
- **Convert** your logic for promoter, passive, and detractor identifiation into _arguments_ that you can pass to your global DataFrame at anytime

---

# Data Manipulation

In [12]:
promoter_count = 0
passive_count = 0
detractor_count = 0

index = []
columns = []

promoter_df = pd.DataFrame(index=index, columns=columns)
passive_df = pd.DataFrame(index=index, columns=columns)
detractor_df = pd.DataFrame(index=index, columns=columns)

In [13]:
arg_promoter = (df["Rating (Num)"] >= 9)
promoters = df[arg_promoter]


# week_one = (promoters["Week"] == "Week 1")

# week_one


# len(promoters)
# promoters

NameError: name 'df' is not defined

In [None]:
df2017.loc[:, ['ID', 'Track', 'Week', 'Rating (Num)']]


all_students = {}

for index, row in df.iterrows():
    
    