# Data Cleaning for Gender IDEAL

Before committing, please re-run the kernel with clear any output to avoid any merge issues with jupyter and github.

## Imports

In [2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib as plt
import altair as alt

## Reading in csvs

#### Summary about columns in Vision file

* There are 44 columns in Vision & Commitment. The multiple choice answers do not come downloaded with the question label. The answers are the column names and the first column name is the first option a user could choose.
* If a type of multiple choice response was already used in an earlier question, a number is appended to it, like None.1
* Tableau prefers data to be "tall and thin" (i.e. instead of one row per respondent, one row per question) https://www.tableau.com/about/blog/2018/2/prepare-survey-data-analysis-three-easy-steps-83122 -> This can be done in Tableau using the pivot functionality



### Vision
This includes the benchmark questions such as number of employees, number of years, industry, workforce type. 

Also includes "other relevant info" which is open response. Can we remove this?

In [2]:
inclusion = pd.read_csv("../data_original/24_Benefits_Inclusion_anon.csv")
inclusion

Unnamed: 0,Benefits & Policies: Does your organization provide a universal family leave policy to all employees?,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,26-51 weeks for primary caregiver,26-51 for secondary caregiver,52+ weeks for primary caregiver,52+ weeks for secondary caregiver,None for primary caregiver,...,"Inclusion, Culture, Training & Community: Does your organization evaluate and confirm alignment on gender equity and diversity standards with potential recipients of philanthropy, community or business partners?","Inclusion, Culture, Training & Community: What % of annual corporate giving - covering both philanthropic or operating dollars (if relevant) - is targeted toward initiatives that have a gender equity and diversity focus?","Inclusion, Culture, Training & Community: Has your organization supported state, local, or federal-level gender equity initiatives? (ie - Business Coalition for Equality, Equal Rights Amendment, funding for reproductive health access).","Inclusion, Culture, Training & Community: Has your organization partnered with other organizations in your industry to advocate for improved standards and practices related to gender equity?","Inclusion, Culture, Training & Community: Before we move to the next section, is there anything else you want to tell us about how your organization approaches creating a gender diverse, inclusive culture and community?",Organization,benchmark_group,number_of_employees,number_of_years,workforce
0,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,20.0,no,"Yes - Yes, we have ongoing or have in the past...",Did not want to share retention metrics.,Company 1X,Technology,"1000-4,999 Employees",5-14 years,At least 80% of employees are Salaried
1,1,Less than 12 weeks for primary caregiver,,,,,,,,,...,0,,no,no,,Company 1W,NonProfit,Fewer than 100 Employees,15 or more years,We have a mixed workforce of hourly and salari...
2,1,,,,,,,,,None for primary caregiver,...,0,,"Yes - we support advocacy for womens, LGBTQ+ a...","Yes - we support advocacy for womens, LGBTQ+ a...",,Company 1V,Finance,Fewer than 100 Employees,5-14 years,At least 80% of employees are Salaried
3,1,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,,,,,,,,...,0,,no,no,,Company 1U,Finance,Fewer than 100 Employees,15 or more years,At least 80% of employees are Salaried
4,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,,no,no,,Company 1T,Technology,100-249 Employees,5-14 years,We have a mixed workforce of hourly and salari...
5,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,0.0,no,no,,Company 1S,Technology,"1000-4,999 Employees",Fewer than 5 years,We have a mixed workforce of hourly and salari...
6,1,,,,,,,,,None for primary caregiver,...,0,0.0,no,no,,Company 1R,Technology,Fewer than 100 Employees,Fewer than 5 years,At least 80% of employees are Salaried
7,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,,No,No,,Company 1Q,Finance,250-999 Employees,5-14 years,At least 80% of employees are Salaried
8,1,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,,,,,,,,...,0,,No,No,,Company 1P,,100-249 Employees,5-14 years,We have a mixed workforce of hourly and salari...
9,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,,no,no,no,Company 1O,Technology,Fewer than 100 Employees,Fewer than 5 years,At least 80% of employees are Salaried


In [3]:
df = inclusion.copy()
df

Unnamed: 0,Benefits & Policies: Does your organization provide a universal family leave policy to all employees?,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,26-51 weeks for primary caregiver,26-51 for secondary caregiver,52+ weeks for primary caregiver,52+ weeks for secondary caregiver,None for primary caregiver,...,"Inclusion, Culture, Training & Community: Does your organization evaluate and confirm alignment on gender equity and diversity standards with potential recipients of philanthropy, community or business partners?","Inclusion, Culture, Training & Community: What % of annual corporate giving - covering both philanthropic or operating dollars (if relevant) - is targeted toward initiatives that have a gender equity and diversity focus?","Inclusion, Culture, Training & Community: Has your organization supported state, local, or federal-level gender equity initiatives? (ie - Business Coalition for Equality, Equal Rights Amendment, funding for reproductive health access).","Inclusion, Culture, Training & Community: Has your organization partnered with other organizations in your industry to advocate for improved standards and practices related to gender equity?","Inclusion, Culture, Training & Community: Before we move to the next section, is there anything else you want to tell us about how your organization approaches creating a gender diverse, inclusive culture and community?",Organization,benchmark_group,number_of_employees,number_of_years,workforce
0,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,20.0,no,"Yes - Yes, we have ongoing or have in the past...",Did not want to share retention metrics.,Company 1X,Technology,"1000-4,999 Employees",5-14 years,At least 80% of employees are Salaried
1,1,Less than 12 weeks for primary caregiver,,,,,,,,,...,0,,no,no,,Company 1W,NonProfit,Fewer than 100 Employees,15 or more years,We have a mixed workforce of hourly and salari...
2,1,,,,,,,,,None for primary caregiver,...,0,,"Yes - we support advocacy for womens, LGBTQ+ a...","Yes - we support advocacy for womens, LGBTQ+ a...",,Company 1V,Finance,Fewer than 100 Employees,5-14 years,At least 80% of employees are Salaried
3,1,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,,,,,,,,...,0,,no,no,,Company 1U,Finance,Fewer than 100 Employees,15 or more years,At least 80% of employees are Salaried
4,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,,no,no,,Company 1T,Technology,100-249 Employees,5-14 years,We have a mixed workforce of hourly and salari...
5,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,0.0,no,no,,Company 1S,Technology,"1000-4,999 Employees",Fewer than 5 years,We have a mixed workforce of hourly and salari...
6,1,,,,,,,,,None for primary caregiver,...,0,0.0,no,no,,Company 1R,Technology,Fewer than 100 Employees,Fewer than 5 years,At least 80% of employees are Salaried
7,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,,No,No,,Company 1Q,Finance,250-999 Employees,5-14 years,At least 80% of employees are Salaried
8,1,Less than 12 weeks for primary caregiver,Less than 12 weeks for secondary caregiver,,,,,,,,...,0,,No,No,,Company 1P,,100-249 Employees,5-14 years,We have a mixed workforce of hourly and salari...
9,1,,,12-25 weeks for primary caregiver,12-25 weeks for secondary caregiver,,,,,,...,0,,no,no,no,Company 1O,Technology,Fewer than 100 Employees,Fewer than 5 years,At least 80% of employees are Salaried


In [4]:
df.columns

Index(['Benefits & Policies: Does your organization provide a universal family leave policy to all employees?',
       'Less than 12 weeks for primary caregiver',
       'Less than 12 weeks for secondary caregiver',
       '12-25 weeks for primary caregiver',
       '12-25 weeks for secondary caregiver',
       '26-51 weeks for primary caregiver', '26-51 for secondary caregiver',
       '52+ weeks for primary caregiver', '52+ weeks for secondary caregiver',
       'None for primary caregiver',
       ...
       'Inclusion, Culture, Training & Community: Does your organization evaluate and confirm alignment  on gender equity and diversity standards with potential recipients of philanthropy, community or business partners? ',
       'Inclusion, Culture, Training & Community: What % of annual corporate giving - covering both philanthropic or operating dollars (if relevant) - is targeted toward initiatives that have a gender equity and diversity focus? ',
       'Inclusion, Culture, Traini

In [5]:
#Dropping all of the Benefits & Policies Columns
df2 = df.drop(columns={'Benefits & Policies: Does your organization provide a universal family leave policy to all employees?',
       'Less than 12 weeks for primary caregiver',
       'Less than 12 weeks for secondary caregiver',
       '12-25 weeks for primary caregiver',
       '12-25 weeks for secondary caregiver',
       '26-51 weeks for primary caregiver', '26-51 for secondary caregiver',
       '52+ weeks for primary caregiver', '52+ weeks for secondary caregiver',
       'None for primary caregiver','None for secondary caregiver',
       'Benefits & Policies: Does the plan above explicitly provide fully paid leave for adoption or surrogacy?',
       'Benefits & Policies: During the last 12 months, what % of qualified employees took their full paid family leave available? If your policy distinguishes between primary and secondary caregivers, respond to this question only considering all qualified primary caregivers.',
       'Benefits & Policies: If your policy differentiates between primary and secondary caregivers, during the last 12 months, what % of secondary caregivers took their full paid family leave available?  If your policy does not differentiate, just respond with N/A.',
       'Benefits & Policies: In addition to paid family-leave, does your workplace offer additional unpaid leave options for caregivers?',
       'Benefits & Policies: Does your organization have a written policy regarding flexible phase-in schedules for employees returning from family leave?',
       'Benefits & Policies: Does your organization provide guidance to all managers to develop supportive phase-in schedules for employees returning from family leave?',
       'Benefits & Policies: Do all offices and facilities have lactation rooms (windowless or covered windows, locked door, refrigerator and sink available) that are accessible to all lactating employees?',
       'Benefits & Policies: Are there on-site or nearby childcare options for at least 80% of your offices/facilities?',
       'Benefits & Policies: Does your organization subsidize the cost of childcare as a benefit available to all employees? ',
                      'Benefits & Policies: Does your organization cover the cost of the following: ',
       'Other', 'Laundry services', 'Grocery shopping services',
       'Meal prep options', 'On-site tutors*',
       'On-site primary and urgent care*', 'None', 'Other.1',
       'Regular work-from-home schedules',
                      'As-Needed work-from-home schedules',
       '100% remote work-from-home schedules', 'Condensed work-hour schedules',
       'Job-sharing opportunities',
       'Flexible work hours (employee determines schedule based on pre-determined hourly total)',
       'None.1', 'Other.2', 'N/A, no hourly employees',
       'Set shift schedules (advanced shift scheduling)',
       'Shift flexibility (allow for shift-swapping and trading amongst employees)',
                      'Guaranteed minimum payments (regardless of shift cancellations)',
       'None.2', 'Other.3',
       'Stated commitment against harassment (ie - a zero tolerance policy must be followed with clear statements of the behavior that will not be tolerated)',
       'Definitions of harassment that cover sexual harassment and harassment based on gender',
       'Clear definitions and examples covering  inappropriate behavior and inappropriate language',
       'Confidentiality and non-retaliation statements as part of the policy',
       'Reporting and accountability policies for the harassment policy',
       'None.3', 'Other.4', 'Benefits & Policies: Does your organization have a publicly-posted policy that defines gender identity and sexual orientation discrimination, and that states a zero tolerance harassment and discrimination policy?',
       'Clear guidance on what to expect from the whistleblower process, with clarity on how issues are elevated',
       'Whistleblowers with assured neutrality',
       'Ensures protection and anonymity of employees',
       'Articulates a fair process for evaluating concerns as they are reported',
       'Articulates how whistleblowers will be informed and engaged in the process',
       'None.4', 'Other.5', 'Sexual harassment', 'Gender identity', 'Sexual orientation discrimination', 'No',
                      'Benefits & Policies: In the last 12 months, have there been any gender-related or sexual harassment or discrimination claims brought by an employee or former employee against another employee of the company?',
       'Benefits & Policies: Tell us about the outcome or status of those claims.',
       'Benefits & Policies:  Does your organization have a written policy regarding the treatment of an employee who has violated the harassment or discrimination policy that includes escalation if unresolved over time ',
       'Benefits & Policies:  During the last 12 months, what % of employees who have harassed or discriminated against one or more colleagues experienced these repercussions?',
       'Benefits & Policies:  During the last two years, have all workers who have violated discrimination or harassment policies received additional bias and discrimination training?',
       'Benefits & Policies:  Have any workers who have a documented pattern of harassing or discriminating behavior been terminated? ',
       'Contraception coverage (male and female)', 'Abortion services',
       'Fertility treatment coverage (including IVF)', 'Egg freezing','Gender Reassignment surgery', 'Other.6', 'Employee only',
       'Married spouses (same and opposite sex)',
       'Domestic partners (same and opposite sex)', 'Other.7',
       'Benefits & Policies:  Do part-time employees who work 20+ hours qualify for health insurance?',
       'Benefits & Policies:  Is the ratio of monthly health insurance premium to lowest paid worker greater than 10%?',
       "Benefits & Policies:  Does your organization have written gender transition guidelines that documents policies or practices on issues pertinent to a workplace gender transition, including supportive guidance on restroom and facilities access, dress code and internal record-keeping that fully recognize an employee's full-time gender presentation and maximize privacy for the employee. ",
       'Benefits & Policies: Before we move to the next section, is there anything else you want to tell us about how your organization tracks, reports on or is working to improve its benefits and policies system?'})
df2.columns

Index(['Inclusion, Culture, Training & Community: How frequently does your workplace conduct culture survey/assessment that captures experiences impacted by gender identity?\n',
       'Inclusion, Culture, Training & Community: Can employees voluntarily disclose their gender identity and sexual orientation along with other demographic questions such as race, ethnicity, religion, parental status, disabilities in employee satisfaction surveys?',
       'Inclusion, Culture, Training & Community: What are the overall results of your most recent employee satisfaction survey?',
       'Inclusion, Culture, Training & Community: Are results disaggregated by gender, race and ethnicity, sexual orientation, religion or ability?',
       'Black or African American women', 'Latina women', 'Asian women',
       'Native American/Pacific Islander/Alaskan women', 'Mixed race women',
       'White/caucasian women', 'Working moms (of school-aged children)',
       'LGTBQ women', 'Gender non-binary person

In [6]:
#Other columns to drop (i.e. feedback)

In [7]:
#Renaming some of the columns based on Question # followed by _ then A/B/C/etc if a multiple selection
#Question # is from the survey at: https://gender-ideal.org/the-assessment
#Prepending the original column label with this
df2=df2.rename(columns={'Inclusion, Culture, Training & Community: How frequently does your workplace conduct culture'+
                      ' survey/assessment that captures experiences impacted by gender identity?\n': "Q32",
       'Inclusion, Culture, Training & Community: Can employees voluntarily disclose their gender identity and sexual'+
                      ' orientation along with other demographic questions such as race, ethnicity, religion, parental'+
                      ' status, disabilities in employee satisfaction surveys?': "Q33",
       'Inclusion, Culture, Training & Community: What are the overall results of your most recent employee '+
                      'satisfaction survey?': "Q34",
       'Inclusion, Culture, Training & Community: Are results disaggregated by gender, race and ethnicity, sexual '+
                      'orientation, religion or ability?' : "Q35",
       'Black or African American women' : "Q36_A", 'Latina women' : "Q36_B", 'Asian women' : "Q36_C",
       'Native American/Pacific Islander/Alaskan women' : "Q36_D", 'Mixed race women': "Q36_E",
       'White/caucasian women' : "Q36_F", 'Working moms (of school-aged children)' : "Q36_G",
       'LGTBQ women': "Q36_H", 'Gender non-binary persons': "Q36_I",
       'No disparities in satisfaction across different employee groups': "Q36_J",
       'N/A (no disparity)': "Q37_A", 'Shared the results internally with leadership': "Q37_B",
       'Shared the results internally with all employees': "Q37_C", 'None.5': "Q37_D", 'Other.8': "Q37_E",
       'Conducted focus groups with marginalized populations  to identify key issues and brainstorm potential '+
                      'solutions' : "Q38_A",
       'Retention of a third-party to support in the development of a remediation plan': "Q38_B",
       'Assurance of anonymity of individuals that are part of marginalized populations through the process': "Q38_C",
       'Obtained Board approval for remediation plans and funding to implement': "Q38_D",
       'Systems in place to track and report on progress toward meeting goals of remediation plans': "Q38_E",
       'No.1': "Q38_F", 'Other.9': "Q38_Other",
       'Inclusion, Culture, Training & Community: If a plan has been developed, tell us more about the activities '+
                      'and target metrics that will address satisfaction disparities:' : "Q39",
       'Inclusion, Culture, Training & Community: If there was a workplan in place to address different employee '+
                      'satisfaction experiences, have target goals been achieved?' : "Q40",
       'Inclusion, Culture, Training & Community: Does the organization have Employee Resource Groups (ERGs) that '+
                      'are:\n- focused on gender diverse populations\n- receive -C-suite or senior management leadership and \n- budget for programming?' : "Q41",
       'Inclusion, Culture, Training & Community: Do employee satisfaction surveys gather feedback on the value of '+
                      'ERGs?' : "Q42",
       'Inclusion, Culture, Training & Community: Does your workplace have a written policy that explicitly allows '+
                      'workers to wear hair, clothing and jewelry in a way that reflects their personal identity? ' : "Q43",
       "Inclusion, Culture, Training & Community: Does your organization have a written policy providing guidance "+
                      "for how employees share gender pronouns and setting expectations for how employees use and "+
                      "respect each others' pronouns?" : "Q44",
       'Inclusion, Culture, Training & Community: Are there clearly marked unisex restrooms in all offices/faciliti'+
                      'es that are easily accessible for all employees across the organization?' : "Q45",
       'A structured listening format to ensure that all voices are heard' : "Q46_A",
       'Norms to set expectations on conduct and participation and set the "tone" (ie - check-in questions to kick '+
                      'off a meeting)' : "Q46_B",
       'Training for all managers on best practices for meeting inclusiveness' : "Q46_C",
       'None.6': "Q46_D", 'Other.10': "Q46_Other",
       'Inclusion, Culture, Training & Community: Do employee satisfaction surveys gather feedback on effectiveness '+
                      'of inclusive meeting practices?' : "Q47",
       'Guidance on socializing (work dinners, after-hours gatherings) to ensure no inadvertent gender bias - ie. '+
                      '  participants must be included and sensitivity around gender, race/ethnicity, religion, '+
                      'socioeconomic status': "Q48_A",
       'Guidance on travel logistics (car, air, lodging) are applied consistently with all employees regardless of '+
                      'seniority or role' : "Q48_B",
       'Prohibited activities on work travel' : "Q48_C", 'None.7': "Q48_D", 'Other.11': "Q48_Other",
       'microaggressions' : "Q49_A", 'microinequities' : "Q49_B", 'microaffirmations' : "Q49_C", 'None.8' : "Q49_D",
       'Other.12' : "Q49_Other",
       'Conducting in-person trainings and discussions for the whole organization' : "Q50_A",
       'Conducting in-person trainings and discussion for board/advisory groups' : "Q50_B",
       'Engaging third-party organizations to structure and lead trainings' : "Q50_C",
       'Providing on-going opportunities for employees to continue learning and engaging on issues related to bias' 
                       : "Q50_D", 'No.2' :  "Q50_E", 'Other.13' : "Q50_Other",
       'Inclusion, Culture, Training & Community: Are managers trained to establish "working norms" with their team '+
                      'members, designed to create clarity on expected working hours and frameworks to establish '+
                      'expected productivity and efficiency, with the goal of establishing acceptable work-life '+
                      'balance goals for teams?  ' : "Q51",
       'Inclusion, Culture, Training & Community: Are managers and colleagues held accountable to engaging and '+
                      'adhering to the "working norms" standards that are created for their teams?' : "Q52",
       'Inclusion, Culture, Training & Community: Does your workplace have a formal mentoring program that engages '+
                      'all entry or mid-career level gender diverse employees?  Is guidance on the purpose and '+
                      'structure of the mentoring program provided to ensure value and benefit is derived to all '+
                      'participants of the program? ' : "Q53",
       'Inclusion, Culture, Training & Community: Does your workplace take steps to encourage allyship through '+
                      'allyship training, integration into job descriptions or compensation plans?' : "Q54",
       'Inclusion, Culture, Training & Community: Do employee satisfaction surveys ask about the impact of mentor'+
                      'ship programs on entry and mid-level staff? If so, what are the results?' : "Q55",
       'Black or African American women.1' : "Q56_A", 'Latina women.1' : "Q56_B", 'Asian women.1' : "Q56_C",
       'Native American/Pacific Islander/Alaskan women.1' : "Q56_D",
       'Mixed race women.1' : "Q56_E", 'White/caucasian women.1' : "Q56_F",
       'Working moms (of school-aged children).1' : "Q56_G", 'LGTBQ women.1' : "Q56_H",
       'Gender non-binary persons.1'  : "Q56_I", 'No disparities in retention' : "Q56_J",
       'Unknown/not sure'  : "Q56_K", 'N/A (No difference in turnover rates)' : "Q57_A",
       'Shared with C-suite and Board' : "Q57_B",
       'Resulted in key populations being engaged in focus groups to better identify the issues resulting in '+
                      'higher-than-average turnover' : "Q57_C",
       'Developed into an action plan' : "Q57_D",
       'Action plan approved by the board with necessary budget allocation to reduce turnover rates amongst these '+
                      'populations' : "Q57_E",
       'No action taken yet' : "Q57_F",
       'Inclusion, Culture, Training & Community: If a retention action plan has been in place for at least 12 months, '+
                      'have the retention goals been met?' : "Q58",
       'Inclusion, Culture, Training & Community: Is there a written policy that sets expectations for gender equity '+
                      'and diversity representation at all internal and external-facing organization events or events '+
                      'at which the organization presents?' : "Q59",
       'Inclusion, Culture, Training & Community: Is there a written policy about the expected  gender equity and '+
                      'diversity alignment  standards that all philanthropic, community or other partner organizations '+
                      'would be expected to meet?' : "Q60",
       'Inclusion, Culture, Training & Community: Does your organization evaluate and confirm alignment  on gender '+
                      'equity and diversity standards with potential recipients of philanthropy, community or business '+
                      'partners? ' : "Q61",
       'Inclusion, Culture, Training & Community: What % of annual corporate giving - covering both philanthropic '+
                      'or operating dollars (if relevant) - is targeted toward initiatives that have a gender '+
                      'equity and diversity focus? ' : "Q62",
       'Inclusion, Culture, Training & Community: Has your organization supported state, local, or federal-level '+
                      'gender equity initiatives? (ie - Business Coalition for Equality, Equal Rights Amendment, '+
                      'funding for reproductive health access).' : "Q63",
       'Inclusion, Culture, Training & Community: Has your organization partnered with other organizations in your '+
                      'industry to advocate for improved standards and practices related to gender equity?' : "Q64",
       'Inclusion, Culture, Training & Community: Before we move to the next section, is there anything else you '+
                      'want to tell us about how your organization approaches creating a gender diverse, inclusive '+
                      'culture and community?' : "Q65"
    
})
df2

Unnamed: 0,Q32,Q33,Q34,Q35,Q36_A,Q36_B,Q36_C,Q36_D,Q36_E,Q36_F,...,Q61,Q62,Q63,Q64,Q65,Organization,benchmark_group,number_of_employees,number_of_years,workforce
0,Annually,1,<85% employees are satisfied or highly satisfied,1.0,,,,,,,...,0,20.0,no,"Yes - Yes, we have ongoing or have in the past...",Did not want to share retention metrics.,Company 1X,Technology,"1000-4,999 Employees",5-14 years,At least 80% of employees are Salaried
1,,0,N/A (no satisfaction surveys conducted),,,,,,,,...,0,,no,no,,Company 1W,NonProfit,Fewer than 100 Employees,15 or more years,We have a mixed workforce of hourly and salari...
2,,0,N/A (no satisfaction surveys conducted),,,,,,,,...,0,,"Yes - we support advocacy for womens, LGBTQ+ a...","Yes - we support advocacy for womens, LGBTQ+ a...",,Company 1V,Finance,Fewer than 100 Employees,5-14 years,At least 80% of employees are Salaried
3,,1,85-95% employees are satisfied or highly satis...,0.0,,,,,,,...,0,,no,no,,Company 1U,Finance,Fewer than 100 Employees,15 or more years,At least 80% of employees are Salaried
4,Semi-annually,1,85-95% employees are satisfied or highly satis...,1.0,,,,,,,...,0,,no,no,,Company 1T,Technology,100-249 Employees,5-14 years,We have a mixed workforce of hourly and salari...
5,,0,<85% employees are satisfied or highly satisfied,1.0,,,,,,,...,0,0.0,no,no,,Company 1S,Technology,"1000-4,999 Employees",Fewer than 5 years,We have a mixed workforce of hourly and salari...
6,,0,N/A (no satisfaction surveys conducted),,,,,,,,...,0,0.0,no,no,,Company 1R,Technology,Fewer than 100 Employees,Fewer than 5 years,At least 80% of employees are Salaried
7,Semi-annually,1,<85% employees are satisfied or highly satisfied,1.0,,,,,,,...,0,,No,No,,Company 1Q,Finance,250-999 Employees,5-14 years,At least 80% of employees are Salaried
8,,0,85-95% employees are satisfied or highly satis...,0.0,,,,,,,...,0,,No,No,,Company 1P,,100-249 Employees,5-14 years,We have a mixed workforce of hourly and salari...
9,In process of implementing,1,N/A (no satisfaction surveys conducted),,,,,,,,...,0,,no,no,no,Company 1O,Technology,Fewer than 100 Employees,Fewer than 5 years,At least 80% of employees are Salaried


In [8]:
#Printing the renamed columns
df2.columns

Index(['Q32', 'Q33', 'Q34', 'Q35', 'Q36_A', 'Q36_B', 'Q36_C', 'Q36_D', 'Q36_E',
       'Q36_F', 'Q36_G', 'Q36_H', 'Q36_I', 'Q36_J', 'Q37_A', 'Q37_B', 'Q37_C',
       'Q37_D', 'Q37_E', 'Q38_A', 'Q38_B', 'Q38_C', 'Q38_D', 'Q38_E', 'Q38_F',
       'Q38_Other', 'Q39', 'Q40', 'Q41', 'Q42', 'Q43', 'Q44', 'Q45', 'Q46_A',
       'Q46_B', 'Q46_C', 'Q46_D', 'Q46_Other', 'Q47', 'Q48_A', 'Q48_B',
       'Q48_C', 'Q48_D', 'Q48_Other', 'Q49_A', 'Q49_B', 'Q49_C', 'Q49_D',
       'Q49_Other', 'Q50_A', 'Q50_B', 'Q50_C', 'Q50_D', 'Q50_E', 'Q50_Other',
       'Q51', 'Q52', 'Q53', 'Q54', 'Q55', 'Q56_A', 'Q56_B', 'Q56_C', 'Q56_D',
       'Q56_E', 'Q56_F', 'Q56_G', 'Q56_H', 'Q56_I', 'Q56_J', 'Q56_K', 'Q57_A',
       'Q57_B', 'Q57_C', 'Q57_D', 'Q57_E', 'Q57_F', 'Q58', 'Q59', 'Q60', 'Q61',
       'Q62', 'Q63', 'Q64', 'Q65', 'Organization', 'benchmark_group',
       'number_of_employees', 'number_of_years', 'workforce'],
      dtype='object')

In [9]:
#Rename
df = df2

### Examining Each Question
#### Question 32

In [10]:
#Selecting all of the Q32 columns from the dataframe
#This is a single choice question without an other
df[['Organization',"Q32"]]

Unnamed: 0,Organization,Q32
0,Company 1X,Annually
1,Company 1W,
2,Company 1V,
3,Company 1U,
4,Company 1T,Semi-annually
5,Company 1S,
6,Company 1R,
7,Company 1Q,Semi-annually
8,Company 1P,
9,Company 1O,In process of implementing


It looks like there's at least one response per row, so no missing data in this question.

### Examining Each Question
#### Question 33

In [11]:
#Selecting all of the Q columns from the dataframe
#This is a single choice question without an other
df[['Organization',"Q33"]]

Unnamed: 0,Organization,Q33
0,Company 1X,1
1,Company 1W,0
2,Company 1V,0
3,Company 1U,1
4,Company 1T,1
5,Company 1S,0
6,Company 1R,0
7,Company 1Q,1
8,Company 1P,0
9,Company 1O,1


It looks like there's at least one response per row, so no missing data in this question.

### Examining Each Question
#### Question 34

In [12]:
#Selecting all of the Q columns from the dataframe
#This is a single choice question without an other
df[['Organization',"Q34"]]

Unnamed: 0,Organization,Q34
0,Company 1X,<85% employees are satisfied or highly satisfied
1,Company 1W,N/A (no satisfaction surveys conducted)
2,Company 1V,N/A (no satisfaction surveys conducted)
3,Company 1U,85-95% employees are satisfied or highly satis...
4,Company 1T,85-95% employees are satisfied or highly satis...
5,Company 1S,<85% employees are satisfied or highly satisfied
6,Company 1R,N/A (no satisfaction surveys conducted)
7,Company 1Q,<85% employees are satisfied or highly satisfied
8,Company 1P,85-95% employees are satisfied or highly satis...
9,Company 1O,N/A (no satisfaction surveys conducted)


In [13]:
#Some companies did not respond so I will fill in with No Response
df.loc[pd.isna(df["Q34"]),"Q34"]="No Response"
df[['Organization',"Q34"]]

Unnamed: 0,Organization,Q34
0,Company 1X,<85% employees are satisfied or highly satisfied
1,Company 1W,N/A (no satisfaction surveys conducted)
2,Company 1V,N/A (no satisfaction surveys conducted)
3,Company 1U,85-95% employees are satisfied or highly satis...
4,Company 1T,85-95% employees are satisfied or highly satis...
5,Company 1S,<85% employees are satisfied or highly satisfied
6,Company 1R,N/A (no satisfaction surveys conducted)
7,Company 1Q,<85% employees are satisfied or highly satisfied
8,Company 1P,85-95% employees are satisfied or highly satis...
9,Company 1O,N/A (no satisfaction surveys conducted)


### Examining Each Question
#### Question 35

In [14]:
#Selecting all of the Q columns from the dataframe
#This is a single choice question without an other

df[['Organization',"Q35"]]

Unnamed: 0,Organization,Q35
0,Company 1X,1.0
1,Company 1W,
2,Company 1V,
3,Company 1U,0.0
4,Company 1T,1.0
5,Company 1S,1.0
6,Company 1R,
7,Company 1Q,1.0
8,Company 1P,0.0
9,Company 1O,


In [15]:
#Flory's recommendations
# If blank and prior response was "N/A"
#then fill in with "N/A - No satisfaction survey conducted"
df[['Organization',"Q34","Q35"]]

Unnamed: 0,Organization,Q34,Q35
0,Company 1X,<85% employees are satisfied or highly satisfied,1.0
1,Company 1W,N/A (no satisfaction surveys conducted),
2,Company 1V,N/A (no satisfaction surveys conducted),
3,Company 1U,85-95% employees are satisfied or highly satis...,0.0
4,Company 1T,85-95% employees are satisfied or highly satis...,1.0
5,Company 1S,<85% employees are satisfied or highly satisfied,1.0
6,Company 1R,N/A (no satisfaction surveys conducted),
7,Company 1Q,<85% employees are satisfied or highly satisfied,1.0
8,Company 1P,85-95% employees are satisfied or highly satis...,0.0
9,Company 1O,N/A (no satisfaction surveys conducted),


In [16]:
#Plan to fill in values from Q34 into Q35 if Q35 is NAN which will be no response and n/a
df.loc[pd.isna(df["Q35"]),"Q35"]=df.loc[pd.isna(df["Q35"]),"Q34"]
df[['Organization',"Q34","Q35"]]

Unnamed: 0,Organization,Q34,Q35
0,Company 1X,<85% employees are satisfied or highly satisfied,1.0
1,Company 1W,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted)
2,Company 1V,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted)
3,Company 1U,85-95% employees are satisfied or highly satis...,0.0
4,Company 1T,85-95% employees are satisfied or highly satis...,1.0
5,Company 1S,<85% employees are satisfied or highly satisfied,1.0
6,Company 1R,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted)
7,Company 1Q,<85% employees are satisfied or highly satisfied,1.0
8,Company 1P,85-95% employees are satisfied or highly satis...,0.0
9,Company 1O,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted)


#### Question 36

In [17]:
#Selecting all of the Q columns from the dataframe
#Choose all that apply

df[['Organization','Q36_A', 'Q36_B', 'Q36_C', 'Q36_D', 'Q36_E',
       'Q36_F', 'Q36_G', 'Q36_H', 'Q36_I', 'Q36_J']]

Unnamed: 0,Organization,Q36_A,Q36_B,Q36_C,Q36_D,Q36_E,Q36_F,Q36_G,Q36_H,Q36_I,Q36_J
0,Company 1X,,,,,,,,,,
1,Company 1W,,,,,,,,,,
2,Company 1V,,,,,,,,,,
3,Company 1U,,,,,,,,,,
4,Company 1T,,,,,,,Working moms (of school-aged children),,,
5,Company 1S,,,,,,,,,,No disparities in satisfaction across differen...
6,Company 1R,,,,,,,,,,
7,Company 1Q,,,,,,,,,,
8,Company 1P,,,,,,,,,,
9,Company 1O,,,,,,,,,,


In [18]:
#Flory advises If blank and #35 was "No" or  #34 "N/A", then fill in with "N/A - No satisfaction survey conducted"
#column K to contain N/A no survey
df["Q36_K"] = np.nan
#Column L to contain no response
df["Q36_L"] = np.nan
#Column M to contain N/A no disaggregation
#df["Q36_M"] = np.nan

#Following above logic
df.loc[(df["Q35"]=="N/A (no satisfaction surveys conducted)"),"Q36_K"]="N/A (no satisfaction surveys conducted) and/or results not disaggregated  by gender, race and ethnicity, sexual orientation, religion or ability"
df.loc[(df["Q35"]==0.0),"Q36_K"]="N/A (no satisfaction surveys conducted) and/or results not disaggregated  by gender, race and ethnicity, sexual orientation, religion or ability"
df.loc[(df["Q35"]=="No Response" ),"Q36_L"]="No Response"

#No response if added yes but no response which was company X & Q
df.loc[df["Organization"].isin(["Company 1X","Company 1Q"]),"Q36_L"] = "No Response"

df[['Organization','Q36_A', 'Q36_B', 'Q36_C', 'Q36_D', 'Q36_E',
       'Q36_F', 'Q36_G', 'Q36_H', 'Q36_I', 'Q36_J','Q36_K',"Q36_L","Q35"]]

Unnamed: 0,Organization,Q36_A,Q36_B,Q36_C,Q36_D,Q36_E,Q36_F,Q36_G,Q36_H,Q36_I,Q36_J,Q36_K,Q36_L,Q35
0,Company 1X,,,,,,,,,,,,No Response,1.0
1,Company 1W,,,,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
2,Company 1V,,,,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
3,Company 1U,,,,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,0.0
4,Company 1T,,,,,,,Working moms (of school-aged children),,,,,,1.0
5,Company 1S,,,,,,,,,,No disparities in satisfaction across differen...,,,1.0
6,Company 1R,,,,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
7,Company 1Q,,,,,,,,,,,,No Response,1.0
8,Company 1P,,,,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,0.0
9,Company 1O,,,,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)


#### Question 37

In [19]:
df[['Organization','Q37_A', 'Q37_B', 'Q37_C',
       'Q37_D', 'Q37_E']]

Unnamed: 0,Organization,Q37_A,Q37_B,Q37_C,Q37_D,Q37_E
0,Company 1X,,Shared the results internally with leadership,,,
1,Company 1W,,,,,
2,Company 1V,,,,,
3,Company 1U,,,,,
4,Company 1T,,,,,In progress
5,Company 1S,N/A (no disparity),,,,
6,Company 1R,,,,,
7,Company 1Q,,,,,
8,Company 1P,,,,,
9,Company 1O,,,,,


In [20]:
#Flory advises If blank and #35 was "No" or  #34 "N/A", then fill in with "N/A - No satisfaction survey conducted"
#column K to contain N/A
df["Q37_F"] = np.nan
#Column L to contain no response
df["Q37_G"] = np.nan

#Following above logic
df.loc[(df["Q35"]=="N/A (no satisfaction surveys conducted)"),"Q37_F"]="N/A (no satisfaction surveys conducted) and/or results not disaggregated  by gender, race and ethnicity, sexual orientation, religion or ability"
df.loc[(df["Q35"]==0.0),"Q37_F"]="N/A (no satisfaction surveys conducted) and/or results not disaggregated  by gender, race and ethnicity, sexual orientation, religion or ability"
df.loc[(df["Q35"]=="No Response" ),"Q37_G"]="No Response"

#No response if added yes but no response which was company Q
df.loc[df["Organization"].isin(["Company 1Q"]),"Q37_G"] = "No Response"

df[['Organization','Q37_A', 'Q37_B', 'Q37_C',
       'Q37_D', 'Q37_E', 'Q37_F', "Q37_G","Q35"]]

Unnamed: 0,Organization,Q37_A,Q37_B,Q37_C,Q37_D,Q37_E,Q37_F,Q37_G,Q35
0,Company 1X,,Shared the results internally with leadership,,,,,,1.0
1,Company 1W,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
2,Company 1V,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
3,Company 1U,,,,,,N/A (no satisfaction surveys conducted) and/or...,,0.0
4,Company 1T,,,,,In progress,,,1.0
5,Company 1S,N/A (no disparity),,,,,,,1.0
6,Company 1R,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
7,Company 1Q,,,,,,,No Response,1.0
8,Company 1P,,,,,,N/A (no satisfaction surveys conducted) and/or...,,0.0
9,Company 1O,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)


#### Question 38

In [21]:
df[['Organization','Q38_A', 'Q38_B', 'Q38_C', 'Q38_D', 'Q38_E', 'Q38_F',
       'Q38_Other']]

Unnamed: 0,Organization,Q38_A,Q38_B,Q38_C,Q38_D,Q38_E,Q38_F,Q38_Other
0,Company 1X,,,,,,,Plan is in the process of being developed
1,Company 1W,,,,,,,
2,Company 1V,,,,,,,
3,Company 1U,,,,,,,
4,Company 1T,Conducted focus groups with marginalized popul...,,,,,,
5,Company 1S,,,,,,,N/A - No disparity
6,Company 1R,,,,,,,
7,Company 1Q,,,,,,,
8,Company 1P,,,,,,,
9,Company 1O,,,,,,,


In [22]:
#Flory advises If blank and #35 was "No" or  #34 "N/A", then fill in with "N/A - No satisfaction survey conducted"
#column K to contain N/A
df["Q38_H"] = np.nan
#Column L to contain no response
df["Q38_I"] = np.nan

#Following above logic
df.loc[(df["Q35"]=="N/A (no satisfaction surveys conducted)"),"Q38_H"]="N/A (no satisfaction surveys conducted) and/or results not disaggregated  by gender, race and ethnicity, sexual orientation, religion or ability"
df.loc[(df["Q35"]==0.0),"Q38_H"]="N/A (no satisfaction surveys conducted) and/or results not disaggregated  by gender, race and ethnicity, sexual orientation, religion or ability"
df.loc[(df["Q35"]=="No Response" ),"Q38_I"]="No Response"

#No response if added yes but no response which was company Q
df.loc[df["Organization"].isin(["Company 1Q"]),"Q38_I"] = "No Response"

df[['Organization','Q38_A', 'Q38_B', 'Q38_C', 'Q38_D', 'Q38_E', 'Q38_F',
       'Q38_Other',"Q38_H","Q38_I","Q35"]]

Unnamed: 0,Organization,Q38_A,Q38_B,Q38_C,Q38_D,Q38_E,Q38_F,Q38_Other,Q38_H,Q38_I,Q35
0,Company 1X,,,,,,,Plan is in the process of being developed,,,1.0
1,Company 1W,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
2,Company 1V,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
3,Company 1U,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,0.0
4,Company 1T,Conducted focus groups with marginalized popul...,,,,,,,,,1.0
5,Company 1S,,,,,,,N/A - No disparity,,,1.0
6,Company 1R,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)
7,Company 1Q,,,,,,,,,No Response,1.0
8,Company 1P,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,0.0
9,Company 1O,,,,,,,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted)


#### Question 39

In [23]:
#This was a text column I don't think we need to visualize... Only 3 responded
df["Q39"]

0                                                   NaN
1                                                   NaN
2                                                   NaN
3                                                   NaN
4     We are in the process of designing the focus g...
5                                                   NaN
6                                                   NaN
7                                                   NaN
8                                                   NaN
9                                                   NaN
10                                                  NaN
11                                                  NaN
12                                                  NaN
13                                                  NaN
14                                                  NaN
15    We have an active task force for Diversity, Eq...
16    Plans to better structure performance review a...
17                                              

#### Question 40

In [24]:
#This is a single choice but Flory advised blanks should be filled in differently depending on earlier values
#If blank and #35 was "No" or  #34 "N/A", then fill in with "N/A - No targets were established"
df[["Organization","Q35","Q34","Q40"]]

Unnamed: 0,Organization,Q35,Q34,Q40
0,Company 1X,1.0,<85% employees are satisfied or highly satisfied,N/A - No targets were established
1,Company 1W,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established
2,Company 1V,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established
3,Company 1U,0.0,85-95% employees are satisfied or highly satis...,
4,Company 1T,1.0,85-95% employees are satisfied or highly satis...,N/A - No targets were established
5,Company 1S,1.0,<85% employees are satisfied or highly satisfied,N/A - No targets were established
6,Company 1R,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established
7,Company 1Q,1.0,<85% employees are satisfied or highly satisfied,N/A - No targets were established
8,Company 1P,0.0,85-95% employees are satisfied or highly satis...,
9,Company 1O,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established


In [25]:
#This is a single choice but Flory advised blanks should be filled in differently depending on earlier values
#If blank and #35 was "No" or  #34 "N/A", then fill in with "N/A - No targets were established"

#Takes #35 value
df.loc[pd.isna(df["Q40"]),"Q40"]=df.loc[pd.isna(df["Q40"]),"Q35"]

#Now if 35 value was 0.0 set to N/A - No targets were established
df.loc[df["Q40"]==0.0,"Q40"] = "N/A - No targets were established"

df[["Organization","Q35","Q34","Q40"]]

Unnamed: 0,Organization,Q35,Q34,Q40
0,Company 1X,1.0,<85% employees are satisfied or highly satisfied,N/A - No targets were established
1,Company 1W,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established
2,Company 1V,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established
3,Company 1U,0.0,85-95% employees are satisfied or highly satis...,N/A - No targets were established
4,Company 1T,1.0,85-95% employees are satisfied or highly satis...,N/A - No targets were established
5,Company 1S,1.0,<85% employees are satisfied or highly satisfied,N/A - No targets were established
6,Company 1R,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established
7,Company 1Q,1.0,<85% employees are satisfied or highly satisfied,N/A - No targets were established
8,Company 1P,0.0,85-95% employees are satisfied or highly satis...,N/A - No targets were established
9,Company 1O,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),N/A - No targets were established


#### Question 41

In [26]:
#Don't see any missing values
df[["Organization","Q41"]]

Unnamed: 0,Organization,Q41
0,Company 1X,1
1,Company 1W,1
2,Company 1V,0
3,Company 1U,0
4,Company 1T,0
5,Company 1S,0
6,Company 1R,0
7,Company 1Q,1
8,Company 1P,0
9,Company 1O,0


#### Question 42

In [27]:
#Some companies had missing values
df.loc[pd.isna(df["Q42"]),"Q42"]="No Response"
df[['Organization',"Q42"]]

Unnamed: 0,Organization,Q42
0,Company 1X,No
1,Company 1W,N/A (no satisfaction surveys conducted)
2,Company 1V,N/A (no satisfaction surveys conducted)
3,Company 1U,No
4,Company 1T,No
5,Company 1S,No
6,Company 1R,N/A (no satisfaction surveys conducted)
7,Company 1Q,No
8,Company 1P,N/A (no satisfaction surveys conducted)
9,Company 1O,N/A (no satisfaction surveys conducted)


#### Question 43

In [28]:
#No missing values, 
df[['Organization',"Q43"]]

Unnamed: 0,Organization,Q43
0,Company 1X,1
1,Company 1W,0
2,Company 1V,0
3,Company 1U,0
4,Company 1T,0
5,Company 1S,0
6,Company 1R,0
7,Company 1Q,0
8,Company 1P,0
9,Company 1O,0


#### Question 44

In [29]:
#No missing values, 
df[['Organization',"Q44"]]

Unnamed: 0,Organization,Q44
0,Company 1X,0
1,Company 1W,0
2,Company 1V,0
3,Company 1U,0
4,Company 1T,0
5,Company 1S,0
6,Company 1R,0
7,Company 1Q,0
8,Company 1P,0
9,Company 1O,0


#### Question 45

In [30]:
#No missing values, 
df[['Organization',"Q45"]]

Unnamed: 0,Organization,Q45
0,Company 1X,1
1,Company 1W,0
2,Company 1V,1
3,Company 1U,1
4,Company 1T,0
5,Company 1S,0
6,Company 1R,1
7,Company 1Q,1
8,Company 1P,1
9,Company 1O,0


#### Question 46

In [31]:
#Choose all that apply
#Company F,E,B,C no responses
df[['Organization','Q46_A',
       'Q46_B', 'Q46_C', 'Q46_D', 'Q46_Other']]

Unnamed: 0,Organization,Q46_A,Q46_B,Q46_C,Q46_D,Q46_Other
0,Company 1X,,,,,
1,Company 1W,,,,,
2,Company 1V,A structured listening format to ensure that a...,Norms to set expectations on conduct and parti...,Training for all managers on best practices fo...,,
3,Company 1U,,,,,
4,Company 1T,,,Training for all managers on best practices fo...,,
5,Company 1S,,,Training for all managers on best practices fo...,,
6,Company 1R,A structured listening format to ensure that a...,Norms to set expectations on conduct and parti...,Training for all managers on best practices fo...,,
7,Company 1Q,A structured listening format to ensure that a...,Norms to set expectations on conduct and parti...,Training for all managers on best practices fo...,,
8,Company 1P,,,,,
9,Company 1O,,,,,


In [32]:
# If I recode to "No Response", I'll add a new column Q11_G that contains the value "No Response"
df["Q46_F"] = np.nan
df.loc[df["Organization"].isin(["Company 1B","Company 1C","Company 1E","Company 1F"]),"Q46_F"] = "No Response"

df[['Organization','Q46_A',
       'Q46_B', 'Q46_C', 'Q46_D', 'Q46_Other',"Q46_F"]]

Unnamed: 0,Organization,Q46_A,Q46_B,Q46_C,Q46_D,Q46_Other,Q46_F
0,Company 1X,,,,,,
1,Company 1W,,,,,,
2,Company 1V,A structured listening format to ensure that a...,Norms to set expectations on conduct and parti...,Training for all managers on best practices fo...,,,
3,Company 1U,,,,,,
4,Company 1T,,,Training for all managers on best practices fo...,,,
5,Company 1S,,,Training for all managers on best practices fo...,,,
6,Company 1R,A structured listening format to ensure that a...,Norms to set expectations on conduct and parti...,Training for all managers on best practices fo...,,,
7,Company 1Q,A structured listening format to ensure that a...,Norms to set expectations on conduct and parti...,Training for all managers on best practices fo...,,,
8,Company 1P,,,,,,
9,Company 1O,,,,,,


#### Question 47

In [33]:
#One company had no response
df.loc[pd.isna(df["Q47"]),"Q47"]="No Response"
df[['Organization','Q47']]

Unnamed: 0,Organization,Q47
0,Company 1X,No Response
1,Company 1W,0.0
2,Company 1V,0.0
3,Company 1U,0.0
4,Company 1T,0.0
5,Company 1S,0.0
6,Company 1R,0.0
7,Company 1Q,1.0
8,Company 1P,0.0
9,Company 1O,0.0


#### Question 48

In [34]:
df[["Organization",'Q48_A', 'Q48_B',
       'Q48_C', 'Q48_D', 'Q48_Other']]

Unnamed: 0,Organization,Q48_A,Q48_B,Q48_C,Q48_D,Q48_Other
0,Company 1X,,"Guidance on travel logistics (car, air, lodgin...",Prohibited activities on work travel,,
1,Company 1W,,,,,
2,Company 1V,,,,,
3,Company 1U,,,,,
4,Company 1T,,,,,
5,Company 1S,"Guidance on socializing (work dinners, after-h...","Guidance on travel logistics (car, air, lodgin...",Prohibited activities on work travel,,
6,Company 1R,"Guidance on socializing (work dinners, after-h...","Guidance on travel logistics (car, air, lodgin...",Prohibited activities on work travel,,
7,Company 1Q,,,,,
8,Company 1P,,,,,
9,Company 1O,,,,,


In [35]:
#Company B, C, D, F did not respond
# If I recode to "No Response", I'll add a new column Q48_F that contains the value "No Response"
df["Q48_F"] = np.nan
df.loc[df["Organization"].isin(["Company 1B","Company 1C","Company 1D","Company 1F"]),"Q48_F"] = "No Response"
df[['Organization','Q48_A', 'Q48_B',
       'Q48_C', 'Q48_D', 'Q48_Other',"Q48_F"]]

Unnamed: 0,Organization,Q48_A,Q48_B,Q48_C,Q48_D,Q48_Other,Q48_F
0,Company 1X,,"Guidance on travel logistics (car, air, lodgin...",Prohibited activities on work travel,,,
1,Company 1W,,,,,,
2,Company 1V,,,,,,
3,Company 1U,,,,,,
4,Company 1T,,,,,,
5,Company 1S,"Guidance on socializing (work dinners, after-h...","Guidance on travel logistics (car, air, lodgin...",Prohibited activities on work travel,,,
6,Company 1R,"Guidance on socializing (work dinners, after-h...","Guidance on travel logistics (car, air, lodgin...",Prohibited activities on work travel,,,
7,Company 1Q,,,,,,
8,Company 1P,,,,,,
9,Company 1O,,,,,,


#### Question 49

In [36]:
#All companies provided a response!
df[["Organization",'Q49_A', 'Q49_B', 'Q49_C', 'Q49_D',
       'Q49_Other']]

Unnamed: 0,Organization,Q49_A,Q49_B,Q49_C,Q49_D,Q49_Other
0,Company 1X,microaggressions,,,,
1,Company 1W,,,,,
2,Company 1V,microaggressions,microinequities,microaffirmations,,
3,Company 1U,,,,,
4,Company 1T,,,,,In development now
5,Company 1S,,,,,
6,Company 1R,,,,,
7,Company 1Q,microaggressions,,,,
8,Company 1P,,,,,
9,Company 1O,,,,,


#### Question 50

In [37]:
#All companies provided a response!
df[["Organization",'Q50_A', 'Q50_B', 'Q50_C', 'Q50_D', 'Q50_E', 'Q50_Other']]

Unnamed: 0,Organization,Q50_A,Q50_B,Q50_C,Q50_D,Q50_E,Q50_Other
0,Company 1X,,,Engaging third-party organizations to structur...,Providing on-going opportunities for employees...,,
1,Company 1W,,,,,No,
2,Company 1V,Conducting in-person trainings and discussions...,Conducting in-person trainings and discussion ...,Engaging third-party organizations to structur...,Providing on-going opportunities for employees...,,
3,Company 1U,,,,,No,
4,Company 1T,,,Engaging third-party organizations to structur...,,,
5,Company 1S,Conducting in-person trainings and discussions...,Conducting in-person trainings and discussion ...,,Providing on-going opportunities for employees...,,
6,Company 1R,,,,Providing on-going opportunities for employees...,,
7,Company 1Q,Conducting in-person trainings and discussions...,,Engaging third-party organizations to structur...,Providing on-going opportunities for employees...,,
8,Company 1P,,,,,No,
9,Company 1O,,,,,No,


#### Question 51

In [38]:
#All companies provided a response!
df[["Organization","Q51"]]

Unnamed: 0,Organization,Q51
0,Company 1X,No
1,Company 1W,No
2,Company 1V,In progress
3,Company 1U,No
4,Company 1T,In progress
5,Company 1S,In progress
6,Company 1R,Yes
7,Company 1Q,No
8,Company 1P,Yes
9,Company 1O,Yes


#### Question 52

In [39]:
#All companies provided a response!
df[["Organization","Q52"]]

Unnamed: 0,Organization,Q52
0,Company 1X,No
1,Company 1W,No
2,Company 1V,In progress
3,Company 1U,No
4,Company 1T,No
5,Company 1S,In progress
6,Company 1R,Yes
7,Company 1Q,No
8,Company 1P,Yes
9,Company 1O,In progress


#### Question 53

In [40]:
#Company X did not respond
df.loc[pd.isna(df["Q53"]),"Q53"]="No Response"
df[["Organization","Q53"]]

Unnamed: 0,Organization,Q53
0,Company 1X,No Response
1,Company 1W,No
2,Company 1V,In progress
3,Company 1U,No
4,Company 1T,No
5,Company 1S,In progress
6,Company 1R,In progress
7,Company 1Q,No
8,Company 1P,No
9,Company 1O,In progress


#### Question 54

In [41]:
#Some companies missing a response
df.loc[pd.isna(df["Q54"]),"Q54"]="No Response"

df[["Organization","Q54"]]

Unnamed: 0,Organization,Q54
0,Company 1X,1.0
1,Company 1W,0.0
2,Company 1V,0.0
3,Company 1U,0.0
4,Company 1T,0.0
5,Company 1S,0.0
6,Company 1R,0.0
7,Company 1Q,1.0
8,Company 1P,0.0
9,Company 1O,0.0


#### Question 55

In [42]:
#All companies provided a response
df[["Organization","Q55"]]

Unnamed: 0,Organization,Q55
0,Company 1X,No
1,Company 1W,N/A (no satisfaction surveys conducted)
2,Company 1V,N/A (no satisfaction surveys conducted)
3,Company 1U,No
4,Company 1T,No
5,Company 1S,No
6,Company 1R,N/A (no satisfaction surveys conducted)
7,Company 1Q,No
8,Company 1P,N/A (no satisfaction surveys conducted)
9,Company 1O,No


#### Question 56

In [43]:
#Company X did not respond

df[["Organization", 'Q56_A', 'Q56_B', 'Q56_C', 'Q56_D',
       'Q56_E', 'Q56_F', 'Q56_G', 'Q56_H', 'Q56_I', 'Q56_J', 'Q56_K']]

Unnamed: 0,Organization,Q56_A,Q56_B,Q56_C,Q56_D,Q56_E,Q56_F,Q56_G,Q56_H,Q56_I,Q56_J,Q56_K
0,Company 1X,,,,,,,,,,,
1,Company 1W,,,,,,,,,,,Unknown/not sure
2,Company 1V,,,,,,,,,,No disparities in retention,
3,Company 1U,,,,,,,,,,No disparities in retention,
4,Company 1T,,,,,,,,,,,Unknown/not sure
5,Company 1S,,,,,,,,,,No disparities in retention,
6,Company 1R,,,,,,,,,,,Unknown/not sure
7,Company 1Q,,,,,,,,,,,Unknown/not sure
8,Company 1P,,,,,,,,,,,Unknown/not sure
9,Company 1O,,,,,,,,,,No disparities in retention,


In [44]:
df["Q56_L"] = np.nan
df.loc[df["Organization"].isin(["Company 1X"]),"Q56_L"] = "No Response"
df[["Organization", 'Q56_A', 'Q56_B', 'Q56_C', 'Q56_D',
       'Q56_E', 'Q56_F', 'Q56_G', 'Q56_H', 'Q56_I', 'Q56_J', 'Q56_K',"Q56_L"]]

Unnamed: 0,Organization,Q56_A,Q56_B,Q56_C,Q56_D,Q56_E,Q56_F,Q56_G,Q56_H,Q56_I,Q56_J,Q56_K,Q56_L
0,Company 1X,,,,,,,,,,,,No Response
1,Company 1W,,,,,,,,,,,Unknown/not sure,
2,Company 1V,,,,,,,,,,No disparities in retention,,
3,Company 1U,,,,,,,,,,No disparities in retention,,
4,Company 1T,,,,,,,,,,,Unknown/not sure,
5,Company 1S,,,,,,,,,,No disparities in retention,,
6,Company 1R,,,,,,,,,,,Unknown/not sure,
7,Company 1Q,,,,,,,,,,,Unknown/not sure,
8,Company 1P,,,,,,,,,,,Unknown/not sure,
9,Company 1O,,,,,,,,,,No disparities in retention,,


#### Question 57

In [45]:
#Company X did not respond
df[["Organization",'Q57_A',
       'Q57_B', 'Q57_C', 'Q57_D', 'Q57_E', 'Q57_F']]

Unnamed: 0,Organization,Q57_A,Q57_B,Q57_C,Q57_D,Q57_E,Q57_F
0,Company 1X,,,,,,
1,Company 1W,,,,,,No action taken yet
2,Company 1V,N/A (No difference in turnover rates),,,,,
3,Company 1U,N/A (No difference in turnover rates),,,,,
4,Company 1T,N/A (No difference in turnover rates),,,,,
5,Company 1S,N/A (No difference in turnover rates),,,,,
6,Company 1R,N/A (No difference in turnover rates),,,,,
7,Company 1Q,,,,,,No action taken yet
8,Company 1P,,,,,,No action taken yet
9,Company 1O,N/A (No difference in turnover rates),,,,,


In [46]:
df["Q57_G"] = np.nan
df.loc[df["Organization"].isin(["Company 1X"]),"Q57_G"] = "No Response"
df[["Organization",'Q57_A',
       'Q57_B', 'Q57_C', 'Q57_D', 'Q57_E', 'Q57_F',"Q57_G"]]

Unnamed: 0,Organization,Q57_A,Q57_B,Q57_C,Q57_D,Q57_E,Q57_F,Q57_G
0,Company 1X,,,,,,,No Response
1,Company 1W,,,,,,No action taken yet,
2,Company 1V,N/A (No difference in turnover rates),,,,,,
3,Company 1U,N/A (No difference in turnover rates),,,,,,
4,Company 1T,N/A (No difference in turnover rates),,,,,,
5,Company 1S,N/A (No difference in turnover rates),,,,,,
6,Company 1R,N/A (No difference in turnover rates),,,,,,
7,Company 1Q,,,,,,No action taken yet,
8,Company 1P,,,,,,No action taken yet,
9,Company 1O,N/A (No difference in turnover rates),,,,,,


#### Question 58

In [47]:
df[["Organization","Q58"]]

Unnamed: 0,Organization,Q58
0,Company 1X,
1,Company 1W,
2,Company 1V,N/A (no action plan in place)
3,Company 1U,N/A (no action plan in place)
4,Company 1T,N/A (no action plan in place)
5,Company 1S,N/A (no action plan in place)
6,Company 1R,N/A (no action plan in place)
7,Company 1Q,
8,Company 1P,
9,Company 1O,N/A (no action plan in place)


In [48]:
#From Flory skipping notes
#If blank and prior question is No action taken yet, then fill with "N/A - No action taken yet"
#Plan to fill in values from Q34 into Q35 if Q35 is NAN which will be no response and n/a
df.loc[pd.notna(df["Q57_F"]),"Q58"]="N/A - No action taken yet"
df.loc[pd.isna(df["Q58"]),"Q58"]="No Response"
df[['Organization',"Q58"]]

Unnamed: 0,Organization,Q58
0,Company 1X,No Response
1,Company 1W,N/A - No action taken yet
2,Company 1V,N/A (no action plan in place)
3,Company 1U,N/A (no action plan in place)
4,Company 1T,N/A (no action plan in place)
5,Company 1S,N/A (no action plan in place)
6,Company 1R,N/A (no action plan in place)
7,Company 1Q,N/A - No action taken yet
8,Company 1P,N/A - No action taken yet
9,Company 1O,N/A (no action plan in place)


#### Question 59

In [49]:
df.loc[pd.isna(df["Q59"]),"Q59"]="No Response"
df[['Organization',"Q59"]]

Unnamed: 0,Organization,Q59
0,Company 1X,0.0
1,Company 1W,0.0
2,Company 1V,0.0
3,Company 1U,0.0
4,Company 1T,0.0
5,Company 1S,0.0
6,Company 1R,0.0
7,Company 1Q,0.0
8,Company 1P,0.0
9,Company 1O,0.0


#### Question 60

In [50]:
df.loc[pd.isna(df["Q60"]),"Q60"]="No Response"
df[["Organization","Q60"]]

Unnamed: 0,Organization,Q60
0,Company 1X,0.0
1,Company 1W,0.0
2,Company 1V,0.0
3,Company 1U,0.0
4,Company 1T,0.0
5,Company 1S,0.0
6,Company 1R,0.0
7,Company 1Q,0.0
8,Company 1P,0.0
9,Company 1O,0.0


#### Question 61

In [51]:
df[["Organization","Q61"]]

Unnamed: 0,Organization,Q61
0,Company 1X,0
1,Company 1W,0
2,Company 1V,0
3,Company 1U,0
4,Company 1T,0
5,Company 1S,0
6,Company 1R,0
7,Company 1Q,0
8,Company 1P,0
9,Company 1O,0


#### Question 62

In [52]:
df.loc[pd.isna(df["Q62"]),"Q62"]="No Response"
df[["Organization","Q62"]]

Unnamed: 0,Organization,Q62
0,Company 1X,20.0
1,Company 1W,No Response
2,Company 1V,No Response
3,Company 1U,No Response
4,Company 1T,No Response
5,Company 1S,0.0
6,Company 1R,0.0
7,Company 1Q,No Response
8,Company 1P,No Response
9,Company 1O,No Response


## Write to new csv

In [53]:
#Print df one more time to double check
df

Unnamed: 0,Q32,Q33,Q34,Q35,Q36_A,Q36_B,Q36_C,Q36_D,Q36_E,Q36_F,...,Q36_K,Q36_L,Q37_F,Q37_G,Q38_H,Q38_I,Q46_F,Q48_F,Q56_L,Q57_G
0,Annually,1,<85% employees are satisfied or highly satisfied,1.0,,,,,,,...,,No Response,,,,,,,No Response,No Response
1,,0,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),,,,,,,...,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,,,,
2,,0,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),,,,,,,...,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,,,,
3,,1,85-95% employees are satisfied or highly satis...,0.0,,,,,,,...,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,,,,
4,Semi-annually,1,85-95% employees are satisfied or highly satis...,1.0,,,,,,,...,,,,,,,,,,
5,,0,<85% employees are satisfied or highly satisfied,1.0,,,,,,,...,,,,,,,,,,
6,,0,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),,,,,,,...,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,,,,
7,Semi-annually,1,<85% employees are satisfied or highly satisfied,1.0,,,,,,,...,,No Response,,No Response,,No Response,,,,
8,,0,85-95% employees are satisfied or highly satis...,0.0,,,,,,,...,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,,,,
9,In process of implementing,1,N/A (no satisfaction surveys conducted),N/A (no satisfaction surveys conducted),,,,,,,...,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,N/A (no satisfaction surveys conducted) and/or...,,,,,


In [54]:
df.columns

Index(['Q32', 'Q33', 'Q34', 'Q35', 'Q36_A', 'Q36_B', 'Q36_C', 'Q36_D', 'Q36_E',
       'Q36_F', 'Q36_G', 'Q36_H', 'Q36_I', 'Q36_J', 'Q37_A', 'Q37_B', 'Q37_C',
       'Q37_D', 'Q37_E', 'Q38_A', 'Q38_B', 'Q38_C', 'Q38_D', 'Q38_E', 'Q38_F',
       'Q38_Other', 'Q39', 'Q40', 'Q41', 'Q42', 'Q43', 'Q44', 'Q45', 'Q46_A',
       'Q46_B', 'Q46_C', 'Q46_D', 'Q46_Other', 'Q47', 'Q48_A', 'Q48_B',
       'Q48_C', 'Q48_D', 'Q48_Other', 'Q49_A', 'Q49_B', 'Q49_C', 'Q49_D',
       'Q49_Other', 'Q50_A', 'Q50_B', 'Q50_C', 'Q50_D', 'Q50_E', 'Q50_Other',
       'Q51', 'Q52', 'Q53', 'Q54', 'Q55', 'Q56_A', 'Q56_B', 'Q56_C', 'Q56_D',
       'Q56_E', 'Q56_F', 'Q56_G', 'Q56_H', 'Q56_I', 'Q56_J', 'Q56_K', 'Q57_A',
       'Q57_B', 'Q57_C', 'Q57_D', 'Q57_E', 'Q57_F', 'Q58', 'Q59', 'Q60', 'Q61',
       'Q62', 'Q63', 'Q64', 'Q65', 'Organization', 'benchmark_group',
       'number_of_employees', 'number_of_years', 'workforce', 'Q36_K', 'Q36_L',
       'Q37_F', 'Q37_G', 'Q38_H', 'Q38_I', 'Q46_F', 'Q48_F', 'Q56_L', 'Q5

In [55]:
#Write out to csv
df.to_csv("inclusion_clean_final.csv")

# Visualizing

In [6]:
df[["Organization","benchmark_group"]]

Unnamed: 0,Organization,benchmark_group
0,Company 1X,Technology
1,Company 1W,NonProfit
2,Company 1V,Finance
3,Company 1U,Finance
4,Company 1T,Technology
5,Company 1S,Technology
6,Company 1R,Technology
7,Company 1Q,Finance
8,Company 1P,
9,Company 1O,Technology


In [4]:
#Reading in clean csv in case I only want to run this code block
df = pd.read_csv("../inclusion_clean_final.csv")

# Set it to None to display all rows in the dataframe
pd.set_option('display.max_rows', None)


In [5]:
# Set it to None to display all rows in the dataframe
pd.set_option('display.max_rows', None)

#Show me all of the rows for the company I want to visualize
#Cell > Toggle Scrolling off to see all rows
df[df["Organization"]=="Company 1X"].T

Unnamed: 0,0
Unnamed: 0,0
Q32,Annually
Q33,1
Q34,<85% employees are satisfied or highly satisfied
Q35,1.0
Q36_A,
Q36_B,
Q36_C,
Q36_D,
Q36_E,
