## Irvington Volunteer Project 

It's that time of year again! 

In your small town of Irvington, there is an annual festival that brings in visitors from all over the state! It's a huge deal and depends completely on the help of the local townspeople. 

The festival is 100% volunteer-run and operated, and it's your job to make sure all the volunteers have everything they need to get started. It is also your job to make sure volunteers are well organized, well prepared, and ready to rock when the day of the festival arrives! 

In the months before the festival, you collected a lot of information on each of the 1,000 volunteers. This information will help you better organize the volunteer effort. Follow the prompts below to complete your task of leading the volunteer effort!

***

#### For this task, use the Irvington Volunteers dataset. This dataset includes the following columns:

    * First = First name of volunteer
    * Last = Last name of volunteer
    * Gender = Gender of volunteer
    * Age = Age of volunteer         
    * Shift = Shift of volunteer worker; 1 = first shift; 2 = second shift; 3 = third shift; 4 = forth shift
    * Month = Month of year volunteer signed up for task
    * Volunteer Task = Assigned task for volunteer 
    * Task Level = Level of involvement for volunteer task; 1 = beginner, 2 = moderate, 3 = expert
    * Supervisor Number = Supervisor assigned to volunteer
    * Fees Owed = Outstanding volunteer fee's that are owed to community
    * Materials Needed = Does the volunteer still need task-specific materials; Y = yes; N = no
    * Volunteer Hours = Number of hours volunteer has spent preparing for task
    * Hours Pledged = Number of hours volunteer pledged to spend on task
    * Task Training Completed = Did the volunteer complete task-specific training; Y = yes; N = no
    * Recruited Volunteers = Number of other volunteers recruited by this volunteer

1. Import the "Irvington Volunteers" dataset. Check to make sure the dataset looks like what you're expecting

In [3]:
import pandas as pd
df=pd.read_excel("Irvington_Volunteers.xlsx")

2. Confirm all your volunteers are accounted for and all of the information you need is within the dataset. What characteristics can you identify in the dataset?
***

* How many rows are in the dataset?
* How many columns are in the dataset?
* What types of data are within each column?

In [5]:
df.shape

(999, 15)

In [6]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 999 entries, 0 to 998
Data columns (total 15 columns):
 #   Column                   Non-Null Count  Dtype 
---  ------                   --------------  ----- 
 0   First                    999 non-null    object
 1   Last                     999 non-null    object
 2   Gender                   999 non-null    object
 3   Age                      999 non-null    int64 
 4   Shift                    999 non-null    int64 
 5   Month                    999 non-null    int64 
 6   Volunteer Task           999 non-null    object
 7   Task Level               999 non-null    int64 
 8   Supervisor Number        999 non-null    int64 
 9   Fees Owed                999 non-null    int64 
 10  Materials Needed         999 non-null    object
 11  Volunteer Hours          999 non-null    int64 
 12  Hours Pledged            999 non-null    int64 
 13  Task Training Completed  999 non-null    object
 14  Recruited Volunteers     999 non-null    i

3. Before taking this list seriously, we need to check to make sure all the volunteers are eligible to work their specific task. The only eligibility requirement is age-based -- no volunteers can be under the age of 13 and no volunteers can be over the age of 75. Are there any volunteers that fall outside of this range?

In [10]:
df.loc[(df['Age'] < 13) & (df['Age'] >75)]

Unnamed: 0,First,Last,Gender,Age,Shift,Month,Volunteer Task,Task Level,Supervisor Number,Fees Owed,Materials Needed,Volunteer Hours,Hours Pledged,Task Training Completed,Recruited Volunteers


4. Did you miss anyone? Did you make sure to record ALL the information for ALL the volunteers? Quickly check if your missing any data in this dataset. 

In [11]:
df.isnull().sum()

First                      0
Last                       0
Gender                     0
Age                        0
Shift                      0
Month                      0
Volunteer Task             0
Task Level                 0
Supervisor Number          0
Fees Owed                  0
Materials Needed           0
Volunteer Hours            0
Hours Pledged              0
Task Training Completed    0
Recruited Volunteers       0
dtype: int64

5. A lot of the information we have organizes the volunteers into specific groups. It's important that none of these groups are mismatched (too many people in one group vs another). Let's check how many volunteers fall into each of the following groups. (Hint: you need to use the value counts function for this question). 
***

* Last (how many volunteers are from the same family?)
* Gender
* Shift
* Month
* Volunteer Task
* Task Level
* Supervisor Number
* Materials Needed
* Task Training Completed

In [24]:
df.value_counts('Shift')

Shift
3    258
2    251
4    246
1    244
dtype: int64

6. Let's look at these same groups, but from another angle. It's important that the community members are mixing and mingling with members of all ages. What is the average age of volunteers within all the groups listed below? (Hint: you need to use the groupby function for this question). Once you determine the average age of volunteers in the groups below, determine the average number of volunteer hours for all the groups below. 
***

* Gender
* Shift
* Month
* Volunteer Task
* Task Level
* Supervisor Number
* Materials Needed
* Task Training Completed

In [33]:
df['Shift'].groupby(df['Gender']).mean()

Gender
F    2.531697
M    2.482353
Name: Shift, dtype: float64

In [34]:
df

Unnamed: 0,First,Last,Gender,Age,Shift,Month,Volunteer Task,Task Level,Supervisor Number,Fees Owed,Materials Needed,Volunteer Hours,Hours Pledged,Task Training Completed,Recruited Volunteers
0,Jackie,Jackson,F,59,4,12,Fire Safety,2,1,18,Y,19,28,Y,3
1,Mary,Patterson,F,31,4,10,First Aid Booth,1,1,7,N,27,28,N,6
2,Tanya,Adams,F,48,1,3,Concession Stand,3,1,22,N,16,28,Y,2
3,Tanya,Henderson,F,19,2,2,Information Desk,3,1,26,Y,9,28,Y,5
4,Walter,Franklin,M,25,1,2,Concession Stand,1,1,25,Y,11,28,Y,4
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
994,Walter,Mulgrew,M,63,3,4,Information Desk,2,5,10,Y,25,28,Y,5
995,Veronica,Looner,F,63,3,5,Clean Up,2,5,28,N,8,28,Y,4
996,Larry,Patterson,M,49,1,4,Clean Up,2,5,21,Y,16,28,N,3
997,Victor,Kane,M,39,3,7,Concession Stand,2,5,28,Y,8,28,Y,4


7. There are specific groups of volunteers that need to be followed-up with immediately! These are the volunteers that have some outstanding business to handle before the big festival. Locate the group of volunteers that fall within each of these conditions.
***
* Locate the volunteers who still need materials for their volunteer task
* Locate the volunteers who have not yet completed their volunteer training
* Locate the volunteers who are experts are their task (aka Task Level = 3)
* Locate the volunteers who are working fire safety and haven't yet completed their training - these folks need to get trained ASAP!

In [44]:
df.loc[(df['Materials Needed'] == 'Y')]
df.loc[(df['Task Training Completed'] == 'N')]
df.loc[(df['Task Level'] == 3)]
df.loc[(df['Volunteer Task'] == 'Fire Safety') & (df['Task Training Completed'] == 'N')]

Unnamed: 0,First,Last,Gender,Age,Shift,Month,Volunteer Task,Task Level,Supervisor Number,Fees Owed,Materials Needed,Volunteer Hours,Hours Pledged,Task Training Completed,Recruited Volunteers
34,Tanya,Harrison,F,44,1,3,Fire Safety,1,1,24,N,11,28,N,5
62,Ronelle,Samuelson,F,55,1,1,Fire Safety,1,1,14,Y,21,28,N,5
74,Roger,Sapp,M,67,4,2,Fire Safety,2,1,9,Y,25,28,N,6
96,Adam,Patterson,M,23,2,2,Fire Safety,1,1,29,Y,7,28,N,4
118,Denise,Moore,F,66,2,10,Fire Safety,2,1,29,Y,9,28,N,2
164,Samantha,Franklin,F,61,3,6,Fire Safety,2,1,24,Y,12,28,N,4
173,Nicole,Kennedy,F,40,4,5,Fire Safety,1,1,6,Y,31,28,N,3
225,Karen,Looner,F,36,3,2,Fire Safety,2,2,9,N,27,28,N,4
227,Betty,Baker,F,17,4,5,Fire Safety,2,2,30,Y,7,28,N,3
230,Adam,Vaughn,M,39,4,9,Fire Safety,3,2,28,Y,8,28,N,4


8. To keep track of volunteers who meet specific conditions, make some additional columns to hold the new information. 
***

* Inexperienced volunteers may get confused or lost. Create a new column called "needsMentor". This column should be assigned a value of "1" if a volunteer has not completed training and is at skill level 1. 
* How many volunteers haven't completed the volunteer hours they promised they would? Create a new column that subtracts the number of hours pledged ("Hours Pledged") from the number of volunteer hours already worked ("Volunteer Hours). Call this column "hoursNeeded". 
* There are some overly committed volunteers - they have already worked all their pledged volunteer hours and are still going! These folks need to get a specific bonus when the festival is over - we need to track them somehow. Create a new column called "overtimeBonus" - if "hoursNeeded" is less than 0, this column should be "1". 

In [56]:
import numpy as np
df['needsMentor'] = np.where(((df['Task Training Completed'] == 'N') & (df["Task Level"] == 1)),1,0)
df['hoursNeeded'] = df['Hours Pledged'] - df['Volunteer Hours']
df['overtimeBonus'] = np.where(df['hoursNeeded'] < 0,1,0)

9. The Mayor wants to make a speech about the different types of community members that are volunteering for the festival. Specifically, the Mayor would like to present some information on the different age groups. Create a new column called "Age Group" and bin the "Age" column to create bins based on the age of the volunteer. Follow the guidance below for the variations in age. (Hint: you can use any code you like to complete this task - np.select is also an option!)
***

* Teen (0 - 17)
* Young Adult (17.1 - 35)
* Adult (35.1 - 65)
* Senior (65+)

10. The Supervisor Number column is all messed up. Someone didn't record is correctly, and now it is meaningless! Drop it from this dataset - it's not providing any meaningful information. 

In [57]:
df.drop(columns = 'Supervisor Number',inplace=True)

11. There is a new policy about volunteer fees. It seems that some of the fees owed were actually for materials that the volunteer paid for out of pocket. Wipe out the debt for those volunteers who owe just a few dollars. If the volunteer owes less than 5 dollars, replace that value with 0!

In [58]:
df.loc[df['Fees Owed']<5] = 0

12. We need some code that we can keep around for next year -- when we have the same task but new data! Define a few functions that can complete tasks that we will need to repeat down the road. 

***

* Define a function to recode the column "Gender". Instead of "M" and "F" -- have the words spelled out "Male" and "Female". Apply this function to the "Gender" column. 
* Define a function to recode the column "Month". Convert the numeric "Month" to the name of the Month. Apply this function to the column "Month" and create a new column called "Name of Month". 