Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decide how to count 'time to hire' #9

Closed
leaflettuce opened this issue Mar 16, 2019 · 20 comments

Comments

@leaflettuce
Copy link
Owner

commented Mar 16, 2019

Split out from #3, we need to figure out how to track time to hire to answer business problem revolving around demo differences and hiring time #2.

As @mitchb63 mentioned, there are time_to_(color) variables in df_contact we can use for this.

Should time to blue be the dependent for this question? Or should we create one based on date user account was created to date turned blue? OR some other method?

@leaflettuce leaflettuce added this to the Business Solutions milestone Mar 16, 2019

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 16, 2019

There is also df_contact$Confirmed_Hired_Date__c
However, I think I read somewhere in the HH documentation that "hired" information should be taken from the "hiring" table

@leaflettuce

This comment has been minimized.

Copy link
Owner Author

commented Mar 16, 2019

Good point! Looks like there is a 'hire date' in hiring table which connects to contact via hire$client_name = contact$id

@ChRizzo1

This comment has been minimized.

Copy link
Collaborator

commented Mar 16, 2019

Hello,

There is a time to hire in df_feedback, although not sure if we have a join to other tables.

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 16, 2019

We sort of need to define where that time "starts" though too. Our group needs to work with "time in program" which includes "time to hire" as well as the time from "program start" to the time clients leave without being hired. Unless of course there's a "time in program" variable in the data. I haven't seen one though.

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 16, 2019

I guess by "We" I mean "our team." ;)
You guys don't necessarily have to do that!

@ChRizzo1

This comment has been minimized.

Copy link
Collaborator

commented Mar 16, 2019

I did not find time in the program but you have the var: Months_Unemployed__c in SalesForce hire info data set. Maybe useful if you can find when the client joined ...

"

@leaflettuce

This comment has been minimized.

Copy link
Owner Author

commented Mar 16, 2019

^^ Months unemployed could be the answer we're looking for.

With green being the 'actively seeking employment' label for HH< could we also find the difference between 'contact$date_turned_green and hire$start_date? This could be one answer to your question of when to define a client program 'start'- although they could be working within the program while labelled red or purple as well? Not sure on the best route here, anyone have a preference?

And while yours directly requires this type of answer @mitchb63, our team could also use it to see if volunteers affect this time-period or if demo has any significant effect on it.

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 16, 2019

Maybe we can get an answer to how HH wants us to determine the "start date" during the webinar Tuesday.

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 17, 2019

I did some experimenting with this today. "Created_Date" works to a degree but it doesn't seem to be the answer in some cases. Some accounts were apparently created after the fact because I came up with negative "Time in Program" values at times. I also don't think Date turned Green will work for our group because we also need to calculate "Time to date turned Black" which happens before they turn Green.

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 17, 2019

I think I may have found something!
df_contact$Dat_Initial_Assessment_was_Completed__c

This is described as "Date of first conversation with Transition Specialist".

@leaflettuce

This comment has been minimized.

Copy link
Owner Author

commented Mar 18, 2019

@mitchb63 Great find! I'm not sure how I missed that when I was looking through the data dictionary. Can we use this and date_turned_blue/green/black to help answer those bus questions you think?

I'll let you know if I find anything else, but it sounds like it could be useful to make some variables such as : time_to_green/black/blue. That would at least be a start and provide a look at how many days (or whatever interval) it takes for each person to move on in the process.

Thoughts?

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 18, 2019

I used it to create a "Days in Program" variable that includes all of the end point "colors" It also uses a somewhat arbitrary date of 3/1/2019 as the end date for those that are "currently in the program". Basically, I was just trying to get a very rough idea of what the variables involved in our problems looks like. I created a Word doc with some overview info to help team members get up to speed.

@leaflettuce

This comment has been minimized.

Copy link
Owner Author

commented Mar 18, 2019

Just browsed the word and it looks solid! My eda branch is looking to do the same for our teams bus questions but just fleshing it out now.

Could you share the code you used to create the days in prog variable pls? I'd like to explore it further as well! 😁

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 18, 2019

@leaflettuce

This comment has been minimized.

Copy link
Owner Author

commented Mar 18, 2019

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 18, 2019

Here is the code for what I did this morning. It's an add-on to the cleaner.r file. I haven't taken the time to split it out nicely like you have yet. As I mentioned, there are some questionable choices made in here but I was trying to crank something out to look at before the call tonight.

########################

TEAM2 TOPIC SPECIFIC

########################

merge contact and hire

df_hire_contact_join <- left_join(df_contact, df_hire, by = c("Id" = "Client_Name__c"))

Reduce variables

df_topic_edit1 <- df_hire_contact_join[,c(1,4:6,9:11,23:26,47,49,54,66,79,89,96,99,104,117,157,168,179,183,186,187,190,238:241,257,260,261,272,283,292,302,303,305:310,315,318,336,337,359,361,377,382:389,394,395)]

Add Age variable based on today??

df_topic_edit1["Age"] <- as.numeric(today() - ymd(as.character(df_topic_edit1$Date_Of_Birth__c)))/365

Replace NA's with 0's

for (i in 1:nrow(df_topic_edit1)){
if(is.na(df_topic_edit1[i,"Age"])){
df_topic_edit1[i,"Age"] <- 0
}
}

Bin the ages according to some standard I forget the name of!

and YES, these should probably be functions with lapply not for loops

df_topic_edit1["Age_bin"] <- ""

for (i in 1:nrow(df_topic_edit1)){
if (df_topic_edit1[i,"Age"] < 16) {
df_topic_edit1[i,"Age_bin"] <- 'Underage'
}
else if (df_topic_edit1[i,"Age"] >= 16 & df_topic_edit1[i,"Age"] < 21) {
df_topic_edit1[i,"Age_bin"] <- '16-20'
}
else if (df_topic_edit1[i,"Age"] >= 21 & df_topic_edit1[i,"Age"] < 26) {
df_topic_edit1[i,"Age_bin"] <- '21-25'
}
else if (df_topic_edit1[i,"Age"] >= 26 & df_topic_edit1[i,"Age"] < 31) {
df_topic_edit1[i,"Age_bin"] <- '26-30'
}
else if (df_topic_edit1[i,"Age"] >= 31 & df_topic_edit1[i,"Age"] < 36) {
df_topic_edit1[i,"Age_bin"] <- '31-35'
}
else if (df_topic_edit1[i,"Age"] >= 36 & df_topic_edit1[i,"Age"] < 41) {
df_topic_edit1[i,"Age_bin"] <- '36-40'
}
else if (df_topic_edit1[i,"Age"] >= 41 & df_topic_edit1[i,"Age"] < 46) {
df_topic_edit1[i,"Age_bin"] <- '41-45'
}
else if (df_topic_edit1[i,"Age"] >= 46 & df_topic_edit1[i,"Age"] < 51) {
df_topic_edit1[i,"Age_bin"] <- '46-50'
}
else if (df_topic_edit1[i,"Age"] >= 51 & df_topic_edit1[i,"Age"] < 56) {
df_topic_edit1[i,"Age_bin"] <- '51-55'
}
else if (df_topic_edit1[i,"Age"] >= 56 & df_topic_edit1[i,"Age"] < 61) {
df_topic_edit1[i,"Age_bin"] <- '56-60'
}
else if (df_topic_edit1[i,"Age"] >= 61 & df_topic_edit1[i,"Age"] < 66) {
df_topic_edit1[i,"Age_bin"] <- '61-65'
}
else if (df_topic_edit1[i,"Age"] >= 66 & df_topic_edit1[i,"Age"] < 71) {
df_topic_edit1[i,"Age_bin"] <- '66-70'
}
else if (df_topic_edit1[i,"Age"] >= 71 & df_topic_edit1[i,"Age"] < 76) {
df_topic_edit1[i,"Age_bin"] <- '71-75'
}
else if (df_topic_edit1[i,"Age"] >= 76 & df_topic_edit1[i,"Age"] < 81) {
df_topic_edit1[i,"Age_bin"] <- '76-80'
}
else {
df_topic_edit1[i,"Age_bin"] <- 'not specified'
}
}

df_topic_edit1$Age_bin <- as.factor(unlist(df_topic_edit1$Age_bin))

Create Days in Program

df_topic_edit1["Days_in_Program"] <- NA

for (i in 1:nrow(df_topic_edit1)){
if(df_topic_edit1[i,"Active_Color__c"] =="Black"){
df_topic_edit1[i,"Days_in_Program"] <- difftime(df_topic_edit1[i, "Date_Turned_Black__c"] , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else if (df_topic_edit1[i,"Active_Color__c"] =="Grey") {
df_topic_edit1[i,"Days_in_Program"] <- difftime(df_topic_edit1[i, "Date_turned_grey__c"] , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else if (df_topic_edit1[i,"Active_Color__c"] =="Blue") {
df_topic_edit1[i,"Days_in_Program"] <- difftime(df_topic_edit1[i, "Confirmed_Hired_Date__c.y"] , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else if (df_topic_edit1[i,"Active_Color__c"] =="Green" | df_topic_edit1[i,"Active_Color__c"] =="Purple" | df_topic_edit1[i,"Active_Color__c"] =="Red") {
df_topic_edit1[i,"Days_in_Program"] <- difftime(as.POSIXct("2019-3-1", format = "%Y-%m-%d", tz = "") , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else {
df_topic_edit1[i,"Days_in_Program"] <- NA
}
}

Get rid of rows that are not actual clients

df_clients <- filter(df_topic_edit1, Client__c == 1)

Get rid of a weird outlier row

df_clients_no <- filter(df_clients, Days_in_Program < 73000 & Days_in_Program >0)

Plot days in program for spouses vs vets

ggplot(df_clients_no, aes(x = Military_Spouse_Caregiver__c, y = Days_in_Program)) + geom_boxplot()

Create a df of spouse data

df_contact_spouses <- filter(df_topic_edit1, Military_Spouse_Caregiver__c ==1 & Client__c == 1)
str(df_contact_spouses, list.len=ncol(df_contact_spouses))
summary(df_contact_spouses)

Create a df of vet data

df_contact_vets <- filter(df_topic_edit1, Military_Spouse_Caregiver__c ==0 & Client__c == 1)
str(df_contact_vets, list.len=ncol(df_contact_vets))
summary(df_topic_edit1)

Plot various spouse demographics

ggplot(df_contact_spouses) + geom_bar(aes(x = MailingState))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(df_contact_spouses) + geom_bar(aes(x = MailingCountry))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_gender <- filter(df_contact_spouses, Gender__c !="" & Gender__c != "--None--")
ggplot(df_contact_spouses_gender) + geom_bar(aes(x = Gender__c))

df_contact_spouses_race <- filter(df_contact_spouses, Race__c !="")
ggplot(df_contact_spouses_race) + geom_bar(aes(x = Race__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(df_contact_spouses) + geom_bar(aes(x = Highest_Level_of_Education_Completed__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(df_contact_spouses) + geom_bar(aes(x = Status__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_branch <- filter(df_contact_spouses, Service_Branch__c !="")
ggplot(df_contact_spouses_branch) + geom_bar(aes(x = Service_Branch__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_status <- filter(df_contact_spouses, Service_Members_Status__c !="")
ggplot(df_contact_spouses_status) + geom_bar(aes(x = Service_Members_Status__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_age <- filter(df_contact_spouses, Age_bin !="Underage")
ggplot(df_contact_spouses_age) + geom_bar(aes(x = Age_bin))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

@leaflettuce

This comment has been minimized.

Copy link
Owner Author

commented Mar 18, 2019

This is Awesome! Thanks for posting. Unless you wanted/planned to I'll toss the top few lines of this into the cleaner saved in here and print out the joined dataset which might just give us #13 !

Or it will at least give us a severely reduced size dataset to explore with. Thanks again!

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 18, 2019

@mitchb63

This comment has been minimized.

Copy link
Collaborator

commented Mar 18, 2019

@leaflettuce

This comment has been minimized.

Copy link
Owner Author

commented Mar 19, 2019

GTG!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.