Decide how to count 'time to hire' #9

leaflettuce · 2019-03-16T13:58:59Z

Split out from #3, we need to figure out how to track time to hire to answer business problem revolving around demo differences and hiring time #2.

As @mitchb63 mentioned, there are time_to_(color) variables in df_contact we can use for this.

Should time to blue be the dependent for this question? Or should we create one based on date user account was created to date turned blue? OR some other method?

mitchb63 · 2019-03-16T16:40:23Z

There is also df_contact$Confirmed_Hired_Date__c
However, I think I read somewhere in the HH documentation that "hired" information should be taken from the "hiring" table

leaflettuce · 2019-03-16T17:30:55Z

Good point! Looks like there is a 'hire date' in hiring table which connects to contact via hire$client_name = contact$id

ChRizzo1 · 2019-03-16T18:33:03Z

Hello,

There is a time to hire in df_feedback, although not sure if we have a join to other tables.

mitchb63 · 2019-03-16T18:37:30Z

We sort of need to define where that time "starts" though too. Our group needs to work with "time in program" which includes "time to hire" as well as the time from "program start" to the time clients leave without being hired. Unless of course there's a "time in program" variable in the data. I haven't seen one though.

mitchb63 · 2019-03-16T18:46:35Z

I guess by "We" I mean "our team." ;)
You guys don't necessarily have to do that!

ChRizzo1 · 2019-03-16T18:55:51Z

I did not find time in the program but you have the var: Months_Unemployed__c in SalesForce hire info data set. Maybe useful if you can find when the client joined ...

"

leaflettuce · 2019-03-16T20:13:46Z

^^ Months unemployed could be the answer we're looking for.

With green being the 'actively seeking employment' label for HH< could we also find the difference between 'contact$date_turned_green and hire$start_date? This could be one answer to your question of when to define a client program 'start'- although they could be working within the program while labelled red or purple as well? Not sure on the best route here, anyone have a preference?

And while yours directly requires this type of answer @mitchb63, our team could also use it to see if volunteers affect this time-period or if demo has any significant effect on it.

mitchb63 · 2019-03-16T21:32:07Z

Maybe we can get an answer to how HH wants us to determine the "start date" during the webinar Tuesday.

mitchb63 · 2019-03-17T19:14:15Z

I did some experimenting with this today. "Created_Date" works to a degree but it doesn't seem to be the answer in some cases. Some accounts were apparently created after the fact because I came up with negative "Time in Program" values at times. I also don't think Date turned Green will work for our group because we also need to calculate "Time to date turned Black" which happens before they turn Green.

mitchb63 · 2019-03-17T20:19:19Z

I think I may have found something!
df_contact$Dat_Initial_Assessment_was_Completed__c

This is described as "Date of first conversation with Transition Specialist".

leaflettuce · 2019-03-18T14:31:14Z

@mitchb63 Great find! I'm not sure how I missed that when I was looking through the data dictionary. Can we use this and date_turned_blue/green/black to help answer those bus questions you think?

I'll let you know if I find anything else, but it sounds like it could be useful to make some variables such as : time_to_green/black/blue. That would at least be a start and provide a look at how many days (or whatever interval) it takes for each person to move on in the process.

Thoughts?

mitchb63 · 2019-03-18T18:03:11Z

I used it to create a "Days in Program" variable that includes all of the end point "colors" It also uses a somewhat arbitrary date of 3/1/2019 as the end date for those that are "currently in the program". Basically, I was just trying to get a very rough idea of what the variables involved in our problems looks like. I created a Word doc with some overview info to help team members get up to speed.

leaflettuce · 2019-03-18T18:48:16Z

Just browsed the word and it looks solid! My eda branch is looking to do the same for our teams bus questions but just fleshing it out now.

Could you share the code you used to create the days in prog variable pls? I'd like to explore it further as well! 😁

mitchb63 · 2019-03-18T20:31:39Z

No problem. It’s a bit clumsy but it seems to work. When I get back to my computer I’ll send it. Btw, what does EDA stand for? 🤔

…

Sent from my iPhone

On Mar 18, 2019, at 2:48 PM, Andrew Trick ***@***.***> wrote: Just browsed the word and it looks solid! My eda branch is looking to do the same for our teams bus questions but just fleshing it out now. Could you share the code you used to create the days in prog variable pls? I'd like to explore it further as well! 😁 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

leaflettuce · 2019-03-18T20:46:50Z

Exploratory data analysis. A little vague of a term really. The eda branch is just me exploring the data more through quick visuals in python. Any unexpected or useful findings I’ll be sure to store in a visuals/eda folder within the repo over the next few days. Thanks! Andy

…

On Mar 18, 2019, at 3:31 PM, mitchb63 ***@***.***> wrote: No problem. It’s a bit clumsy but it seems to work. When I get back to my computer I’ll send it. Btw, what does EDA stand for? 🤔 Sent from my iPhone > On Mar 18, 2019, at 2:48 PM, Andrew Trick ***@***.***> wrote: > > Just browsed the word and it looks solid! My eda branch is looking to do the same for our teams bus questions but just fleshing it out now. > > Could you share the code you used to create the days in prog variable pls? I'd like to explore it further as well! 😁 > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

mitchb63 · 2019-03-18T23:39:27Z

Here is the code for what I did this morning. It's an add-on to the cleaner.r file. I haven't taken the time to split it out nicely like you have yet. As I mentioned, there are some questionable choices made in here but I was trying to crank something out to look at before the call tonight.

########################

TEAM2 TOPIC SPECIFIC

########################

merge contact and hire

df_hire_contact_join <- left_join(df_contact, df_hire, by = c("Id" = "Client_Name__c"))

Reduce variables

df_topic_edit1 <- df_hire_contact_join[,c(1,4:6,9:11,23:26,47,49,54,66,79,89,96,99,104,117,157,168,179,183,186,187,190,238:241,257,260,261,272,283,292,302,303,305:310,315,318,336,337,359,361,377,382:389,394,395)]

Add Age variable based on today??

df_topic_edit1["Age"] <- as.numeric(today() - ymd(as.character(df_topic_edit1$Date_Of_Birth__c)))/365

Replace NA's with 0's

for (i in 1:nrow(df_topic_edit1)){
if(is.na(df_topic_edit1[i,"Age"])){
df_topic_edit1[i,"Age"] <- 0
}
}

Bin the ages according to some standard I forget the name of!

and YES, these should probably be functions with lapply not for loops

df_topic_edit1["Age_bin"] <- ""

for (i in 1:nrow(df_topic_edit1)){
if (df_topic_edit1[i,"Age"] < 16) {
df_topic_edit1[i,"Age_bin"] <- 'Underage'
}
else if (df_topic_edit1[i,"Age"] >= 16 & df_topic_edit1[i,"Age"] < 21) {
df_topic_edit1[i,"Age_bin"] <- '16-20'
}
else if (df_topic_edit1[i,"Age"] >= 21 & df_topic_edit1[i,"Age"] < 26) {
df_topic_edit1[i,"Age_bin"] <- '21-25'
}
else if (df_topic_edit1[i,"Age"] >= 26 & df_topic_edit1[i,"Age"] < 31) {
df_topic_edit1[i,"Age_bin"] <- '26-30'
}
else if (df_topic_edit1[i,"Age"] >= 31 & df_topic_edit1[i,"Age"] < 36) {
df_topic_edit1[i,"Age_bin"] <- '31-35'
}
else if (df_topic_edit1[i,"Age"] >= 36 & df_topic_edit1[i,"Age"] < 41) {
df_topic_edit1[i,"Age_bin"] <- '36-40'
}
else if (df_topic_edit1[i,"Age"] >= 41 & df_topic_edit1[i,"Age"] < 46) {
df_topic_edit1[i,"Age_bin"] <- '41-45'
}
else if (df_topic_edit1[i,"Age"] >= 46 & df_topic_edit1[i,"Age"] < 51) {
df_topic_edit1[i,"Age_bin"] <- '46-50'
}
else if (df_topic_edit1[i,"Age"] >= 51 & df_topic_edit1[i,"Age"] < 56) {
df_topic_edit1[i,"Age_bin"] <- '51-55'
}
else if (df_topic_edit1[i,"Age"] >= 56 & df_topic_edit1[i,"Age"] < 61) {
df_topic_edit1[i,"Age_bin"] <- '56-60'
}
else if (df_topic_edit1[i,"Age"] >= 61 & df_topic_edit1[i,"Age"] < 66) {
df_topic_edit1[i,"Age_bin"] <- '61-65'
}
else if (df_topic_edit1[i,"Age"] >= 66 & df_topic_edit1[i,"Age"] < 71) {
df_topic_edit1[i,"Age_bin"] <- '66-70'
}
else if (df_topic_edit1[i,"Age"] >= 71 & df_topic_edit1[i,"Age"] < 76) {
df_topic_edit1[i,"Age_bin"] <- '71-75'
}
else if (df_topic_edit1[i,"Age"] >= 76 & df_topic_edit1[i,"Age"] < 81) {
df_topic_edit1[i,"Age_bin"] <- '76-80'
}
else {
df_topic_edit1[i,"Age_bin"] <- 'not specified'
}
}

df_topic_edit1$Age_bin <- as.factor(unlist(df_topic_edit1$Age_bin))

Create Days in Program

df_topic_edit1["Days_in_Program"] <- NA

for (i in 1:nrow(df_topic_edit1)){
if(df_topic_edit1[i,"Active_Color__c"] =="Black"){
df_topic_edit1[i,"Days_in_Program"] <- difftime(df_topic_edit1[i, "Date_Turned_Black__c"] , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else if (df_topic_edit1[i,"Active_Color__c"] =="Grey") {
df_topic_edit1[i,"Days_in_Program"] <- difftime(df_topic_edit1[i, "Date_turned_grey__c"] , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else if (df_topic_edit1[i,"Active_Color__c"] =="Blue") {
df_topic_edit1[i,"Days_in_Program"] <- difftime(df_topic_edit1[i, "Confirmed_Hired_Date__c.y"] , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else if (df_topic_edit1[i,"Active_Color__c"] =="Green" | df_topic_edit1[i,"Active_Color__c"] =="Purple" | df_topic_edit1[i,"Active_Color__c"] =="Red") {
df_topic_edit1[i,"Days_in_Program"] <- difftime(as.POSIXct("2019-3-1", format = "%Y-%m-%d", tz = "") , df_topic_edit1[i,"Dat_Initial_Assessment_was_Completed__c"], units="days")
}
else {
df_topic_edit1[i,"Days_in_Program"] <- NA
}
}

Get rid of rows that are not actual clients

df_clients <- filter(df_topic_edit1, Client__c == 1)

Get rid of a weird outlier row

df_clients_no <- filter(df_clients, Days_in_Program < 73000 & Days_in_Program >0)

Plot days in program for spouses vs vets

ggplot(df_clients_no, aes(x = Military_Spouse_Caregiver__c, y = Days_in_Program)) + geom_boxplot()

Create a df of spouse data

df_contact_spouses <- filter(df_topic_edit1, Military_Spouse_Caregiver__c ==1 & Client__c == 1)
str(df_contact_spouses, list.len=ncol(df_contact_spouses))
summary(df_contact_spouses)

Create a df of vet data

df_contact_vets <- filter(df_topic_edit1, Military_Spouse_Caregiver__c ==0 & Client__c == 1)
str(df_contact_vets, list.len=ncol(df_contact_vets))
summary(df_topic_edit1)

Plot various spouse demographics

ggplot(df_contact_spouses) + geom_bar(aes(x = MailingState))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(df_contact_spouses) + geom_bar(aes(x = MailingCountry))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_gender <- filter(df_contact_spouses, Gender__c !="" & Gender__c != "--None--")
ggplot(df_contact_spouses_gender) + geom_bar(aes(x = Gender__c))

df_contact_spouses_race <- filter(df_contact_spouses, Race__c !="")
ggplot(df_contact_spouses_race) + geom_bar(aes(x = Race__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplot(df_contact_spouses) + geom_bar(aes(x = Highest_Level_of_Education_Completed__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(df_contact_spouses) + geom_bar(aes(x = Status__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_branch <- filter(df_contact_spouses, Service_Branch__c !="")
ggplot(df_contact_spouses_branch) + geom_bar(aes(x = Service_Branch__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_status <- filter(df_contact_spouses, Service_Members_Status__c !="")
ggplot(df_contact_spouses_status) + geom_bar(aes(x = Service_Members_Status__c))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

df_contact_spouses_age <- filter(df_contact_spouses, Age_bin !="Underage")
ggplot(df_contact_spouses_age) + geom_bar(aes(x = Age_bin))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))

leaflettuce · 2019-03-18T23:50:18Z

This is Awesome! Thanks for posting. Unless you wanted/planned to I'll toss the top few lines of this into the cleaner saved in here and print out the joined dataset which might just give us #13 !

Or it will at least give us a severely reduced size dataset to explore with. Thanks again!

mitchb63 · 2019-03-18T23:53:39Z

Just remember that I was selecting variables based on the Serving Spouses questions.

…

-- *Mitch Beckner* USA Cycling Official #183739 Ohio Cycling Association Officials Coordinator 937-875-0081

On Mon, Mar 18, 2019 at 7:50 PM Andrew Trick ***@***.***> wrote: This is Awesome! Thanks for posting. Unless you wanted/planned to I'll toss the top few lines of this into the cleaner saved in here and print out the joined dataset which might just give us #13 <#13> ! Or it will at least give us a severely reduced size dataset to explore with. Thanks again! — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AjJ74WFYssO5kpx2AZhuZbNEhx5IVxNsks5vYCY7gaJpZM4b30YA> .

mitchb63 · 2019-03-18T23:54:19Z

And yeah, that's fine! Some of that does really go in the cleaner.

…

-- *Mitch Beckner* USA Cycling Official #183739 Ohio Cycling Association Officials Coordinator 937-875-0081

On Mon, Mar 18, 2019 at 7:53 PM Mitchell Beckner ***@***.***> wrote: Just remember that I was selecting variables based on the Serving Spouses questions. -- *Mitch Beckner* USA Cycling Official #183739 Ohio Cycling Association Officials Coordinator 937-875-0081 On Mon, Mar 18, 2019 at 7:50 PM Andrew Trick ***@***.***> wrote: > This is Awesome! Thanks for posting. Unless you wanted/planned to I'll > toss the top few lines of this into the cleaner saved in here and print out > the joined dataset which might just give us #13 > <#13> ! > > Or it will at least give us a severely reduced size dataset to explore > with. Thanks again! > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub > <#9 (comment)>, > or mute the thread > <https://github.com/notifications/unsubscribe-auth/AjJ74WFYssO5kpx2AZhuZbNEhx5IVxNsks5vYCY7gaJpZM4b30YA> > . >

leaflettuce · 2019-03-19T00:35:17Z

GTG!

leaflettuce added the question Further information is requested label Mar 16, 2019

leaflettuce added this to the Business Solutions milestone Mar 16, 2019

leaflettuce closed this as completed Mar 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide how to count 'time to hire' #9

Decide how to count 'time to hire' #9

leaflettuce commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

leaflettuce commented Mar 16, 2019

ChRizzo1 commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

ChRizzo1 commented Mar 16, 2019

leaflettuce commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

mitchb63 commented Mar 17, 2019

mitchb63 commented Mar 17, 2019

leaflettuce commented Mar 18, 2019

mitchb63 commented Mar 18, 2019

leaflettuce commented Mar 18, 2019

mitchb63 commented Mar 18, 2019 via email

leaflettuce commented Mar 18, 2019 via email

mitchb63 commented Mar 18, 2019

leaflettuce commented Mar 18, 2019

mitchb63 commented Mar 18, 2019 via email

mitchb63 commented Mar 18, 2019 via email

leaflettuce commented Mar 19, 2019

Decide how to count 'time to hire' #9

Decide how to count 'time to hire' #9

Comments

leaflettuce commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

leaflettuce commented Mar 16, 2019

ChRizzo1 commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

ChRizzo1 commented Mar 16, 2019

I did not find time in the program but you have the var: Months_Unemployed__c in SalesForce hire info data set. Maybe useful if you can find when the client joined ...

leaflettuce commented Mar 16, 2019

mitchb63 commented Mar 16, 2019

mitchb63 commented Mar 17, 2019

mitchb63 commented Mar 17, 2019

leaflettuce commented Mar 18, 2019

mitchb63 commented Mar 18, 2019

leaflettuce commented Mar 18, 2019

mitchb63 commented Mar 18, 2019 via email

leaflettuce commented Mar 18, 2019 via email

mitchb63 commented Mar 18, 2019

TEAM2 TOPIC SPECIFIC

merge contact and hire

Reduce variables

Add Age variable based on today??

Replace NA's with 0's

Bin the ages according to some standard I forget the name of!

and YES, these should probably be functions with lapply not for loops

Create Days in Program

Get rid of rows that are not actual clients

Get rid of a weird outlier row

Plot days in program for spouses vs vets

Create a df of spouse data

Create a df of vet data

Plot various spouse demographics

leaflettuce commented Mar 18, 2019

mitchb63 commented Mar 18, 2019 via email

mitchb63 commented Mar 18, 2019 via email

leaflettuce commented Mar 19, 2019