# PROJECT RESULTS

## INSIGHTS

### Baseline (AS IS)

**Leads:**
- 96.9% of the total leads are from India.
- 34.9% of the total leads are from Mumbai city.
- 39.5% of potential customers do not provide their city of residence.
- The vast majority (89,7%) of the interested customers that the company is currently attracting are unemployed.
- Only 7.7% of leads are working professionals.
- Activity and profile index features are missing for 45.4% of the leads.
- Regarding profile index feature, almost no leads fall under '03. Low' category.


**Lead-to-customer conversion rate**
- Current lead-to-customer conversion rate is 38.6%.
- Working professionals have high conversion rate (92.5%), especially those from management sector.
- Unemploeyed leads, although high in number, have low conversion rate (33.9%).
- Almost all leads coming from 'Reference' (91.7% conv. rate) and 'Welingak Website' (98.1% conv. rate) sources end up buying the product. However, only 2.8% of leads come from these sources.
- Leads who hear about the company through recommendations (World of mouth, Student of someschool) have higher conversion rates.
- 94.8% of the leads marked as 'high in relevance' by employees end up becoming paying customers. However, leads classified as 'low in relevance' or 'might be' are also having high conversion rates (81.9% and 75.6% respectively).
- Regarding activity index feature, leads under '02. Medium' category obtains better conversion rates than '01. High' category.


**Commercial and marketing channels**
- Most leads come from Google (30.8%) and Direct traffic (27.8%). Organic search and Olark chat generates a significat number of leads too (12.7% and 18.8% of the total number of leads respectively).
- Converted leads spent a median of 10 minutes more time viewing the website than those who did not convert, regardless to the amount of visits and number of pages viewed.
- Only 0.24% of the leads indicated on the form that they have seen advertisements from the company.
- Employees are not sure about the quality of 63% of leads, and only have time/information to fill in the lead profile field for 26.1% of all leads.
- Around 92% of leads do not like to be called or receive emails about the course.
- Only 14.8% of leads who want to be contacted by email end up converting into paying customers.
- Email marketing campaigns have untapped potential, as the last notable activity/last activity of 30%-37% of total number of leads was opening an email but only about 37% of them were converted after it.
- Sms campaigns achieved conversion rates of 60%-70% and reached a significant number of leads.
- There are leads tagged as 'Ringing' who selected 'do not want to receive phone calls' on the form.
- Most potential customers were not interested in getting a free copy of the lead magnet. Leads interested in downloading the lead magnet are mostly unemployed and download it mainly from the landing page.

### Actions to improve company's customer knowledge

1. Improve quality of the survey or form questions to receive more user inputs and reduce NaN/default ('Select') values.
2. Improve algorithm for the activity and profile score/index to produce complete and more accurate results.
3. Colect time stamp visiting the website for seasonality analysis and implement cookies to identity and track users as they navigate different pages on the website.
4. Implement a new lead segmentation algorithm that identifies the company's different leads profiles and makes it possible to identify which group best fit for each new lead, in order to be able to carry out more personalised commercial actions.

### Actions to improve lead-to-customer conversion rate

1. Implement a predictive lead scoring algorithm that identifies people who are most likely to convert into paying customers through the most efficient channel for each lead and relieves the sales team of the workload of manually filling in features such as lead_quality, lead_profile or tags so they will be able to spend more time on contacting the most promising customers.

### Actions to improve commercial and marketing channels performance

1. Improve the content strategy of the website, lead magnet and emails to attract traffic and increase the time people spend on the website by creating tailored content mainly for working professionals in the Indian management sector.
2. Create a referral program to encourage existing customers to recommend the course to their friends, family, and colleagues.
3. Invest more resources into acquiring leads from 'Welingak website'.
4. Increase investments in SMS campaigns as they are performing well.
5. Check whether the default value for advertisement features is set to 'No' in the web form, which could explain the high percentage of 'No' for all of them. If this is not the case, then the advertising investment strategy should be completely revised as it is not generating almost any lead (0,24%).
6. Check that the sales team is only contacting people who have given their consent to do so.

## LEAD SEGMENTATION MODEL

### Identified segment profiles

**Segment 0**
- Origin: API.
- Last activity: Most leads have conversations via Olark chat.
- Segment with lower presence of working professionals.
- Time spent on the website far below average.
- Almost no leads in this segment buy the company's product.

**Segment 1**
- Origin: Landing Page.
- Last activity: Email Opened.
- Some presence of working professionals.
- Above-average time spent on the website.
- Slightly lower conversion rate than the company's current average conversion rate.

**Segment 2**
- Origin: Landing Page.
- Last activity: Most of them have received an SMS. Some of them have visited the website.
- Some presence of working professionals.
- Above-average time spent on the website.
- Slightly higher conversion rate than the company's current average conversion rate.

**Segment 3**
- Origin: API.
- Last activity: Email Opened.
- Some presence of working professionals.
- Below average time spent on website.
- Slightly lower conversion rate than the company's current one. Similar to the conversion rate of segment 1.

**Segment 4**
- Origin: Lead Add Form (main sources: References, Welingak website).
- Last activity: Email Opened, SMS sent or unknown.
- High presence of working professionals.
- Time spent on the website far below average.
- 9 out of 10 leads in this segment end up buying the company's product.

**Segment 5**
- Origin: API.
- Last activity: SMS sent.
- Notable presence of working professionals.
- Above-average time spent on the website.
- Conversion rate significantly higher than the company's current average conversion rate.

### Actionable initiatives

1. The company's most valuable leads are those that come from referrals or from Welingak website, and even more so if they are working professionals. As proposed in the exploratory data analysis section, the company should seriously consider creating a referral programme to encourage existing customers to recommend the course to their close circle.


2. SMS campaigns are performing quite well. However, these campaigns should focus on:
    - Working professionals comming from API or landing page who spend above-average time on the website.
    - Leads comming from References or Welingak website regardless of their occupation and time spend on the website.
    - Avoid sending sms to leads who come from API and have spent a short time on the site.
    

3. Olark chat is not performing well. The company should consider withdrawing investment in this service and for leads coming from API replace it with:
    - Email marketing campaigns in case of working professionals who spent a short time on the website or in case of unemployed leads.
    - SMS campaigns in case of working professionals who spend above-average time on the website, as discussed in the previous point.

## LEAD SCORING MODEL

In [15]:
import cloudpickle
import pandas as pd
from numpy import where
from janitor import clean_names

# Data importation
project_path = (r'C:\Users\pedro\PEDRO\DS\Portfolio\LEAD_SCORING').replace('\\','/')
data_file_name = 'validation.csv'
full_path = project_path + '/02_Data/02_Validation/' + data_file_name
df = pd.read_csv(full_path,sep=',')

# Data quality
df = clean_names(df) \
       .rename(columns={'lead_number':'id',
                        'lead_source':'source',
                        'totalvisits':'total_visits',
                         'total_time_spent_on_website':'total_time_website',
                         'how_did_you_hear_about_x_education':'hear_about',
                         'what_is_your_current_occupation':'ocupation',
                         'what_matters_most_to_you_in_choosing_a_course':'matters_most',
                         'asymmetrique_activity_score':'activity_score',
                         'asymmetrique_profile_score':'profile_score',
                         'a_free_copy_of_mastering_the_interview':'lead_magnet'}) \
       .drop_duplicates() \
       .set_index('id')
df = df.loc[~((df.last_activity=='Email Bounced')|(df.last_notable_activity=='Email Bounced'))]

# Final features
final_features = ['activity_score',
                  'city',
                  'country',
                  'do_not_call',
                  'do_not_email',
                  'hear_about',
                  'last_activity',                  
                  'last_notable_activity',                  
                  'lead_magnet',   
                  'lead_origin',  
                  'matters_most',     
                  'ocupation',
                  'page_views_per_visit',
                  'profile_score',
                  'source',
                  'specialization',
                  'total_time_website',
                  'total_visits']

# Ground truth
y_true = df['converted']

# Final dataset                 
df = df[final_features]

# Loading execution pipe
name_pipe_execution = 'pipe_execution.pickle'
path_pipe_ejecucion = project_path + '/04_Models/' + name_pipe_execution

with open(path_pipe_ejecucion, mode='rb') as file:
   pipe_execution = cloudpickle.load(file)

# Loading optimal discrimination threshold
name_disc_threshold = 'optimal_disc_threshold.pickle'
path_disc_threshold = project_path + '/04_Models/' + name_disc_threshold

with open(path_disc_threshold, mode='rb') as file:
   optimal_disc_threshold = cloudpickle.load(file)

# Execution and results
scoring = pipe_execution.predict_proba(df)[:, 1]
manage_lead = where(scoring>optimal_disc_threshold, 'Yes', 'No')
results = pd.DataFrame({'lead_score':scoring, 'manage_lead':manage_lead}).set_index(df.index)

# Model performance dataframe
model_performance = pd.concat([results,y_true],axis=1)

### KPIs

#### Lead-to-customer conversion rate

##### As is

Conversion rate without applying the developed machine learning model:

In [25]:
cr_asis = model_performance.converted.mean()
cr_asis

0.38960313024035775

##### To be 

Conversion rate managing only those leads that the model identifies as profitable customers:

In [26]:
cr_tobe = model_performance[model_performance.manage_lead=='Yes'].converted.mean()
cr_tobe

0.498546511627907

KPI improvement:

In [111]:
improvements_cr = round(cr_tobe - cr_asis,2)

#### Sales team workload

##### As is

Number of leads to be managed without applying the developed model:

In [31]:
workload_asis = model_performance.shape[0]
workload_asis

1789

##### To be

Number of leads selected by the algorithm to be managed:

In [33]:
workload_tobe = model_performance[model_performance.manage_lead=='Yes'].shape[0]
workload_tobe

1376

KPI improvement:

In [110]:
improvements_workload = round(100 - workload_tobe*100/workload_asis,2)

#### Lost investment in not converted lead management

##### As is

Amount lost in managing leads that ultimately did not buy the product without applying the developed machine learning model:

In [64]:
lost_asis = round(model_performance[model_performance.converted==0].shape[0]*ltc_avg_cost,2)
lost_asis

3549.0

##### To be

Amount lost in managing leads that ultimately did not buy the product if the developed machine learning model is applied:

In [65]:
lost_tobe = round(model_performance[(model_performance.manage_lead=='Yes')&(model_performance.converted==0)] \
                  .shape[0]*ltc_avg_cost,2)
lost_tobe 

2242.5

KPI improvement:

In [109]:
improvements_lost = round(100 - lost_tobe*100/lost_asis,2)

#### Sales profit

As reported by the company:
- Product selling price (online course for industry professional): 49.99 \$
- Lead-to-customer average cost: 3.25 $ per converted customer.

In [37]:
product_price = 49.99
ltc_avg_cost = 3.25

##### As is

Profit from product sales without applying the developed machine learning model:

In [75]:
sales_asis = round(model_performance[model_performance.converted==1] \
                   .shape[0]*(product_price-ltc_avg_cost) - lost_asis,2)
sales_asis

29028.78

##### To be

Profit from product sales if the developed machine learning model is applied:

In [76]:
sales_tobe = round(model_performance[(model_performance.manage_lead=='Yes')&(model_performance.converted==1)] \
                   .shape[0]*(product_price-ltc_avg_cost) - lost_tobe,2)
sales_tobe

29821.14

KPI improvement:

In [108]:
improvement_sales = round(sales_tobe*100/sales_asis - 100,2)

### Measurable improvements

By applying the developed lead scoring predictive model, the company would be able to:
1. Increase its sales profit by 2.7%.
2. Save 36.8% of the amount of money lost due to the management of low quality leads.
3. Increase lead-to-customer conversion rate from 38% to 50%.
4. Save 23% of time spent by employees on managing leads.

In [107]:
pd.DataFrame({'Conversion Rate':[cr_asis,cr_tobe,'Increased by ' + str(improvements_cr)],
              'Workload':[workload_asis,workload_tobe,'Reduced by ' + str(improvements_workload) + '%'],
              'Lost investments':[lost_asis,lost_tobe,'Reduced by ' + str(improvements_lost) + '%'],
              'Sales profit':[sales_asis,sales_tobe,'Increased by ' + str(improvement_sales) + '%']}) \
  .T.rename_columns({0:'As is',1:'To be',2:'Improvements'})

Unnamed: 0,As is,To be,Improvements
Conversion Rate,0.389603,0.498547,Increased by 0.11
Workload,1789.0,1376.0,Reduced by 23.09%
Lost investments,3549.0,2242.5,Reduced by 36.81%
Sales profit,29028.78,29821.14,Increased by 2.73%
