### **Link:** https://platform.stratascratch.com/data-projects/modelling-churn-energy-company

### **Difficulty:** Hard

# Modelling Churn in Energy Company

<div><p><em>This data project has been used as a take-home assignment in the recruitment process for the data science positions at BCG Gamma.</em></p>
<h2>Assignment</h2>
<p><strong>Scenario:</strong></p>
<p>Our client, PowerCo, is a major utility company providing gas and electricity to corporate, SME and residential customers.  In recent years, post-liberalization of the energy market in Europe, PowerCo has had a growing problem with increasing customer defections above industry average. Thus PowerCo has asked BCG to work alongside them to identify the drivers of this problem and to devise and implement a strategy to counter it. The churn issue is most acute in the SME division and thus they want it to be the first priority.</p>
<p>The head of the SME division has asked whether it is possible to predict the customers which are most likely to churn so that they can trial a range of pre-emptive actions. He has a hypothesis that clients are switching to cheaper providers so the first action to be tried will be to offer customers with high propensity of churning a 20% discount.</p>
<p><strong>Your task:</strong></p>
<p>We have scheduled a meeting in one week's time with the head of the SME division in which you will present our findings of the churn issue and your recommendations on how to address it.</p>
<p>You are in charge of building the model and of suggesting which commercial actions should be taken as a result of the model's outcome. The client also would like to answer the following questions:</p>
<ol>
<li>What are the most explicative variables for churn,</li>
<li>Is there a correlation between subscribed power and consumption,</li>
<li>Is there a link between channel sales and churn.</li>
</ol>
<p>The first stage is to establish the viability of such a model. For training your model you are provided with a dataset which includes features of SME customers in January 2016 as well as the information about whether or not they have churned by March 2016. In addition to that you have received the prices from 2015 for these customers. Of particular interest for the client is how you frame the problem for training.</p>
<p>Given that this is the first time the client is resorting to predictive modelling, it is beneficial to leverage descriptive statistics and visualisation for extracting interesting insights from the provided data before diving into the model. Also while it is not mandatory, you are encouraged to test multiple algorithms. If you do so it will helpful to describe the tested algorithms in a simple manner.</p>
<p>Using the trained model you shall “score” customers in the verification data set (provided in the eponymous file) and put them in descending order of the propensity to churn.  You should also classify these customers into two classes:  those which you predict to churn are to be labelled "1" and the remaining customers should be labelled "0" in the result template.</p>
<p>Finally, the client would like to have a view on whether the 20% discount offer to customers predicted to be churned is a good measure. Given that it is a steep discount bringing their price lower than all competitors we can assume for now that everyone who is offered will accept it. According to regulations they cannot raise the price of someone within a year if they accept the discount. Therefore offering it excessively is going to hit revenues hard.</p>
<h2>Data Description</h2>
<p>The table below describes all the data fields which are found in the data. You will notice that the contents of some fields are meaningless text strings. This is due to "hashing" of text fields for data privacy. While their commercial interpretation is lost as a result of the hashing, they may still have predictive power.</p>
<table><thead><tr><th>Field name</th><th>Description</th></tr></thead><tbody><tr><td>id</td><td>contact id</td></tr><tr><td>activity_new</td><td>category of the company's activity</td></tr><tr><td>campaign_disc_ele</td><td>code of the electricity campaign the customer last subscribed to</td></tr><tr><td>channel_sales</td><td>code of the sales channel</td></tr><tr><td>cons_12m</td><td>electricity consumption of the past 12 months</td></tr><tr><td>cons_gas_12m</td><td>gas consumption of the past 12 months</td></tr><tr><td>cons_last_month</td><td>electricity consumption of the last month</td></tr><tr><td>date_activ</td><td>date of activation of the contract</td></tr><tr><td>date_end</td><td>registered date of the end of the contract</td></tr><tr><td>date_first_activ</td><td>date of first contract of the client</td></tr><tr><td>date_modif_prod</td><td>date of last modification of the product</td></tr><tr><td>date_renewal</td><td>date of the next contract renewal</td></tr><tr><td>forecast_base_bill_ele</td><td>forecasted electricity bill baseline for next month</td></tr><tr><td>forecast_base_bill_year</td><td>forecasted electricity bill baseline for calendar year</td></tr><tr><td>forecast_bill_12m</td><td>forecasted electricity bill baseline for 12 months</td></tr><tr><td>forecast_cons</td><td>forecasted electricity consumption for next month</td></tr><tr><td>forecast_cons_12m</td><td>forecasted electricity consumption for next 12 months</td></tr><tr><td>forecast_cons_year</td><td>forecasted electricity consumption for next calendar year</td></tr><tr><td>forecast_discount_energy</td><td>forecasted value of current discount</td></tr><tr><td>forecast_meter_rent_12m</td><td>forecasted bill of meter rental for the next 12 months</td></tr><tr><td>forecast_price_energy_p1</td><td>forecasted energy price for 1st period</td></tr><tr><td>forecast_price_energy_p2</td><td>forecasted energy price for 2nd period</td></tr><tr><td>forecast_price_pow_p1</td><td>forecasted power price for 1st period</td></tr><tr><td>has_gas</td><td>indicated if client is also a gas client</td></tr><tr><td>imp_cons</td><td>current paid consumption</td></tr><tr><td>margin_gross_pow_ele</td><td>gross margin on power subscription</td></tr><tr><td>margin_net_pow_ele</td><td>net margin on power subscription</td></tr><tr><td>nb_prod_act</td><td>number of active products and services</td></tr><tr><td>net_margin</td><td>total net margin</td></tr><tr><td>num_years_antig</td><td>antiquity of the client (in number of years)</td></tr><tr><td>origin_up</td><td>code of the electricity campaign the customer first subscribed to</td></tr><tr><td>pow_max</td><td>subscribed power</td></tr><tr><td>price_date</td><td>reference date</td></tr><tr><td>price_p1_var</td><td>price of energy for the 1st period</td></tr><tr><td>price_p2_var</td><td>price of energy for the 2nd period</td></tr><tr><td>price_p3_var</td><td>price of energy for the 3rd period</td></tr><tr><td>price_p1_fix</td><td>price of power for the 1st period</td></tr><tr><td>price_p2_fix</td><td>price of power for the 2nd period</td></tr><tr><td>price_p3_fix</td><td>price of power for the 3rd period</td></tr><tr><td>churned</td><td>has the client churned over the next 3 months</td></tr></tbody></table>
<p>A whole host of rich investigations are possible. Your ideas on what some next steps could be, armed with such data is also of interest.</p>
<h2>Practicalities</h2>
<p>This is a fictional case study designed to loosely resemble the work you might undertake on a GAMMA project. It will test your ability to handle big data and perform statistical/machine learning analyses as well as your ability to communicate your findings and derive commercial insight from your technical work.</p>
<p>You may perform the analyses using any computational language you wish (including at least one tool different from excel, since the majority of data sets we receive from clients are too large for us to be able to use it).</p></div>

## **Data:**

## **Solution:**