
# **MBA742** Data Science and AI in Business
## *Spring 2026*
Daniel M. Ringel  

www.ringel.ai

## **Class 04** - Data Science with AI: Cloud computing  
*(Customer Churn Prediction Part 1)*

*Janaury 14, 2026*  
Version 3.1

# Today's Agenda
> Familiarize ourselves with Google CoLab, Gemini AI, and how to use ChatGPT to help use analyze data.

1. **Today's Business Problem: Customer Churn**
2. **Python Runtimes, Markdown and Code Cells**
3. **Load Data into CoLab**
4. **Data Wrangling and Cleaning with GenAI (ChatGPT)**
5. **Exploratory Data Analysis with GenAI (Gemini vs. ChatGPT)**


## Prep-Check:
- Have a Google Account
- Have a Google Drive

## Additional Ressoruces:
> https://colab.research.google.com/#scrollTo=C1XfGyQXQK1n  

> https://colab.research.google.com/github/cs231n/cs231n.github.io/blob/master/python-colab.ipynb

# 1. Today's Business Problem: Customer Churn


![Lab vs Real-World](https://atrium.ai/wp-content/uploads/2021/07/What-stops-customer-churn-Having-a-centralized-data-hub-does-and-heres-why.jpeg)


####***Customer churn***
- customer attrition
- customer turnover
- customer defection

***is the loss of clients or customers***

Firms that have subscription or membership business models usually monitor customer churn closely:

- Banks
- Telephone service companies
- Internet service providers
- Pay TV companies
- Insurance firms
- Gyms
- etc.   

----------------

####***Customer churn rates*** often a key business metric (along with cash flow, EBITDA (earnings before interest, tax, depreciation), etc.)
* Cost of retaining an existing customer is far less than acquiring a new one.

-----------------

####Dedicated departments attempt to ***prevent churn*** and ***win back churned customers***   
- long-term customers can be worth more than newly acquired customers

####***BUT: Competitors*** may make special offers to entice customers away
- Customers leave in hope of better service or value for money
- ***Switching cost*** can create hurdles

-----------------
#### Important business activity: ***Customer Retention***
- Can be costly -  *why?*
- To focus retention efforts, must understand ***which customers are at risk of churning***.  

  

*Source: definition adapted from Wikipedia.com*

## 1.2. **The Business Challenge:** How to Identify customers that are at risk of churning?
- We will use a dataset that is based on real bank data, but was slightly modified for the purpose of this case study to
    - preserve real customers privacies  
    - preserve the bank's privacy  
    - allow for richer analysis  

## 1.3 Let's work our way through ***The Data Science Pipeline***

<img src="https://ringel.ai/UNC/2026/img/datasciencepipeline.png" width="600"/>

# 2. Python Runtimes, Markdown, and Code Cells

### You will use cloud computing for this analysis. The Menu item "Runtime" lets you start, interupt, and terminate your Python session on Google Colab.

<img src="https://ringel.ai/UNC/2026/img/runtime-colab.png" width="400"/>


### You can see the status of your runtime at the top right:
<img src="https://ringel.ai/UNC/2026/img/runtime-status.png" width="400"/>





## 2.1 Add a Markdown Cell to write comments, instructions, etc.
<img src="https://ringel.ai/UNC/2026/img/new-markdown.png" width="600"/>


## 2.2 Add code cell to write and execute code
<img src="https://ringel.ai/UNC/2026/img/new-code-cell.png" width="600"/>



### Click on "PLAY" to execute code!

<img src="https://ringel.ai/UNC/2026/img/play-to-run.png" width="1000"/>

In [None]:
print("Welcome to the Age of AI")

Welcome to the Age of AI


# 3. Load Data into CoLab
The bank provides us with the following data set:
1. Data on bank customers that previously churned / did not churn (Training Set)

This dataset is available online at:
https://www.ringel.ai/UNC/2026/MBA742/Class04/data/Bank_Churn_Train.json

> **Right click** and select "Save Link As ..." to download

**Let's load the data on your google drive.**
- I recommend that you create a folder in the root of your google drive called "MBA742".
- Create another folder inside of this folder called "Class04"
- Download the data file and copy it into the Class04 folder on your Google drive.

**Now, we need to connect our google drive to this Colab session.**
  -  How to do that? Why not get the help of Gemini, the AI embedded in CoLab?
> Add a code cell (already done for you) and use Gemini (click on **generate** in *Start coding or generate with AI.*):   
--> Tell Gemini to connect your google drive and navigate to your folder "MBA742/Class04" (where your data are).   
--> Tell Gemini to also show you the contents of that folder.

Next, add another code cell and tell Gemini to generate code to load the Bank_Churn_Train.json file

# 4. Data Wrangling and Cleaning with GenA
Create a new code cell and ask Gemini to describe each column of the data (HINT: tell it to display, not to print)

**What can you learn from the avaiable data?**

## 4.1 Data Wrangling
For now, we may not need to do anything here:
- We have a single data file (flat file)
- We have a good set of variables
- *We may come back later to some feature engineering (= create new features/variables/columns from existing ones)*

***Question***: What is the dependent variable in this dataset?
- Examine the data by outputting the dataframe ("output the dataframe")
- How many rows are there? That is, how many observations do you have?
> **Add a text cell and write your answer in it!**

## 4.2 Data Cleaning

Use ChatGPT (or Claude) to help you generate code that cleans the data.


- ensure consistent data types
- discover implausible or invalid values (e.g., in the "Active" column, amongst others)
  -  consider recoding variables and enforcing enforcing boundaries

- impute missing values
  - what about implausible/invalid values?
- remove outliers
- other?

**Steps with ChatGPT (or Claude or similar)**
1. Upload the dataset to ChatGPT
2. Ask it how these data might be cleaned.
> *I want to create a predictive model. How would I clean these data?*
3. Check if the **categorical variables are consistent**:   
> *Are these categorical columns consistent: 'Gender', 'Subsidiary', 'BankCC', 'Active', 'LifeInsur', 'PlatStatus', 'Terminated' ?*
>  
> *Make the categorical variables consistent. Do not set to missing. Replace inconsistent with appropriate values*
4. Make sure that the **categorical variables are of type "categorical"**.
> *Make the categorical variables of type categorical*
5. Focus on the numerical variables that are interesting to us: ***Are the numerical values valid?***
> *Ask yourself:*
> - *what is a valid FICO Score? (FICOScore)*
> - *what is a valid Age? (Age)*
> - *what is a valid Account Balance? Negative balances are possible, right? (Balance)*
> - *what is a plausible number of products? (Products)*
> - *what is a plausible Regular Deposits? (RegDeposits)*
>
6. Fix **implausible values** and **remove outliers**:
> *Investigate and fix these numerical variables so that they fall in the right range by (1) imputing invalid or missing values, and (2) removing outliers*
7. Now, **transfer the code to CoLab** and run it here
> *Give me the entire cleaning code from the previous steps so that I can run it in Google CoLab. I already loaded the data into dataframe df*
8. If you  get **errors or undesirable outcomes in Colab**, go back to ChatGPT (Copy-Paste code and the error messages) and ask ChatGPT to fix it, then back to Step 7.


### Make sure to build you code in this notebook!

### **WHY?**
> Add a text cell and write your explanation in it!
- Can you create a list of reasons?
- Can you Bold what is important?
- Can you write a main message that is in a larger font size?

# 5. Exploratory Data Analysis with GenAI (Gemini vs. ChatGPT)

Let's do some EDA with Gemini (ColabAI) and ChatGPTor another AI of your choice such as Claudefrom Anthropic.

#### ***What is EDA and what is the value of it?***
> *Write your answer in a new text cell.*

## 5.1 Let's ask Gemini to do some EDA on your wrangled and cleaned data.

## 5.2 Let's ask ChatGPT to help you with some EDA on your wrangled and cleaned data.

Here are some examples you might ask for:
- Distributions of numerical data
- Boxplots and Outliers
- Categorical by Churn Status

## 5.3 EDA Insights

What are three insights from your EDA that are valuable to what you are trying to achieve?

Add a text cell where you specify:
1. What you want to Achieve
2. Your three Insights and how they add value

# What's Next?
1. Build a model that can predict customers likely to churn

2. Evaluate your model's performance (validate it)

3. Quantify the potential value of your model

4. Consider how to use it to improve customer retention of the bank


# **Looking Ahead**

### **MLK Day** on Monday, January 19th - ***No Class!***

------

### **Class 05:** *Data Analytics with AI: Customer Churn Prediction*
*Wednesday, January 21st*

### **PREP**:
***Work through the Python Notebook of Class 05***. I recommend that you only look at the solution version if you get stuck.
> We ***will discuss*** the notebook in class.  
> We ***will not*** get into the details of the code because ChatGPT, Claude, Gemini and many more can easily explain those to you (if you are interested)!

