# Neural Network: The Origin of Deep Learning

<img src="https://towardsdatascience.com/wp-content/uploads/2021/12/1hkYlTODpjJgo32DoCOWN5w.png">



## Consider this sample data

In [22]:
import warnings
warnings.filterwarnings("ignore")

import pandas as pd

# Sample churn dataset
url = 'https://raw.githubusercontent.com/bonchae/data/refs/heads/master/WA_Fn-UseC_-Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df.head()

Unnamed: 0,customerID,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn
0,7590-VHVEG,Female,0,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No
1,5575-GNVDE,Male,0,No,No,34,Yes,No,DSL,Yes,...,Yes,No,No,No,One year,No,Mailed check,56.95,1889.5,No
2,3668-QPYBK,Male,0,No,No,2,Yes,No,DSL,Yes,...,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes
3,7795-CFOCW,Male,0,No,No,45,No,No phone service,DSL,Yes,...,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,1840.75,No
4,9237-HQITU,Female,0,No,No,2,Yes,No,Fiber optic,No,...,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes


Telco Customer Churn Dataset: Feature Definitions

| Feature Name        | Description                                                                 |
|---------------------|-----------------------------------------------------------------------------|
| `customerID`        | Unique ID assigned to each customer (dropped in modeling)                  |
| `gender`            | Customer’s gender: `Male`, `Female`                                        |
| `SeniorCitizen`     | Indicates if the customer is a senior (1) or not (0)                       |
| `Partner`           | Whether the customer has a partner (`Yes`/`No`)                            |
| `Dependents`        | Whether the customer has dependents (`Yes`/`No`)                           |
| `tenure`            | Number of months the customer has been with the company                    |
| `PhoneService`      | Whether the customer has phone service (`Yes`/`No`)                        |
| `MultipleLines`     | If customer has multiple phone lines                                       |
| `InternetService`   | Type of internet: `DSL`, `Fiber optic`, or `No`                            |
| `OnlineSecurity`    | Whether the customer has online security add-on                            |
| `OnlineBackup`      | Whether the customer has online backup service                             |
| `DeviceProtection`  | Whether the customer has device protection plan                            |
| `TechSupport`       | Whether the customer has technical support access                          |
| `StreamingTV`       | Whether the customer has streaming TV service                              |
| `StreamingMovies`   | Whether the customer has streaming movies access                           |
| `Contract`          | Type of contract: `Month-to-month`, `One year`, `Two year`                |
| `PaperlessBilling`  | Whether billing is paperless (`Yes`/`No`)                                  |
| `PaymentMethod`     | Method of payment: `Electronic check`, `Mailed check`, etc.                |
| `MonthlyCharges`    | Amount charged to the customer monthly (in dollars)                        |
| `TotalCharges`      | Total amount charged over the customer’s lifetime                          |
| `Churn`             | Target variable: whether the customer left in the last month (`Yes`/`No`)  |

**Note:** After preprocessing, categorical variables are converted into dummy variables, and `Churn_Yes` becomes the binary target (1 = churned, 0 = stayed).


<center><img src="https://miro.medium.com/v2/format:webp/1*Ne7jPeR6Vrl1f9d7pLLG8Q.jpeg"></center>

[Source](https://medium.com/@b.terryjack/introduction-to-deep-learning-feed-forward-neural-networks-ffnns-a-k-a-c688d83a309d)

Input Layer
- This is where data enters the model
- Each feature (column) in your data (e.g., **tenure, contract type, monthly charges**) is one neuron
- It passes the raw information to the next layer — no learning happens here

Hidden Layers
- These are the “thinking” layers of the model
- They learn patterns in the data by **transforming inputs with weights** and activations
- The more hidden layers or neurons, the more **complex patterns** the model can learn

Output Layer
- This layer gives the **final prediction**
- Example: 1 neuron with sigmoid activation → **churn probability**
- In multi-class problems: multiple neurons with softmax

For structured business data (like churn), 1–2 hidden layers are usually enough.
More layers don’t always help and can even hurt if the data is small or noisy.

# What is “Deep”?


“Deep” just means **many layers** of neurons. Each layer **extracts increasingly complex features** from data (e.g., images, texts).

- **Early layers (Convolution + ReLU) → Low-level features**  
  Detect **edges**, **colors**, **textures**, **blobs**.

- **Middle layers (More Convolution + Pooling) → Mid-level patterns**  
  Recognize **corners**, **shapes**, **simple parts of objects**.

- **Deeper layers (Final Convolutional Stages) → High-level representations**  
  Detect **eyes**, **faces**, **objects**, **categories**.

> This hierarchical feature extraction is exactly what makes CNNs so powerful for image-related tasks.

<img src="https://i0.wp.com/developersbreach.com/wp-content/uploads/2020/08/cnn_banner.png?fit=1400%2C658&ssl=1">

<h2 style="text-align: center;">Deep Learning Algorithm Hierarchy with Business Applications</h2>

<table border="1" cellspacing="0" cellpadding="8" style="margin: auto; text-align: left; font-family: sans-serif;">
  <tr style="background-color: #8cbeb2; text-align: center;">
    <th><strong>Type</strong></th>
    <th><strong>Business Applications</strong></th>
    <th><strong>Real-World Examples</strong></th>
  </tr>

  <tr>
    <td><strong>CNNs</strong><br><small>Convolutional Neural Networks</small></td>
    <td>
      <ul>
        <li><strong>Image classification (e.g., product recognition)</strong></li>
        <li>Medical image diagnostics</li>
        <li>Quality control in manufacturing</li>
        <li>Facial recognition in security</li>
      </ul>
    </td>
    <td>
      <ul>
        <li>Google Lens</li>
        <li>Apple Face ID</li>
        <li>Siemens AI Medical Imaging</li>
        <li>Amazon Go Surveillance</li>
      </ul>
    </td>
  </tr>

  <tr>
    <td><strong>RNNs</strong><br><small>Recurrent Neural Networks</small></td>
    <td>
      <ul>
        <li>Stock and sales forecasting</li>
        <li>Customer churn prediction</li>
        <li>Speech-to-text transcription</li>
        <li>Chatbot conversation modeling</li>
      </ul>
    </td>
    <td>
      <ul>
        <li>Google Voice</li>
        <li>Nuance Dragon Speech</li>
        <li>Apple Siri (early versions)</li>
        <li>Amazon Forecast</li>
      </ul>
    </td>
  </tr>

  <tr>
    <td><strong>GANs</strong><br><small>Generative Adversarial Networks</small></td>
    <td>
      <ul>
        <li>Synthetic data generation</li>
        <li>Product design and prototyping</li>
        <li>AI-generated media and marketing visuals</li>
        <li>Art and creative content generation</li>
      </ul>
    </td>
    <td>
      <ul>
        <li><strong>DALL·E</strong> (OpenAI)</li>
        <li>Midjourney</li>
        <li>StyleGAN (NVIDIA)</li>
        <li>Runway ML</li>
      </ul>
    </td>
  </tr>

  <tr>
    <td><strong>Transformers</strong><br><small>LLMs / Attention-Based Models</small></td>
    <td>
      <ul>
        <li>Customer support chatbots</li>
        <li>Sentiment analysis and social listening</li>
        <li>Text summarization and document search</li>
        <li>Personalized recommendations and Q&A</li>
      </ul>
    </td>
    <td>
      <ul>
        <li><strong>ChatGPT</strong> (OpenAI)</li>
        <li>Google Bard</li>
        <li>GitHub Copilot</li>
        <li>Google Translate</li>
      </ul>
    </td>
  </tr>

  <tr>
    <td><strong>DRL</strong><br><small>Deep Reinforcement Learning</small></td>
    <td>
      <ul>
        <li>Autonomous vehicles and robotics</li>
        <li>Dynamic pricing systems</li>
        <li>Supply chain and inventory optimization</li>
        <li>Simulated training environments</li>
      </ul>
    </td>
    <td>
      <ul>
        <li>Waymo Self-Driving Cars</li>
        <li>DeepMind AlphaGo</li>
        <li>OpenAI Five (Dota 2)</li>
        <li>Amazon Warehouse Optimization</li>
      </ul>
    </td>
  </tr>
</table>


# Popularity Ranking of DL Architectures (as of 2025)

| Rank | Architecture     | Popularity               | Primary Use Cases                                        |
|------|------------------|--------------------------|----------------------------------------------------------|
| 1️⃣   | Transformers      | ⭐⭐⭐⭐⭐ *(most popular)*    | LLMs, NLP, vision, audio, multimodal                     |
| 2️⃣   | CNNs              | ⭐⭐⭐⭐                     | Image classification, object detection, vision tasks     |
| 3️⃣   | GANs              | ⭐⭐⭐                      | Image generation, style transfer, data augmentation      |
| 4️⃣   | RNNs / LSTMs      | ⭐⭐                       | Legacy NLP, time series prediction, audio modeling       |


# Two Broad Ways to Use Deep Learning Models



## 1. Using Pre-trained Models

You take a model that has already been trained on a large dataset (like **ImageNet**, **COCO**, or a large text corpus).

You can:
- Use it **as-is** for inference
- **Fine-tune** it on your own smaller dataset
- **Pros**: Faster, requires less data and compute, great for transfer learning





## 2. Training Models from Scratch

You define your model architecture and train it from the beginning using your dataset.

- **Pros**: Complete control, better for novel or domain-specific problems  
- **Cons**: Requires large datasets, longer training time, and more compute resources

# A Quick Demo

## Emotion Detection using Pre-trained Models
Use case: Go beyond "positive/negative" — detect emotions like joy, anger, sadness

<img src="https://webflow-amber-prod.gumlet.io/620e4101b2ce12a1a6bff0e8/66ab6846124b51c486c24b3e_640f1bb03074900cbf0f28f3_What-are-the-Ivy-League-schools.webp" width=700>

**"I can't believe I got in! I'm so happy and feel very grateful."**

In [23]:
from transformers import pipeline

emotion = pipeline("text-classification",
                   model="j-hartmann/emotion-english-distilroberta-base",
                   top_k=None)

results = emotion("I can't believe I got in! I'm so happy and feel very grateful.")


import pandas as pd

rows = []
for i, row in enumerate(results):
    for item in row:
        rows.append({"text_id": i, "label": item["label"], "score": item["score"]})
df = pd.DataFrame(rows)
print(df.pivot(index="text_id", columns="label", values="score").round(3))


config.json: 0.00B [00:00, ?B/s]

pytorch_model.bin:   0%|          | 0.00/329M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/294 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/329M [00:00<?, ?B/s]

vocab.json: 0.00B [00:00, ?B/s]

merges.txt: 0.00B [00:00, ?B/s]

tokenizer.json: 0.00B [00:00, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

Device set to use cpu


label    anger  disgust  fear    joy  neutral  sadness  surprise
text_id                                                         
0        0.002    0.001   0.0  0.965    0.002    0.003     0.027


In [24]:
from transformers import pipeline
emotion_classifier = pipeline(
    "text-classification",
    model="bhadresh-savani/distilbert-base-uncased-emotion",
    top_k=None
)

text = "I can't believe I got in! I'm so happy and feel very grateful."
results = emotion_classifier(text)

for row in results:
    for item in row:
        print(f"{item['label']:<10} -> {item['score']:.4f}")

Device set to use cpu


joy        -> 0.9989
love       -> 0.0007
sadness    -> 0.0002
surprise   -> 0.0001
fear       -> 0.0001
anger      -> 0.0001


In [25]:
# Try a different sentence
from transformers import pipeline
emotion_classifier = pipeline(
    "text-classification",
    model="cardiffnlp/twitter-xlm-roberta-base-sentiment",
    top_k=None
)

text = "Sunt fericit și foarte recunoscător."
results = emotion_classifier(text)

for row in results:
    for item in row:
        print(f"{item['label']:<10} -> {item['score']:.4f}")










Device set to use cpu


positive   -> 0.9321
neutral    -> 0.0533
negative   -> 0.0146


## Training Emotion Detection Model from Scratch

In [26]:
# import packages

from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from sklearn.preprocessing import LabelEncoder
import pandas as pd

In [27]:
# Tiny dataset for demonstration
data = {
    "text": [
        "I'm so happy and grateful",
        "This is terrible. I'm really mad",
        "I feel scared and anxious",
        "What a surprise! I didn’t expect that",
        "I'm feeling so loved and supported",
        "This makes me really sad"
    ],
    "emotion": [
        "joy",
        "anger",
        "fear",
        "surprise",
        "love",
        "sadness"
    ]
}

df = pd.DataFrame(data)
df

Unnamed: 0,text,emotion
0,I'm so happy and grateful,joy
1,This is terrible. I'm really mad,anger
2,I feel scared and anxious,fear
3,What a surprise! I didn’t expect that,surprise
4,I'm feeling so loved and supported,love
5,This makes me really sad,sadness


In [28]:
# Tokenize text
tokenizer = Tokenizer()
tokenizer.fit_on_texts(df['text'])
X = tokenizer.texts_to_sequences(df['text'])
X = pad_sequences(X, padding='post')

# Encode labels
le = LabelEncoder()
y = le.fit_transform(df['emotion'])

# Check vocab size
vocab_size = len(tokenizer.word_index) + 1

The model below performs much better than traditional sentiment analysis since it uses ```Embedding``` that understand word sequence pattern. This is more than word matching. For example,

```"I’m so ____ and ____"``` → joy

```"This is ____"``` → anger or sadness

In [29]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense

model = Sequential([
    Embedding(input_dim=vocab_size, output_dim=16, input_length=X.shape[1]),
    LSTM(16),
    Dense(6, activation='softmax')  # 6 emotions
])

model.compile(loss='sparse_categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()


In [30]:
model.fit(X, y, epochs=30, verbose=1)

Epoch 1/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2s/step - accuracy: 0.1667 - loss: 1.7940
Epoch 2/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 53ms/step - accuracy: 0.1667 - loss: 1.7923
Epoch 3/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 54ms/step - accuracy: 0.3333 - loss: 1.7905
Epoch 4/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.3333 - loss: 1.7887
Epoch 5/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.3333 - loss: 1.7870
Epoch 6/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 56ms/step - accuracy: 0.6667 - loss: 1.7852
Epoch 7/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 51ms/step - accuracy: 0.6667 - loss: 1.7834
Epoch 8/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 52ms/step - accuracy: 0.6667 - loss: 1.7816
Epoch 9/30
[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m

<keras.src.callbacks.history.History at 0x7840d0360bc0>

In [31]:
# Your test sentence
test_text = ["I can't believe I got in! I'm so happy and feel very grateful."]

# Preprocess the test input
test_seq = tokenizer.texts_to_sequences(test_text)
test_pad = pad_sequences(test_seq, maxlen=X.shape[1], padding='post')

# Predict emotion
pred = model.predict(test_pad)
emotion_label = le.inverse_transform([pred.argmax()])[0]

# Print Output
print(f"Text: {test_text[0]}")
print(f"Predicted Emotion: {emotion_label}")


[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 195ms/step
Text: I can't believe I got in! I'm so happy and feel very grateful.
Predicted Emotion: love


In [32]:
# Try a different sentence

test_text = ["I had a great childhood."]

# Preprocess the test input
test_seq = tokenizer.texts_to_sequences(test_text)
test_pad = pad_sequences(test_seq, maxlen=X.shape[1], padding='post')

# Predict emotion
pred = model.predict(test_pad)
emotion_label = le.inverse_transform([pred.argmax()])[0]

# Print Output
print(f"Text: {test_text[0]}")
print(f"Predicted Emotion: {emotion_label}")









[1m1/1[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 62ms/step
Text: I had a great childhood.
Predicted Emotion: fear
