Project Charter — Smartphone Battery Life Prediction
Goal

Build a machine learning model that predicts the remaining battery time (in minutes) of a smartphone based on real or simulated usage data.

Prediction Target

minutes_remaining → continuous value
Type: Regression

Success Metric

Primary: MAE (Mean Absolute Error) in minutes

Secondary: RMSE, R²

Why This Project Matters

Accurate battery predictions improve:

User experience

Power management

Device optimization

Data Needed

Battery %

CPU usage

Screen on/off

App usage

Network activity

Timestamps

Data Strategy

Start with synthetic time-series battery logs

Later optionally add real logs or Kaggle datasets

In [None]:
Day 1 Setup Completed
Folders Created
battery_project/
 ├── data
 ├── notebooks
 ├── src
 ├── models
 └── reports

Git Setup

Initialized repository → git init

Added README

First commit → "project scaffold"

Conda Environment

Created environment:

conda create -n batteryml python=3.10
conda activate batteryml

Installed Libraries
numpy
pandas
matplotlib
seaborn
scikit-learn
jupyterlab

Next Steps

Create synthetic dataset

Explore features

Build baseline model

In [2]:
import numpy, pandas, sklearn
print("NumPy:", numpy.__version__)
print("Pandas:", pandas.__version__)
print("scikit-learn:", sklearn.__version__)
print("\nEnvironment OK ✔")


NumPy: 2.3.5
Pandas: 2.3.3
scikit-learn: 1.7.1

Environment OK ✔


In [None]:
Synthetic Data — purpose & notes

We generate a small, realistic-ish session so you can implement baseline models and iterate without waiting for real logs. This generator provides minute-level telemetry, CPU and screen usage, and a computed ground-truth minutes_remaining_target.

In [5]:
import os
os.getcwd()


'C:\\Users\\madan\\battery_project'

In [6]:
df.to_csv(r"C:\Users\madan\battery_project\data\synthetic_session1.csv", index=False)


In [7]:
import os
os.path.exists(r"C:\Users\madan\battery_project\data\synthetic_session1.csv")


True

In [1]:
import os
os.getcwd()


'C:\\Users\\madan\\battery_project\\notebooks'

In [3]:
import numpy as np
import pandas as pd
from datetime import datetime, timedelta

def simulate_session(start_percent=90, minutes=240, base_drain_per_min=0.08, seed=1):
    np.random.seed(seed)
    times = [datetime.now() + timedelta(minutes=i) for i in range(minutes)]
    cpu = np.clip(np.random.normal(loc=20, scale=10, size=minutes), 0, 100)
    screen_on = (np.random.rand(minutes) < 0.35).astype(int)
    drain = base_drain_per_min + (cpu/100)*0.18 + screen_on*0.45 + np.random.normal(0, 0.02, minutes)
    percent = np.maximum(0, start_percent - np.cumsum(drain))
    df = pd.DataFrame({
        'timestamp': times,
        'battery_percent': np.round(percent,3),
        'cpu_pct': np.round(cpu,2),
        'screen_on': screen_on,
        'drain_per_min': np.round(drain,4),
    })
    minutes_remaining = []
    for i in range(len(df)):
        current_pct = df.loc[i, 'battery_percent']
        future_drains = df.loc[i:, 'drain_per_min'].values
        avg_future = future_drains.mean() if future_drains.size else np.nan
        rem = int(np.ceil(current_pct / avg_future)) if avg_future > 0 else 0
        minutes_remaining.append(rem)
    df['minutes_remaining_target'] = minutes_remaining
    return df

df = simulate_session(start_percent=95, minutes=300, base_drain_per_min=0.06, seed=42)
df.head()


Unnamed: 0,timestamp,battery_percent,cpu_pct,screen_on,drain_per_min,minutes_remaining_target
0,2025-12-03 19:18:39.320908,94.478,24.97,1,0.5216,378
1,2025-12-03 19:19:39.320926,93.924,18.62,1,0.5544,377
2,2025-12-03 19:20:39.320931,93.83,26.48,0,0.0944,379
3,2025-12-03 19:21:39.320934,93.695,35.23,0,0.1348,377
4,2025-12-03 19:22:39.320938,93.618,17.66,0,0.0765,376


In [4]:
df.to_csv("../data/synthetic_session1.csv", index=False)
os.path.exists("../data/synthetic_session1.csv")


True