A small Python script that reads a CSV of sales and prints a clear text summary: totals, revenue by category, top products, and revenue by month. It auto-parses a date
column (if present) into real datetimes for correct time-based grouping.
-
Reads a CSV and computes:
- Total rows, total units, total revenue
- Revenue by category (if the column exists)
- Top 3 products by revenue (if the column exists)
- Revenue by month (if a
date
column exists)
-
Safely validates required columns (
units
,unit_price
) -
Works cross-platform (Windows/macOS/Linux)
-
Simple CLI interface
- Python 3.9+ (3.11+ recommended)
pandas
python -m pip install --upgrade pip
python -m pip install pandas
# in your project folder
py -3.13 -m venv .venv
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\.venv\Scripts\Activate.ps1
python -m pip install --upgrade pip
pip install -r requirements.txt
py -3.13 -m venv .venv
.\.venv\Scripts\activate.bat
python -m pip install --upgrade pip
pip install -r requirements.txt
py -3.13 -m venv .venv
source .venv/Scripts/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
Create a
requirements.txt
with:pandas
python summarize_sales.py /path/to/your.csv
Examples
-
PowerShell:
python .\summarize_sales.py .\sample_sales.csv
-
CMD:
python summarize_sales.py sample_sales.csv
-
Bash/WSL/macOS:
python summarize_sales.py sample_sales.csv
The script prints a report to stdout.
At minimum you need:
units
(integer)unit_price
(number)
Recommended columns (optional but unlock more features):
date
(ISO formatYYYY-MM-DD
preferred) → enables monthly groupingproduct
category
region
date,order_id,product,category,units,unit_price,region
2025-07-01,1001,DeskMate Pro,Office,2,149.99,West
2025-07-01,1002,SlimKey 60,Electronics,1,89.50,West
2025-07-02,1003,Focus Lamp,Home,3,24.99,South
2025-07-02,1004,DeskMate Pro,Office,1,149.99,South
2025-07-03,1005,Ergo Chair,Office,1,299.00,West
2025-07-03,1006,SlimKey 60,Electronics,2,89.50,East
2025-07-05,1007,Focus Lamp,Home,5,24.99,East
2025-07-07,1008,AirPods Case,Accessories,4,12.00,West
2025-07-09,1009,USB-C Hub 7-in-1,Electronics,3,39.99,South
2025-08-01,1010,Ergo Chair,Office,2,299.00,East
2025-08-02,1011,DeskMate Pro,Office,1,149.99,West
2025-08-03,1012,Focus Lamp,Home,2,24.99,South
2025-08-04,1013,USB-C Hub 7-in-1,Electronics,4,39.99,East
2025-08-10,1014,SlimKey 60,Electronics,1,89.50,South
2025-08-12,1015,AirPods Case,Accessories,6,12.00,West
2025-08-15,1016,Focus Lamp,Home,1,24.99,West
=== SUMMARY REPORT ===
Rows: 16
Total units sold: 39
Total revenue: $2,529.78
Revenue by category:
- Office: $1,496.96
- Electronics: $637.93
- Home: $274.89
- Accessories: $120.00
Top 3 products by revenue:
- Ergo Chair: $897.00
- DeskMate Pro: $599.96
- SlimKey 60: $358.00
Revenue by month:
- 2025-07: $1,385.36
- 2025-08: $1,144.42
The script tries to parse a date
column only if it exists:
header = pd.read_csv(csv_path, nrows=0).columns
parse_cols = ["date"] if "date" in header else []
df = pd.read_csv(csv_path, parse_dates=parse_cols or None)
- If the CSV contains
date
, pandas converts it todatetime64[ns]
. - If not, the script reads the file normally.
- Parsing dates enables correct chronological sorting, monthly grouping, and
.dt
accessors.
Prefer ISO format (YYYY-MM-DD
). If your file uses another format, you can enforce it:
df = pd.read_csv(csv_path)
df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d", errors="raise")
.
├─ summarize_sales.py
├─ sample_sales.csv # optional example file
└─ requirements.txt # contains: pandas
-
pip: command not found
Usepython -m pip install ...
(orpy -m pip ...
on Windows). -
Activation errors on Windows PowerShell needs an execution policy for this session:
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass .\.venv\Scripts\Activate.ps1
-
CSV missing columns The script requires
units
andunit_price
. Add them or rename your columns. -
Date parsing looks wrong Your CSV might use a different format. Force a format with
pd.to_datetime(..., format=...)
.
Q: What is df
?
A: It’s a pandas.DataFrame
(a 2-D labeled table). A single column like df["revenue"]
is a pandas.Series
(1-D).
Q: Do I need a virtual environment? A: You can run without it, but a venv keeps dependencies clean and reproducible.
Q: Can I group by something else (e.g., region)? A: Yes. Example:
df.groupby("region")["revenue"].sum().sort_values(ascending=False)