Skip to content

jynamba/python-csv-read-and-analyze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

summarize-sales

A small Python script that reads a CSV of sales and prints a clear text summary: totals, revenue by category, top products, and revenue by month. It auto-parses a date column (if present) into real datetimes for correct time-based grouping.

Features

  • Reads a CSV and computes:

    • Total rows, total units, total revenue
    • Revenue by category (if the column exists)
    • Top 3 products by revenue (if the column exists)
    • Revenue by month (if a date column exists)
  • Safely validates required columns (units, unit_price)

  • Works cross-platform (Windows/macOS/Linux)

  • Simple CLI interface

Requirements

  • Python 3.9+ (3.11+ recommended)
  • pandas

Installation

Option A: Quick start (global install)

python -m pip install --upgrade pip
python -m pip install pandas

Option B: Recommended (virtual environment)

Windows – PowerShell

# in your project folder
py -3.13 -m venv .venv
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
.\.venv\Scripts\Activate.ps1

python -m pip install --upgrade pip
pip install -r requirements.txt

Windows – CMD

py -3.13 -m venv .venv
.\.venv\Scripts\activate.bat

python -m pip install --upgrade pip
pip install -r requirements.txt

Windows – Git Bash

py -3.13 -m venv .venv
source .venv/Scripts/activate

python -m pip install --upgrade pip
pip install -r requirements.txt

macOS/Linux

python3 -m venv .venv
source .venv/bin/activate

python -m pip install --upgrade pip
pip install -r requirements.txt

Create a requirements.txt with:

pandas

Usage

python summarize_sales.py /path/to/your.csv

Examples

  • PowerShell:

    python .\summarize_sales.py .\sample_sales.csv
  • CMD:

    python summarize_sales.py sample_sales.csv
  • Bash/WSL/macOS:

    python summarize_sales.py sample_sales.csv

The script prints a report to stdout.

CSV Format

At minimum you need:

  • units (integer)
  • unit_price (number)

Recommended columns (optional but unlock more features):

  • date (ISO format YYYY-MM-DD preferred) → enables monthly grouping
  • product
  • category
  • region

Sample CSV (save as sample_sales.csv)

date,order_id,product,category,units,unit_price,region
2025-07-01,1001,DeskMate Pro,Office,2,149.99,West
2025-07-01,1002,SlimKey 60,Electronics,1,89.50,West
2025-07-02,1003,Focus Lamp,Home,3,24.99,South
2025-07-02,1004,DeskMate Pro,Office,1,149.99,South
2025-07-03,1005,Ergo Chair,Office,1,299.00,West
2025-07-03,1006,SlimKey 60,Electronics,2,89.50,East
2025-07-05,1007,Focus Lamp,Home,5,24.99,East
2025-07-07,1008,AirPods Case,Accessories,4,12.00,West
2025-07-09,1009,USB-C Hub 7-in-1,Electronics,3,39.99,South
2025-08-01,1010,Ergo Chair,Office,2,299.00,East
2025-08-02,1011,DeskMate Pro,Office,1,149.99,West
2025-08-03,1012,Focus Lamp,Home,2,24.99,South
2025-08-04,1013,USB-C Hub 7-in-1,Electronics,4,39.99,East
2025-08-10,1014,SlimKey 60,Electronics,1,89.50,South
2025-08-12,1015,AirPods Case,Accessories,6,12.00,West
2025-08-15,1016,Focus Lamp,Home,1,24.99,West

Example Output

=== SUMMARY REPORT ===
Rows: 16
Total units sold: 39
Total revenue: $2,529.78

Revenue by category:
  - Office: $1,496.96
  - Electronics: $637.93
  - Home: $274.89
  - Accessories: $120.00

Top 3 products by revenue:
  - Ergo Chair: $897.00
  - DeskMate Pro: $599.96
  - SlimKey 60: $358.00

Revenue by month:
  - 2025-07: $1,385.36
  - 2025-08: $1,144.42

How date parsing works

The script tries to parse a date column only if it exists:

header = pd.read_csv(csv_path, nrows=0).columns
parse_cols = ["date"] if "date" in header else []
df = pd.read_csv(csv_path, parse_dates=parse_cols or None)
  • If the CSV contains date, pandas converts it to datetime64[ns].
  • If not, the script reads the file normally.
  • Parsing dates enables correct chronological sorting, monthly grouping, and .dt accessors.

Prefer ISO format (YYYY-MM-DD). If your file uses another format, you can enforce it:

df = pd.read_csv(csv_path)
df["date"] = pd.to_datetime(df["date"], format="%Y-%m-%d", errors="raise")

Project Structure

.
├─ summarize_sales.py
├─ sample_sales.csv         # optional example file
└─ requirements.txt         # contains: pandas

Troubleshooting

  • pip: command not found Use python -m pip install ... (or py -m pip ... on Windows).

  • Activation errors on Windows PowerShell needs an execution policy for this session:

    Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass
    .\.venv\Scripts\Activate.ps1
  • CSV missing columns The script requires units and unit_price. Add them or rename your columns.

  • Date parsing looks wrong Your CSV might use a different format. Force a format with pd.to_datetime(..., format=...).

FAQ

Q: What is df? A: It’s a pandas.DataFrame (a 2-D labeled table). A single column like df["revenue"] is a pandas.Series (1-D).

Q: Do I need a virtual environment? A: You can run without it, but a venv keeps dependencies clean and reproducible.

Q: Can I group by something else (e.g., region)? A: Yes. Example:

df.groupby("region")["revenue"].sum().sort_values(ascending=False)

About

python-csv-read-and-analyze

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages