Skip to content

idrissbado/DataStory

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ“Š DataStory - Automatic Storytelling from Data

PyPI version Python 3.8+ License: MIT

Turn raw data into compelling business narratives automatically.

DataStory analyzes your datasets and generates full written reports with insights, trends, and recommendations - no LLMs needed, pure Python intelligence.

πŸš€ The Problem

  • Dashboards don't explain insights - They show graphs, not stories
  • People want narratives - Business stakeholders need context, not just charts
  • Manual analysis takes time - Writing reports is tedious and repetitive
  • Insights get lost - Important patterns buried in spreadsheets

πŸ’‘ The Solution

DataStory automatically:

  • βœ… Analyzes your data for trends, patterns, and anomalies
  • βœ… Generates natural language business narratives
  • βœ… Identifies risks and opportunities
  • βœ… Provides actionable recommendations
  • βœ… Exports to text, markdown, HTML, or PDF

All with a single line of code!

πŸ“¦ Installation

pip install datastory

For full features (charts, Excel, PDF):

pip install datastory[full]

🎯 Quick Start

One-Line Magic

from datastory import narrate

report = narrate("sales.csv")
print(report)

Output:

πŸ“Š EXECUTIVE SUMMARY
==================================================
Analyzed 1,247 records across 8 dimensions.

🟑 3 high-priority insights identified.

Key Highlights:
1. Sales increased by 12.3% from $450,000 to $505,000.
2. Customer churn rose in April by 8.5%, requiring attention.
3. West Africa region dominates sales, accounting for 45.2% of revenue.

πŸ“ˆ KEY FINDINGS
==================================================

**Performance Trends:**
β€’ Sales Shows Strong Growth: Sales increased by 12.3% from $450,000 to $505,000.
β€’ Revenue per Customer Rising: Average order value grew by 15.7%.

**Notable Anomalies:**
β€’ Unusual Values Detected in Order Quantity: Found 23 outliers (1.8% of data).

**Relationships Discovered:**
β€’ Strong Positive Link: Marketing Spend and Revenue move together (correlation: 0.85).

πŸ” DETAILED ANALYSIS
==================================================

**High-Priority Insights:**

🟑 Customer Churn Rising
   Customer churn increased by 8.5% in April. This represents a significant concern.

🟑 Low Stock Risk: Product X
   Minimum inventory is 12 units, significantly below average of 150. Consider restocking.

πŸ’‘ RECOMMENDATIONS
==================================================
1. Investigate the decline in customer retention and implement recovery strategies
2. Capitalize on the growth in revenue per customer to maximize returns
3. Replenish product_x inventory to avoid stockouts
4. Review outliers in order quantity to identify root causes
5. Leverage identified relationships between metrics for predictive insights

==================================================
Report generated on December 03, 2025 at 1:20 PM
Powered by DataStory - Automatic Storytelling from Data

πŸ”₯ Key Features

1. Pure Python Intelligence

  • No LLMs or AI APIs required
  • Works offline
  • Fast and deterministic
  • Zero-cost analysis

2. Comprehensive Analysis

  • Statistical summaries
  • Trend detection
  • Anomaly identification
  • Correlation discovery
  • Time series patterns
  • Risk assessment

3. Natural Language Output

  • Business-friendly narratives
  • Context-aware descriptions
  • Action-oriented recommendations
  • Multiple detail levels

4. Flexible Export

from datastory import DataStory

story = DataStory()
story.load("data.csv")

# Export to different formats
story.export("report.txt", format="text")
story.export("report.md", format="markdown")
story.export("report.html", format="html", include_charts=True)
story.export("report.pdf", format="pdf")

5. Multiple Data Sources

# CSV, Excel, JSON, Parquet
story.load("sales.csv")
story.load("data.xlsx")
story.load("records.json")
story.load("dataset.parquet")

# URLs
story.load("https://example.com/data.csv")

# Pandas DataFrames
import pandas as pd
df = pd.read_sql("SELECT * FROM sales", conn)
story.load(df)

πŸ“– Advanced Usage

Customization

from datastory import DataStory

# Configure narrative style
config = {
    "style": "business",  # business, casual, technical
    "detail_level": "detailed",  # brief, medium, detailed
    "include_recommendations": True
}

story = DataStory(config=config)
story.load("sales.csv")
narrative = story.generate_narrative()
print(narrative)

Programmatic Access

# Access insights directly
story = DataStory()
story.load("data.csv")

insights = story.extract_insights()
for insight in insights:
    print(f"{insight.type}: {insight.title}")
    print(f"Priority: {insight.priority}")
    print(f"Description: {insight.description}\n")

Analysis Results

# Get raw analysis results
story = DataStory()
story.load("data.csv")

results = story.analyze()
print(results["trends"])
print(results["anomalies"])
print(results["correlations"])

πŸŽ“ Use Cases

1. Business Intelligence

Generate executive summaries from sales, marketing, or financial data.

2. Data Science Reports

Automatically document exploratory data analysis (EDA) findings.

3. Automated Monitoring

Create daily/weekly reports on KPIs and metrics.

4. Client Reporting

Transform raw analytics into client-ready narratives.

5. Academic Research

Quickly summarize dataset characteristics and patterns.

πŸ†š Why DataStory?

Feature DataStory Traditional BI LLM-based
Setup Time Instant Hours/Days API setup
Cost Free $$$$ $$$ per call
Offline Use βœ… Yes ❌ No ❌ No
Customizable βœ… Full control ⚠️ Limited ❌ Black box
Speed ⚑ Instant 🐌 Slow ⏳ API delays
Privacy πŸ”’ Local ⚠️ Cloud ❌ Sent to API
Deterministic βœ… Yes βœ… Yes ❌ No

πŸ“Š Example Datasets

The examples/ directory includes sample datasets:

  • sales.csv - Sales performance data
  • customer_churn.csv - Customer retention data
  • inventory.csv - Stock levels and products

πŸ› οΈ Technical Details

Architecture

  • Core Analyzer: Statistical analysis using pandas/numpy
  • Insight Extractor: Pattern recognition and business logic
  • Narrative Generator: Template-based natural language generation
  • Data Loaders: Multi-format support (CSV, Excel, JSON, Parquet)
  • Report Formatters: Export to text, markdown, HTML, PDF

Dependencies

  • Core: pandas, numpy
  • Optional: matplotlib (charts), openpyxl (Excel), reportlab (PDF)

Performance

  • Analyzes 100K rows in <2 seconds
  • Generates narrative in <1 second
  • Low memory footprint

🀝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Submit a pull request

πŸ“ License

MIT License - see LICENSE file for details.

πŸ™ Acknowledgments

Built with ❀️ by Idriss Bado

Inspired by the need for better data communication in business.

πŸ“§ Contact


⭐ Star this repo if you find it useful!

πŸ› Found a bug? Open an issue

πŸ’‘ Have an idea? Start a discussion

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages