📊 DataStory - Automatic Storytelling from Data

Turn raw data into compelling business narratives automatically.

DataStory analyzes your datasets and generates full written reports with insights, trends, and recommendations - no LLMs needed, pure Python intelligence.

🚀 The Problem

Dashboards don't explain insights - They show graphs, not stories
People want narratives - Business stakeholders need context, not just charts
Manual analysis takes time - Writing reports is tedious and repetitive
Insights get lost - Important patterns buried in spreadsheets

💡 The Solution

DataStory automatically:

✅ Analyzes your data for trends, patterns, and anomalies
✅ Generates natural language business narratives
✅ Identifies risks and opportunities
✅ Provides actionable recommendations
✅ Exports to text, markdown, HTML, or PDF

All with a single line of code!

📦 Installation

pip install datastory

For full features (charts, Excel, PDF):

pip install datastory[full]

🎯 Quick Start

One-Line Magic

from datastory import narrate

report = narrate("sales.csv")
print(report)

Output:

📊 EXECUTIVE SUMMARY
==================================================
Analyzed 1,247 records across 8 dimensions.

🟡 3 high-priority insights identified.

Key Highlights:
1. Sales increased by 12.3% from $450,000 to $505,000.
2. Customer churn rose in April by 8.5%, requiring attention.
3. West Africa region dominates sales, accounting for 45.2% of revenue.

📈 KEY FINDINGS
==================================================

**Performance Trends:**
• Sales Shows Strong Growth: Sales increased by 12.3% from $450,000 to $505,000.
• Revenue per Customer Rising: Average order value grew by 15.7%.

**Notable Anomalies:**
• Unusual Values Detected in Order Quantity: Found 23 outliers (1.8% of data).

**Relationships Discovered:**
• Strong Positive Link: Marketing Spend and Revenue move together (correlation: 0.85).

🔍 DETAILED ANALYSIS
==================================================

**High-Priority Insights:**

🟡 Customer Churn Rising
   Customer churn increased by 8.5% in April. This represents a significant concern.

🟡 Low Stock Risk: Product X
   Minimum inventory is 12 units, significantly below average of 150. Consider restocking.

💡 RECOMMENDATIONS
==================================================
1. Investigate the decline in customer retention and implement recovery strategies
2. Capitalize on the growth in revenue per customer to maximize returns
3. Replenish product_x inventory to avoid stockouts
4. Review outliers in order quantity to identify root causes
5. Leverage identified relationships between metrics for predictive insights

==================================================
Report generated on December 03, 2025 at 1:20 PM
Powered by DataStory - Automatic Storytelling from Data

🔥 Key Features

1. Pure Python Intelligence

No LLMs or AI APIs required
Works offline
Fast and deterministic
Zero-cost analysis

2. Comprehensive Analysis

Statistical summaries
Trend detection
Anomaly identification
Correlation discovery
Time series patterns
Risk assessment

3. Natural Language Output

Business-friendly narratives
Context-aware descriptions
Action-oriented recommendations
Multiple detail levels

4. Flexible Export

from datastory import DataStory

story = DataStory()
story.load("data.csv")

# Export to different formats
story.export("report.txt", format="text")
story.export("report.md", format="markdown")
story.export("report.html", format="html", include_charts=True)
story.export("report.pdf", format="pdf")

5. Multiple Data Sources

# CSV, Excel, JSON, Parquet
story.load("sales.csv")
story.load("data.xlsx")
story.load("records.json")
story.load("dataset.parquet")

# URLs
story.load("https://example.com/data.csv")

# Pandas DataFrames
import pandas as pd
df = pd.read_sql("SELECT * FROM sales", conn)
story.load(df)

📖 Advanced Usage

Customization

from datastory import DataStory

# Configure narrative style
config = {
    "style": "business",  # business, casual, technical
    "detail_level": "detailed",  # brief, medium, detailed
    "include_recommendations": True
}

story = DataStory(config=config)
story.load("sales.csv")
narrative = story.generate_narrative()
print(narrative)

Programmatic Access

# Access insights directly
story = DataStory()
story.load("data.csv")

insights = story.extract_insights()
for insight in insights:
    print(f"{insight.type}: {insight.title}")
    print(f"Priority: {insight.priority}")
    print(f"Description: {insight.description}\n")

Analysis Results

# Get raw analysis results
story = DataStory()
story.load("data.csv")

results = story.analyze()
print(results["trends"])
print(results["anomalies"])
print(results["correlations"])

🎓 Use Cases

1. Business Intelligence

Generate executive summaries from sales, marketing, or financial data.

2. Data Science Reports

Automatically document exploratory data analysis (EDA) findings.

3. Automated Monitoring

Create daily/weekly reports on KPIs and metrics.

4. Client Reporting

Transform raw analytics into client-ready narratives.

5. Academic Research

Quickly summarize dataset characteristics and patterns.

🆚 Why DataStory?

Feature	DataStory	Traditional BI	LLM-based
Setup Time	Instant	Hours/Days	API setup
Cost	Free	$$$$	$$$ per call
Offline Use	✅ Yes	❌ No	❌ No
Customizable	✅ Full control	⚠️ Limited	❌ Black box
Speed	⚡ Instant	🐌 Slow	⏳ API delays
Privacy	🔒 Local	⚠️ Cloud	❌ Sent to API
Deterministic	✅ Yes	✅ Yes	❌ No

📊 Example Datasets

The examples/ directory includes sample datasets:

sales.csv - Sales performance data
customer_churn.csv - Customer retention data
inventory.csv - Stock levels and products

🛠️ Technical Details

Architecture

Core Analyzer: Statistical analysis using pandas/numpy
Insight Extractor: Pattern recognition and business logic
Narrative Generator: Template-based natural language generation
Data Loaders: Multi-format support (CSV, Excel, JSON, Parquet)
Report Formatters: Export to text, markdown, HTML, PDF

Dependencies

Core: pandas, numpy
Optional: matplotlib (charts), openpyxl (Excel), reportlab (PDF)

Performance

Analyzes 100K rows in <2 seconds
Generates narrative in <1 second
Low memory footprint

🤝 Contributing

Contributions are welcome! Please:

Fork the repository
Create a feature branch
Submit a pull request

📝 License

MIT License - see LICENSE file for details.

🙏 Acknowledgments

Built with ❤️ by Idriss Bado

Inspired by the need for better data communication in business.

📧 Contact

GitHub: @idrissbado
PyPI: datastory

⭐ Star this repo if you find it useful!

🐛 Found a bug? Open an issue

💡 Have an idea? Start a discussion

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
examples		examples
src/datastory		src/datastory
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test_datastory.py		test_datastory.py

Folders and files

Latest commit

History

Repository files navigation

📊 DataStory - Automatic Storytelling from Data

🚀 The Problem

💡 The Solution

📦 Installation

🎯 Quick Start

One-Line Magic

🔥 Key Features

1. Pure Python Intelligence

2. Comprehensive Analysis

3. Natural Language Output

4. Flexible Export

5. Multiple Data Sources

📖 Advanced Usage

Customization

Programmatic Access

Analysis Results

🎓 Use Cases

1. Business Intelligence

2. Data Science Reports

3. Automated Monitoring

4. Client Reporting

5. Academic Research

🆚 Why DataStory?

📊 Example Datasets

🛠️ Technical Details

Architecture

Dependencies

Performance

🤝 Contributing

📝 License

🙏 Acknowledgments

📧 Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages