DataGuy is a Python package designed to simplify data science workflows by leveraging the power of Large Language Models (LLMs). It provides tools for automated data wrangling, intelligent analysis, and AI-assisted visualization, making it ideal for small-to-medium datasets.
- GitHub: View the source code on GitHub
- PyPI: Install from PyPI
- Documentation: Read the full documentation
- Demo: Try the demo
- Automated Data Wrangling: Clean and preprocess your data with minimal effort using LLM-generated code.
- AI-Powered Data Visualization: Generate insightful plots and visualizations based on natural language descriptions.
- Intelligent Data Analysis: Perform descriptive and inferential analysis with the help of LLMs.
- Customizable Workflows: Integrate with pandas, matplotlib, and other Python libraries for seamless data manipulation.
- Safe Code Execution: Built-in safeguards to ensure only safe and trusted code is executed.
Install the package using pip:
pip install dataguy-
Load Anthrpic API key in your environment:
import os os.environ["ANTHROPIC_API_KEY"] = "your_api_key_here"
Replace
your_api_key_herewith your actual API key from Anthropic. -
Import the Package:
from dataguy import DataGuy
-
Initialize a DataGuy Instance:
dg = DataGuy()
-
Load Your Data:
import pandas as pd data = pd.DataFrame({"age": [25, 30, None], "score": [88, 92, 75]}) dg.set_data(data)
-
Summarize Your Data:
summary = dg.summarize_data() print(summary)
-
Wrangle Your Data:
cleaned_data = dg.wrangle_data()
-
Visualize Your Data:
dg.plot_data("age", "score")
-
Analyze Your Data:
results = dg.analyze_data() print(results)
from dataguy import DataGuy
import pandas as pd
# Initialize DataGuy
dg = DataGuy()
# Load data
data = pd.read_csv("path/to/data.csv")
dg.set_data(data)
# Summarize data
summary = dg.summarize_data()
print("Data Summary:", summary)
# Wrangle data
cleaned_data = dg.wrangle_data()
# Visualize data
dg.plot_data("column_x", "column_y")
# Analyze data
analysis_results = dg.analyze_data()
print("Analysis Results:", analysis_results)set_data(obj): Load data into theDataGuyinstance. Supports pandas DataFrames, dictionaries, lists, numpy arrays, and CSV files.summarize_data(): Generate a summary of the dataset, including shape, columns, missing values, and means.wrangle_data(): Automatically clean and preprocess the dataset for analysis.plot_data(column_x, column_y): Create a scatter plot of two columns using matplotlib.analyze_data(): Perform an automated analysis of the dataset, returning descriptive statistics and insights.
- Python 3.8 or higher
- Dependencies:
- pandas
- numpy
- matplotlib
- scikit-learn
- claudette
- anthropic
Contributions are welcome! Please submit issues or pull requests via the GitHub repository.
This project is licensed under the MIT License. See the LICENSE file for details.
- István Magyary
- Sára Viemann
- Kristóf Bálint
For inquiries, contact: magistak@gmail.com