# Demonstrating the `DataGuy` Package

This notebook demonstrates the features and complexity of the `DataGuy` package. It includes examples of data loading, summarization, wrangling, visualization, and analysis, as well as insights into the background functionalities like LLM integration and safe code execution.

In [None]:
# Install necessary packages
!pip install pandas numpy matplotlib scikit-learn claudette anthropic
!pip install dataguy

## Step 1: Importing the Package

We start by importing the `DataGuy` package and other necessary libraries.

In [None]:
from dataguy import DataGuy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

## Step 2: Initializing `DataGuy`

We create an instance of `DataGuy` with a custom `max_code_history` parameter to demonstrate its configurability.

In [None]:
# Initialize DataGuy
data_guy = DataGuy(max_code_history=50)

## Step 3: Loading Data

`DataGuy` supports multiple data formats, including pandas DataFrames, dictionaries, lists, numpy arrays, and CSV files. Here, we load a sample dataset.

In [None]:
# Create a sample dataset
data = pd.DataFrame({
    "age": [25, 30, None, 45, 50, None],
    "score": [88, 92, 75, None, 85, 90]
})

# Load data into DataGuy
data_guy.set_data(data)

## Step 4: Summarizing Data

`DataGuy` provides a summary of the dataset, including its shape, columns, missing values, and means.

In [None]:
# Summarize the data
summary = data_guy.summarize_data()
print("Data Summary:", summary)

## Step 5: Wrangling Data

`DataGuy` uses LLM-generated code to clean and preprocess the dataset. This demonstrates the integration of AI in data wrangling.

In [None]:
# Wrangle the data
cleaned_data = data_guy.wrangle_data()
print("Cleaned Data:")
print(cleaned_data)

## Step 6: Visualizing Data

`DataGuy` can generate visualizations using matplotlib. Here, we create a scatter plot of two columns.

In [None]:
# Plot the data
data_guy.plot_data("age", "score")

## Step 7: Analyzing Data

`DataGuy` performs automated analysis of the dataset, returning descriptive statistics and insights.

In [None]:
# Analyze the data
analysis_results = data_guy.analyze_data()
print("Analysis Results:", analysis_results)

## Step 8: Exploring Background Functionalities

### Safe Code Execution
`DataGuy` ensures that only safe and trusted code is executed by analyzing the Abstract Syntax Tree (AST) of generated code.

### LLM Integration
The package uses Large Language Models (LLMs) to generate code for tasks like data wrangling and visualization. It retries with corrections if the initial code fails.

In [None]:
# Example: Safe code execution
try:
    unsafe_code = "import os; os.system('rm -rf /')"
    is_safe = data_guy._is_safe_code(unsafe_code)
    print("Is the code safe?", is_safe)
except Exception as e:
    print("Error during safety check:", e)

## Conclusion

This notebook demonstrates the capabilities and complexity of the `DataGuy` package, highlighting its integration with LLMs, safe code execution, and automated data science workflows.