# Feature Analysis and EDA Framework Tutorial

This notebook demonstrates how to use the Feature Analysis framework for different types of data:
1. Tabular Data (Pandas DataFrame)
2. Image Data
3. Geospatial Data
4. Text Data

First, let's import our requirements and set up the framework.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from PIL import Image
import geopandas as gpd
import folium
from IPython.display import display, HTML

# Copy the FeatureAnalyzer class here
# [Previous FeatureAnalyzer code goes here]

## 1. Analyzing Tabular Data

Let's start with a simple dataset to demonstrate basic functionality.

In [None]:
# Create sample dataset
np.random.seed(42)
df = pd.DataFrame({
    'numeric_feature': np.random.normal(0, 1, 1000),
    'categorical_feature': np.random.choice(['A', 'B', 'C'], 1000),
    'target': np.random.randint(0, 2, 1000)
})

# Save temporary CSV
df.to_csv('temp_data.csv', index=False)

# Initialize analyzer
tabular_analyzer = FeatureAnalyzer('temp_data.csv', 'pandas')

In [None]:
# Perform data quality analysis
quality_metrics = tabular_analyzer.analyze_data_quality()
print("Data Quality Metrics:")
display(pd.DataFrame(quality_metrics))

In [None]:
# Perform univariate analysis
univariate_stats = tabular_analyzer.perform_univariate_analysis()
print("Univariate Statistics:")
display(pd.DataFrame(univariate_stats))

In [None]:
# Perform bivariate analysis
bivariate_stats = tabular_analyzer.perform_bivariate_analysis(target_column='target')
print("Bivariate Statistics:")
display(pd.DataFrame(bivariate_stats))

## 2. Analyzing Image Data

Now let's analyze some image data. We'll create a simple test image first.

In [None]:
# Create a sample image
img = Image.fromarray(np.random.randint(0, 255, (100, 100, 3)).astype('uint8'))
img.save('temp_image.jpg')

# Initialize analyzer
image_analyzer = FeatureAnalyzer('temp_image.jpg', 'image')

In [None]:
# Analyze image quality
image_quality = image_analyzer.analyze_data_quality()
print("Image Quality Metrics:")
display(pd.DataFrame([image_quality]))

## 3. Analyzing Geospatial Data

Let's create a simple GeoJSON file and analyze it.

In [None]:
# Create sample geospatial data
points = [
    Point([-73.935242, 40.730610]),  # New York
    Point([-122.419416, 37.774929]),  # San Francisco
]
gdf = gpd.GeoDataFrame(geometry=points)
gdf.to_file('temp_geo.geojson', driver='GeoJSON')

# Initialize analyzer
geo_analyzer = FeatureAnalyzer('temp_geo.geojson', 'geojson')

In [None]:
# Analyze geospatial quality
geo_quality = geo_analyzer.analyze_data_quality()
print("Geospatial Quality Metrics:")
display(pd.DataFrame([geo_quality]))

## 4. Additional Analysis Features

Let's demonstrate some additional analysis capabilities.

In [None]:
# Generate visualizations for tabular data
tabular_analyzer.generate_visualizations('output_viz')

# Display a few visualizations
plt.figure(figsize=(15, 5))
for i, img_path in enumerate(os.listdir('output_viz')):
    if i >= 3:  # Show only first 3 visualizations
        break
    plt.subplot(1, 3, i+1)
    img = plt.imread(f'output_viz/{img_path}')
    plt.imshow(img)
    plt.axis('off')
    plt.title(img_path)
plt.tight_layout()
plt.show()

In [None]:
# Generate comprehensive report
tabular_analyzer.generate_report('analysis_report.json')

# Display report contents
with open('analysis_report.json', 'r') as f:
    report = json.load(f)
print(json.dumps(report, indent=2))

## 5. Cleanup

Remove temporary files created during the tutorial.

In [None]:
# Cleanup temporary files
import os

files_to_remove = ['temp_data.csv', 'temp_image.jpg', 'temp_geo.geojson', 'analysis_report.json']
for file in files_to_remove:
    if os.path.exists(file):
        os.remove(file)

## Conclusion

This notebook demonstrated the key features of our Feature Analysis framework:

1. Handling multiple data types
2. Comprehensive quality analysis
3. Statistical analysis
4. Visualization generation
5. Report generation

You can extend this framework by:
- Adding more specialized metrics for each data type
- Implementing additional visualization types
- Adding export capabilities for different formats
- Implementing more advanced statistical analyses