Title: Grouping & Aggregating Data using Pandas<br>
Objective: Learn how to group data and perform aggregations on these groups.

Task 1: Grouping by a Single Column<br>

Task: Group the dataset by 'region' and calculate total sales per region.<br>
Steps:<br>
10. Load the dataset.<br>
11. Use groupby('region') on the DataFrame.<br>
12. Apply .sum() to the 'sales' column.

In [1]:
import pandas as pd
import os

# Step 10: Load the dataset
# List available files
print("Available files:")
for file in os.listdir():
    if file.endswith('.csv'):
        print(f"- {file}")

# Use the first CSV file found or specify the filename
csv_files = [f for f in os.listdir() if f.endswith('.csv')]
if csv_files:
    filename = csv_files[0]
    print(f"\nUsing file: {filename}")
    df = pd.read_csv(filename)
    
    # Step 11: Use groupby('region') on the DataFrame
    # Step 12: Apply .sum() to the 'sales' column
    region_sales = df.groupby('region')['sales'].sum()
    
    print("\nTotal sales per region:")
    print(region_sales)
else:
    print("\nNo CSV files found in the current directory.")
    print("Please place a CSV file with 'region' and 'sales' columns in the current directory.")

Available files:

No CSV files found in the current directory.
Please place a CSV file with 'region' and 'sales' columns in the current directory.


Task 2: Grouping by Multiple Columns<br>

Task: Group the dataset by 'region' and 'category', then find the average sales.<br>
Steps:<br>
13. Group by ['region', 'category'].<br>
14. Use .mean() on the 'sales' column.<br>
15. Examine the resulting DataFrame structure.

In [2]:
import pandas as pd
import os

# List available files
print("Available files:")
for file in os.listdir():
    if file.endswith('.csv'):
        print(f"- {file}")

# Use the first CSV file found
csv_files = [f for f in os.listdir() if f.endswith('.csv')]
if csv_files:
    filename = csv_files[0]
    print(f"\nUsing file: {filename}")
    df = pd.read_csv(filename)
    
    # Step 13: Group by ['region', 'category']
    # Step 14: Use .mean() on the 'sales' column
    region_category_avg = df.groupby(['region', 'category'])['sales'].mean()
    
    print("\nAverage sales by region and category:")
    print(region_category_avg)
    
    # Step 15: Examine the resulting DataFrame structure
    print("\nDataFrame structure information:")
    print(f"Type: {type(region_category_avg)}")
    print(f"Shape: {region_category_avg.shape}")
    print(f"Index type: {type(region_category_avg.index)}")
    print(f"Index levels: {region_category_avg.index.names}")
    
    # Convert to regular DataFrame for easier viewing
    reset_df = region_category_avg.reset_index()
    print("\nAs regular DataFrame (reset_index):")
    print(reset_df.head())
else:
    print("\nNo CSV files found in the current directory.")
    print("Please place a CSV file with 'region', 'category', and 'sales' columns in the current directory.")

Available files:

No CSV files found in the current directory.
Please place a CSV file with 'region', 'category', and 'sales' columns in the current directory.


Task 3: Aggregating Multiple Functions<br>

Task: Group data by 'category' and apply multiple aggregation functions (sum and count) on 'quantity'.<br>
Steps:<br>
16. Group by 'category'.<br>
17. Use .agg(['sum', 'count']) on 'quantity'.<br>
18. Analyze the result to understand how multiple aggregations work.

In [3]:
import pandas as pd
import os

# List available files
print("Available files:")
for file in os.listdir():
    if file.endswith('.csv'):
        print(f"- {file}")

# Use the first CSV file found
csv_files = [f for f in os.listdir() if f.endswith('.csv')]
if csv_files:
    filename = csv_files[0]
    print(f"\nUsing file: {filename}")
    df = pd.read_csv(filename)
    
    # Step 16: Group by 'category'
    # Step 17: Use .agg(['sum', 'count']) on 'quantity'
    category_agg = df.groupby('category')['quantity'].agg(['sum', 'count'])
    
    print("\nMultiple aggregations (sum and count) by category:")
    print(category_agg)
    
    # Step 18: Analyze the result
    print("\nAnalysis of multiple aggregations:")
    print(f"DataFrame shape: {category_agg.shape}")
    print(f"Column names: {category_agg.columns.tolist()}")
    print(f"Index name: {category_agg.index.name}")
    
    # Calculate average quantity per category using the aggregated results
    category_agg['average'] = category_agg['sum'] / category_agg['count']
    print("\nAdding calculated average column:")
    print(category_agg)
    
    # Sort by sum to see which category has highest total quantity
    print("\nCategories sorted by total quantity (descending):")
    print(category_agg.sort_values('sum', ascending=False))
else:
    print("\nNo CSV files found in the current directory.")
    print("Please place a CSV file with 'category' and 'quantity' columns in the current directory.")

Available files:

No CSV files found in the current directory.
Please place a CSV file with 'category' and 'quantity' columns in the current directory.
