# Assignment 1:
### - The automated_stat_analyzer Function
- Scenario: A retail company needs a utility to quickly summarize sales data. Students must create a function that identifies the 
"Central Tendency" and "Dispersion" of any numerical column.
- ### Requirements:

* Accept a Pandas DataFrame and a column name.

* Calculate the Mean, Median, and Standard Deviation .

* Identify if the data is "Skewed" by comparing the Mean and Median.


* Bonus: If the column is categorical, return the Mode instead.

### Your Data

In [8]:
import pandas as pd
import numpy as np

# Create a synthetic Company Sales Dataset
data = {
    'Transaction_ID': range(1, 11),
    'Product_Category': ['Electronics', 'Home', 'Electronics', 'Sports', 'Home', 
                         'Electronics', 'Home', 'Sports', 'Electronics', 'Electronics'],
    'Sales_Amount': [150, 200, 155, 300, 210, 180, 205, 1000, 190, 160], # 1000 is an Outlier
    'Customer_Age': [25, 34, np.nan, 45, 23, 31, 29, np.nan, 38, 40],    # Contains Nulls (NaN)
    'Rating': [5, 4, 3, 5, 2, 4, 5, 2, 4, 3]
}

df_test = pd.DataFrame(data)

# Save to CSV for students to practice loading files [cite: 74]
df_test.to_csv('company_sales_test.csv', index=False)
print("Test dataset created successfully!")

Test dataset created successfully!


In [9]:
df_test.head()

Unnamed: 0,Transaction_ID,Product_Category,Sales_Amount,Customer_Age,Rating
0,1,Electronics,150,25.0,5
1,2,Home,200,34.0,4
2,3,Electronics,155,,3
3,4,Sports,300,45.0,5
4,5,Home,210,23.0,2


In [10]:
import pandas as pd

def automated_stat_analyzer(df, column_name):
    """
    Company Task: Provide a summary report of a specific data variable.
    
    Instructions:
    1. Check if the column is numerical or categorical.
    2. For numerical: Calculate Mean, Median, and Standard Deviation.
    3. For categorical: Calculate the Mode.
    4. Return a dictionary with these statistical measures.
    """
    # TODO: Implement using df[column_name].mean(), .median(), .std(), or .mode()  you can used Sales_Amount for your test case
    column_data=df[column_name]
    if pd.api.types.is_numeric_dtype(column_data):
        # 2. حساب المقاييس الرقمية
        df['Mean'] = column_data.mean()
        df['Median'] = column_data.median()
        df['Std_Dev'] = column_data.std()
    else:
        # 3. حساب المنوال للبيانات الفئوية
        # المنوال في pandas يرجع Series، لذا نأخذ أول قيمة [0]
        df['Mode'] = column_data.mode()[0] if not column_data.mode().empty else None

    return df

 

In [11]:
print("Numerical Summary:", automated_stat_analyzer(df_test, 'Sales_Amount'))

print("Categorical Summary:", automated_stat_analyzer(df_test, 'Product_Category'))

Numerical Summary:    Transaction_ID Product_Category  Sales_Amount  Customer_Age  Rating   Mean  \
0               1      Electronics           150          25.0       5  275.0   
1               2             Home           200          34.0       4  275.0   
2               3      Electronics           155           NaN       3  275.0   
3               4           Sports           300          45.0       5  275.0   
4               5             Home           210          23.0       2  275.0   
5               6      Electronics           180          31.0       4  275.0   
6               7             Home           205          29.0       5  275.0   
7               8           Sports          1000           NaN       2  275.0   
8               9      Electronics           190          38.0       4  275.0   
9              10      Electronics           160          40.0       3  275.0   

   Median    Std_Dev  
0   195.0  258.30645  
1   195.0  258.30645  
2   195.0  258.30645

## Assignment 2: 
  ### The null_handling_strategy Function


#### Scenario: Incoming user data often has missing values.Students must implement a flexible strategy to handle these "Null Values" to prepare data for Machine Learning.
### Requirements:

* Check for null values in the DataFrame.

* Apply a strategy based on parameters: "drop_rows", "fill_mean", or "fill_median" .

* Ensure the function only fills numerical columns when using mean or median.

In [15]:
def null_handling_strategy(test_df, strategy="fill_mean"):
    """
    Company Task: Clean a dataset by resolving missing (NaN) values.
    """
    # TODO: Implement using .isnull(), .dropna(), or .fillna() you can used Customer_Age for your test case
    clean_df=test_df.copy()
    null_count = test_df.isnull().sum().sum()
    print(f"Total null values found: {null_count}")

    if strategy == "drop_rows":
       
        clean_df = clean_df.dropna()
        
    elif strategy in ["fill_mean", "fill_median"]:
       
        num_cols = clean_df.select_dtypes(include=['number']).columns
        
        for col in num_cols:
            if strategy == "fill_mean":
                fill_value = clean_df[col].mean()
            else: 
                fill_value = clean_df[col].median()
            
            clean_df[col] = clean_df[col].fillna(fill_value)
            
    else:
        print("Strategy not recognized. No changes applied.")

    return clean_df
    pass

In [16]:
result = null_handling_strategy(df_test, strategy="fill_mean")
print(result)

Total null values found: 2
   Transaction_ID Product_Category  Sales_Amount  Customer_Age  Rating   Mean  \
0               1      Electronics           150        25.000       5  275.0   
1               2             Home           200        34.000       4  275.0   
2               3      Electronics           155        33.125       3  275.0   
3               4           Sports           300        45.000       5  275.0   
4               5             Home           210        23.000       2  275.0   
5               6      Electronics           180        31.000       4  275.0   
6               7             Home           205        29.000       5  275.0   
7               8           Sports          1000        33.125       2  275.0   
8               9      Electronics           190        38.000       4  275.0   
9              10      Electronics           160        40.000       3  275.0   

   Median    Std_Dev         Mode  
0   195.0  258.30645  Electronics  
1   195.0