In [181]:
import pandas as pd
import numpy as np 
from scipy.stats import chi2_contingency
from itertools import combinations

# Load the CSV file
file_path = r'C:\Users\Dell\Downloads\Imports_Exports_Dataset.csv'
df = pd.read_csv(file_path)

# Generate 2001 random records
random_df = df.sample(n=2001, random_state=55020)

# View the first few records to ensure the dataset is correct
random_df.head()

Unnamed: 0,Transaction_ID,Country,Product,Import_Export,Quantity,Value,Date,Category,Port,Customs_Code,Weight,Shipping_Method,Supplier,Customer,Invoice_Number,Payment_Terms
2407,92931f54-4dca-41c1-aa1f-7a9391340a41,Cambodia,set,Import,9258,4079.8,13-05-2022,Clothing,Sallystad,537420,4424.72,Air,Klein-Wise,Mark Garcia,7748664,Net 30
5021,9d486bf1-914c-475a-b3cd-5e52eaf68027,Northern Mariana Islands,watch,Export,9147,3881.42,08-05-2024,Clothing,North Christopherberg,333634,2801.65,Sea,"Rivas, Mann and Turner",Stephanie Gates,33084263,Net 30
4605,5de95471-4d60-4cc6-82f9-91676a71a0e5,Mayotte,onto,Import,6556,501.74,28-12-2022,Clothing,Leonfurt,664774,3611.25,Sea,Young and Sons,Craig Harper,58456025,Net 60
11017,17d737c9-65da-4d27-8562-6be92b52308f,Philippines,agreement,Export,3517,341.83,19-12-2019,Clothing,Clarkebury,648115,3700.05,Air,Davis-Edwards,Richard Gray,60567165,Net 60
12207,7b2b45c0-7d6c-4f1e-aa22-6fbbe75b6190,Pakistan,friend,Export,3716,6997.8,25-03-2022,Clothing,Hernandeztown,475381,1238.25,Sea,Jones-Brandt,Erin Sutton,80468789,Net 60


In [183]:
# Identify categorical columns
categorical_columns = df.select_dtypes(include=['object', 'category']).columns.tolist()

In [185]:
# Identify non-categorical columns
non_categorical_columns = df.select_dtypes(include=[np.number]).columns.tolist()

In [187]:
# Split into categorical and non-categorical datasets
categorical_df = df[categorical_columns]
non_categorical_df = df[non_categorical_columns]

# Display the categorical and non-categorical columns
print("Categorical Columns:", categorical_columns)
print("Non-Categorical Columns:", non_categorical_columns)


Categorical Columns: ['Transaction_ID', 'Country', 'Product', 'Import_Export', 'Date', 'Category', 'Port', 'Shipping_Method', 'Supplier', 'Customer', 'Payment_Terms']
Non-Categorical Columns: ['Quantity', 'Value', 'Customs_Code', 'Weight', 'Invoice_Number']


In [189]:
# Descriptive statistics for non-categorical data
descriptive_stats = random_df[non_categorical_columns].describe()
print("\nDescriptive Statistics for Non-Categorical Data:\n", descriptive_stats)


Descriptive Statistics for Non-Categorical Data:
            Quantity        Value   Customs_Code       Weight  Invoice_Number
count   2001.000000  2001.000000    2001.000000  2001.000000    2.001000e+03
mean    5014.079960  5008.034533  535893.304848  2499.504393    5.024264e+07
std     2847.085928  2899.149164  261004.151748  1438.597982    2.902805e+07
min       18.000000   102.870000  100041.000000     1.980000    7.801700e+04
25%     2555.000000  2447.630000  315430.000000  1285.910000    2.479617e+07
50%     4986.000000  5039.770000  517674.000000  2421.580000    5.066739e+07
75%     7448.000000  7513.780000  760770.000000  3764.270000    7.439026e+07
max    10000.000000  9993.020000  999768.000000  4994.900000    9.997707e+07


In [191]:
# Descriptive statistics for categorical data 
descriptive_stats = random_df[categorical_columns].describe()
print("\nDescriptive Statistics for Categorical Data:\n", descriptive_stats)


Descriptive Statistics for Categorical Data:
                               Transaction_ID  Country   Product Import_Export  \
count                                   2001     2001      2001          2001   
unique                                  2001      243       837             2   
top     92931f54-4dca-41c1-aa1f-7a9391340a41  Mayotte  positive        Import   
freq                                       1       18         8          1016   

              Date   Category          Port Shipping_Method     Supplier  \
count         2001       2001          2001            2001         2001   
unique        1237          5          1854               3         1906   
top     08-11-2021  Furniture  South Robert             Air  Smith Group   
freq             7        410             5             679            4   

                 Customer Payment_Terms  
count                2001          2001  
unique               1968             4  
top     Samantha Williams        Net 60 

In [193]:
# Range 
range_values = random_df[non_categorical_columns].max() - random_df[non_categorical_columns].min()

# Standard Deviation
std_dev = random_df[non_categorical_columns].std()

# Skewness
skewness = random_df[non_categorical_columns].skew()

# Kurtosis
kurtosis = random_df[non_categorical_columns].kurt()

# Correlation Matrix
correlation_matrix = random_df[non_categorical_columns].corr()

# Print the measures of dispersion
print("\n--- Measures of Dispersion for Non-Categorical Data ---")
print("\nRange:\n", range_values)
print("\nStandard Deviation:\n", std_dev)
print("\nSkewness:\n", skewness)
print("\nKurtosis:\n", kurtosis)
print("\nCorrelation Matrix:\n", correlation_matrix)


--- Measures of Dispersion for Non-Categorical Data ---

Range:
 Quantity              9982.00
Value                 9890.15
Customs_Code        899727.00
Weight                4992.92
Invoice_Number    99899055.00
dtype: float64

Standard Deviation:
 Quantity          2.847086e+03
Value             2.899149e+03
Customs_Code      2.610042e+05
Weight            1.438598e+03
Invoice_Number    2.902805e+07
dtype: float64

Skewness:
 Quantity          0.014186
Value             0.008251
Customs_Code      0.070813
Weight            0.045819
Invoice_Number   -0.019080
dtype: float64

Kurtosis:
 Quantity         -1.175037
Value            -1.221326
Customs_Code     -1.193286
Weight           -1.189971
Invoice_Number   -1.196075
dtype: float64

Correlation Matrix:
                 Quantity     Value  Customs_Code    Weight  Invoice_Number
Quantity        1.000000  0.017422      0.009492  0.039891       -0.028761
Value           0.017422  1.000000     -0.001446  0.018703        0.031917
Custom

In [195]:
# Initialize a dictionary to store analysis data
analysis_data = {
    'Column': [],
    'Count': [],
    'Frequency': [],
    'Proportion': [],
    'Minimum': [],
    'Maximum': [],
    'Mode': [],
    'Rank': []
}

# Analysis of categorical columns in tabular form
for col in categorical_columns:
    analysis_data['Column'].append(col)
    
    # 1. Count of non-null entries
    count = categorical_df[col].count()
    analysis_data['Count'].append(count)
    
    # 2. Frequency of the most frequent value
    frequency = categorical_df[col].value_counts().max()
    analysis_data['Frequency'].append(frequency)
    
    # 3. Proportion of the most frequent value
    proportion = categorical_df[col].value_counts(normalize=True).max()
    analysis_data['Proportion'].append(proportion)
    
    # 4. Minimum and Maximum (for numeric-like categorical columns, if applicable)
    if pd.api.types.is_numeric_dtype(df[col]):
        min_val = categorical_df[col].min()
        max_val = categorical_df[col].max()
        analysis_data['Minimum'].append(min_val)
        analysis_data['Maximum'].append(max_val)
    else:
        analysis_data['Minimum'].append('N/A')
        analysis_data['Maximum'].append('N/A')
    
    # 5. Mode (Most frequent value)
    mode = categorical_df[col].mode()[0]
    analysis_data['Mode'].append(mode)
    
    # 6. Rank based on frequency
    rank = categorical_df[col].value_counts().rank(ascending=False).iloc[0]
    analysis_data['Rank'].append(rank)

# Convert the dictionary to a DataFrame for tabular output
analysis_df = pd.DataFrame(analysis_data)

# Display the analysis as a table
print(analysis_df)

# Correlation between categorical variables (in tabular format)
# Convert categorical columns to numeric codes for correlation analysis
cat_codes = categorical_df.apply(lambda x: x.astype('category').cat.codes)

# Correlation matrix for categorical columns
corr_matrix = cat_codes.corr()

# Display the correlation matrix
print("\nCorrelation Matrix:")
print(corr_matrix)

             Column  Count  Frequency  Proportion Minimum Maximum  \
0    Transaction_ID  15000          1    0.000067     N/A     N/A   
1           Country  15000        133    0.008867     N/A     N/A   
2           Product  15000         28    0.001867     N/A     N/A   
3     Import_Export  15000       7569    0.504600     N/A     N/A   
4              Date  15000         19    0.001267     N/A     N/A   
5          Category  15000       3048    0.203200     N/A     N/A   
6              Port  15000         20    0.001333     N/A     N/A   
7   Shipping_Method  15000       5054    0.336933     N/A     N/A   
8          Supplier  15000         21    0.001400     N/A     N/A   
9          Customer  15000          8    0.000533     N/A     N/A   
10    Payment_Terms  15000       3831    0.255400     N/A     N/A   

                                    Mode    Rank  
0   00073cc2-c801-467c-9039-fca63c78c6a9  7500.5  
1                                  Congo     1.0  
2                 

In [197]:
from scipy import stats
# Initialize a dictionary to store the results
composite_measures = {
    'Column': [],
    'Mean': [],
    'Standard Deviation': [],
    'Coefficient of Variation (CV)': [],
    '95% Confidence Interval (Lower)': [],
    '95% Confidence Interval (Upper)': []
}

# Calculate composite measures for each non-categorical column
for col in non_categorical_columns:
    mean_val = df[col].mean()
    std_dev = df[col].std()
    count = df[col].count()
    
    # Coefficient of Variation (CV)
    cv = std_dev / mean_val
    
    # Confidence Interval (95%)
    confidence_level = 0.95
    z_value = stats.norm.ppf((1 + confidence_level) / 2)  # Z-score for 95% confidence
    margin_of_error = z_value * (std_dev / np.sqrt(count))
    ci_lower = mean_val - margin_of_error
    ci_upper = mean_val + margin_of_error
    
    # Store the results
    composite_measures['Column'].append(col)
    composite_measures['Mean'].append(mean_val)
    composite_measures['Standard Deviation'].append(std_dev)
    composite_measures['Coefficient of Variation (CV)'].append(cv)
    composite_measures['95% Confidence Interval (Lower)'].append(ci_lower)
    composite_measures['95% Confidence Interval (Upper)'].append(ci_upper)

# Convert the dictionary into a DataFrame for tabular display
composite_measures_df = pd.DataFrame(composite_measures)

# Display the results as a table
print(composite_measures_df)

           Column          Mean  Standard Deviation  \
0        Quantity  4.980555e+03        2.866167e+03   
1           Value  5.032931e+03        2.857594e+03   
2    Customs_Code  5.495080e+05        2.608869e+05   
3          Weight  2.492119e+03        1.451379e+03   
4  Invoice_Number  5.020677e+07        2.889888e+07   

   Coefficient of Variation (CV)  95% Confidence Interval (Lower)  \
0                       0.575471                     4.934687e+03   
1                       0.567779                     4.987201e+03   
2                       0.474765                     5.453330e+05   
3                       0.582387                     2.468892e+03   
4                       0.575597                     4.974430e+07   

   95% Confidence Interval (Upper)  
0                     5.026422e+03  
1                     5.078661e+03  
2                     5.536829e+05  
3                     2.515345e+03  
4                     5.066924e+07  


In [199]:
from scipy.stats import ttest_ind

# Independent t-test between Quantity and Value
t_stat, p_value = ttest_ind(df['Quantity'], df['Value'])
print(f"T-test Statistic: {t_stat}, P-value: {p_value}")

T-test Statistic: -1.5849450669854537, P-value: 0.1129893567049956


In [201]:
from scipy.stats import f_oneway

# Perform ANOVA on different categories (Group by Category and compare Quantity)
groups = [df['Quantity'][df['Category'] == category] for category in df['Category'].unique()]
f_stat, p_value = f_oneway(*groups)
print(f"ANOVA F-Statistic: {f_stat}, P-value: {p_value}")

ANOVA F-Statistic: 0.2990216089664796, P-value: 0.8787372680249909


In [203]:
from scipy.stats import levene

# Levene’s test for equal variance between Quantity and Value
stat, p_value = levene(df['Quantity'], df['Value'])
print(f"Levene’s Statistic: {stat}, P-value: {p_value}")

Levene’s Statistic: 0.1683953325598296, P-value: 0.6815448123110948


In [205]:
from scipy.stats import bartlett

# Bartlett’s test for equal variance between Quantity and Value
stat, p_value = bartlett(df['Quantity'], df['Value'])
print(f"Bartlett’s Statistic: {stat}, P-value: {p_value}")

Bartlett’s Statistic: 0.1345827997059782, P-value: 0.7137269672324467


In [207]:
from statsmodels.stats.proportion import proportions_ztest

# z-test for proportions (e.g., between two shipping methods)
count = [df['Shipping_Method'].value_counts()['Air'], df['Shipping_Method'].value_counts()['Sea']]
nobs = [df['Shipping_Method'].count(), df['Shipping_Method'].count()]
stat, p_value = proportions_ztest(count, nobs)
print(f"Z-test Statistic: {stat}, P-value: {p_value}")

Z-test Statistic: -0.7217202345286419, P-value: 0.47046649899058324


In [209]:
from scipy.stats import chi2_contingency

# Chi-square test between Country and Product
contingency_table = pd.crosstab(df['Country'], df['Product'])
chi2_stat, p_value, dof, expected = chi2_contingency(contingency_table)
print(f"Chi-square Statistic: {chi2_stat}, P-value: {p_value}")

Chi-square Statistic: 233796.6910236458, P-value: 0.9158357435731135


In [211]:
from scipy.stats import pearsonr

# Pearson correlation test between Quantity and Value
corr_coeff, p_value = pearsonr(df['Quantity'], df['Value'])
print(f"Correlation Coefficient: {corr_coeff}, P-value: {p_value}")

Correlation Coefficient: -0.002876247947060439, P-value: 0.7246595079319986


In [212]:
from scipy.stats import shapiro

# Shapiro-Wilk test for normality in Quantity
stat, p_value = shapiro(df['Quantity'])
print(f"Shapiro-Wilk Statistic: {stat}, P-value: {p_value}")

Shapiro-Wilk Statistic: 0.9567630851274996, P-value: 2.9092870668660095e-54


  res = hypotest_fun_out(*samples, **kwds)


In [215]:
from scipy.stats import kstest

# Kolmogorov-Smirnov test for normality in Quantity
stat, p_value = kstest(df['Quantity'], 'norm')
print(f"Kolmogorov-Smirnov Statistic: {stat}, P-value: {p_value}")

Kolmogorov-Smirnov Statistic: 0.9998663800150948, P-value: 0.0


In [217]:
from scipy.stats import mannwhitneyu

# Mann-Whitney U test comparing Quantity between Import and Export
group1 = df[df['Import_Export'] == 'Import']['Quantity']
group2 = df[df['Import_Export'] == 'Export']['Quantity']
stat, p_value = mannwhitneyu(group1, group2)
print(f"Mann-Whitney U Statistic: {stat}, P-value: {p_value}")

Mann-Whitney U Statistic: 28338040.5, P-value: 0.4165570432895699


In [219]:
from scipy.stats import kruskal

# Kruskal-Wallis test for Quantity across different Categories
groups = [df['Quantity'][df['Category'] == category] for category in df['Category'].unique()]
stat, p_value = kruskal(*groups)
print(f"Kruskal-Wallis Statistic: {stat}, P-value: {p_value}")

Kruskal-Wallis Statistic: 1.1772345044419383, P-value: 0.8818323970549906


In [221]:
import statsmodels.api as sm

# Define the independent (X) and dependent (Y) variables
X = df['Quantity']
Y = df['Value']

# Add a constant to the independent variable (required for intercept in statsmodels)
X = sm.add_constant(X)

# Fit the linear regression model
model = sm.OLS(Y, X).fit()

# Print the summary of the regression
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  Value   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.1241
Date:                Fri, 20 Sep 2024   Prob (F-statistic):              0.725
Time:                        02:37:31   Log-Likelihood:            -1.4065e+05
No. Observations:               15000   AIC:                         2.813e+05
Df Residuals:                   14998   BIC:                         2.813e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5047.2136     46.781    107.890      0.0

In [223]:
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression

# Define the independent (X) and dependent (Y) variables
X = df[['Quantity']]
Y = df['Value']

# Create polynomial features (degree 2 for quadratic)
poly = PolynomialFeatures(degree=2)
X_poly = poly.fit_transform(X)

# Fit the polynomial regression model
model = LinearRegression().fit(X_poly, Y)

# Print coefficients and intercept
print("Coefficients:", model.coef_)
print("Intercept:", model.intercept_)

# Predict values
Y_pred = model.predict(X_poly)

Coefficients: [ 0.00000000e+00  1.63459123e-02 -1.91827039e-06]
Intercept: 5014.861252168136


In [225]:
# Convert Date column to datetime format, assuming the format is day-month-year
df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

# Sort by Date
df = df.sort_values(by='Date')

# Set Date as index (optional for time-series analysis)
df.set_index('Date', inplace=True)

# Display the updated dataframe
print(df.head())

                                  Transaction_ID         Country   Product  \
Date                                                                         
2019-09-07  1b966375-c291-4bf1-9004-5f63a53c7195   Cote d'Ivoire       ask   
2019-09-07  2732acd6-2f34-4ad2-a688-f05e7e6f332d         Burundi      hard   
2019-09-07  c0bb3a74-e5dc-4bf5-896a-3496066308ca  Cayman Islands     happy   
2019-09-07  691e4c70-752b-48f6-97f2-063bd09e39b5         Moldova  attorney   
2019-09-07  4f5ac05c-2a83-4718-8d49-f4838a447865           Ghana     build   

           Import_Export  Quantity    Value   Category               Port  \
Date                                                                        
2019-09-07        Export       365  8035.52  Machinery         Suarezland   
2019-09-07        Import       654  5527.36  Machinery  East Lindsayshire   
2019-09-07        Export      8867  4142.66   Clothing          Debrafurt   
2019-09-07        Import      3353  3286.39  Furniture          Pate

In [227]:
# Define independent (X) and dependent (Y) variables
X = df['Quantity']
Y = df['Value']

# Add a constant for intercept
X = sm.add_constant(X)

# Fit the regression model for time-series
model = sm.OLS(Y, X).fit()

# Print the summary of the regression
print(model.summary())

                            OLS Regression Results                            
Dep. Variable:                  Value   R-squared:                       0.000
Model:                            OLS   Adj. R-squared:                 -0.000
Method:                 Least Squares   F-statistic:                    0.1241
Date:                Fri, 20 Sep 2024   Prob (F-statistic):              0.725
Time:                        02:37:36   Log-Likelihood:            -1.4065e+05
No. Observations:               15000   AIC:                         2.813e+05
Df Residuals:                   14998   BIC:                         2.813e+05
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const       5047.2136     46.781    107.890      0.0

In [229]:
pip install linearmodels




In [230]:
from linearmodels.panel import PanelOLS

# Set multi-index for Panel Data (Supplier as entity, Date as time)
df = df.set_index(['Supplier', 'Date'])

# Define independent (X) and dependent (Y) variables
X = df[['Quantity']]
Y = df['Value']

# Add a constant
X = sm.add_constant(X)

# Fit the panel data regression model
model = PanelOLS(Y, X, entity_effects=True).fit()

# Print the summary of the regression
print(model.summary)

KeyError: "None of ['Date'] are in the columns"

In [None]:
import matplotlib.pyplot as plt

# Scatter plot of two columns (e.g., Quantity vs. Value)
plt.figure(figsize=(8, 6))
plt.scatter(df['Quantity'], df['Value'], color='blue', alpha=0.5)
plt.title('Scatter Plot of Quantity vs. Value')
plt.xlabel('Quantity')
plt.ylabel('Value')
plt.grid(True)
plt.show()

In [None]:
# Line plot of a time series (e.g., Date vs. Quantity)
import matplotlib.pyplot as plt

# Check the columns of the DataFrame
print("Columns in the DataFrame:", df.columns)

# If 'Date' is in the index, reset the index to make it a column
if 'Date' not in df.columns and 'Date' in df.index.names:
    df.reset_index(inplace=True)
    print("'Date' was in the index. Resetting the index.")

# If 'Date' column has different name (e.g., 'date' or contains spaces), rename it
if 'Date' not in df.columns:
    for col in df.columns:
        if 'date' in col.lower():
            df.rename(columns={col: 'Date'}, inplace=True)
            print(f"Renaming column '{col}' to 'Date'.")

# Check if 'Date' exists after adjustments
if 'Date' in df.columns:
    # Plot the line graph using the 'Date' and 'Quantity' columns
    plt.figure(figsize=(10, 6))
    plt.plot(df['Date'], df['Quantity'], color='green', linestyle='-', marker='o')
    plt.title('Line Plot of Quantity Over Time')
    plt.xlabel('Date')
    plt.ylabel('Quantity')
    plt.xticks(rotation=45)
    plt.grid(True)
    plt.show()
else:
    print("The 'Date' column still does not exist. Please check your dataset.")

In [None]:
import seaborn as sns

# Box-Whisker plot for Quantity by some categorical variable (e.g., Import_Export)
plt.figure(figsize=(8, 6))
sns.boxplot(x='Import_Export', y='Quantity', data=df)
plt.title('Box-Whisker Plot of Quantity by Import/Export')
plt.xlabel('Import/Export')
plt.ylabel('Quantity')
plt.show()

In [None]:
import scipy.stats as stats

# Example: Observed and expected frequencies
observed = df['Category'].value_counts().values  # Replace with your categorical column
expected = [len(df) / len(observed)] * len(observed)  # Uniform expected frequencies

# Perform Chi-square goodness of fit test
chi2_stat, p_value = stats.chisquare(f_obs=observed, f_exp=expected)
print(f"Chi-Square Statistic: {chi2_stat}, P-value: {p_value}")

In [None]:
# Mann-Whitney U Test (for comparing two independent groups in a non-categorical column)
for col in non_categorical_columns:
    for cat_col in categorical_columns:
        unique_groups = random_df[cat_col].dropna().unique()
        if len(unique_groups) == 2:
            group1 = random_df[random_df[cat_col] == unique_groups[0]][col].dropna()
            group2 = random_df[random_df[cat_col] == unique_groups[1]][col].dropna()
            stat, p_val = mannwhitneyu(group1, group2)
            print(f"Mann-Whitney U Test for {col} by {cat_col} - Statistic: {stat}, p-value: {p_val}")

In [None]:
# Histogram of the 'Value' column
plt.figure(figsize=(8, 6))
sns.histplot(data=df, x='Value', bins=30, kde=True)
plt.title('Distribution of Values')
plt.show()

In [None]:
import matplotlib.pyplot as plt

# Select relevant categorical column 'Import_Export' to make a pie chart
import_export_counts = random_df['Import_Export'].value_counts()

# Create the pie chart
plt.figure(figsize=(7, 7))
plt.pie(import_export_counts, labels=import_export_counts.index, autopct='%1.1f%%', startangle=90, colors=['lightblue', 'lightgreen'])
plt.title('Proportion of Import vs Export in Random Sample')
plt.show()

# Project Report

1. **Project Information**

  Project Title: DEVP Project 1

  Student Name(s): [KARTIK TALWAR, AMARTYA RAJ SINGH]

  Enrollment Number(s): [055020, 055053]

  Group Number(s): [10]

2. **Description of Data**

(i) Data Source & Size: https://www.kaggle.com/datasets/chakilamvishwas/imports-exports-15000 
 containing 2001 records.


(ii) Data Type: Panel data


(iii) Data Dimension:

a. Number of Variables: 17

b. Number of Observations: 2001


(iv) Data Variable Type


a. Categorical Variable:
Transaction ID, Country, Product, Import/Export, Date, Category, Port, Shipping Method, Supplier, Customer, Payment Term

b. Non-Categorical Variable:
Quantity, Value, Customs Code, Weight, Invoice Number


(v) Data Variable Category:

a. Categorical (Nominal): Transaction ID, Country, Product, Import/Export, Shipping Method, Supplier, Customer

b. Categorical (Ordinal): Payment Term

c. Non-Categorical: Quantity, Value, Customs Code, Weight, Invoice Number


3. **Project Objectives | Problem Statements**

Objective 1: Analyze the trends in trade transactions across different countries and products


Objective 2: Identify key factors that affect trade volume, value, and method of shipping.


Problem Statement: How do different shipping methods, product categories, and countries impact the value and weight of transactions? What are the significant patterns in trade operations?

4. **Analysis of Data**

Descriptive Statistics for Non-Categorical Data:
Quantity: The average quantity traded is 5014 units with a standard deviation of 2847. The minimum quantity is 18, while the maximum is 10,000.
Value: The mean value of transactions is 5008 with a standard deviation of 2899. Values range from 102.87 to 9993.02.
Customs Code: The average customs code is 535,893, and codes vary widely from 100,041 to 999,768.
Weight: The mean weight of shipments is 2499.5 kg, with a maximum of 4994.9 kg and a minimum of 1.98 kg.
Invoice Number: Invoice numbers are represented numerically and range between 78,017 and 99,977,070.

5. **Observations | Findings**
   
Higher-value transactions tend to be associated with products in certain categories such as clothing.
Export transactions show a greater variance in value and weight compared to imports.
Certain countries like Cambodia and Northern Mariana Islands have high transaction counts, indicating strong trade relations.
Air shipping appears to be more common for lighter and higher-value goods, while sea shipping is associated with heavier goods.

6. **Managerial Insights | Recommendations**
   
Optimizing Shipping: Shipping methods should be optimized based on the value and weight of the product. Lighter, high-value items can be sent via air, whereas bulkier items should use sea freight.

Focus on High-Value Categories: Countries and products with consistently high transaction values, such as clothing exports, should be prioritized for strategic partnerships and supply chain enhancements.

Customs Code Variance: There is significant variation in customs codes, suggesting a need for standardization or better documentation practices to streamline customs processes.

This report summarizes the main aspects of the trade transaction data, focusing on key variables like value, weight, and shipping method to provide actionable insights.