### Understanding Microshocks and Macro Variability: A Detailed Exploration

**Background**

In the complex dynamics of economic systems, the study of micro-level shocks and their macro-level implications is crucial. Firms or economic agents often experience microshocks — small, individual-level disturbances that can arise from a multitude of sources like market changes, technological innovations, or regulatory shifts. Understanding how these microshocks aggregate and manifest at a macro level, such as impacting an entire sector's volatility, is vital for economic forecasting, policy-making, and risk management.

**Microshocks and Their Aggregation**

Microshocks, in this context, refer to deviations in firm-level sales from their average values. These deviations can be modeled using different distributions, each reflecting a unique aspect of real-world phenomena. For instance, Gaussian (normal) distributions are used for their simplicity and symmetry, representing common, everyday fluctuations. Laplace distributions, with their sharper peaks and heavier tails, might signify more extreme yet equally probable positive and negative shocks. Empirical distributions, derived from actual data, provide the most realistic representation, incorporating the actual observed variability in the data.

The aggregation of these microshocks gives insight into how individual firm-level variances accumulate to create sectoral or market-level volatility. This aggregation is not merely a sum of individual variances but a complex interaction that can lead to amplification or dampening of overall volatility.


1. **Introduction and Setup**
   - Import necessary libraries and set display options.
   - A brief description of the notebook's purpose and the data being analyzed.


In [None]:
import pandas as pd
import numpy as np


  from IPython.core.display import display, HTML



2. **Data Loading and Initial Exploration**
   - Load the dataset and provide a brief overview.
   - Initial exploration and preprocessing of the data.
   - Generate summary statistics to understand the dataset better.

In [16]:
df = pd.read_csv('./../../../data/processed/ID_Y.csv')
# df = pd.read_csv('./../../data/processed/.csv')  # Alternative path, if needed

# Display basic information about the dataset
print("Dataset dimensions:", df.shape)
print("Column names:", df.columns.tolist())
print("First few rows of the dataset:")
display(df.head())

# Filtering data where IMPORT is equal to Mbool (0) and summarizing sales
Mbool = 0
sales = df.loc[df.IMPORT == Mbool].groupby(['ID', 'YEAR'])['VART'].sum().unstack()

# Sorting the sales data
sales = sales.loc[sales.sum(1).sort_values().index]

# Summary statistics of the sales data
sales_summary = sales.describe()
print("Summary Statistics of Sales Data:")
print(sales_summary)


Dataset dimensions: (3745743, 5)
Column names: ['ID', 'IMPORT', 'YEAR', 'VART', 'VFTE']
First few rows of the dataset:


Unnamed: 0,ID,IMPORT,YEAR,VART,VFTE
0,0,0,1997,39221936,9663564
1,0,1,1997,45264143,17060750
2,215,0,1997,656617,656593
3,223,1,1997,335002,335387
4,330,0,1997,23402,23402


Summary Statistics of Sales Data:
YEAR           1997          1998          1999          2000          2001  \
count  1.092510e+05  1.121870e+05  1.136910e+05  1.154570e+05  1.144670e+05   
mean   2.317227e+06  2.403965e+06  2.464215e+06  2.794554e+06  2.864126e+06   
std    5.027168e+07  5.550315e+07  5.787878e+07  6.421588e+07  6.530466e+07   
min    1.000000e+03  1.000000e+03  1.000000e+03  1.000000e+03  1.000000e+03   
25%    6.857500e+03  6.860000e+03  6.893500e+03  7.104000e+03  7.035000e+03   
50%    3.787400e+04  3.864700e+04  3.814900e+04  4.014400e+04  4.009400e+04   
75%    2.514745e+05  2.606820e+05  2.614915e+05  2.777180e+05  2.951255e+05   
max    1.072210e+10  1.340463e+10  1.421550e+10  1.514020e+10  1.440674e+10   

YEAR           2002          2003          2004          2005          2006  \
count  1.141060e+05  1.115990e+05  1.097970e+05  1.087200e+05  1.077490e+05   
mean   2.850487e+06  2.856554e+06  3.048637e+06  3.264325e+06  3.608280e+06   
std    6.194888e+


3. **Sales Data Analysis**
   - Detailed analysis of sales data.
   - Calculation of logarithmic sales and their distribution.
   - Examination of sales size and partitioning into quantiles.


In [19]:
# Calculating the logarithm of sales
logsales = np.log10(sales)

# Detrending the logarithmic sales by subtracting the mean
demlogsales = logsales.subtract(logsales.mean(1), axis=0)
# Calculating total sales size
sizes = sales.loc[sales.sum(1).sort_values().index].sum(1)

# Partitioning sales into quantiles
Q = 10  # Number of quantiles
parts = pd.cut(sizes.cumsum()/sizes.sum(), Q, labels=range(Q))



4. **Effective Quantile Analysis**
   - Compute and analyze the effective number of quantiles.
   - Explore how sales data is distributed across these quantiles.


In [20]:
# Calculating the effective number of data points in each quantile
eff_nq = sales.groupby(parts).count().mean(1).round().astype(int)
print("Effective number of data points in each quantile:")
print(eff_nq)


Effective number of data points in each quantile:
0    98643
1     4212
2     1414
3      635
4      319
5      168
6       92
7       46
8       17
9        5
dtype: int64


5. **Microshock Analysis**
   - Analyze the standard deviation within parts.
   - Discuss microshocks in the context of the data and their implications.

In [8]:
## Microshocks
demlogsales['parts'] = parts
# Filter to keep only those entries with more than one observation per ID
std_data = demlogsales.loc[demlogsales.iloc[:, :-1].count(1) > 2]

# Reshaping data for standard deviation analysis
std_info = std_data.reset_index().set_index(['ID', 'parts']).stack()
# Standard deviation by quantile
std_q = std_info.groupby(level='parts').std()
# Average standard deviation across quantiles
avg_std = std_info.std()

# Displaying the results
print("Average standard deviation across quantiles:", avg_std)
display(std_q)

# emp_nqs = np.round(nq.sort_values()).astype(int)

# Extracting empirical shocks
emp_shocks = std_info.values

# Average value of empirical shocks
print("Average of empirical shocks:", emp_shocks.mean())

# I don't knwo the possible mus, because on every firm I subtracted the observed mean value.

**The Experiment**

The experiment conducted through the code aims to simulate and analyze these phenomena. We create synthetic microshocks for a set of firms over time, using different distributions (Gaussian, Laplace, Empirical). These shocks are then aggregated to observe the resulting macro-level variance.

**Key Components of the Experiment**

1. **Generation of Microshocks**: Microshocks are generated for each firm in the dataset, varying in intensity and distribution. The parameters 'mu' (mean) and 'sigma' (standard deviation) are varied to simulate different scenarios. The 'empirical' shocks are drawn from actual data to mimic real-world conditions closely.

2. **Aggregation Process**: The microshocks are aggregated to understand their cumulative effect. This step is crucial as it mimics the real-world scenario where individual firm-level disturbances contribute to the overall sectoral or market volatility.

3. **Analysis of Macro Variability**: The aggregated results are analyzed in terms of mean, standard deviation, and variance. These metrics provide insights into the overall impact of microshocks on the macro economy. The log ratios help in understanding the multiplicative effect of these shocks.

**Purpose of Precision and Specificity**

Being precise and specific in this study allows for a nuanced understanding of economic dynamics. It helps in:

- **Identifying the Impact of Different Shock Types**: By comparing different distributions, we can understand which types of shocks (common vs. extreme) have more significant impacts on the macro economy.


6. **Parameter Setup for Experiments**
   - Define and explain the parameters used in the experiments (e.g., `mus`, `ss`, `M`, `T`).
   - Discuss the rationale behind these parameter choices.


In [None]:
import numpy as np
import pandas as pd

def generate_shocks(distribution, mu, sigma, n, T, emp_shocks=None):
    if distribution == 'norm':
        return np.random.normal(mu, sigma, (n, T))
    elif distribution == 'lapl':
        return np.random.laplace(mu, sigma / np.sqrt(2), (n, T))
    elif distribution == 'emp':
        s0 = emp_shocks.std()
        return (mu + np.random.choice(emp_shocks, n * T) * (sigma / s0)).reshape(n, T)

def calculate_ratios(shocks, n):
    ratio = np.power(10, shocks).sum(0) / n
    log_ratio = np.log10(ratio)
    return ratio.mean(), ratio.std(), ratio.var(), log_ratio.mean(), log_ratio.std(), log_ratio.var()

# Parameters
Q = 10
# Analysis parameters
partition = eff_nq.astype(int)
ss = np.arange(0.1, 0.8, 0.1)  # Range of shock scales
M = 200  # Number of simulations
T = 17   # Time periods



7. **Experimentation with Different Distributions**
   - Conduct experiments with different distributions (Gaussian, Laplace, empirical).
   - Explore the impact of these distributions on the results.

8. **Result Compilation and Export**
   - Compile the results from the experiments into a DataFrame.
   - Export the results to a CSV file for further analysis or reporting.


In [21]:

results = []
for dist in ['norm', 'lapl', 'emp']:
    print(f"Distribution: {dist}")
    for q, n in enumerate(partition.values[1:], start=1):
        print(f"Quantile: {q}, N: {n}")
        for s in ss:
            for mu in mus:
                for m in range(M):
                    shocks = generate_shocks(dist, mu, s, n, T, emp_shocks=emp_shocks if dist == 'emp' else None)
                    mean_ratio, std_ratio, var_ratio, mean_log_ratio, std_log_ratio, var_log_ratio = calculate_ratios(shocks, n)
                    results.append([dist, s, mu, n, m, mean_ratio, std_ratio, var_ratio, mean_log_ratio, std_log_ratio, var_log_ratio])

result_df = pd.DataFrame(results, columns=['dist', 's', 'mu', 'nq', 'repeat', 'mean_ratio', 'std_ratio', 'var_ratio', 'mean_log_ratio', 'std_log_ratio', 'var_log_ratio'])


Quantile: 1, N: 4212
Quantile: 2, N: 1414
Quantile: 3, N: 635
Quantile: 4, N: 319
Quantile: 5, N: 168
Quantile: 6, N: 92
Quantile: 7, N: 46
Quantile: 8, N: 17
Quantile: 9, N: 5
Quantile: 1, N: 4212
Quantile: 2, N: 1414
Quantile: 3, N: 635
Quantile: 4, N: 319
Quantile: 5, N: 168
Quantile: 6, N: 92
Quantile: 7, N: 46
Quantile: 8, N: 17
Quantile: 9, N: 5
Quantile: 1, N: 4212
Quantile: 2, N: 1414
Quantile: 3, N: 635
Quantile: 4, N: 319
Quantile: 5, N: 168
Quantile: 6, N: 92
Quantile: 7, N: 46
Quantile: 8, N: 17
Quantile: 9, N: 5


In [22]:
result_df.to_csv('./../../../data/processed/microshocks.csv', index=False)
# result.to_csv('./experiment_3.csv', index = False)

In [25]:
result_df.head()

Unnamed: 0,dist,s,mu,nq,repeat,mean_ratio,std_ratio,var_ratio,mean_log_ratio,std_log_ratio,var_log_ratio
0,norm,0.1,0.0,4212,0,1.026879,0.004687,2.2e-05,0.011515,0.001982,4e-06
1,norm,0.1,0.0,4212,1,1.026152,0.003554,1.3e-05,0.011209,0.001506,2e-06
2,norm,0.1,0.0,4212,2,1.027322,0.00464,2.2e-05,0.011702,0.00196,4e-06
3,norm,0.1,0.0,4212,3,1.026874,0.003201,1e-05,0.011515,0.001354,2e-06
4,norm,0.1,0.0,4212,4,1.027669,0.00365,1.3e-05,0.011851,0.001542,2e-06


In [24]:
result_df.tail()

Unnamed: 0,dist,s,mu,nq,repeat,mean_ratio,std_ratio,var_ratio,mean_log_ratio,std_log_ratio,var_log_ratio
188995,emp,0.7,0.1,5,195,3.817261,3.622178,13.120174,0.419501,0.363732,0.132301
188996,emp,0.7,0.1,5,196,3.252686,1.971284,3.885962,0.417249,0.307898,0.094801
188997,emp,0.7,0.1,5,197,2.271387,1.444192,2.08569,0.272243,0.272949,0.074501
188998,emp,0.7,0.1,5,198,2.403494,2.003725,4.014913,0.292215,0.253133,0.064076
188999,emp,0.7,0.1,5,199,5.959136,8.887765,78.992362,0.495335,0.433848,0.188224
