# Task
Simulate rolling a die a variable number of times (from 1 to 50) and repeating each simulation 100, 500, 1000, 5000, 10000, 50000, and 100000 times. For each combination of die rolls and repetitions, calculate the sum of the rolls (excluding the case of a single die roll), plot and save a histogram of the sums, and calculate the mean and variance of the sums.

## Define parameters

### Subtask:
Define the range of die rolls (1 to 50) and the number of experiments for each die roll count (100, 500, 1000, 5000, 10000, 50000, and 100000).


**Reasoning**:
Define the range of die rolls and the list of experiment counts as specified in the instructions.



In [None]:
num_die_rolls = range(1, 51)
num_experiments = [100, 500, 1000, 5000, 10000, 50000, 100000]

## Simulation function

### Subtask:
Create a function that simulates rolling a die a specified number of times and returns the sum.


**Reasoning**:
Define the function `simulate_die_rolls` to simulate rolling a die a specified number of times and return the sum.



In [None]:
import random

def simulate_die_rolls(num_rolls):
  """Simulates rolling a die a specified number of times and returns the sum.

  Args:
    num_rolls: The number of times to roll the die.

  Returns:
    The sum of the simulated die rolls.
  """
  total_sum = 0
  for _ in range(num_rolls):
    total_sum += random.randint(1, 6)
  return total_sum

## Run experiments

### Subtask:
Iterate through the range of die rolls and the number of experiments, calling the simulation function and storing the sums.


**Reasoning**:
Initialize a dictionary to store the simulation results and then iterate through the specified ranges of die rolls and experiments, calling the simulation function and storing the sums in the dictionary.



In [None]:
simulation_results = {}

for rolls in num_die_rolls:
    simulation_results[rolls] = {}
    for experiments in num_experiments:
        sums = []
        for _ in range(experiments):
            sums.append(simulate_die_rolls(rolls))
        simulation_results[rolls][experiments] = sums

## Calculate mean and variance

### Subtask:
For each set of sums (corresponding to a specific number of die rolls and experiments), calculate the mean and variance.


**Reasoning**:
Import the numpy library and iterate through the simulation results to calculate the mean and variance for each set of sums, storing the results in a new dictionary.



In [None]:
import numpy as np

analysis_results = {}

for rolls, experiments_data in simulation_results.items():
    analysis_results[rolls] = {}
    for experiments, sums in experiments_data.items():
        # Exclude the case of a single die roll when calculating the sum
        if rolls > 1:
            mean_sum = np.mean(sums)
            variance_sum = np.var(sums)
            analysis_results[rolls][experiments] = {'mean': mean_sum, 'variance': variance_sum}
        else:
             analysis_results[rolls][experiments] = {'mean': None, 'variance': None} # Or handle appropriately for rolls = 1

## Plot histograms

### Subtask:
For each set of sums, plot a histogram and save it. There will be 6 histograms for each number of die rolls (1 to 50), resulting in a total of 50 * 6 = 300 histograms.


**Reasoning**:
Iterate through the simulation results, plot histograms for sums when the number of rolls is greater than 1, add titles and labels, save the plots, and close them.



In [None]:
import matplotlib.pyplot as plt

for rolls, experiments_data in simulation_results.items():
    for experiments, sums in experiments_data.items():
        if rolls > 1:
            plt.figure(figsize=(10, 6))
            plt.hist(sums, bins='auto', edgecolor='black')
            plt.title(f"Histogram of Sums for {rolls} Die Rolls ({experiments} Experiments)")
            plt.xlabel("Sum of Die Rolls")
            plt.ylabel("Frequency")
            plt.savefig(f"histogram_rolls_{rolls}_experiments_{experiments}.png")
            plt.close()

## Organize results

### Subtask:
Store the calculated means and variances in a structured format (e.g., a dictionary or DataFrame) for easy access and analysis.


**Reasoning**:
Create a pandas DataFrame from the analysis_results dictionary and then restructure it to have separate columns for mean and variance for easier analysis.



In [None]:
import pandas as pd

# Create a DataFrame from the analysis_results dictionary
analysis_df = pd.DataFrame.from_dict(analysis_results, orient='index')

# Restructure the DataFrame
# Stack the experiments columns to long format
analysis_df_stacked = analysis_df.stack().reset_index()
analysis_df_stacked.columns = ['num_rolls', 'num_experiments', 'metrics']

# Expand the metrics dictionary into separate columns
analysis_df_restructured = pd.json_normalize(analysis_df_stacked['metrics']).set_index(analysis_df_stacked['num_rolls'])
analysis_df_restructured['num_experiments'] = analysis_df_stacked['num_experiments'].values

# Reorder columns
analysis_df_restructured = analysis_df_restructured[['num_experiments', 'mean', 'variance']]

display(analysis_df_restructured.sample(10))

Unnamed: 0_level_0,num_experiments,mean,variance
num_rolls,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
41,10000,143.444,119.375664
9,5000,31.566,25.695644
32,5000,112.1162,94.850298
5,10000,17.5388,14.516495
10,10000,34.9658,29.73103
20,100,70.83,63.1611
26,50000,91.0106,75.084888
29,500,102.164,91.297104
13,5000,45.5278,37.968027
16,5000,55.9946,47.028571


## Summary:

### Data Analysis Key Findings

*   The simulation successfully generated sums for rolling a die a variable number of times (2 to 50) across various numbers of experiments (100 to 100,000).
*   Histograms were generated and saved for each combination of die rolls (greater than 1) and number of experiments, totaling 300 plots.
*   The mean and variance of the simulated sums were calculated for each combination of die rolls (greater than 1) and experiments.
*   The calculated means and variances were organized into a pandas DataFrame for structured storage and analysis.

### Insights or Next Steps

*   Analyze how the mean and variance of the sums change as the number of die rolls and the number of experiments increase.
*   Compare the calculated means and variances to the theoretically expected values for the sum of multiple die rolls to observe the effects of the Law of Large Numbers.


In [None]:
analysis_df_restructured.to_csv('analysis_results.csv')