# Table of contents
- [<font color='grey'>Pre-Section 1. What is hacker statistics</font>](#pre-sectionA)

- [<font color='grey'>Pre-Section 2. Why hacker statistics?</font>](#pre-section2)

- [Section 1. Statistical inference with hacker statistics](#section1)
    - [1.1. Bootstrap confidence interval](#1-1)
    
    - [1.2. Hypothesis test](#1-2)
    
        - [<font color='grey'>1.2.1. Data pre-processing</font>](#1-2-1)
    
        - [1.2.2. Permutation test](#1-2-2)
    
        - [1.2.3. Bonus: Bootstrap hypothesis test](#1-2-3)

# Pre-Section 1. What is hacker statistics <a name="pre-sectionA"></a>

### Import all the modules

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#import dc_stat_think as dcst

plt.rcParams['figure.figsize'] = (10, 5)

### Coin tossing example  

In [None]:
# set the seed
np.random.seed(4)

# flip one coin
random_number = np.random.random(size = 1)
random_number

In [None]:
# See if the coin flip is a head or tail 
head = random_number < 0.5
head

In [None]:
# Flipping the coin 10000 times with a for loop and check the probability of heads
num_heads = 0 # Initialize the number of heads to be zero
for _ in range(10000):
    head = np.random.random(size = 1) < 0.5
    if head == True:
        num_heads += 1
num_heads/10000

### Speed of Light example

In [None]:
import os
os.getcwd()

In [None]:
# Read in the light speed dataa
df = pd.read_csv("../data/light_speed.csv", names = ["speed"])
df.head()

In [None]:
# Save the 100 light speed measurements into a numpy array `light_speed`
light_speed = np.array(df['speed'])
light_speed

In [None]:
# The "square root rule" is a commonly-used rule of thumb for choosing number of bins: 
# choose the number of bins to be the square root of the number of samples. 
n_bins = int(np.sqrt(len(light_speed)))
print(n_bins)

_ = plt.hist(light_speed, bins = n_bins)
_ = plt.axvline(x=np.mean(light_speed), color='r', linestyle='-', alpha = 1)
_ = plt.xlabel('Speed of light')
_ = plt.ylabel('Counts')
plt.show()

In [None]:
mean_speed = np.mean(light_speed)
print(mean_speed)

But can the mean and median of this 100 measurements represent the real light speed? What is we repeat the experiment again, will the mean or median speed be the same? What if again? again, and again?

In [None]:
for i in range(50):
    bs_sample = np.random.choice(light_speed, size = len(light_speed))
    bs_mean = np.mean(bs_sample)
    _ = plt.hist(bs_sample, alpha = 0.2, color = 'grey')
    _ = plt.axvline(x=bs_mean, color='r', linestyle='-', alpha = 0.2)
_ = plt.xlabel('Speed of light')
_ = plt.ylabel('Counts')
plt.show()

Often times it is hard for us to repeat an experiment endless of times, and get the "true" statistics of a popultation. Here is where hacker statistics can become handy. 

**Hacker statistics** is using simulated repeated measurements to gather more info about data. The basic idea is that instead of literally repeating the data acquisition over and over again, we can simulate those repeated measurements through **resampling from the original dataset**.

# Pre-Section 2. Why hacker statistics? <a name="pre-section2"></a>

Pros: 

- Resampling based methods do not require the population of interest to be normally distributed

- Can be applied to complex estimators

- Easy to understand and implement


Cons: 

- All the resampled samples are drawn from the initially observed sample. Thus the observed sample should be randomly drawn from the population

- Can be computationally expensive

# Section 1. Statistical inference with hacker statistics <a name="section1"></a>



## 1.1. Bootstrap confidence interval <a name="1-1"></a>

Note: The 100 light speed measurements from Michelson's experiment are saved in `light_speed` variable as a numpy array.


**1. Create a bootstrap sample for Michelson's light speed data, and compute a bootstrap replicate for a given statistic:** 

In [None]:
np.random.seed(123)
bs_sample = np.random.choice(light_speed, size = len(light_speed), replace = True)
print(bs_sample)

In [None]:
np.mean(bs_sample)

In [None]:
np.var(bs_sample)

Now that we know 

- how to draw a bootstrap sample from a given one dimensional sample array using `np.random.choice`, 

- and how to compute the bootstrap replicate of the bootstrap sample that we are interested in with some functions in `numpy` package, such as `np.mean`. 

To better maintain the code for future usage, we can summarize the steps above and generate a **function** that **takes** 1) the original one dimensional sample array and 2) function name to compute bootstrap replicate, and **returns** the bootstrap replicate being calculated. Let's do it! 

### Your turn: 

- **Exercise 1.1.1.** Write a function to draw one bootstrap replicate from a given one dimensional sample array with 2 inputs: sample data, function to calculate the bootstrap replicate.


*Instructions*:  

1. Use `np.random.choice` function to generate a bootstrap sample from given `data` array. Define this bootstrap sample to have the same length with that of the given `data` array using `size` parameter.

    

In [None]:
# A function to generate one bootstrap replicate with given 1-d sample and statistic
def bs_replicate_1d(data, func):
    bs_sample = ____.____.____(____, size = len(data))
    return func(bs_sample)

- **Exercise 1.1.2.** Use the `bs_replicate_1d` function you just wrote to calculate median for a Bootstrap sample generated from Michelson's `light_speed` data

In [None]:
bs_replicate_1d(____, ____)

**2. Calculate bootstrap confidence interval**

**A "p percent (p$\%$) Bootstrap confidence interval" is**: If we resampled the data over and over again, p percent of the observed values would lie within the p percent confidence interval. 

We want to repeat the resample process of the original sample (e.g., `light_speed`) over and over and over again, say 10000 times? For each resampled sample, we want to calculate a summary statistic / Bootstrap replicate (e.g., `np.mean`). A for loop seems to be a good option here:

In [None]:
# Create an empty array of length 10000 
bs_replicates = np.empty(10000)

# Resample the `light_speed` data 10000 times and calculate a Bootstrap replicate each time, 
# and save each bootstrap replicate to `bs_replicates` array
for i in range(10000):
    bs_replicates[i] = bs_replicate_1d(light_speed, np.mean)

In [None]:
bs_replicates

`bs_replicates` now is an array contains 10000 means calculated from 10000 resampled samples of `light_speed` array. Normally, we may want to know what is the range that most of the replicates locate in? A **confidence interval** can tell you that! Let's see how to use `np.percentile` function to get the $95\%$ confidence interval of the `bs_replicates` array:

In [None]:
# Get the confidence interval of the mean light speed
conf_95 = np.percentile(bs_replicates, [2.5, 97.5])
print(conf_95)

In [None]:
_ = plt.hist(bs_replicates, density = True, bins = 100)
_ = plt.xlabel('mean of light speed')
_ = plt.ylabel('counts')
plt.ticklabel_format(useOffset=False)
_ = plt.axvline(x=conf_95[0], color='r', linestyle='-', alpha = 0.5)
_ = plt.axvline(x=conf_95[1], color='r', linestyle='-', alpha = 0.5)

plt.show()

Now you know how to:

- generate a big number (10000) of bootstrap samples from a given data array, 
- calculate the bootstrap replicate of interest from each of these resampled samples, 
- and get the confidence interval of the repeated bootstrap replicates. 

Amazing! Now we will write a function that summarizes the first two steps listed above. With a given sample array (e.g., `light_speed`), a funciton to calculate the bootstrap replicate of interest (e.g., `np.mean`), and the number of times to resample (e.g., `size = 10000`), your function will return an array of bootstrap replicates. 



### Your turn: 

- **Exercise 1.1.3.** Write a function to draw bootstrap replicates with 3 inputs: sample data, function to calculate statistic, number of replicates: Now you'll write a function, `draw_bs_reps(data, func, size=1)`, which generates many bootstrap replicates from the given dataset. This function will come in handy for you as you compute Bootstrap confidence intervals in the later sections.



*Instructions*:  

1. Initialize an empty array of length of given `size` using `np.empty()` function, and save it to `bs_replicates` array. 


2. Start a for loop with given range (`size`). In each loop, generate a boostrap replicate with input `data` and `func` variables using the `bs_replicate_1d` function you built in the earlier section, and save the replicate to the `i` th position of your `bs_replicates` array. 

In [None]:
# Write a function to draw bootstrap replicates with 3 inputs: sample data, function to calculate statistic, number of replicates
def draw_bs_reps(data, func, size = 1):
    """Draw Bootstrap replicates"""
    
    # Initialize an empty array of for Bootstrap replicates: bs_replicates
    bs_replicates = np.empty(____)
    
    # Generate replicates
    for i in range(____):
        bs_replicates[i] = bs_replicate_1d(____, ____)
        
    return(bs_replicates)

- **Exercise 1.1.4.** Use the `draw_bs_reps` function you just created to generate an array of medians (`np.median`) as bootstrap replicates for `light_speed` data, the repeat time for bootstrap is `10000` 

In [None]:
bs = draw_bs_reps(____, ____, size = 10000)
bs

In [None]:
# Get the 95% confidence interval for bs sample 
np.percentile(bs, [2.5, 97.5])

**Interpretation** of this $95\%$ Bootstrap confidence interval: If we resampled the data over and over again, $95\%$ of the observed values would lie within the $95\%$ confidence interval, which is $[____, ____]$ (as calculated above). 

## 1.2. Hypothesis test <a name="1-2"></a>

### 1.2.1. Read in data & Data pre-processing<a name="1-2-1"></a>

### Read in the Oklahoma earthquake data



In [None]:
df = pd.read_csv('../data/oklahoma_earthquakes_1950-2017.csv', skiprows=2)

In [None]:
df.head()

### Process the time data and visualize the data with a magnitude vs time plot

Process the time data:

In [None]:
from datetime import datetime as dt
import time
# Resource: https://stackoverflow.com/questions/6451655/how-to-convert-python-datetime-dates-to-decimal-float-years

def toYearFraction(date):
    def sinceEpoch(date): # returns seconds since epoch
        return time.mktime(date.timetuple())
    s = sinceEpoch

    year = date.year
    startOfThisYear = dt(year=year, month=1, day=1)
    startOfNextYear = dt(year=year+1, month=1, day=1)

    yearElapsed = s(date) - s(startOfThisYear)
    yearDuration = s(startOfNextYear) - s(startOfThisYear)
    fraction = yearElapsed/yearDuration

    return date.year + fraction

In [None]:
times = pd.to_datetime(df['time']).apply(toYearFraction).to_numpy()
mags = df['mag'].to_numpy()
times

Visualize the data:

In [None]:
_ = plt.plot(times, mags, marker = '.', linestyle = 'none', alpha = 0.1)
_ = plt.xlabel('Time')
_ = plt.ylabel('Magnitude')
plt.show()

### Generate two data arrays `mags_pre` and `mags_post` for Permutation test example  

Note: 

- We will focus on the earthquakes that were at least at magnitude of 3  

- `mags_pre`: an array contains the earthquake magnitudes before year 2010 ($1980 \le \text{year} < 2010$)

- `mags_post`: an array contains the earthquake magnitudes after year 2010 ($2010 \le \text{year} \le 2017$)   

Generate the two data arrays:

In [None]:
mags_pre = mags[(times < 2010) & (times >= 1980)]
mags_post = mags[times >= 2010]

mt = 3  # Define the magnitudes lowest threshold to study 
mags_pre = mags_pre[mags_pre >= mt]
mags_post = mags_post[mags_post >= mt]

Basic EDA on `mags_pre` and `mags_post`: 

Mean:

In [None]:
np.mean(mags_pre)

In [None]:
np.mean(mags_post)

Two sample T-test: 

**NOTE**: I didn't check the assumptions of the t-test here. So the result is NOT trustworthy!!!

In [None]:
import scipy.stats as stats

stats.ttest_ind(mags_pre, mags_post)

Bootstrap confidence interval:

In [None]:
bs_mags_pre = draw_bs_reps(mags_pre, np.mean, size = 10000)
bs_mags_post = draw_bs_reps(mags_post, np.mean, size = 10000)

In [None]:
np.percentile(bs_mags_pre, [2.5, 97.5]) 

In [None]:
np.percentile(bs_mags_post, [2.5, 97.5])

### Generate two data arrays `dt_pre` and `dt_post` for Bootstrap test example 

Note: 

- We will focus on the earthquakes that were at least at magnitude of 3  

- `dt_pre`: an array contains the earthquake frequencies before year 2010 ($1980 \le \text{year} < 2010$)

- `dt_post`: an array contains the earthquake frequencies after year 2010 ($2010 \le \text{year} \le 2017$)  

Generate the two arrays:

In [None]:
df_over3 = df[df['mag'] >= 3]
df_over3.head()

In [None]:
df_over3['time'] = pd.to_datetime(df_over3['time']).copy()
#df_over3['time'] = pd.to_datetime(df_over3['time'])
df_over3.head()

In [None]:
dt_pre_df = df_over3[(df_over3['time'].dt.year < 2010) & (df_over3['time'].dt.year >= 1980)]
dt_post_df = df_over3[df_over3['time'].dt.year >= 2010]

In [None]:
dt_pre = dt_pre_df.time.diff().dt.total_seconds().apply(lambda x: x / 86400).to_numpy()[1:]
dt_post = dt_post_df.time.diff().dt.total_seconds().apply(lambda x: x / 86400).to_numpy()[1:]

In [None]:
dt_post

Basic EDA for the two earthquake frequency arrays: 

Mean: 

In [None]:
dt_pre_mean = np.mean(dt_pre)
dt_pre_mean

In [None]:
dt_post_mean = np.mean(dt_post)
dt_post_mean

Bootstrap Confidence interval:

In [None]:
bs_dt_pre = draw_bs_reps(dt_pre, np.mean, size = 10000)
bs_dt_post = draw_bs_reps(dt_post, np.mean, size = 10000)

In [None]:
np.percentile(bs_dt_pre, [2.5, 97.5])

In [None]:
np.percentile(bs_dt_post, [2.5, 97.5])

### 1.2.2 Permutation test <a name="1-2-2"></a>

In this section, you will learn how to use permutation test on the magnitudes of earthquakes before and after 2010

**Note**: 

- The null hypothesis of permutation test assumes that the two populations being compared have **the same distribution**

- the earthquake magnitudes data before and after year 2010 are saved in `mags_pre` and `mags_post` arrays separately, and ready for you to use.

**1. Generate a pair of permutation sample from `mags_pre` and `mags_post` data and calculate a test statistic**

In [None]:
# Concatenate `mags_pre` and `mags_post` array into one big array
combined_mags = np.concatenate((mags_pre, mags_post))

In [None]:
# Randomly reorder the concatenated array
permuted_mags = np.random.permutation(combined_mags)

In [None]:
# Extract permutation samples
perm_mags_pre = permuted_mags[:len(mags_pre)]
perm_mags_post = permuted_mags[len(mags_pre):]

In [None]:
# difference of mean magnitudes calculated from permutation sample
diff_perm = np.mean(perm_mags_post) - np.mean(perm_mags_pre)
diff_perm

Now you know how to: 

- concatenate two sample arrays you want to compare with `np.concatenate` function

- randomly permute / reorder an array with `np.random.permutation` function

- extract a pair of permutation sample from the permuted array above

- calculate test statistic from the permutation sample   


It's time to create a function that summarizes the first three steps listed above and generates a pair of permutation sample based on two given input one-dimensional arrays. Let's do it!

### Your turn

- **Exercise 1.2.2.1.** Write a function `permutation_sample()` that takes two input arrays, concatenates, permutes, and returns two permuted arrays. 

*Instruction*:

1. Use `np.concatenate` function to concatenate the two input arrays

2. Permute the concatenated array with `np.random.permutation` function

3. Split the permuted array into two: 

    - 1. the first array `perm_out_1` contains the first "$n = \text{length of first input array}$" entries in the permuted array
    
    - 2. the second array `perm_out_2` contains the rest of the entries in the permuted array

In [None]:
# A function to get permutation sample
def permutation_sample(data1, data2):
    '''Generate a permutation sample from two input 1-d arrays'''
    
    # Concatenate the two input datasets into one array
    data = ____.____((data1, data2))
    
    # Permute the concatenated data
    permute_data = ____.____.____(____)
    
    # Split the permuted array into two
    perm_out_1 = permute_data[:____]
    perm_out_2 = permute_data[len(data1):]
    
    # Return the two permuted arrays
    return(perm_out_1, perm_out_2)


- **Exercise 1.2.2.2.** Use the `permutation_sample()` function you just created to generate a pair of permutation sample from `mags_pre` and `mags_post` arrays. 


*Instruction*:

1. Pass the two sample arrays to `permutation_sample()` function, and save the output to `p_pre` and `p_post` variables separately

2. Print the size of `p_pre` and `p_post` arrays using `len()` function separately


In [None]:
p_pre, p_post = permutation_sample(____, ____)
print("The size of the earthquake magnitudes permutation sample before 2010 is", len(p_pre))
print("The size of the earthquake magnitudes permutation sample after 2010 is", ____(____))

- **Exercise 1.2.2.3.** Write a function `mean_diff()` that takes two input arrays, and returns the difference of means of these two arrays. 

*Instruction*:

- `np.mean()` function can be used to calculate the means of each array

In [None]:
def mean_diff(arr1, arr2):
    diff = ____.____(arr2) - ____.____(arr1)
    return(diff)

- **Exercise 1.2.2.4.** Use the `mean_diff()` function you created in Exercise 1.2.2.3 to calculate the difference of means between `p_pre` and `p_post` arrays you generated in the Exercise 1.2.2.2

*Instruction*:

- Pass the two arrays (`p_pre`, `p_post`) to `mean_diff()` function, and save the output to variable `p_diff`

In [None]:
# Use the `mean_diff()` function to calculate the difference between means of 
# p_pre and p_post arrays you generated in the exercise above
p_diff = mean_diff(____, ____)

print(p_diff)

**2. Generate a large number of permutation samples and perform permutation test**

Generate 10000 permutation samples and calculate test statistic for each sample:

In [None]:
# Create an empty array of length 10000 
perm_mag_ts = np.empty(10000)

for i in range(10000):
    perm_1, perm_2 = permutation_sample(mags_pre, mags_post)
    perm_mag_ts[i] = mean_diff(perm_1, perm_2)

Calculate the observed test statistic:

In [None]:
# Observed difference in mean magnitudes: diff_obs
diff_obs = np.mean(mags_post) - np.mean(mags_pre)
print(diff_obs)

Knowing that the difference of magnitude means before and after 2010 is our test statistic, we got the observed test statistic, and 10000 test statistics calculated from the 10000 simulated permutation samples. We want to know how extreme our observed test statistic is, given the null hypothesis (where all the permutation samples were generated under) is true. 

Let's plot all the test statistics:

In [None]:
_ = plt.hist(perm_mag_ts, bins = 100)
_ = plt.xlabel('test statistics')
_ = plt.ylabel('counts')
plt.ticklabel_format(useOffset=False)
_ = plt.axvline(x=diff_obs, color='r', linestyle='-', alpha = 0.5)
_ = plt.axvline(x=abs(diff_obs), color='r', linestyle='-', alpha = 0.5)

plt.show()

To calculate the p-value for a permutation test, we simply count the number of test-statistics as or more extreme than our initially observed test statistic, and divide that number by the total number of test-statistics we calculated.   

Calculate p-value:

In [None]:
np.sum(abs(perm_mag_ts) > abs(diff_obs)) / 10000

With a $p-value > 0.05$, we failed to reject the null hypothesis. We do not have enough evidence to conclude that waste water injection in Oklahoma in 2010 changed the earthquake magnitudes happened there. 

Now you know how to:

- generate a large number of permutation samples

- calculate a test statistic for each sample

- compute p-value for a permutation test

Next we will write a function that summarizes the first two steps given above, generates an array of permutation test statistics based on two input arrays, one function to compute test statistic, and the number of permutation resampling to perform. 

### You turn  

- **Exercise 1.2.2.5.** Write a function `draw_permutation_reps()` that generates permutation samples of given size, calculates test statistics with given function for each sample and returns all the test statistics in an array. 

*Instruction*:

1. Initialize an empty array of given `size` using `np.empty` function

2. Within the for loop

    - 1. generate a pair of permutation sample from the two input arrays using a function you created above
    
    - 2. calculate test statistic for the new generated pair of permutation samples, and save the value to the i th entry of `perm_reps` array


In [None]:
# A function to generate permutation statistic replicates
def draw_permutation_reps(data1, data2, func, size = 1):
    '''Generate permutation statistic replicates of given size'''
    
    # Initiate an empty permutation array 
    perm_reps = ____.____(____)
    
    for i in range(____):
        
        # Generate a pair of permutation sample with given array
        perm_sample_1, perm_sample_2 = permutation_sample(____, ____)
        
        # calculate test statistic using the input func
        perm_reps[i] = func(perm_sample_1, perm_sample_2)
        
    return(perm_reps)

- **Exercise 1.2.2.6.** Based on the `mags_pre` and `mags_post` arrays, use the `draw_permutation_reps` function you just created to resample and get `size = 100000` test statistics calculated by `mean_diff` function. (It may take several minutes to run)

*Instruction*: 

1. Pass the 4 arguments in the right order to `draw_permutation_reps` function

2. Calculate the observed test statistics, and save the result to `diff_obs` variable

3. To calculate the p-value, count the total number of permutation-sample-generated test statistics (`perm_reps`) whose absolute value are as or more extreme as that of the originally observed test statistic (`diff_obs`), and then divide this count by the total number of permutation test statistics

In [None]:
# Generate permutation replicates: perm_reps
perm_reps = draw_permutation_reps(____, ____, ____, size=____)

# Calculate observed test statistic from `mags_pre` and `mags_post` arrays: diff_obs
diff_obs = ...

# Compute and print p-value
p_val = np.sum(abs(____) > abs(____)) / ____
print('p =', p_val)

### Bonus: 1.2.3 Bootstrap hypothesis test <a name="1-2-3"></a>


From the permutation test in the last section, we didn't find enough evidence to conclude that injection of waste water in Oklahoma in 2010 caused the change of earthquake magnitides there. But as shown in the figure below, the frequencies of earthquakes since 2010 seems to be changed compared with those before 2010. Thus, in this section, we are interested in comparing the average frequencies of earthquakes before and after year 2010. 


In [None]:
_ = plt.plot(times, mags, marker = '.', linestyle = 'none', alpha = 0.1)
_ = plt.xlabel('Time')
_ = plt.ylabel('Magnitude')
plt.show()

While the null hypothesis of permutation test assumes that the two populations being compared have the same distribution, injection of waste water to the ground may have already changed the earthquake frequency distribution after 2010. What if we do not care if the same distribution were followed, but **only want to know if the average earthquake frequencies in Oklahoma before and after year 2010 are the same**? Hypothesis test with Bootstrap will do it here.  

In this section, you will learn how to use hypothesis test with Bootstrap on the frequency of earthquakes before and after 2010

**Note**: the time gaps between earthquakes before and after year 2010 are saved in `dt_pre` and `dt_post` arrays separately, and ready for you to use.

Confidence interval for `dt_pre` and `dt_post`:

In [None]:
bs_dt_pre = draw_bs_reps(dt_pre, np.mean, size = 10000)
bs_dt_post = draw_bs_reps(dt_post, np.mean, size = 10000)

print("Bootstrap confidence interval for dt_pre:", np.percentile(bs_dt_pre, [2.5, 97.5]))
print("Bootstrap confidence interval for dt_post:", np.percentile(bs_dt_post, [2.5, 97.5]))

In [None]:
# Compute the observed test statistic, which is the difference of dt_pre and dt_post means
dt_pre_mean = np.mean(dt_pre)
dt_post_mean = np.mean(dt_post)
observed_diff = dt_pre_mean - dt_post_mean


# Shift the dt_post data under the null hypothesis (so that they have the same means as the dt_pre data)
shifed_dt_post = dt_post - np.mean(dt_post) + np.mean(dt_pre)

# Generate Bootstrap replicated means for dt_pre and shifted_dt_post
bs_dt_pre = draw_bs_reps(dt_pre, np.mean, size = 10000)
bs_shifted_dt_post = draw_bs_reps(shifed_dt_post, np.mean, size = 10000)

# Calculate the replicated test statistic diff
bs_reps_diff = bs_dt_pre - bs_shifted_dt_post


# Calculate the p-values
p = np.sum(bs_reps_diff >= observed_diff)/len(bs_reps_diff)
p

The $p-value = 0$ suggests that if the null hypothesis is true, there is very very few or none simultated test statistics is as or more extreme than the observed one. Thus, the injection of waste water in Oklahoma probably influenced the earthquake frequencies there. 

**Reference**: 

- This notebook was built mainly based on Dr. Justin Bois's three statistical courses on [Datacamp website](https://www.datacamp.com/instructors/bois)

- An interesting tutorial about [permutation test](https://www.jwilber.me/permutationtest/) may help you understand it better