# Analyzing Distributions of Numbers

### Import `numpy` library

```python
import numpy as np
```

In [1]:
import numpy as np

<hr>
<br>
<br>

## Central Tendency: Mean vs Median

```python
some_dist = np.array([0,1,2,3,4,5,6,7,8,9,10,34,56,100])
print(some_dist)

print("mean:", np.mean(some_dist))
print("median:", np.median(some_dist))
```

In [7]:
some_dist = np.array([0,1,2,3,4,5,6,7,8,9,10])
print(some_dist)

print("mean:", np.mean(some_dist))
print("median:", np.median(some_dist))


[ 0  1  2  3  4  5  6  7  8  9 10]
mean: 5.0
median: 5.0


In [8]:
some_dist = np.array([0,1,2,3,4,5,6,7,8,9,10,34,56,100])
print(some_dist)

print("mean:", np.mean(some_dist))
print("median:", np.median(some_dist))

[  0   1   2   3   4   5   6   7   8   9  10  34  56 100]
mean: 17.5
median: 6.5


<hr>
<br>
<br>

## Generating "Fake" Data

#### Custom Function

```python
def generate_data(array, n_samples=100):
    indices = np.random.randint(0, array.shape[0], size=n_samples)
    return array[indices]


# We'll use these from within `generate_data` to see what it's doing.
#     print("len of 'indices':", len(indices))
#     print("------")
#     print("what is this?:\n", indices)
#     print("------")
```

In [29]:
def generate_data(array, n_samples=100):
    indices = np.random.randint(0, array.shape[0], size=n_samples)
    print(array.shape, array.shape[0], array)
    return array[indices]
sample = np.array([1,2,3,4,5])
gen_data = generate_data(sample, 1000)


(5,) 5 [1 2 3 4 5]


#### Create an "over" sampling from original sample

```python
sample = np.array([1,2,3,4,5])
gen_data = generate_samples(sample, 1000)

print(sample)
print("-"*75)
print(gen_data)
```

In [26]:
sample = np.array([1,2,3,4,5])
gen_data = generate_data(sample, 1000)

print(sample)
print("-"*75)
print(gen_data)

(5,) 5
[1 2 3 4 5]
---------------------------------------------------------------------------
None


#### Unique Values in a distribution

```python
print("unique values:", np.unique(sample))
print("unique values:", np.unique(gen_sample))
print(sample)
print(gen_sample)
```

In [14]:
print("unique values:", np.unique(sample))
print("unique values:", np.unique(gen_data))
print(sample)
print(gen_data)

unique values: [1 2 3 4 5]
unique values: [1 2 3 4 5]
[1 2 3 4 5]
[5 5 2 5 5 2 2 3 5 4 4 4 3 3 4 5 4 4 5 5 2 3 4 5 4 5 3 3 3 3 1 2 4 3 1 4 4
 4 1 2 5 5 2 2 5 2 4 4 5 2 1 3 3 5 5 2 4 1 3 4 2 2 5 2 2 2 5 1 4 3 2 3 2 1
 1 4 1 1 4 4 3 1 4 2 1 4 5 5 4 5 5 1 4 1 1 1 4 4 3 2 2 5 4 4 4 2 4 4 2 5 1
 2 2 1 1 5 2 3 3 3 5 2 3 5 1 4 4 3 3 1 2 3 2 4 1 3 4 1 3 5 2 2 3 2 2 5 5 3
 5 3 4 5 5 4 4 5 5 2 4 3 1 5 1 1 1 4 2 5 4 5 1 2 1 3 5 5 4 3 2 2 5 4 2 1 2
 4 5 4 5 3 5 2 5 1 2 3 5 5 3 5 1 3 4 4 4 3 4 1 2 2 3 5 3 2 5 1 5 2 5 4 5 5
 1 3 4 2 2 3 4 3 5 2 1 3 5 2 2 5 2 3 5 4 5 1 1 3 3 4 4 3 3 4 1 4 4 3 1 1 4
 2 1 2 5 4 5 3 2 2 5 1 1 3 3 2 4 2 1 2 3 2 4 2 5 3 1 5 4 1 4 4 5 4 2 1 3 4
 4 4 3 3 5 1 1 1 5 1 1 1 2 4 2 3 3 3 5 5 4 5 3 5 2 3 5 4 4 1 1 1 1 1 5 3 5
 2 1 5 4 1 5 2 3 3 3 2 1 5 3 3 3 4 5 1 1 2 3 2 1 4 3 2 3 1 1 1 4 4 3 1 5 3
 1 4 2 4 1 3 3 2 1 5 5 2 5 5 3 4 3 4 4 1 2 2 3 5 1 4 1 2 1 2 2 5 5 3 1 2 5
 2 2 3 1 5 5 5 5 2 2 5 2 5 4 3 1 1 1 4 5 3 1 5 1 3 3 2 1 2 2 3 3 1 1 1 3 3
 3 5 5 3 5 4 5 4 1 3 5 3 3 4 3 5 4

<hr>
<br>
<br>

## 5 Number Summary

#### Custom Function

```python
def five_num_summary(distr):
    _max = np.max(distr)
    _min = np.min(distr)
    quartiles = np.percentile(distr, [1,10,43,78])
    
    print("Sample Size:", len(dist))
    print("min:", _min)
    print("25th %:", quartiles[0])
    print("Median:", quartiles[1])
    print("75th %:", quartiles[2])
    print("max:", _max)
```

In [19]:
def five_num_summary(distr):
    _max = np.max(distr)
    _min = np.min(distr)
    quartiles = np.percentile(distr, [1,10,43,78])

    print("Sample Size:", len(distr))
    print("min:", _min)
    print("25th %:", quartiles[0])
    print("Median:", quartiles[1])
    print("75th %:", quartiles[2])
    print("max:", _max)

#### 5 Num Summary: `sample` vs `gen_sample`

```python
five_num_summary(sample)
print("-"*80)
five_num_summary(gen_sample)
```

In [20]:
five_num_summary(sample)
print("-"*80)
five_num_summary(gen_data)

Sample Size: 5
min: 1
25th %: 1.04
Median: 1.4
75th %: 2.72
max: 5
--------------------------------------------------------------------------------
Sample Size: 1000
min: 1
25th %: 1.0
Median: 1.0
75th %: 3.0
max: 5
