# 🔢 Chapter 02: Numeric Data Types and Summary Functions

## 🎯 Numeric Data Types in PostgreSQL

<div align="left">

### Integer Types

<table style="text-align: left;">
  <tr>
    <th>Type</th>
    <th>Size</th>
    <th>Description</th>
    <th>Range</th>
  </tr>
  <tr>
    <td>smallint / int2</td>
    <td>2 bytes</td>
    <td>Small-range integer</td>
    <td>-32,768 to 32,767</td>
  </tr>
  <tr>
    <td>integer / int / int4</td>
    <td>4 bytes</td>
    <td>Typical integer</td>
    <td>-2,147,483,648 to 2,147,483,647</td>
  </tr>
  <tr>
    <td>bigint / int8</td>
    <td>8 bytes</td>
    <td>Large-range integer</td>
    <td>±9.2 quintillion</td>
  </tr>
</table>

### Auto-incrementing Types

<table style="text-align: left;">
  <tr>
    <th>Type</th>
    <th>Size</th>
    <th>Description</th>
    <th>Range</th>
  </tr>
  <tr>
    <td>serial</td>
    <td>4 bytes</td>
    <td>Auto-incrementing</td>
    <td>1 to 2,147,483,647</td>
  </tr>
  <tr>
    <td>smallserial</td>
    <td>2 bytes</td>
    <td>Small auto-increment</td>
    <td>1 to 32,767</td>
  </tr>
  <tr>
    <td>bigserial</td>
    <td>8 bytes</td>
    <td>Large auto-increment</td>
    <td>1 to 9.2 quintillion</td>
  </tr>
</table>

### Decimal and Floating Point

<table style="text-align: left;">
  <tr>
    <th>Type</th>
    <th>Size</th>
    <th>Description</th>
    <th>Precision</th>
  </tr>
  <tr>
    <td>numeric / decimal</td>
    <td>Variable</td>
    <td>Exact value (user-defined)</td>
    <td>Up to 131072 digits left, 16383 right of decimal</td>
  </tr>
  <tr>
    <td>real</td>
    <td>4 bytes</td>
    <td>Inexact floating point</td>
    <td>~6 decimal digits</td>
  </tr>
  <tr>
    <td>double precision</td>
    <td>8 bytes</td>
    <td>Inexact floating point</td>
    <td>~15 decimal digits</td>
  </tr>
</table>

</div>

## 🎯 Division Behavior

```sql
SELECT 10 / 4;     -- Integer division → 2
SELECT 10 / 4.0;   -- Numeric division → 2.5
```

## 🎯 Summary Statistics

### Minimum and Maximum

```sql
SELECT min(question_pct) FROM stackoverflow; -- 0
SELECT max(question_pct) FROM stackoverflow; -- 0.071957428
```

### Average (Mean)

```sql
SELECT avg(question_pct) FROM stackoverflow;
-- 0.00379494620059319
```

## 🎯 Variance

```sql
-- Population variance
SELECT var_pop(question_pct) FROM stackoverflow;

-- Sample variance
SELECT var_samp(question_pct) FROM stackoverflow;

-- Equivalent to var_samp
SELECT variance(question_pct) FROM stackoverflow;
```

## 🎯 Standard Deviation

```sql
-- Sample standard deviation
SELECT stddev_samp(question_pct) FROM stackoverflow;

-- Equivalent
SELECT stddev(question_pct) FROM stackoverflow;

-- Population standard deviation
SELECT stddev_pop(question_pct) FROM stackoverflow;
```

## 🎯 Rounding

```sql
SELECT round(42.1256, 2); -- 42.13
```

## 🎯 Grouped Summary

```sql
SELECT tag,
       min(question_pct),
       avg(question_pct),
       max(question_pct)
FROM stackoverflow
GROUP BY tag;
```

<div align="left">

**Result (excerpt):**

<table style="text-align: left;">
  <tr>
    <th>tag</th>
    <th>min</th>
    <th>avg</th>
    <th>max</th>
  </tr>
  <tr>
    <td>amazon-sqs</td>
    <td>6.91e-05</td>
    <td>8.08e-05</td>
    <td>9.6e-05</td>
  </tr>
  <tr>
    <td>mongodb</td>
    <td>0.0049625</td>
    <td>0.00577465885069125</td>
    <td>0.00631164</td>
  </tr>
</table>

</div>

## 🎯 Counting Distribution Values

```sql
SELECT unanswered_count, count(*)
FROM stackoverflow
WHERE tag = 'amazon-ebs'
GROUP BY unanswered_count
ORDER BY unanswered_count;
```

<div align="left">

**Result (excerpt):**

<table style="text-align: left;">
  <tr>
    <th>unanswered_count</th>
    <th>count</th>
  </tr>
  <tr>
    <td>37</td>
    <td>12</td>
  </tr>
  <tr>
    <td>38</td>
    <td>40</td>
  </tr>
  <tr>
    <td>54</td>
    <td>131</td>
  </tr>
  <tr>
    <td>55</td>
    <td>34</td>
  </tr>
</table>

</div>

## 🎯 Truncating Values

```sql
SELECT trunc(42.1256, 2);  -- 42.12
SELECT trunc(12345, -3);   -- 12000
```

## 🎯 Truncate & Group

```sql
SELECT trunc(unanswered_count, -1) AS trunc_ua,
       count(*)
FROM stackoverflow
WHERE tag='amazon-ebs'
GROUP BY trunc_ua
ORDER BY trunc_ua;
```

<div align="left">

**Result:**

<table style="text-align: left;">
  <tr>
    <th>trunc_ua</th>
    <th>count</th>
  </tr>
  <tr>
    <td>30</td>
    <td>74</td>
  </tr>
  <tr>
    <td>40</td>
    <td>194</td>
  </tr>
  <tr>
    <td>50</td>
    <td>480</td>
  </tr>
</table>

</div>

## 🎯 Generate Series

```sql
SELECT generate_series(1, 10, 2);
-- 1, 3, 5, 7, 9

SELECT generate_series(0, 1, 0.1);
-- 0, 0.1, 0.2, ..., 1.0
```

## 🎯 Creating Bins

```sql
WITH bins AS (
    SELECT generate_series(30, 60, 5) AS lower,
           generate_series(35, 65, 5) AS upper
),
ebs AS (
    SELECT unanswered_count
    FROM stackoverflow
    WHERE tag = 'amazon-ebs'
)
SELECT lower, upper, count(unanswered_count)
FROM bins
LEFT JOIN ebs
  ON unanswered_count >= lower AND unanswered_count < upper
GROUP BY lower, upper
ORDER BY lower;
```

<div align="left">

**Result:**

<table style="text-align: left;">
  <tr>
    <th>lower</th>
    <th>upper</th>
    <th>count</th>
  </tr>
  <tr>
    <td>30</td>
    <td>35</td>
    <td>0</td>
  </tr>
  <tr>
    <td>35</td>
    <td>40</td>
    <td>74</td>
  </tr>
  <tr>
    <td>40</td>
    <td>45</td>
    <td>155</td>
  </tr>
  <tr>
    <td>50</td>
    <td>55</td>
    <td>445</td>
  </tr>
  <tr>
    <td>60</td>
    <td>65</td>
    <td>0</td>
  </tr>
</table>

</div>

## 🎯 Correlation

```sql
SELECT corr(assets, equity)
FROM fortune500;
-- 0.637710143588615
```

## 🎯 Median and Percentiles

```sql
-- Discrete percentile
SELECT percentile_disc(0.5)
  WITHIN GROUP (ORDER BY val)
FROM nums;

-- Continuous (interpolated) percentile
SELECT percentile_cont(0.5)
  WITHIN GROUP (ORDER BY val)
FROM nums;
```

<div align="left">

**Sample Result:**

<table style="text-align: left;">
  <tr>
    <th>percentile_disc</th>
    <th>percentile_cont</th>
  </tr>
  <tr>
    <td>3</td>
    <td>3.5</td>
  </tr>
</table>

</div>

## 🎯 Common Numeric Data Issues

* **Special codes for missing values**: `-99`, `99`, `NA`, `NaN`, `N/A`
* **Zeros used as placeholders?**
* **Outliers**: Extremely high/low values
* **Not truly numeric?**
  (e.g., zip codes, survey categories)

## 🎯 Creating Temporary Tables

### Method 1: `CREATE TEMP TABLE ... AS`

```sql
CREATE TEMP TABLE top_companies AS
SELECT rank, title
FROM fortune500
WHERE rank <= 10;
```

### Method 2: `SELECT INTO TEMP`

```sql
SELECT rank, title
INTO TEMP TABLE top_companies
FROM fortune500
WHERE rank <= 10;
```

### Add More Rows

```sql
INSERT INTO top_companies
SELECT rank, title
FROM fortune500
WHERE rank BETWEEN 11 AND 20;
```

### View Table

```sql
SELECT * FROM top_companies;
```

### Delete Table

```sql
DROP TABLE top_companies;
-- Or safely:
DROP TABLE IF EXISTS top_companies;
```