# 🧠 2.0 Introduction to Programming for Data Analysis

Welcome to the foundation of programming! This notebook introduces key programming concepts and explains what programming is, why it's useful, and how Python fits into the world of data analysis—especially in nutrition, food, and sensory sciences.

**Objectives**:
- Understand what programming is and what it is used for.
- Learn about different programming paradigms.
- Recognise Python’s strengths and why it’s a good fit for this course.

**Context**: Just as recipes guide a chef, code tells the computer what to do. Let’s explore how! 🦛

## 🧾 What is Programming?

Programming is the process of writing **instructions** that a computer can follow to perform tasks.

In nutrition and food science, we use programs to:
- Analyse large datasets (e.g., NDNS or lab results)
- Perform calculations (e.g., nutrient intake, sensory scores)
- Create visualisations (e.g., graphs of dietary intake)
- Automate repetitive tasks

Think of a program like a recipe: clear steps to achieve a tasty result. 🍲

# 🌐 Programming Languages Overview

Just like there are different languages spoken around the world, there are many **programming languages**. Some common ones include:

- Python (our focus)
- R (used in statistics)
- JavaScript (web development)
- C/C++ (high-performance computing)
- SQL (databases)

> **Turing completeness:** All of these general-purpose languages can compute the same class of problems in principle, given enough memory and time.

<details>
<summary>🧪 Examples of Popular Languages</summary>

### Python
```python
print("Hello, hippo!")
```

### R
```r
print("Hello, hippo!")
```

### JavaScript
```javascript
console.log("Hello, hippo!");
```

### C
```c
#include <stdio.h>
int main() {
    printf("Hello, hippo!\n");
    return 0;
}
```

### SQL
```sql
SELECT 'Hello, hippo!' AS message;
```

</details>

<details>
<summary>🧠 Obscure Programming Languages</summary>

#### Ada
```ada
with Ada.Text_IO;
procedure Hello is
begin
   Ada.Text_IO.Put_Line("Hello, hippo!");
end Hello;
```

#### Fortran
```fortran
program hello
  print *, "Hello, hippo!"
end program hello
```

#### COBOL
```cobol
IDENTIFICATION DIVISION.
PROGRAM-ID. HELLO.
PROCEDURE DIVISION.
    DISPLAY "Hello, hippo!".
    STOP RUN.
```

#### Forth
```forth
." Hello, hippo!" CR
```

#### Brainfuck
```brainfuck
+[----->+++<]>+.---.+++++++..+++.[--->+<]>-----.---[->+++<]>-.--------.+++.
```

#### Rust
```rust
fn main() {
    println!("Hello, hippo!");
}
```

#### Lisp
```lisp
(print "Hello, hippo!")
```

#### Julia
```julia
println("Hello, hippo!")
```

#### Pascal
```pascal
program HelloHippo;
begin
  writeln('Hello, hippo!');
end.
```

#### Haskell
```haskell
main = putStrLn "Hello, hippo!"
```

#### Prolog
```prolog
hello :- write('Hello, hippo!'), nl.
```

<details>
<summary>🔍 Very Obscure Languages</summary>

##### INTERCAL
```intercal
PLEASE DO ,1 <- #13
PLEASE READ OUT ,1
```

##### Malbolge
```malbolge
(=<`#9]~6ZY32Vx/4Rs+0Po-&Jk%"Fh#?Dc'BA@;:ZYX
```

##### Whitespace
```whitespace
   	
```

##### Befunge
```befunge
>25*"!dlroW olleH":v
                v:,_@
                >  ^
```

##### Piet
A program written as a 2D image where colors direct control flow — see https://www.dangermouse.net/esoteric/piet.html

</details>
</details>

## 🧭 Programming Paradigms

**Paradigms** are different styles of organising code. The main ones are:

- **Procedural Programming** (like a recipe): You write step-by-step instructions.
- **Object-Oriented Programming** (like describing ingredients as objects): You define “things” (objects) and actions they can perform.
- **Functional Programming** (like maths): You build programs using pure functions.

In this course, we’ll mostly use **procedural** and a little **object-oriented** programming.

## 🔬 Theoretical Foundations

Before we write any code, let’s cover some core ideas that underlie *all* programming:

1. **Algorithm**  
   - A precise, step-by-step procedure for solving a problem.  
   - *Example:* A recipe is an algorithm for baking a cake.

2. **Computational Thinking** (Wing, 2006)  
   - **Decomposition**: Break a big problem into smaller parts.  
   - **Pattern Recognition**: Spot similarities across problems.  
   - **Abstraction**: Focus on the important details, ignore the rest.  
   - **Algorithm Design**: Combine the above into a clear procedure.

3. **Complexity & Big-O**  
   - Measures how runtime (or memory) grows with input size.  
   - *Example:* Linear search → **O(n)**; Bubble sort → **O(n²)**.  
   - Why it matters for NDNS: millions of rows mean you need efficient algorithms.

4. **Von Neumann Architecture**  
   - Programs and data live in the same memory (“stored-program” concept).  
   - The CPU fetches instructions one by one and executes them.

5. **Compiled vs. Interpreted Languages**  
   - **Compiled (C/C++)**: Translated to machine code ahead of time → very fast at runtime.  
   - **Interpreted (Python, R)**: Translated on the fly → slower execution, but faster development and iteration.

> **Why it matters**: Python’s interpreter makes exploration quick, while libraries like NumPy use compiled C under the hood for performance.

## 📚 Abstraction Layers

Programming happens at many levels. Each layer lets you ignore details below it until you need them:

1. **Machine Code** (binary)
2. **Assembly** (mnemonics & registers)
3. **High-Level Languages** (Python, R, JavaScript)
4. **Libraries/Frameworks** (pandas, ggplot2, Dash)
5. **Applications** (your final report or dashboard)

- You don’t write CPU instructions in Python—but under the hood your code becomes reads/writes to memory and arithmetic in the ALU.
- Libraries like pandas are written in C for speed, but expose a friendly Python API so you can work at a high level.

### 🏃‍♂️ Complexity Demo: Linear vs Quadratic

We’ll time two functions:
- `sum_list()` – sums a list in **O(n)** time  
- `pairwise_sum()` – sums every possible pair in **O(n²)** time

In [None]:
import time

def sum_list(lst):
    return sum(lst)

def pairwise_sum(lst):
    total = 0
    for i in lst:
        for j in lst:
            total += i + j
    return total

# Prepare data
data = list(range(2000))  # 2k elements

for fn in (sum_list, pairwise_sum):
    start = time.time()
    fn(data)
    elapsed = time.time() - start
    print(f"{fn.__name__:12} → {elapsed:.3f} sec")

## 🧪 Deep-Dive Exercises

1. **Algorithm Timing**  
   - Implement your own linear search (`find_item`) and naive quadratic search (compare-all pairs) on a list of 1,000 elements.  
   - Time each and compare the growth of runtime.

2. **Pandas Complexity**  
   - Load a small slice of `hippo_diets.csv`.  
   - Time a `groupby` operation (e.g., average calories per hippo) and estimate its complexity.

3. **Reflect**  
   - Why does understanding Big-O help when working with large nutrition datasets?


## 🐍 Why Python?

Python is widely used in data science because it’s:
- Easy to read and write (clean syntax)
- Free and open-source
- Supported by a huge community
- Rich in data analysis tools (e.g., `pandas`, `matplotlib`)

Python works great for everything from analysing diet logs to building interactive visualisations or machine learning models!

## ✅ Summary

In this notebook, we’ve covered:
- What programming is and key paradigms
- Theoretical foundations: algorithms, complexity, abstraction layers
- A practical complexity demo to see Big-O in action
- Why Python is our language of choice for nutrition data analysis

**Next Steps**: Head to `2.1_syntax_variables_comments.ipynb` to write your first lines of Python code! 🦛