
# Apache Log Format Explanation

### Introduction
This notebook explains the structure of **Apache Access Logs** (Combined Log Format). Each line in such a log file records one HTTP request handled by the server. Understanding these fields helps analyze user behavior, track performance, and detect errors.

---

### Example Log Entry

```
79.133.215.123 - - [14/Jun/2014:10:30:13 -0400] "GET /home HTTP/1.1" 200 1671 "-" "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.153"
```

---

### Field-by-Field Explanation

| Field | Example | Description |
|:------|:---------|:-------------|
| **1. IP Address** | `79.133.215.123` | The IP address of the client (visitor) making the request. |
| **2. Identity (RFC 1413)** | `-` | Client identity determined by identd (rarely used today). Usually `-`. |
| **3. User ID (AuthUser)** | `-` | The username if HTTP authentication was used; otherwise `-`. |
| **4. Timestamp** | `[14/Jun/2014:10:30:13 -0400]` | Date and time when the request was received. `-0400` indicates timezone offset from UTC. |
| **5. Request Line** | `"GET /home HTTP/1.1"` | The HTTP request line including method (`GET`), resource path (`/home`), and protocol version (`HTTP/1.1`). |
| **6. Status Code** | `200` | The HTTP response status code — `200` for success, `404` for not found, `500` for server error, etc. |
| **7. Response Size** | `1671` | The size of the object returned to the client in bytes. |
| **8. Referrer** | `"-"` | The URL of the page that referred the client to this resource. `"-"` means none. |
| **9. User-Agent** | `"Mozilla/5.0 (Windows NT 6.1; WOW64)..."` | Information about the browser or client software making the request. |

---

### Visual Structure of Apache Combined Log Format

```
%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-Agent}i"
```

| Symbol | Field Name | Description |
|:--------|:------------|:-------------|
| `%h` | Remote Host | Client IP address |
| `%l` | Remote Logname | Identity from identd (if available) |
| `%u` | Remote User | Authenticated user (if any) |
| `%t` | Time | Request time |
| `%r` | Request | Request line from the client |
| `%>s` | Status | Final HTTP status code |
| `%b` | Bytes Sent | Size of response in bytes |
| `%{Referer}i` | Referrer | URL of the referring page |
| `%{User-Agent}i` | User Agent | Client browser info |

---

### Example Analysis Goals

- Count requests per IP address.
- Identify the most visited pages.
- Detect 404 (page not found) errors.
- Understand user browsers (based on User-Agent).

---

### Summary
Apache Access Logs provide a detailed record of every request handled by a web server. Each line helps in understanding **who** accessed the server, **when**, **what resource**, and **how** the server responded.

You can use Python, PySpark, or tools like ELK (Elasticsearch, Logstash, Kibana) to analyze these logs for insights into website performance and user activity.
