# User Navigation Analysis with Markov-Like Model

Analyze user navigation behavior on a website using page-to-page transition data. The focus is on understanding bounce rates, initial entry points, and transition probabilities between pages - without needing full session tracking.

## Assumptions:
Input data is in `site_data.csv` with two columns:
`last_page_id`, `next_page_id`

Page IDs are strings or numbers (e.g. `"3"` or `"5"`)

**Special cases:**

+ `last_page_id = "-1"` → session entry point

+ `next_page_id = "B"` → bounce (user left immediately)

+ `next_page_id = "C"` → close (user finished session normally)

## Core steps:
1. Parse CSV and extract transitions `(s, e)` - from source to destination page.

2. Count transitions between pages and store them as a flat dictionary.

3. Normalize these counts to produce transition probabilities (like a simplified Markov model).

4. Output insights:

    + Distribution of entry pages (`s == "-1"`)

    + Bounce rates per page (`e == "B"`)

## Key benefits:
+ No need for full session history - only direct page-to-page data.

+ Fast and lightweight to compute.

+ Works well even without JavaScript or client-side instrumentation.


# Import

In [67]:
from future.utils import iteritems
import numpy as np

# Display the data

In [68]:
for i, line in enumerate(open('site_data/site_data.csv')):
    if i == 5:
        break
    print(line)

-1,8

4,8

-1,2

1,B

-1,5



# Collect counts

In [69]:
transitions = {}
row_sums = {}

# row_sums[key] = value → {key : value}

for line in open('site_data/site_data.csv'):
    s,e = line.rstrip().split(',')
    transitions[(s,e)] = transitions.get((s,e), 0) + 1
    row_sums[s] = row_sums.get((s), 0) + 1

# transitions → {('-1', '8'): 2016, ...}
# row_sums → {'-1': 2016, ...}

# Normalize

In [70]:
for k,v in iteritems(transitions):
    s,e = k
    transitions[k] = v/row_sums[s]

# transitions → {('-1', '8'): 1.2966894713406547e-14, ... }

# Initial state distribution

In [71]:
print('Initial state distribution:')
for k,v in iteritems(transitions):
    s,e = k
    if s == '-1':
        print(e,v)

Initial state distribution:
8 0.10152591025834719
2 0.09507982071813466
5 0.09779926474291183
9 0.10384247368686106
0 0.10298635241980159
6 0.09800070504104345
7 0.09971294757516241
1 0.10348995316513068
4 0.10243239159993957
3 0.09513018079266758


# Which page has the highest bounce?

In [72]:
for k,v in iteritems(transitions):
    s,e = k
    if e == 'B':
        print(f"Bounce rate for {s}:{v}")

Bounce rate for 1:0.125939617991374
Bounce rate for 2:0.12649551345962112
Bounce rate for 8:0.12529550827423167
Bounce rate for 6:0.1208153180975911
Bounce rate for 7:0.12371650388179314
Bounce rate for 3:0.12743384922616077
Bounce rate for 4:0.1255756067205974
Bounce rate for 5:0.12369559684398065
Bounce rate for 0:0.1279673590504451
Bounce rate for 9:0.13176232104396302
