<a href="https://colab.research.google.com/github/pathwaycom/pathway-examples/blob/main/tutorials/suspicious_user_activity.ipynb" target="_parent"><img src="https://pathway.com/assets/colab-badge.svg" alt="Run In Colab" class="inline"/></a>

# [Colab-specific instructions] Installing Pathway with Python 3.8+

> In the cell below we install pathway into a Python 3.8+ Linux runtime.
> Please:
> 1. **Insert in the form below the pip install command** given to you with your beta access.
> 2. **Run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning. **The installation and loading time is less than 1 minute**.


In [None]:
#@title ⚙️ Pathway installer. Please provide the pip install link for Pathway:
# Please copy here the installation line:
PATHWAY_INSTALL_LINE='pip install --extra-index-url https://packages.pathway.com/... pathway' #@param {type:"string"}

PATHWAY_INSTALL_LINE.replace('pip install --extra-index-url ', '')

class InterruptExecution(Exception):
    def _render_traceback_(self):
        pass

if '...' in PATHWAY_INSTALL_LINE or not PATHWAY_INSTALL_LINE.startswith('https://packages.pathway.com/'):
    print(
        "⛔ Please register at https://pathway.com/developers/documentation/introduction/installation-and-first-steps\n"
        "to Copy & Paste the Linux pip install line for Pathway!"
    )
    raise InterruptExecution

DO_INSTALL = False
import sys
if sys.version_info >= (3, 8):
    print(f'✅ Python {sys.version} is active.')
    try:
        import pathway as pw
        print('✅ Pathway successfully imported.')
    except:
        DO_INSTALL = True
else:
    print("⛔ Pathway requires Python 3.8 or higher.")
    raise InterruptExecution

if DO_INSTALL:
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || echo "⌛ Installing Pathway. This usually takes a few seconds..."
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || pip install --extra-index-url {PATHWAY_INSTALL_LINE} 1>/dev/null 2>/dev/null
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || echo "⛔ Installation failed. Don't be shy to reach out to the community at https://pathway.com !"
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null && echo "✅ All installed. Enjoy Pathway!"


# Detecting suspicious user activity with Tumbling Window group-by

Our task is to detect suspicious user login attempts during some period of time.
The main ingredient used is grouping over a tumbling window.

We have an input data table with following columns:
* `username`,
* whether the login was `successful`,
* `time` of a login attempt,
* `ip_address` of a login.


First we ingest the data.

In [1]:
# Uncomment to download the required files.
# %%capture --no-display
# !wget https://public-pathway-releases.s3.eu-central-1.amazonaws.com/data/suspicious_users_tutorial_logins.csv -O logins2.csv

In [2]:
from datetime import datetime

import pathway as pw

logins = pw.csv.read(
    "logins.csv", value_columns=["username", "successful", "time", "ip_address"]
)

In [3]:
logins = logins.select(
    *pw.this.without(pw.this.successful), successful=(pw.this.successful == "True")
)

In [4]:
logins = logins.select(
    *pw.this.without(pw.this.successful),
    successful=pw.cast(bool, pw.this.successful),
)

We then filter attempts and keep only the unsuccessful ones.

In [5]:
processed = logins.filter(~pw.this.successful)

We now group remaining attempts by login `time` and `ip_address` (ignoring seconds in `time` of login).

In [6]:
by_minutes = processed.select(
    pw.this.ip_address,
    time=pw.apply(
        lambda timestamp_str: (datetime.fromtimestamp(int(timestamp_str)).isoformat())[
            :-2
        ]
        + "00",
        pw.this.time,
    ),
)
grouped_by_minutes = by_minutes.groupby(pw.this.time, pw.this.ip_address)

Next step is to count the logins...

In [7]:
logins_counted = grouped_by_minutes.reduce(
    by_minutes.time, by_minutes.ip_address, count=pw.reducers.count(by_minutes.id)
)

...and to keep only incidents where number of failed logins exceeded the threshold.

In [8]:
suspicious_logins = logins_counted.filter(pw.this.count >= 5)
pw.debug.compute_and_print(suspicious_logins)

            | time                | ip_address    | count
^DKEYHS4... | 2018-12-25T10:30:00 | 50.37.169.241 | 7
