<a href="https://colab.research.google.com/github/pathwaycom/pathway-examples/blob/main/documentation/survival_guide.ipynb" target="_parent"><img src="https://pathway.com/assets/colab-badge.svg" alt="Run In Colab" class="inline"/></a>

# [Colab-specific instructions] Installing Python 3.8+ and Pathway

> Pathway requires Python >=3.8 and works best with Python 3.10.
>
> In the cell below we install pathway into any Python 3.8+ runtime!
> Please:
> 1. **IF running under Google Colab, do "File" -> "Save a copy in Drive"**, before running any cell.
> 2. **Insert in the form below the pip install link** given to you with your beta access.
> 3. **Run the colab notebook (Ctrl+F9)**, disregarding the 'not authored by Google' warning. **The installation and loading time is about 2 minutes**.


In [None]:
#@title ⚙️ Pathway installer. Please provide the pip install link for Pathway:
# Please copy here the installation line:
PATHWAY_INSTALL_LINE='' #@param {type:"string"}

if PATHWAY_INSTALL_LINE.startswith('pip install '):
    PATHWAY_INSTALL_LINE=PATHWAY_INSTALL_LINE[len('pip install '):]

class InterruptExecution(Exception):
    def _render_traceback_(self):
        pass

if '...' in PATHWAY_INSTALL_LINE or not PATHWAY_INSTALL_LINE.startswith('https://'):
    print(
        "⛔ Please register at https://pathway.com/developers/documentation/introduction/installation-and-first-steps\n"
        "to Copy & Paste the Linux pip install line for Pathway!"
    )
    raise InterruptExecution

DO_INSTALL = False
import sys
if sys.version_info >= (3, 8):
    print(f'✅ Python {sys.version} is active.')
    try:
        import pathway as pw
        print('✅ Pathway successfully imported.')
    except:
        DO_INSTALL = True
else:
    print("⛔ Pathway requires Python 3.8 or higher.")
    raise InterruptExecution

if DO_INSTALL:
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || echo "⌛ Installing Pathway. This usually takes 2-3 minutes..."
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || pip install {PATHWAY_INSTALL_LINE} 1>/dev/null 2>/dev/null
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null || echo "⛔ Installation failed. Don't be shy to reach out to the community at https://pathway.com !"
    !ls $(dirname $(which python))/../lib/python*/*-packages/pathway 1>/dev/null 2>/dev/null && echo "✅ All installed. Enjoy Pathway!"


# Pathway: a survival guide
Must-read for both first-timers and veterans alike, this guide gathers the most commonly used basic elements of Pathway.


While the Pathway programming framework comes with advanced functionalities such as [classifiers](https://pathway.com/developers/showcases/lsh/lsh_chapter1) or [fuzzy-joins](https://pathway.com/developers/showcases/fuzzy_join/fuzzy_join_chapter1), it is essential to master the basic operations at the core of the framework.
As part of this survival guide, we are going to walk through the following topics:
* [Select and notations](#select-and-notations)
* [Manipulating the table](#manipulating-the-table)
* [Working with multiples tables: union, concatenation, join](#working-with-multiple-tables-union-concatenation-join)
* [Updating](#updating)
* [Computing](#operations)

If you want more information you can see our complete [API docs](https://pathway.com/developers/documentation/api-docs/pathway) or some of our [tutorials](https://pathway.com/developers/tutorials/suspicious_activity_tumbling_window).

## Prerequisite

Be sure to import Pathway, and we need some tables:

In [1]:
import pathway as pw

t_name = pw.debug.table_from_markdown(
    """
    | name
 1  | Alice
 2  | Bob
 3  | Carole
 """
)
t_age = pw.debug.table_from_markdown(
    """
    | age
 1  | 25
 2  | 32
 3  | 28
 """
)
t_david = pw.debug.table_from_markdown(
    """
    | name  | age
 4  | David | 25
 """
)

We can display a snapshot of our table (for debugging purposes) using `pw.debug.compute_and_print()`:

In [2]:
pw.debug.compute_and_print(t_name)

            | name
^2TMTFGY... | Alice
^YHZBTNY... | Bob
^SERVYWW... | Carole


In the following we will omit this for clarity reasons but keep in mind that it is required to print the actual data at a given time.

## Select and notations

 The main way to manipulate a table in Pathway is by using the `select` operation.

 * **The dot notation**: we can use `select` to select a particular column and we can use the dot notation to specify the name of the column. For example, we can access the column "name" of our `t_extra` table:

In [3]:
t_david.select(t_david.name)

            | name
^8GR6BSX... | David


 * **The bracket notation**: we can also use string to access the column **the bracket notation**. The previous example is equivalent to ```t_david.select(t_david["name"])```.

 * The **this notation**: to refer to the table currently manipulated we can use `pw.this`. Our example becomes `t_david.select(pw.this.name)`.
This notation works for all standard transformers.
    > It can be use to refer to the table, even if it has not been given a name, for example in a successive operations:

In [4]:
t_new_age = t_david.select(new_age=pw.this.age).select(
    new_age_plus_7=pw.this.new_age + 7
)
pw.debug.compute_and_print(t_new_age)

            | new_age_plus_7
^8GR6BSX... | 32


In this example, it would be impossible to refer to the table obtained after the first select (with the column `new_age`) without using `pw.this` as `t_david` still refer to the initial and unmodified table.

 * **left and right notations**: similarly to the this notation, `pw.left` and `pw.right` can be used to manipulate the different tables used in a [join](#working-with-multiples-tables-union-concatenation-join).
    > `left_table.join(right_table, pw.left.C1==pw.right.C2).select(pw.left.C3, pw.right.C4)`
    
For more information about the join and the use of `pw.left` and `pw.right`, you can see the dedicated [section](#working-with-multiples-tables-union-concatenation-join) and [manual](join-manual).

 * The **star * notation**: this notation is used to select all the columns of the manipulated table. `table.select(*pw.this)` will return the full table.
It can be combined with `.without` to remove the unwanted columns:

    > In our example, instead of selecting the "name" column, we want to select all the columns except the "age" one. This is obtained as follows:

In [5]:
t_david.select(*pw.this.without(pw.this.age))

            | name
^8GR6BSX... | David


## Manipulating the table

In addition to `select`, Pathway provides more operators to manipulate and index the tables.

 * **Filtering**: we can use `filter` to keep rows following a given property.

In [6]:
t_age.filter(pw.this.age>30)

            | age
^YHZBTNY... | 32


 * **Reindexing**: you can change the ids (accessible by `table.id`) by using `.with_id_from()`.
We need a table with new ids:

In [7]:
t_new_ids = pw.debug.table_from_markdown(
    """
    | new_id_source
 1  | 4
 2  | 5
 3  | 6
 """
)

In [8]:
t_name.unsafe_promise_universe_is_subset_of(t_new_ids).with_id_from(t_new_ids.new_id_source)

            | name
^8GR6BSX... | Alice
^76QPWK3... | Bob
^C4S6S48... | Carole


Here we need to use `unsafe_promise_universe_is_subset_of`, you can find the explanation in our [article](https://pathway.com/developers/documentation/introduction/key-concepts) about Pathway's concepts.

* **ix**: uses a column's values as indexes.
As an example, if we have a table containing with indexes pointing to another table, we can use this `ix` to obtain those lines:

In [9]:
t_selected_ids = pw.debug.table_from_markdown(
    """
      | selected_id
 100  | 1
 200  | 3
 """
)
t_selected_ids.select(selected=t_name.ix_ref(t_selected_ids.selected_id).name)

            | selected
^M1T2QKJ... | Alice
^9WGHV46... | Carole


* **Group-by**: we can use `groupby` to aggregate data sharing a common property and then use a reducer to compute an aggregated value.

In [10]:
t_spending = pw.debug.table_from_markdown(
    """
    | name  | amount
 1  | Bob   | 100
 2  | Alice | 50
 3  | Alice | 125
 4  | Bob   | 200
 """
)
t_spending.groupby(pw.this.name).reduce(pw.this.name, sum=pw.reducers.sum(pw.this.amount))

            | name  | sum
^TSP7EFT... | Alice | 175
^4PVZ777... | Bob   | 300


You can do groupbys on multiples columns at once (e.g. `.groupby(t.colA, t.colB)`).
The list of all the available reducers can be found [here](https://pathway.com/developers/documentation/api-docs/reducers).

If you want to find out more about the `groupby` and `reduce` operations, you can read our [article](groupby-reduce-manual) about it.

## Working with multiples tables: union, concatenation, join

 * **Union**: we can use the operator `+` or `+=` to add compute the union of two tables sharing the same ids.

In [11]:
t_age = t_age.unsafe_promise_same_universe_as(t_name)
t_union = t_name + t_age


            | name   | age
^2TMTFGY... | Alice  | 25
^YHZBTNY... | Bob    | 32
^SERVYWW... | Carole | 28


* **Concatenation**: we can use `Table.concat(t1,t2)` to concatenate two tables, but they need to have the same ids.

In [12]:
pw.Table.concat(t_union,t_david)

            | name   | age
^531BJZ8... | Alice  | 25
^9SVRC47... | Bob    | 32
^R5XMQ21... | Carole | 28
^C4VQQCA... | David  | 25


As you can see, Pathway may reindex the obtained tables.

> **Info for Databricks Delta users**: Concatenation is highly similar to the SQL [`MERGE INTO`](https://docs.databricks.com/sql/language-manual/delta-merge-into.html) operation.

* **Join**: we can do all usual types of joins in Pathway (inner, outer, left, right). The example below presents an inner join:

In [13]:
t_age.join(t_name, t_age.id==t_name.id).select(t_age.age, t_name.name)

            | age | name
^VJ3K9DF... | 25  | Alice
^R0GE4WM... | 28  | Carole
^V1RPZW8... | 32  | Bob


Note that in the equality `t_age.id==t_name.id` the left part must be a column of the table on which the join is done, namely `t_name` in our example. Doing `t_name.id==t_age.id` would throw an error.

For more visibility, the `pw.left` and `pw.right` notations should be used:

In [14]:
t_age.join(t_name, pw.left.id == pw.right.id).select(pw.left.age, pw.right.name)

            | age | name
^VJ3K9DF... | 25  | Alice
^R0GE4WM... | 28  | Carole
^V1RPZW8... | 32  | Bob


If you want more info about joins, we have an entire [manu\[a\]l](join-manual) about it!

## Updating

* **Adding a new column with a default value** with `select`:

In [15]:
t_age.select(*pw.this, adult=True)

            | age | adult
^2TMTFGY... | 25  | True
^SERVYWW... | 28  | True
^YHZBTNY... | 32  | True


The value can be a basic operation on the columns:

In [16]:
t_age.select(*pw.this, thirties=pw.this.age>=30)

            | age | thirties
^2TMTFGY... | 25  | False
^SERVYWW... | 28  | False
^YHZBTNY... | 32  | True


* **Renaming** with `select`:

In [17]:
t_name.select(surname=pw.this.name)

            | surname
^2TMTFGY... | Alice
^YHZBTNY... | Bob
^SERVYWW... | Carole


 * **Renaming** with `rename_columns`:

In [18]:
t_name.rename_columns(surname=pw.this.name)

            | surname
^2TMTFGY... | Alice
^YHZBTNY... | Bob
^SERVYWW... | Carole


 * **Updating cells**: you can the values of cells using `update_cells` which can be also done using the binary operator `<<`. The ids and column names should be the same.

In [19]:
t_updated_names = pw.debug.table_from_markdown(
    """
    | name
 1  | Alicia
 2  | Bobby
 3  | Caro
 """
)
t_updated_names = t_updated_names.unsafe_promise_same_universe_as(t_name)
t_name.update_cells(t_updated_names)

            | name
^2TMTFGY... | Alicia
^YHZBTNY... | Bobby
^SERVYWW... | Caro


## Operations

* **Row-centered operations** with `apply`: you can apply a function to each value of a column (or more) by using `apply` in a `select`.

In [20]:
t_age.select(thirties=pw.apply(lambda x: x>30, pw.this.age)))

            | thirties
^2TMTFGY... | False
^SERVYWW... | False
^YHZBTNY... | True


Operations on multiples values of a single row can be easily done this way:

In [21]:
t_multiples_values = pw.debug.table_from_markdown(
    """
    | valA    | valB
 1  | 1       | 10
 2  | 100     | 1000
 """
)
t_multiples_values.select(sum=pw.apply(lambda x,y: x+y, pw.this.valA, pw.this.valB))

            | sum
^2TMTFGY... | 11
^YHZBTNY... | 1100


* Other operations with **transformer classes**: Pathway enables complex computation on data stream by using transformer classes.
It is a bit advanced for this survival guide but you can find all the information about transformer classes in [our tutorial](https://pathway.com/developers/documentation/transformer-classes/transformer-intro).