<a href="https://colab.research.google.com/github/pathwaycom/pathway-examples/blob/main/documentation/survival_guide_colab.ipynb" target="_parent"><img src="https://pathway.com/assets/colab-badge.svg" alt="Run In Colab" class="inline"/></a>

# Setting up Python and Pathway

Pathway can be installed to a Python 3.10 environment using pip, please register at https://pathway.com to get beta access to the package.

Insert the pip link:

In [None]:
PIP_PACKAGE_ADDRESS=""

In [None]:
if not PIP_PACKAGE_ADDRESS:
    print(
        "⛔ Please register at https://pathway.com/developers/documentation/introduction/installation-and-first-steps\n"
        "To get the pip package installation link!"
    )

In [None]:
if not (sys.version_info.major==3 and sys.version_info.minor==10):
    raise Exception("Pathway is only built for Python 3.10 at the moment")

In [None]:
# Install pathway's package
!pip install {PIP_PACKAGE_ADDRESS} 1>/dev/null 2>/dev/null

# Pathway: a survival guide
Must-read for both first-timers and veterans alike, this guide gathers the most commonly used basic elements of Pathway.


While the Pathway programming framework comes with advanced functionalities such as [classifiers](https://pathway.com/developers/showcases/lsh/lsh_chapter1) or [fuzzy-joins](https://pathway.com/developers/showcases/fuzzy_join/fuzzy_join_chapter1), it is essential to master the basic operations at the core of the framework.
As part of this survival guide, we are going to walk through the following topics:
* [Selecting and indexing](#selecting-and-indexing)
* [Working with multiples tables: union, concatenation, join](#working-with-multiple-tables-union-concatenation-join)
* [Updating](#updating)
* [Computing](#operations)

If you want more information you can see our complete [API docs](https://pathway.com/developers/documentation/api-docs/pathway) or some of our [tutorials](https://pathway.com/developers/tutorials/suspicious_activity_tumbling_window).

## Prerequisite

Be sure to import Pathway, and we need some tables:

In [1]:
import pathway as pw

t_name = pw.debug.table_from_markdown(
    """
    | name
 1  | Alice
 2  | Bob
 3  | Carole
 """
)
t_age = pw.debug.table_from_markdown(
    """
    | age
 1  | 25
 2  | 32
 3  | 28
 """
)
t_name_extra = pw.debug.table_from_markdown(
    """
    | name  | age
 4  | David | 25
 """
)

We can display a snapshot of our table (for debugging purposes) using `pw.debug.compute_and_print()`:

In [2]:
pw.debug.compute_and_print(t_name)

            | name
^2TMTFGY... | Alice
^YHZBTNY... | Bob
^SERVYWW... | Carole


In the following we will omit this for clarity reasons but keep in mind that it is required to print the actual data at a given time.

## Selecting and indexing
 * **Select**: we can use `select` to select a particular column and we can use the dot notation to specify the name of the column.

In [3]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(t_name_extra.select(t_name_extra.name))
# _MD_COMMENT_END_
# _MD_SHOW_t_extra.select(t_name_extra.name)

            | name
^8GR6BSX... | David


 * **Filtering**: we can use `filter` to keep rows following a given property.

In [4]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(t_age.filter(t_age.age > 30))
# _MD_COMMENT_END_
# _MD_SHOW_t_age.filter(t_age.age>30)

            | age
^YHZBTNY... | 32


 * **Reindexing**: you can change the ids (accessible by `table.id`) by using `.with_id_from()`.
We need a table with new ids:

In [5]:
t_new_ids = pw.debug.table_from_markdown(
    """
    | new_id_source
 1  | 4
 2  | 5
 3  | 6
 """
)

In [6]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(
    t_name.unsafe_promise_universe_is_subset_of(t_new_ids).with_id_from(
        t_new_ids.new_id_source
    )
)
# _MD_COMMENT_END_
# _MD_SHOW_t_name.unsafe_promise_universe_is_subset_of(t_new_ids).with_id_from(t_new_ids.new_id_source)

            | name
^8GR6BSX... | Alice
^76QPWK3... | Bob
^C4S6S48... | Carole


Here we need to use `unsafe_promise_universe_is_subset_of`, you can find the explanation in our [article](https://pathway.com/developers/documentation/introduction/key-concepts) about Pathway's concepts.
XXX: `with_id_from()` works the same, but take the ids of as new ids, as opposed to a dedicated column as in our previous example.

* **ix**: uses a column's values as indexes.
As an example, if we have a table containing with indexes pointing to another table, we can use this `ix` to obtain those lines:

In [7]:
t_selected_ids = pw.debug.table_from_markdown(
    """
      | selected_id
 100  | 1
 200  | 3
 """
)
# _MD_COMMENT_START_
pw.debug.compute_and_print(
    t_selected_ids.select(selected=t_name.ix_ref(t_selected_ids.selected_id).name)
)
# _MD_COMMENT_END_
# _MD_SHOW_t_selected_ids.select(selected=t_name.ix_ref(t_selected_ids.selected_id).name)

            | selected
^M1T2QKJ... | Alice
^9WGHV46... | Carole


* **Group-by**: we can use `groupby` to aggregate data sharing a common property and then use a reducer to compute an aggregated value.

In [8]:
t_spending = pw.debug.table_from_markdown(
    """
    | name  | amount
 1  | Bob   | 100
 2  | Alice | 50
 3  | Alice | 125
 4  | Bob   | 200
 """
)
# _MD_COMMENT_START_
pw.debug.compute_and_print(
    t_spending.groupby(t_spending.name).reduce(
        t_spending.name, sum=pw.reducers.sum(t_spending.amount)
    )
)
# _MD_COMMENT_END_
# _MD_SHOW_t_spending.groupby(t_spending.name).reduce(t_spending.name, sum=pw.reducers.sum(t_spending.amount))

            | name  | sum
^TSP7EFT... | Alice | 175
^4PVZ777... | Bob   | 300


You can do groupbys on multiples columns at once (e.g. `.groupby(t.colA, t.colB)`).
The list of all the available reducers can be found [here](#) (available soon).

## Working with multiples tables: union, concatenation, join

 * **Union**: we can use the operator `+` or `+=` to add compute the union of two tables sharing the same ids.

In [9]:
t_age = t_age.unsafe_promise_same_universe_as(t_name)
t_union = t_name + t_age
# _MD_COMMENT_START_
pw.debug.compute_and_print(t_union)
# _MD_COMMENT_END_
# _MD_SHOW_

            | name   | age
^2TMTFGY... | Alice  | 25
^YHZBTNY... | Bob    | 32
^SERVYWW... | Carole | 28


* **Concatenation**: we can use `Table.concat(t1,t2)` to concatenate two tables, but they need to have the same ids.

In [10]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(pw.Table.concat(t_union, t_name_extra))
# _MD_COMMENT_END_
# _MD_SHOW_pw.Table.concat(t_union,t_name_extra)

            | name   | age
^531BJZ8... | Alice  | 25
^9SVRC47... | Bob    | 32
^R5XMQ21... | Carole | 28
^C4VQQCA... | David  | 25


As you can see, Pathway may reindex the obtained tables.

> **Info for Databricks Delta users**: Concatenation is highly similar to the SQL [`MERGE INTO`](https://docs.databricks.com/sql/language-manual/delta-merge-into.html) operation.

* **Join**: we can do all usual types of joins in Pathway (inner, outer, left, right). The example below presents an inner join:

In [11]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(
    t_age.join(t_name, t_age.id == t_name.id).select(t_age.age, t_name.name)
)
# _MD_COMMENT_END_
# _MD_SHOW_t_age.join(t_name, t_age.id==t_name.id).select(t_age.age, t_name.name)

            | age | name
^VJ3K9DF... | 25  | Alice
^R0GE4WM... | 28  | Carole
^V1RPZW8... | 32  | Bob


Note that in the equality `t_age.id==t_name.id` the left part must be a column of the table on which the join is done, namely `t_name` in our example. Doing `t_name.id==t_age.id` would throw an error.

## Updating

* **Renaming** with `select`:

In [12]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(t_name.select(surname=t_name.name))
# _MD_COMMENT_END_
# _MD_SHOW_t_name.select(surname=t_name.name)

            | surname
^2TMTFGY... | Alice
^YHZBTNY... | Bob
^SERVYWW... | Carole


 * **Renaming** with `rename_columns`:

In [13]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(t_name.rename_columns(surname=t_name.name))
# _MD_COMMENT_END_
# _MD_SHOW_t_name.rename_columns(surname=t_name.name)

            | surname
^2TMTFGY... | Alice
^YHZBTNY... | Bob
^SERVYWW... | Carole


 * **Updating cells**: you can the values of cells using `update_cells` which can be also done using the binary operator `<<`. The ids and column name should be the same.

In [14]:
t_updated_names = pw.debug.table_from_markdown(
    """
    | name
 1  | Alicia
 2  | Bobby
 3  | Caro
 """
)
t_updated_names = t_updated_names.unsafe_promise_same_universe_as(t_name)
# _MD_COMMENT_START_
pw.debug.compute_and_print(t_name.update_cells(t_updated_names))
# _MD_COMMENT_END_
# _MD_SHOW_t_name.update_cells(t_updated_names)

            | name
^2TMTFGY... | Alicia
^YHZBTNY... | Bobby
^SERVYWW... | Caro


## Operations

* **Row-centered operations** with `apply`: you can apply a function to each value of a column (or more) by using `apply` in a `select`.

In [15]:
# _MD_COMMENT_START_
pw.debug.compute_and_print(t_age.select(thirties=pw.apply(lambda x: x > 30, t_age.age)))
# _MD_COMMENT_END_
# _MD_SHOW_t_age.select(thirties=pw.apply(lambda x: x>30, t_age.age)))

            | thirties
^2TMTFGY... | False
^SERVYWW... | False
^YHZBTNY... | True


Operations on multiples values of a single row can be easily done this way:

In [16]:
t_multiples_values = pw.debug.table_from_markdown(
    """
    | valA    | valB
 1  | 1       | 10
 2  | 100     | 1000
 """
)
# _MD_COMMENT_START_
pw.debug.compute_and_print(
    t_multiples_values.select(
        sum=pw.apply(
            lambda x, y: x + y, t_multiples_values.valA, t_multiples_values.valB
        )
    )
)
# _MD_COMMENT_END_
# _MD_SHOW_t_multiples_values.select(sum=pw.apply(lambda x,y: x+y, t_multiples_values.valA, t_multiples_values.valB))

            | sum
^2TMTFGY... | 11
^YHZBTNY... | 1100


* Other operations with **transformer classes**: Pathway enables complex computation on data stream by using transformer classes.
It is a bit advanced for this survival guide but you can find all the information about transformer classes in [our tutorial](https://pathway.com/developers/documentation/transformer-classes/transformer-intro).